“plus ça change, plus c’est la même chose “
Jean-Baptiste Alphonse Karr
So its 2023 and the new [sequencing] kid on the block, MGI, hasn’t figured out adapter trimming.
Here is a quick primer [pun intended] on the easiest way to remove primers and filter reads.
Use fastp and give it this file of Illumina adapters to trim against.
I use this command to filter and remove adapters from our sequences
mkdir output
fastp -n 1 -l 100 -i fastq/$R1 -I fastq/$R2 -o output/$R1 -O output/$R2 --adapter_fasta IlluminaAdapters.fa
If you have a lot of files, you can easily wrap this in a for loop to process all the files in a directory
mkdir fastq_fastp
for R1 in $(find fastq -name \*R1\* -printf "%f\n"); do
R2=${R1/R1/R2}
fastp -n 1 -l 100 -i fastq/$R1 -I fastq/$R2 -o fastq_fastp/$R1 -O fastq_fastp/$R2 --adapter_fasta IlluminaAdapters.fa
done
In addition to trimming all adapter sequences, this will remove any sequence with an N
(yes, one or more), and remove any sequence shorter than 100 bp. fastp
will also make sure that your reads are paired as well!