Sorting FASTQ files by their sequence identifiers

In certain cases, you need to sort FASTQ files by their sequence identifiers (e.g. to fix the order of paired-end or mate-pair sequences). There are several ways of sorting the FASTQ files, but the simplest way is usually the best. Here is a one liner to do the job:

cat file.fastq | paste - - - - | sort -k1,1 -t " " | tr "\t" "\n" > file_sorted.fastq

The cat command will print the file content (to STDOUT).

The paste command will join the four lines of a FASTQ entry into a single line, each original line separated by a tab.

The sort command will sort each line using everything before the first space (which is our sequence identifer).

The tr command will replace the tabs with line breaks, which is basically an undo of the paste command (in a simplified explanation).

The “>” sign will write the sorted output to the file specified after it.