Changing the label of paired-end sequences in FASTQ files

Programs such as MIRA require that paired-end reads are labeled accoring to specific rules. If your paired-end reads, for example, have the same name and are not marked with either /1 and /2 or _F and _R, you can add this using this oneliner:

cat file_1.fastq | paste - - | sed 's/^\(\S*\)/\1\/1/' | tr "\t" "\n" > file_1_renamed.fastq

cat file_2.fastq | paste - - | sed 's/^\(\S*\)/\1\/2/' | tr "\t" "\n" > file_2_renamed.fastq

The cat command will print the file content (to STDOUT).

The paste command will join two lines of a FASTQ file into a single line

The sed command will add the /1 or /2 to the sequence identifiers

The tr command will replace the tabs with line breaks, which is basically an undo of the paste command (in a simplified explanation).

The “>” sign will write the renamed output to the file specified after it.


This assumes that you have a sequence identifier for both the sequence and the quality line. If you have an empty header line for the quality entry (just +), then you can use “paste – – – -” instead of “paste – -” to rename your FASTQ files.