Perl one liner to extract sequences by their ID from a FASTA file

The first one liner is useful if you only want to extract a few sequences by their identifier from a FASTA file.

perl -ne 'if(/^>(\S+)/){$c=grep{/^$1$/}qw(id1 id2)}print if $c' fasta.file

This will extract the two sequences with the sequence idenfiers id1 and id2. You only have to change the identifiers within the parentheses and separate them by space to extract the sequences you need.

If you have a large number of sequences that you want to extract, then you most likely have the sequence identifiers in a separate file. Assuming that you have one sequence identifier per line in the file ids.file, then you can use this one line:

perl -ne 'if(/^>(\S+)/){$c=$i{$1}}$c?print:chomp;$i{$_}=1 if @ARGV' ids.file fasta.file

EdwardsLab

Delivering the best in bioinformatics…

Perl one liner to extract sequences by their ID from a FASTA file

Related