Site icon EdwardsLab

Is there a difference between a ‘query to db’ BLAST and a ‘db to query’ BLAST

Often times the question comes up whether there is a difference between using your query file as the query in a BLAST when compared to using the query file as a database.  There is, in fact a difference between the two, and that difference is in the e-value.   Consider the following db file:

>one
acgtacgtagCtagctagctagctgactagc
>two
acgtacgtagAtagctagctagctgactagc
>three
acgtacgtagTtagctagctagctgactagc
>four
acgtacgtagGtagctagctagctgactagc

And the following query file:

>qs1
acgtacgtagCtagctagctagctgactagc

When you have a blast database, the e-value is calculated from the bit score, and takes into account the size of the database, to give an estimate of the probability of finding a certain sequence in a database. In the example above, you see the same four sequences which only differ by one base.  When you BLAST the query sequence qs1 against the database of four sequences, and since you see the sequence qs1 quite often in the database, your e-values will be higher(worse).

$ blastn -db db.fna -query query.fna -outfmt '6'
qs1 one 100.00  31  0   0   1   31  1   31  7e-15   58.4
qs1 four    96.77   31  1   0   1   31  1   31  3e-13   52.8
qs1 three   96.77   31  1   0   1   31  1   31  3e-13   52.8
qs1 two 96.77   31  1   0   1   31  1   31  3e-13   52.8

However if you reverse the database and query, because there is only a single unique sequence qs1 in your database, your e-values will be lower(better).

$ blastn -db query.fna -query db.fna -outfmt '6'
one qs1 100.00  31  0   0   1   31  1   31  2e-15   58.4
two qs1 96.77   31  1   0   1   31  1   31  9e-14   52.8
three   qs1 96.77   31  1   0   1   31  1   31  9e-14   52.8
four    qs1 96.77   31  1   0   1   31  1   31  9e-14   52.8

The two red highlighted e-values should be identical, but they are not.

While the skew is only marginal here, consider if you had a database consisting of the millions of bases.

Exit mobile version