I had to rant about this. In phage P22, Moak and Molineux showed that gp7 is a murein hydrolase, the enzyme that breaks down the peptidoglycan layer and allows the phage to enter the host [here’s the paper]. Cool, we can find gp7, and annotate it is a Phage murein hydrolase. Here’s the snag. The original paper doesn’t have the DNA or protein sequence. Well, we go find gp7 in the genome [here’s the P22 genome in GenBank] and there is a gene called P22gp07, whose protein sequence starts “MQIKTKGDLVRAALRKLGVASD….”
However, if we look for P22 gp7 not in the genome, we find a completely different protein, whose sequence starts “mlhaftlgrk lrgeepsype…”.
Which one is the real gp7, and which one is the interloper. Of course, the people that annotated the genome just started with gp1 at the start of the genome, and incremented. While that’s one way to do it, it completely screws up anyone trying to use historical literature to annotate genomes.
Caveat emptor!