phage terminase orientation

I was recently looking at the relative orientation of phage terminase genes along the genome. Here is a little summary.

I have a set of phage genomes that I downloaded from GenBank a while back. I was curious about the relative orientation of the terminase subunits – are they both oriented the same direction, or are they sometimes in opposite directions? Phages are notorious for their syntenic organization, and it is one of the features we use in PhiSpy to differentiate prophage regions from bacterial regions.

First, we extract a list of all potential terminase regions from our GenBank files. There are several ways to do this, but a simple perl onliner is a straightforward way to get the coordinates and the function of the protein (what genbank calls the product):

perl -ne 'if (/^\s+CDS.*\d\.\.\d/) {chomp; $cds=$_} if (/terminase/i && /\/product=/) {print "$ARGV\t$cds\t$_"}' * > ../terminases.txt

This gets us the terminase genes from 2,165 genomes (when I did the analysis).

Because we’re doing this one file at a time the genomes are grouped. Therefore we can look for just those genomes whose terminases are on different strands:

perl -lane 'next if ($F[3] !~ m#/product#); $test=($F[2] =~ /complement/); if ($last eq $F[0]) {if ($test != $lasttest) {print "$lastline\n$_\n"}} $lasttest=$test; $lastline=$_; $last=$F[0]' terminases.txt

This gives us a list of those terminases that are on different strands.

There are 38 genomes whose “terminase genes” are on different strands. However, this is a simple heuristic, and they may not be real terminase genes. How to check? I would run them through PATRIC and see how they reannotate.