Category Archives: Phage

PhiGenometrics1

Phage Genometrics. I’m only 23 years late!

“Eureka! If I just ran these analyses 23 years earlier!”

This is what I was telling myself few minutes after I was so excited at looking at the nice graphs I generated by analyzing the GC% statistics of about 600 phage genomes (graphs below).

As any good scientist (although they would tell you otherwise), I did the analysis before digging the literature deeply. I just got this urge to perform all kind of calculations on the phage codons and compare them to the phage overall GC%. Of course I know well that the third nucleotide in a codon is under much less pressure than other nucleotides, and thus varies depending on which genome it is located in (depending on many factors including the translational machinery of the host). However, I didn’t know whether anybody bothered to look at the difference between codon GC statistics and genome GC statistics, or the slopes of codon GC% versus genome GC%.

Whether the data below are new or not (they are new of course, since they are done on new sequences, but old since the conclusion was reached in 1987–before some of our lab members were born–, or maybe earlier), the figures look really cool! Another neat study from 1997 is here.

PhiGenometrics1

Fig. 1: GC% statistics for individual protein-coding genes (all phage proteins in PhAnToMe)

PhiGenometrics2

Fig. 2: GC% statistics for individual phages (all phages in PhAnToMe)

PhiGenometrics3

Fig. 3: nucleotide GC% (first and second derivatives) vs genome GC% statistics for individual phages (all phages in PhAnToMe)

1.tblastx.268746.3

Phages without borders (1)

Phages are everywhere. We know this very well. But where is everywhere? Can we locate particular phages in certain ecosystems? Is there a pattern there?

This new PhAnToMe Labs tool may help us answer this question, or at least get the right questions:

1) Are cyanophages enriched in certain marine metagenomes?

2) Are enteric bacteriophages enriched in mammalian fecal samples?

3) Can we locate a certain phage entirely in one metagenome (remember the classical riddle: can one locate the same virus twice somewhere?)

 


 

Here are some examples:

Continue reading

Phage module of the day: Phage DNA packaging

A major challenge for a bacteriophage is to quickly pack lengthy (relative to its size) DNA in newly formed phage heads (capsids). This packaging involves “pressurizing” this DNA in the available space.

A good place to start reading about it is this review by Rao and Feiss: The Bacteriophage DNA Packaging Motor (Annu. Rev. Genet. 2008. 42:647–81).

Once you get the big picture, follow these proteins in phage genomes in the Phage Packaging Machinery subsystem.

DNA packaging

 

 

Continue reading

Phage of the day: Bacteriophage r1t

Why am I working on phage r1t today?

I was working on mycobacteriophages, starting with Che9d; but I found out that since Rob was working on a phage closely related to that one, we were continuously reversing each other’s annotations.

On the other hand, r1t is quite important because: 

i) I know some things about it

ii) it has been (re-)annotated by Brüssow’s group, known to be miticulous and accurate

iii) more importantly, it has multiple close relatives in my favorite organism, Streptococcus pyogenes

Phages and Mycobacterial virulence

Mycobacterium tuberculosis is the cause of tb, and is still a major health problem worldwide. There are a class of proteins that have no known functions, called the PE-PGRS family of proteins. They may be fibronectin binding proteins (fibronectin is one of those things that holds us together at the cellular level), they are involved in disease because if you delete some out Mycobacteria don’t grow as well, and they could be variable surface antigens that allow the bacteria to avoid being seen by our immune system.

Regardless of what it is actually doing, the phages like it. Several of the Mycobacterial phages contain PE-PGRS proteins, suggesting, again, that the phages are helping their hosts cause disease.

Will the real P22 gp7 please stand up?

I had to rant about this. In phage P22, Moak and Molineux showed that gp7 is a murein hydrolase, the enzyme that breaks down the peptidoglycan layer and allows the phage to enter the host [here’s the paper]. Cool, we can find gp7, and annotate it is a Phage murein hydrolase. Here’s the snag. The original paper doesn’t have the DNA or protein sequence. Well, we go find gp7 in the genome [here’s the P22 genome in GenBank] and there is a gene called P22gp07, whose protein sequence starts “MQIKTKGDLVRAALRKLGVASD….”

However, if we look for P22 gp7 not in the genome, we find a completely different protein, whose sequence starts “mlhaftlgrk lrgeepsype…”.

Which one is the real gp7, and which one is the interloper. Of course, the people that annotated the genome just started with gp1 at the start of the genome, and incremented. While that’s one way to do it, it completely screws up anyone trying to use historical literature to annotate genomes.

Caveat emptor!

 

Hummingbirds and phages

Hummingbird

Image credit: http://www.deskpicture.com/DPs/Nature/Animals/hummingbird.jpg

As I’m sitting outside Café Vita and getting ready to work on phage subsystems, I can’t avoid being distracted by this number of bees and hummingbirds surrounding me. I have never had such a close view of a hummingbird. Because phage subsystems are keeping me still, these birds are totally peaceful around me. I am getting a really close view of hummingbirds working on sucking flowers. I have never noticed their beaks before. They are really interesting–well adapted to their feeding style (read about co-evolution of hummingbirds and their favorite flowers). And  because phage subsystems are keeping my mind busy, I can’t help but draw an analogy between phage modules and bird modules.

The hummingbird has several modules: a flight module (the unique wings),  a feeding module (i.e., the beak), a body module, etc. There may be other birds with same bodies but different beaks, same wings but different feet, and so forth. Phages are similar. Phage genomes, and subsequently their encoded proteomes, are modular: a set of clustered protein-encoding genes (Pegs) are dedicated to encode the phage heads (capsids); another set encodes the tails; a third set encodes host-specificity proteins, and so on. If a phage “decides” to “feed on” a novel bacterial host (for several reasons including the extinction of its old host), the phage will have to switch its host specificity. An entire phage can thus “exchange modules” with one that attacks the new host. For example, a phage in the human throat may face a crisis when the human host uses an antibiotic for a couple of weeks. The phage may be forced to switch hosts from streptococci (almost extinct after antibiosis) to bacteroides, for example (I’m just making this up). To do so, the phage needs proteins that are specific to Bacteroides.

Unlike birds, the phage cannot afford the slow process of mutagenesis and selection for evolving bacteroides-specific attack molecules. Instead, it would just “exchange” it with another phage that “knows how” to attack bacteroides but has been unsuccessful in replicating inside these anaerobic bacteria (probably due to bacterial immunity). The novel, re-invented, phage will keep the successful modules that replicate well (from the streptococcal phage) and the bacteroides-specific module from the less successful bacteroides phage.

 

Finding sets of paralogs in phages

Input: BlastP files (all versus all)

Outputs:

1) Lists of proteins with nearly exact paralogs (nearly exact is up to you; E value < e-10 or 70% similarity or both or whatever) in each genome

2) Sets of paralogs (pairs or triplicates or more)

3) Number of sets per genomes. This allows later to ask the question of which genomes tend to have more paralogs and what factors control this.