Finding sets of paralogs in phages

Input: BlastP files (all versus all)

Outputs:

1) Lists of proteins with nearly exact paralogs (nearly exact is up to you; E value < e-10 or 70% similarity or both or whatever) in each genome

2) Sets of paralogs (pairs or triplicates or more)

3) Number of sets per genomes. This allows later to ask the question of which genomes tend to have more paralogs and what factors control this.

EdwardsLab