Monthly Archives: August 2010

Pangenomes

A project that started with the question, “how many microbial genes are there in the world?” has grown to potentially lead to answers to this and broader questions about the microbial universe. First, known taxa (E. coli) were organized into matrices, with strains as rows, and proteins as columns. Hamming distances define a metric for organizing strains into phylogenetic trees. The phylogenetic distance is the importance of the split between the strains, or the alpha score, as refered to in d-splits literature. This approach became our main focus when we attempted the same heuristic with viral data, with surprisingly strong results. At present, we are taking “pie slices” of the phage proteonomic tree, and seeing to what extent we can recreate that observed internal structure, as a “proof of concept” for viral applicability. Reading and work on splitstrees, d-splits, and consecutive ones property, will drive the next developments. In addition, this coming week, on August 18th, our group will be attending a lecture on whole genome taxonomy, which should help drive further progress on our project.

PhiRAST

Phages are the most abundant biological entities on the planet and have had tremendous impact on biological sciences; however, phage genomes lag behind bacterial and eukaryotic genomes in the quality of annotation. For this purpose, the PhAnToMe project was launched to establish a phage annotation database, a rapid annotation pipeline for phage genomes (PhiRAST or phage Rapid Annotation Using Subsystem Technology), and a graphic programming interface for biologists (using the BioBIKE interface).

The PhAnToMe project involves multiple research centers in the United States and includes several stages. The SDSU center is in charge of developing phage genomic subsystems, phage protein families (FIGFams), and subsequently the first release of PhiRAST. As a member of this team, I am in charge of building or coordinating subsystems, and of establishing links with the phage research community. In addition, once the PhiRAST is developed, I will also be in charge of coordinating training workshops and developing testable hypotheses based on the PhAnToMe annotations and subsystems.

About Rob

Robert Edwards – Biography

After receiving his Ph. D. from the University of Sussex, in England studying nitrogen regulation in bacteria, Dr. Edwards moved to the United States to continue his studies. He worked as a Post-Doctoral Researcher with Dr. Dieter Schifferli at the University of Pennsylvania, Philadelphia, understanding the mechanisms and regulation of virulence of enterotoxigenic E. coli, a leading cause of traveler’s diarrhea. Dr. Edwards then moved to the University of Illinois, Urbana Champaign to work with Dr. Stanley Maloy on understanding the virulence of Salmonella. These studies merged the nascent area of genomics with traditional microbial genomics to investigate the virulence of Salmonella enterica serovar Enteritidis, a leading bacterial cause of food-borne illness. During this period, Dr. Edwards began sequencing several Salmonella genomes, and began collaborating on the open-source BioPerl project, to which he remains an active contributor.

From 2000 to 2004, Dr. Edwards was an Assistant Professor at University of Tennessee Health Sciences Center in Memphis, TN. Here, Dr. Edwards continued his studies on pathogenic bacteria, notably Salmonella and the class A Select Agent Francisella. Dr. Edwards was responsible for overseeing the renovation of space into a BSL-3 laboratory, capable of handling Select Agents, and for registering that facility with the CDC. Dr. Edwards received FBI clearance to work on Select Agents, and was invited to the NIH to comment on the use of Select Agents at basic research laboratories.

In 2004, Dr. Edwards moved to the non-profit Fellowship for Interpretation of Genomes to work at the interface of biologists and computer scientists. He remains an active software developer for the Fellowship, and helps guide their development direction through liaisons with microbiological researchers. Using breakout technologies, like pyrosequencing and high throughput bioinformatics analysis, Dr. Edwards’ studies are pushing the forefront of both sequencing technology and bioinformatics. This work was highlighted in three independent publications in Nature at the start of 2008. Dr. Edwards maintains interactions with mathematicians and computer scientists, developing open source software such as the PERL modules for biological analysis and parallel computing he has released to the community through BioPerl and CPAN.

Most recently, Dr. Edwards has returned to academia, and taken a research and teaching position in the Department of Computer Science at San Diego State University. Here he is continuing to work at the interface of biology and computing, but also expanding his research into grid-enabled research and high performance computing. Dr. Edwards has written open-source code for high performance parallel computing that is used worldwide.

Dr. Edwards research is currently funded by the National Science Foundation, and aims to bring high performance computing to the smallest biological organisms – the viruses. Edwards’ research is leading to breakthroughs in our understanding of how viruses interact with their hosts, and how viruses samples from around the world carry important genetic information.

Committed to teaching, Dr. Edwards received the SGAEC award for outstanding educator at the University of Tennessee, teacher-scholar award (2008) and outstanding faculty award (2009) at San Diego State University. He has taught bioinformatics classes around the US, in Brazil, China, Europe, and Mexico. He is funded by the Department of Education to develop joint courses in marine sciences between San Diego and Brazil.

Dr. Edwards maintains strong interactions with biologists, working closely with groups sequencing uncultured microbes (“metagenomes”) from diverse environments such as human samples, oceans, coral reefs, and mines. In addition to bioinformatics analysis Dr. Edwards is also a scientific SCUBA diver having studied both Pacific and Atlantic Coral Reefs, and also enjoys racing sailboats.

PhAnToMe

The lab’s spearhead PhAnToMe project is funded by the National Science Foundation to understand viral life. We are researching the genomics of viruses that infect bacteria — phages — with Dr. Mya Breitbart (Univ. Southern Florida), Dr. Matt Sullivan (U. Arizona), and Dr. Jeff Elhai (Virginia Commonwealth University). These viruses are the most abundant biological entities on the planet, and are responsible for many of the evolutionary changes that bacteria undergo. Phages carry virulence genes that allow bacteria to cause disease, they carry photosynthetic genes that allow bacteria to grow in the oceans, and  they carry many genes that we don’t even know what they do. Our project will unearth the role of some of those genes and proteins, and help biologists get to grips with the most diverse parts of microbiology — the phages. At the PhAnToMe website you can browse complete genomes, and download phage genomes and associated data.

Research

edwardslab_may_2014

Rob Edwards’ bioinformatics lab at San Diego State University is all about decoding life’s best kept secrets. These secrets are encoded, as you must have already guessed, in genomes of bacteria, archaea, eukaryotes and the viruses that infect them.

We use all kinds of computers, from clusters to cell phones, to solve the most unsolvable computational problems that help us better understand biology.

We are funded by the National Science Foundation to explore phage genomes, through our PhAnToMe project, and to explore phage metagenomes (and the unknown genes in them) through our new Viral Dark Matter Project.

Rob has collaborations all over the world, and has taught in Europe, Asia, and Latin America. We are currently funded by the Department of Education through the Fund for the Improvement of Postsecondary Education and the Brazilian Ministry of Education (FIPSE-CAPES) to develop a marine sciences course in Brazil.

Rob has published over 60 peer-reviewed papers, and given an equal number of talks. A short biography about Rob describes his background, and his CV has more information. You can contact Rob for more information.

Lab highlights

CrAssphage

Sequencing on the boat

SEED Facelift

The new project I’m working on is a facelift for the TBLASTX results that are displayed on the Phage SEED here at edwardslab. The current one is functional but a little bit hard to read. Also it’s not very easy on the eyes, in my opinion. So I have taken up the charge of giving it a so-called facelift.

The code for the SeedViewer is all .cgi, so I can use web-friendly scripting language I want, I’ve chosen Perl to be consistent with the rest of the SEED’s programming.

I haven’t chosen a graphics library because I’m not sure what I want, and I haven’t done enough research into pros and cons. Or enough research period.

I am becoming skilled in Perl and am coding the framework as I spin my wheels with regards to the graphics.

Current graphics choices are: GD graphics library (simple, easy, on-demand .jpg/.png generation), Flash (popular, high-power, highest quality, interactive), Cairo (Pros/Cons not researched YET).

One big question I have is that the data I need is stored on Octopussy and updated regularly. I’m unsure whether I want to just open a pipe up to Octopussy, read the file, then copy it to a local data structure in memory, or copy the files over regularly via a daemon or script, and just make sure that I’m cleaning up after myself so I don’t blow up the server. There are also probably third or fourth options. If you think of anything, let me know. The first sounds slow and hard on memory, but local storage-efficient. The second requires me to write a daemon or script that then spends a LONG time copying files from one server to another. I’m not sure that’s the best way of doing things.

 

Perl tips: saving a hash to the disk

From: Perl Cookbook
use Storable;  store(%hash, "filename");  # later on...   $href = retrieve("filename");        # by ref %hash = %{ retrieve("filename") };   # direct to hash
OR From Perl Monks

#Save use Data::Dumper; $Data::Dumper::Purity = 1; open FILE, ">$outfile" or die "Can't open '$outfile':$!"; print FILE Data::Dumper->Dump([$main], ['*main']); close FILE; #restore open FILE, $infile; undef $/; eval ; close FILE;
 

Perl tips: saving a hash to the disk

From: Perl Cookbook
use Storable;  store(%hash, "filename");  # later on...   $href = retrieve("filename");        # by ref %hash = %{ retrieve("filename") };   # direct to hash
OR From Perl Monks

#Save use Data::Dumper; $Data::Dumper::Purity = 1; open FILE, ">$outfile" or die "Can't open '$outfile':$!"; print FILE Data::Dumper->Dump([$main], ['*main']); close FILE; #restore open FILE, $infile; undef $/; eval ; close FILE;