Monthly Archives: August 2010

Pangenomes

A project that started with the question, “how many microbial genes are there in the world?” has grown to potentially lead to answers to this and broader questions about the microbial universe. First, known taxa (E. coli) were organized into matrices, with strains as rows, and proteins as columns. Hamming distances define a metric for organizing strains into phylogenetic trees. The phylogenetic distance is the importance of the split between the strains, or the alpha score, as refered to in d-splits literature. This approach became our main focus when we attempted the same heuristic with viral data, with surprisingly strong results. At present, we are taking “pie slices” of the phage proteonomic tree, and seeing to what extent we can recreate that observed internal structure, as a “proof of concept” for viral applicability. Reading and work on splitstrees, d-splits, and consecutive ones property, will drive the next developments. In addition, this coming week, on August 18th, our group will be attending a lecture on whole genome taxonomy, which should help drive further progress on our project.

PhiRAST

Phages are the most abundant biological entities on the planet and have had tremendous impact on biological sciences; however, phage genomes lag behind bacterial and eukaryotic genomes in the quality of annotation. For this purpose, the PhAnToMe project was launched to establish a phage annotation database, a rapid annotation pipeline for phage genomes (PhiRAST or phage Rapid Annotation Using Subsystem Technology), and a graphic programming interface for biologists (using the BioBIKE interface).

The PhAnToMe project involves multiple research centers in the United States and includes several stages. The SDSU center is in charge of developing phage genomic subsystems, phage protein families (FIGFams), and subsequently the first release of PhiRAST. As a member of this team, I am in charge of building or coordinating subsystems, and of establishing links with the phage research community. In addition, once the PhiRAST is developed, I will also be in charge of coordinating training workshops and developing testable hypotheses based on the PhAnToMe annotations and subsystems.

Our transposase paper featured among top 5% and on the cover of Nucleic Acids Research paper issue

Our paper “Transposases are the most abundant, most ubiquitous genes in nature” has been featured on the cover of Nucleic Acids Research Vol 38 Issue 13. The paper is also featured among the journal’s top 5% articles.

About Rob

After receiving a Ph. D. from the University of Sussex, in England studying nitrogen regulation in bacteria, Dr. Edwards moved to the United States to continue studying. As a Post-Doctoral Researcher at the University of Pennsylvania, Philadelphia, Edwards researched how a leading cause of traveller’s diarrhea (E. coli) causes disease. Dr. Edwards then moved to the University of Illinois, Urbana Champaign to study another food-borne pathogen, Salmonella. These studies merged the nascent area of genomics with traditional microbial genetics to investigate how a particular type of Salmonella became the leading cause of food-borne illness in the United States, and Edwards showed that phages are responsible for the diseases that Salmonella causes in different animals.

From 2000 to 2004, Dr. Edwards was an Assistant Professor at the University of Tennessee Health Sciences Center in Memphis, TN. Here, Dr. Edwards continued studying pathogenic bacteria, notably Salmonella and the bioterrorism weapon Francisella. Dr. Edwards received FBI clearance to work on these bacteria and was invited to the NIH to comment on the use of Select Agents at basic research laboratories.

In 2004, Dr. Edwards moved to the non-profit Fellowship for Interpretation of Genomes to work at the interface of biologists and computer scientists and worked with their team at Argonne National Laboratory. Edwards remains an active software developer for Argonne and the Fellowship, developing open-source software including PERL and Python software for biological analysis and parallel computing that are used by scientists worldwide. Using breakout DNA sequencing technologies, Dr. Edwards’ studies have continually pushed the forefront of both sequencing technology and bioinformatics. Edwards’ work has been published in leading journals including multiple papers in both Nature and Science.

Dr. Edwards returned to academia in 2007, taking a research and teaching position in the Departments of Computer Science and Biology at San Diego State University and quickly rose through the ranks to become a Full Professor, continuing to work at the interface of biology and computing. The National Institutes of Health, the National Science Foundation, the Department of Education, the Department of Defense, the USGS, and private donors funded Dr. Edwards’ research at SDSU, and that work led to breakthroughs in our understanding of how viruses interact with their hosts, and how viruses from around the world carry important genetic information. Dr. Edwards has continued to push current sequencing and bioinformatics technologies, in 2013 Edwards took a next-generation sequencing machine to the remote Southern Line Islands to explore metagenomics of coral reefs in real-time. In 2014 Dr. Edwards’ team identified a virus that is present in the intestines of approximately half the people in the world, and in 2019 Dr. Edwards demonstrated the global spread of the virus in a paper that includes collaborators from every continent who collected and sequenced samples. In 2017, Dr. Edwards was elected to the American Academy of Microbiology in recognition of contributions to the field of microbiology. In 2020, Dr. Edwards took the position of Matthew Flinders Fellow in Bioinformatics at Flinders University, in Adelaide, South Australia, Australia to start the Flinders Accelerator for Microbiome Exploration, to enhance microbiome and metagenome studies in South Australia.

Committed to teaching, Dr. Edwards received the Graduate Student Award for the Outstanding Educator at the University of Tennessee, the Teacher-scholar Award and the outstanding faculty award four times at San Diego State University. Edwards was the Graduate Advisor to the Biological and Medical Informatics Program at SDSU. Edwards travels extensively to share a passion for bioinformatics and has taught bioinformatics classes in Australia, Africa, China, Chile, Europe, Mexico, and North and South America.

In addition to science and teaching Dr. Edwards is also an advanced scientific SCUBA diver having led teams to study Coral Reefs all over the world. Edwards is also an avid international yachtsman, navigating in long-distance offshore races, including navigating the 2019 TransPac race from Los Angeles to Honolulu finishing 4th out of 89 boats, and racing from Adelaide to Port Lincoln in South Australia.

Here are some photos of Rob you are free to use with appropriate credits.

PhAnToMe

The lab’s spearhead PhAnToMe project is funded by the National Science Foundation to understand viral life. We are researching the genomics of viruses that infect bacteria — phages — with Dr. Mya Breitbart (Univ. Southern Florida), Dr. Matt Sullivan (U. Arizona), and Dr. Jeff Elhai (Virginia Commonwealth University). These viruses are the most abundant biological entities on the planet, and are responsible for many of the evolutionary changes that bacteria undergo. Phages carry virulence genes that allow bacteria to cause disease, they carry photosynthetic genes that allow bacteria to grow in the oceans, and they carry many genes that we don’t even know what they do. Our project will unearth the role of some of those genes and proteins, and help biologists get to grips with the most diverse parts of microbiology — the phages. At the PhAnToMe website you can browse complete genomes, and download phage genomes and associated data.

Research

Rob Edwards’ bioinformatics lab at San Diego State University is all about decoding life’s best kept secrets. These secrets are encoded, as you must have already guessed, in genomes of bacteria, archaea, eukaryotes and the viruses that infect them.

We use all kinds of computers, from clusters to cell phones, to solve the most unsolvable computational problems that help us better understand biology.

We are funded by the National Science Foundation to explore phage genomes, through our PhAnToMe project, and to explore phage metagenomes (and the unknown genes in them) through our new Viral Dark Matter Project.

Rob has collaborations all over the world, and has taught in Europe, Asia, and Latin America. We are currently funded by the Department of Education through the Fund for the Improvement of Postsecondary Education and the Brazilian Ministry of Education (FIPSE-CAPES) to develop a marine sciences course in Brazil.

Rob has published over 60 peer-reviewed papers, and given an equal number of talks. A short biography about Rob describes his background, and his CV has more information. You can contact Rob for more information.

Lab highlights

CrAssphage

Sequencing on the boat

SEED Facelift

The new project I’m working on is a facelift for the TBLASTX results that are displayed on the Phage SEED here at edwardslab. The current one is functional but a little bit hard to read. Also it’s not very easy on the eyes, in my opinion. So I have taken up the charge of giving it a so-called facelift.

The code for the SeedViewer is all .cgi, so I can use web-friendly scripting language I want, I’ve chosen Perl to be consistent with the rest of the SEED’s programming.

I haven’t chosen a graphics library because I’m not sure what I want, and I haven’t done enough research into pros and cons. Or enough research period.

I am becoming skilled in Perl and am coding the framework as I spin my wheels with regards to the graphics.

Current graphics choices are: GD graphics library (simple, easy, on-demand .jpg/.png generation), Flash (popular, high-power, highest quality, interactive), Cairo (Pros/Cons not researched YET).

One big question I have is that the data I need is stored on Octopussy and updated regularly. I’m unsure whether I want to just open a pipe up to Octopussy, read the file, then copy it to a local data structure in memory, or copy the files over regularly via a daemon or script, and just make sure that I’m cleaning up after myself so I don’t blow up the server. There are also probably third or fourth options. If you think of anything, let me know. The first sounds slow and hard on memory, but local storage-efficient. The second requires me to write a daemon or script that then spends a LONG time copying files from one server to another. I’m not sure that’s the best way of doing things.

Perl tips: saving a hash to the disk

From: Perl Cookbook

use Storable;  store(%hash, "filename");  # later on...   $href = retrieve("filename");        # by ref %hash = %{ retrieve("filename") };   # direct to hash

OR From Perl Monks

#Save use Data::Dumper; $Data::Dumper::Purity = 1; open FILE, ">$outfile" or die "Can't open '$outfile':$!"; print FILE Data::Dumper->Dump([$main], ['*main']); close FILE; #restore open FILE, $infile; undef $/; eval ; close FILE;

Perl tips: saving a hash to the disk

From: Perl Cookbook

use Storable;  store(%hash, "filename");  # later on...   $href = retrieve("filename");        # by ref %hash = %{ retrieve("filename") };   # direct to hash

OR From Perl Monks

#Save use Data::Dumper; $Data::Dumper::Purity = 1; open FILE, ">$outfile" or die "Can't open '$outfile':$!"; print FILE Data::Dumper->Dump([$main], ['*main']); close FILE; #restore open FILE, $infile; undef $/; eval ; close FILE;

EdwardsLab

Delivering the best in bioinformatics…

Monthly Archives: August 2010

Pangenomes

PhiRAST

Our transposase paper featured among top 5% and on the cover of Nucleic Acids Research paper issue

About Rob

PhAnToMe

Research

SEED Facelift

Perl tips: saving a hash to the disk

Perl tips: saving a hash to the disk