Monthly Archives: February 2011

Extreme caution is needed when sequencing

One of my friends from U of I sent me an interesting discussion of a recent paper with a potentially fatal error…..Note the article’s editor…hehe. Anyway, the paper talks about a horizontally transferred gene from the human genome to the genome of the intracellular pathogen, Neisseria gonorrhoeae (http://mbio.asm.org/content/2/1/e00005-11.full).

The following blog has an interesting description of a more plausible reason for the published finding:

http://pathogenomics.bham.ac.uk/blog/2011/02/human-dna-in-bacterial-genomes-yes-no-maybe/

Reminds me of the story about Shewanella and Burkholderia being the clearly dominating organisms in one of the Sargasso Sea metagenomes…(http://www.nature.com/nrmicro/journal/v3/n6/pdf/nrmicro1158.pdf)

Lab Responsibilities

There are many people responsible for making the lab run smoothly. These are the key people for these tasks. However, if you notice something that needs fixing, you should fix it and not wait for someone else. Remember to clean up after yourself!

  1. Printer and office supplies. Sajia
  2. Posters in the hall. RobS
  3. Food area, fridge, and buying soda. Cristiane+Daniel
  4. Organizing lab meeting. Cristiane
  5. Organizing journal club. Pedro
  6. Organizing social odds. Blaire
  7. Organizing lab clean up. RobS
  8. Machine/Website/Mailing list accounts. RobE

We also need to work on the lab website. This is everyones responsibility, too. These people are responsible for arranging the sections, but you should provide the content for them.

  1. Front Page. Daniel
  2. Research and Projects. Sandi+HQ
  3. Lab members. Geni+Haydee
  4. Software. Jeff
  5. Publications
    1. All publications. Sajia
    2. White papers. Sajia
    3. Talks. Jeremy
    4. Posters. RobS
  6. Photos. Kate
  7. Intranet
    1. Lab responsibilities. RobS
    2. Hardware and Software in the lab. RobE+Blaire
  8. Lab Blog and User Accounts. Geni

Computers in the lab

We have a bunch of computers for day-to-day use. Some machines are on the public network, some are on the sdsu network but behind the firewall, and some are on the edwards lab internal network.

Public Network

You can access these machines from anywhere in the world!

edwards.sdsu.edu (external IP: 130.191.27.146; internal IP 192.168.0.20)

This is the main server for the lab. You should not use it for computing except under certain circumstances, and it houses all of the web sites that we use.

This machine has http/https ports open, and ssh available from outside the firewall on port 7010. There are a wide range of domain names that point to this machine!

pipeline1.acel.sdsu.edu (external IP: 130.191.27.145; internal IP: 192.168.0.1)

This is the main gateway to the lab, and to the internal network. This has a few http services, and you can use this for developing new webpages, etc. but it is also routing all traffic from the internal network to the outside world).

This machine has an exemption to the sdsu firewall policy, and so almost every port is open. You can ssh to this on port 22 (the default port).

pipeline3.acel.sdsu.edu (external IP: 130.191.27.147)

This is mainly used by Mary Thomas’ group for GCOM development

SDSU Network

You can access these machines from within SDSU.

octopussy.sdsu.edu (external IP: 130.191.28.81)

This is a general workhorse machine. Its a 32-bit 8-core machine that still has plenty of life left in it. Use at will for anything you want.

goldeneye (external IP: 130.191.27.151) [note that this is not a fully qualified domain name, you need to use the ip address to connect to it]

Another workhorse, this is a 64-bit 8-core machine that you can use for anything.

anthill.sdsu.edu (external IP: 130.191.226.86)

This is a 12-node 8-cores per node (most of the time) compute cluster running SGE. You should use this for repetitive tasks that are less time sensitive (because sometimes the cluster is full).

Internal Network

You can only access these machines from within the network, and you need to be on pipeline1.acel.sdsu.edu to reach them. You can use either the name or the IP address to access these machines.

pipeline0 (internal IP: 192.168.0.10) and pipeline2 (internal IP: 192.168.0.11)

These are older 32-bit 4-core machines that are perfect for development and are not usually used by anyone. If you want a machine to yourself, these would be good candidates!

shortboard (internal IP: 192.168.0.30)

This machine is currently backing up several other machines and also storing the HMP data.

longerboard (internal IP: 192.168.0.40)

Currently backing up one other machine, and housing much of the GCOM data.

rambox (internal IP: 192.168.0.50)

This 64-bit 12-core computing monster has 198 GB of RAM for you to play with. Use this for all your heavy lifting needs.

 

Automated analysis of ARISA data using ADAPT system – White Paper

This white paper was written by Robert in 2008 while working on different community profiling projets that combined microbiolgy and computer science.

The white paper describes a computational system consisting of the database ADAPTdb and the program ADAPT for the analysis of Automated Ribosomal Intergenic Spacer Analysis (ARISA) data sets. ARISA is a method for analyzing the composition of microbial communities that is both faster and cheaper than other community profiling techniques. In an application example, we describe the use of the tool for an unpublished data set and compare the results to work previously published using different analysis methods. Although there have been many papers until 2008 that used ARISA to analyze community samples, there were none that described computational approaches that allow the automatic analysis of the raw data sets. We have taken the manual process, automated it, and developed a web-based program for the automatic analysis, including taxonomic classifications, as well as autotrophic/heterotrophic and pathogenic/non-pathogenic comparisons.

This paper was submitted to BMC Bioinformatics and reviewed, but it was never published because of the comments that reviewer #2 made.

If this software is published in BMC Bioinformatics, this software will likely be used by many colleagues who will not look carefully at how ARISA works and doesn’t work. As a result, many analyses of microbial diversity will be highly flawed. Eventually, the community will learn the inaccuracy provided by the program but not before lots of scarce resources are spent and many meaningless papers are published.

That’s a pretty harsh criticism of their fellow microbial ecologists, who are apparently too stupid to understand their work and analyze their data.

Frankly, whoever wrote that review should be ashamed of themselves. Given comments like that there was no incentive for us to carry on making software that would mislead people, and so we never bothered.  The journal was not interested in publishing the paper, and we are not interested in helping people that are idiots.

Here are the complete reviews and the paper so you can decided for yourself. If you use ARISA, please cite this paper as:

Schmieder, R., Haynes, M., Dinsdale, E., Rohwer, F., and Edwards, R.A. Automated analysis of ARISA data using ADAPT system. 2009. https://edwardslab.wpengine.com/adapt/

The ADAPT paper (1.4 MB).

reviewer #1 comments

Reviewer #2 comments

Random Community Genomics (metagenomics) – White Paper

 

The random community genomics white paper was written by Rob in the spring of 2006 while traveling around Europe for meetings. You’ll notice that some of the problems have been solved, but many have been ignored!

Random community genomics, sequencing whole DNA without growing the microbes or cloning their DNA is now a reality. Our group, alone, has sequenced in excess of 850 M bp of DNA from environmental samples. Sample preparation and sequencing is very cheap and easy, costing less than $500 per million bp. The major limitation towards the advancement of our understanding of environmental genomics is no longer our ability to see the DNA. It is the lack of access to high-performance computing. Once the technologies and techniques that are being used be selected labs today become commonplace, there will be overwhelming demand for computational power beyond anything that is readily available.

Random community genomics, sequencing whole DNA without growing the microbes or
cloning their DNA is now a reality. Our group, alone, has sequenced in excess of 850 M bp of
DNA from environmental samples. Sample preparation and sequencing is very cheap and easy,
costing less than $500 per million bp.
The major limitation towards the advancement of our understanding of environmental
genomics is no longer our ability to see the DNA. It is the lack of access to high-performance
computing.
Once the technologies and techniques that are being used be selected labs today become
commonplace, there will be overwhelming demand for computational power beyond anything
that is readily available.

You can download the white paper (0.2 MB).