Category Archives: SRA

Describing metagenomes in the SRA

I love standards; there are always so many to choose from. The sequence read archive strives hard to capture appropriate information about the sequences that people deposit, but in the end scientists are people too, and they are never uniform and standard. This means there are a lot of ways to describe metagenomes. To get your data used by other people (and cite your papers), make sure you tag it so we can find it!

Continue reading

fastq-dump

NCBI’s fastq-dump has to be one of the worst-documented programs available online. The default parameters for fastq-dump are also ridiculous and certainly not what you want to use. They also have absolutely required parameters mixed in with totally optional parameters, and so you have no idea what is required and what is optional. Here, we take a look at some of the options and hopefully help you decide which parameters to run.

Continue reading

SRA Metadata

The sequence read archive (aka short read archive) SRA metadata is complex! This is a brief guide to help you navigate it.

One key thing to remember is that:

A project (SRP) has one or more samples. However, projects are in the table called study.
A sample (SRS) has one or more experiments (SRX).
An experiment has one or more runs (SRR).

[source: davetang.org]

What you really want are the runs, and this is how you can get them!

Continue reading