Category Archives: SRA

Hidden SRA metadata

The metadata in the SRA is not all the data you can get about a run. Here is how to get more data about a run from the SRA without going to the SRA website.

Continue reading →

Runs and Experiments in the SRA

Recall that in the SRA A project (SRP) has one or more samples, a sample (SRS) has one or more experiments (SRX), and an experiment has one or more runs (SRR). [source: davetang.org]

How many experiments only have one run, and how many experiments have lots of runs?

Continue reading →

Instruments in the SRA

While answering some reviewers comments, I pulled out this data about the instruments used to submit data to the SRA. Clearly the HiSeq and MiSeq are dominating the number of runs that people are submitting.

Continue reading →

Describing metagenomes in the SRA

I love standards; there are always so many to choose from. The sequence read archive strives hard to capture appropriate information about the sequences that people deposit, but in the end scientists are people too, and they are never uniform and standard. This means there are a lot of ways to describe metagenomes. To get your data used by other people (and cite your papers), make sure you tag it so we can find it!

Continue reading →

All the ways to get metagenomics data from the SRA

There is a lot of metagenomics data in the SRA, but it is not very well organized. To get it all, you need some wicked SQL-FU … or you can copy these recipes!

Continue reading →

SRA attributes

These are all the attributes in the SRA files

Continue reading →

Getting data from the SRA

Getting data from the NCBI Sequence Read Archive is not easy. Here we combine a few of our posts to go step by step through getting the data.

Continue reading →

fastq-dump

NCBI’s fastq-dump has to be one of the worst-documented programs available online. The default parameters for fastq-dump are also ridiculous and certainly not what you want to use. They also have absolutely required parameters mixed in with totally optional parameters, and so you have no idea what is required and what is optional. Here, we take a look at some of the options and hopefully help you decide which parameters to run.

Continue reading →

fastq-dump options

Not all the options available to fastq-dump are listed on the NCBI website. It is not a very well documented program! Here is the current list.

Continue reading →

SRA Metadata

The sequence read archive (aka short read archive) SRA metadata is complex! This is a brief guide to help you navigate it.

One key thing to remember is that:

A project (SRP) has one or more samples. However, projects are in the table called study.
A sample (SRS) has one or more experiments (SRX).
An experiment has one or more runs (SRR).

[source: davetang.org]

What you really want are the runs, and this is how you can get them!

Continue reading →

EdwardsLab

Delivering the best in bioinformatics…

Category Archives: SRA

Hidden SRA metadata

Runs and Experiments in the SRA

Instruments in the SRA

Describing metagenomes in the SRA

All the ways to get metagenomics data from the SRA

SRA attributes

Getting data from the SRA

fastq-dump

fastq-dump options

SRA Metadata