We have been working on updating access to the SRA metadata. In previous posts, we used the SQLite database provided by the Meltzer lab, but for a variety of reasons, we are now using the XML provided by the NCBI.
Continue readingTag Archives: sra
Metagenomes with and without crAssphage
There is lots of crAssphage in the world, and there are lots of metagenomes in the sequence read archive. Can we find those metagenomes that do, or do not, have crAssphage in them in the SRA? Lets try…
Hidden SRA metadata
The metadata in the SRA is not all the data you can get about a run. Here is how to get more data about a run from the SRA without going to the SRA website.
Runs and Experiments in the SRA
Recall that in the SRA A project (SRP) has one or more samples, a sample (SRS) has one or more experiments (SRX), and an experiment has one or more runs (SRR). [source: davetang.org]
How many experiments only have one run, and how many experiments have lots of runs?
Aspera, fastq-dump, and prefetch
To download things from NCBI a bit faster, you can try aspera connect. This is proprietary, closed-source, software that the NCBI uses for large data transfers, but to run it in batch you need to figure out where to download it from and what to do with it.
Continue reading
SRA attributes
These are all the attributes in the SRA files
SRA Metadata
The sequence read archive (aka short read archive) SRA metadata is complex! This is a brief guide to help you navigate it.
One key thing to remember is that:
A project (SRP) has one or more samples. However, projects are in the table called study.
A sample (SRS) has one or more experiments (SRX).
An experiment has one or more runs (SRR).
[source: davetang.org]
What you really want are the runs, and this is how you can get them!