We have been working on updating access to the SRA metadata. In previous posts, we used the SQLite database provided by the Meltzer lab, but for a variety of reasons, we are now using the XML provided by the NCBI.
Continue reading
We have been working on updating access to the SRA metadata. In previous posts, we used the SQLite database provided by the Meltzer lab, but for a variety of reasons, we are now using the XML provided by the NCBI.
Continue readingThere is lots of crAssphage in the world, and there are lots of metagenomes in the sequence read archive. Can we find those metagenomes that do, or do not, have crAssphage in them in the SRA? Lets try…
The metadata in the SRA is not all the data you can get about a run. Here is how to get more data about a run from the SRA without going to the SRA website.
Recall that in the SRA A project (SRP) has one or more samples, a sample (SRS) has one or more experiments (SRX), and an experiment has one or more runs (SRR). [source: davetang.org]
How many experiments only have one run, and how many experiments have lots of runs?
To download things from NCBI a bit faster, you can try aspera connect. This is proprietary, closed-source, software that the NCBI uses for large data transfers, but to run it in batch you need to figure out where to download it from and what to do with it.
Continue reading
These are all the attributes in the SRA files
The sequence read archive (aka short read archive) SRA metadata is complex! This is a brief guide to help you navigate it.
One key thing to remember is that:
A project (SRP) has one or more samples. However, projects are in the table called study.
A sample (SRS) has one or more experiments (SRX).
An experiment has one or more runs (SRR).
[source: davetang.org]
What you really want are the runs, and this is how you can get them!