The tools fastq-dump
and fasterq-dump
are used to extract reads from the Sequence Read Archive and export them to (for example) fastq format. There is a hidden gotcha that you should be aware of using fastq-dump
to extract data.

The tools fastq-dump
and fasterq-dump
are used to extract reads from the Sequence Read Archive and export them to (for example) fastq format. There is a hidden gotcha that you should be aware of using fastq-dump
to extract data.
CentOS 8 does not include a command that responds to python
! Here are some solutions to this!
Sometimes when you look at a record in RefSeq/GenBank it is a virtual record that is really a pointer to a set of records. For example, the entry for Callorhinchus milii isolate IMCB2004 points you to the WGS records AAVX02000001-AAVX02067420. Here we show how to get these records.
As part of the STRIDES initiative, the NIH has moved the SRA to the cloud. This includes the metadata, and the whole SRA archive. Here, I show how to set up a new instance to access the sequence read archive in the cloud. In a separate post, we’ll explore getting the metadata out of bigtable.
Continue readingWe have just made the transition of most of the servers from CentOS6 or CentOS7 to CentOS8. Most everything should be unified on CentOS8 (unless you know what you are doing).
This brings several new changes (as always) and some added benefits. This is a summary and does not reflect all the changes.
To check your servers operating system version, use this command:
cat /etc/redhat-release
The biggest changes should allow you to install software by yourself! There are two different ways you can install easily install software if either are supported by whatever you are trying to install.
Please note, that if you do not want to do either of these, it is fine. Just let me know and I am happy to install software for you (and everyone else) to use.
A lot of bioinformatics software is now available via conda. It is installed globally, but you can not install packages globally. You can create your own environment and then use that.
The first time you use conda, you will need to create a local environment. Start with:
source /usr/local/anaconda3/bin/activate
conda create --name <username>
But use your username instead of <username>
!
After this has run, any time you need to use conda, you can use the command
conda activate <username>
And you will get into your environment.
A simple test is to install my fastq-pair
package and see if it works:
conda install -c bioconda fastq-pair
once it has installed, this command should give some output
fastq-pair
Another popular way of sharing software is by using docker. We don’t support docker, but we support a drop-in replacement called podman.
Anywhere you see docker, you can use podman instead. For example, we created a focus docker image for the cami challenge described here: https://hub.docker.com/r/linsalrob/cami-focus and you can install that with
podman pull linsalrob/cami-focus
If you are trying to run some python code and don’t have the appropriate library, you should be able to use pip install as a user to add it. For example:
pip3 install --user xmlschema
this will install the appropriate libraries into your account. Of course, if you want them globally installed, just let me know.
Deprecated | Alternate | Used For | Alternative |
screen | tmux | Virtual terminals. You should use this! | tmux has similar keys to screen but uses ctrl-b instead of ctrl-a to access them. eg. create a new window: “ctrl-b n ” |
cd-hit | mmseqs | Clustering sequences | cd-hit is still an option if you want, but mmseqs2 appears to be much better |