Aspera, fastq-dump, and prefetch

To download things from NCBI a bit faster, you can try aspera connect. This is proprietary, closed-source, software that the NCBI uses for large data transfers, but to run it in batch you need to figure out where to download it from and what to do with it.

You can download it from Aspera, but when you try and install it you have to first install it as a local user, and then copy the files to somewhere useful.

The installer will install things in your home directory, and politely warn you that it is installing only for the local user. However, if you try and run it as sudo to (presumably) install for all users, it will fail!

In addition, the installer installs the required software, and then just hangs. It doesn’t do anything!

Install aspera as a local user, probably somewhere like $HOME/.aspera/connect

Take a look in that directory and see if you have bin/ascp, wait until you are bored, and then ctrl-c out of the ascp installer.

You will need to find two files:

  1. The ascp executable binary, which is probably in $HOME/.aspera/connect/bin/ascp
  2. The ascp ssh key, which is probably in $HOME/.aspera/connect/etc/asperaweb_id_dsa.openssh

Now, you can use ascp with either prefetch or fastq dump. Try this command. You need a pipe between the two file names above, but you should now get much faster downloads!

 

prefetch --ascp-path '$HOME/.aspera/connect/bin/ascp|$HOME/.aspera/connect/etc/asperaweb_id_dsa.openssh' ERR1303010