CAMI challenge datasets

CAMI (Critical Assessment of Metagenome Interpretation) is a community-led initiative designed to help tackle the problems faced by metagenomics analyses, aiming for an independent, comprehensive and bias-free evaluation of these metagenomics pipelines [source]. As part of the challenge, several simulated datasets were generated in order to evaluate each of the assembly, profiling, and binning tools submitted for review. Three distinct datasets were generated simulating microbiomes of varying complexities: low, medium, and high complexity. A pre-print version of the CAMI manuscript can be found on bioRxiv here: http://biorxiv.org/content/early/2017/01/09/099127
This blog post contains links to the binning and profiling results for those datasets.

The datasets and results were downloaded from the CAMI data website here: https://data.cami-challenge.org/participate

ComplexityDescription# of samplesTotal sizeRead lengthInsert size (mean)
LowSimulated Illumina HiSeq data, small insert size115Gbp2 x 150bp270bp
MediumMedium complexity community, sampled twice, with differential abundances of respective organisms, and short and long insert sizes used for sequencing: 2 Hiseq samples from each with small insert sizes of 15 Gbp. 2 Hiseq samples with large insert sizes (5kb insert) of 5 Gbp240Gbp2 x 150bp270bp and 5kbp
HighTime series with 5 Hiseq samples of 15 Gbp each with small insert sizes sampled from a complex microbial community575Gbp2 x 150bp270bp

 

Downloads

File notes

Profile results are tab-delimited plain text files with header lines and column headers. Columns begin on the 5th line. Column headers:

  1. @@TAXID
  2. RANK
  3. TAXPATH
  4. TAXPATHSN
  5. PERCENTAGE
  6. _CAMI_genomeID
  7. _CAMI_OTU
  • TAXPATH and TAXPATHSN are vertical bar (|) delimited between taxonomy IDs and names.

Binning results are tab-delimited, gzipped plain text files with header lines and column headers. Columns begin on the 4th line. Column headers:

  1. @@SEQUENCEID
  2. BINID
  3. TAXID
  4. _READID
  • SEQUENCEID column maps to sequence IDs in raw read files.

Low complexity

FASTA sequences: download [4.6 GB]
FASTQ sequences: download [9.6 GB]
Profile results: download [22 KB]
Binning results: download [448 MB]


Medium complexity

Sample 1 270bp insert FASTA sequences: download [4.6 GB]
Sample 1 270bp insert
FASTQ sequences: download [9.6 GB]
Sample 1 5kbp insert FASTA sequences: download [1.5 GB]
Sample 1 5kbp insert
FASTQ sequences: download [3.2 GB]
Sample 1 profile results: download [69 KB]
Sample 1 270bp insert binning results: download [481 MB]
Sample 1 5kbp insert binning results: download [153 MB]

Sample 2 270bp insert FASTA sequences: download [4.6 GB]
Sample 2 270bp insert
FASTQ sequences: download [9.6 GB]
Sample 2 5kbp insert
FASTA sequences: download [1.5 GB]
Sample 2 5kbp insert
FASTQ sequences: download [3.2 GB]
Sample 2 profile results: download [69 KB]
Sample 2 270bp binning results: download [482 MB]
Sample 2 5kbp binning results: download [154 MB]


High complexity

Sample 1 FASTA sequences: download [4.6 GB]
Sample 1 FASTQ sequences: download [9.6 GB]
Sample 1
profile results: download [252 KB]
Sample 1 binning results: download [559 MB]

Sample 2 FASTA sequences: download [4.6 GB]
Sample 2
 FASTQ sequences: download [9.6 GB]
Sample 2 profile results: download [252 KB]
Sample 2 binning results: download [566 MB]

Sample 3 FASTA sequences: download [4.6 GB]
Sample 3 
FASTQ sequences: download [9.6 GB]
Sample 3
 profile results: download [252 KB]
Sample 3 binning results: download [572 MB]

Sample 4 FASTA sequences: download [4.6 GB]
Sample 4 
FASTQ sequences: download [9.6 GB]
Sample 4
 profile results: download [252 KB]
Sample 4 binning results: download [572 MB]

Sample 5 FASTA sequences: download [4.6 GB]
Sample 5 
FASTQ sequences: download [9.6 GB]
Sample 5
 profile results: download [252 KB]
Sample 5 binning results: download [574 MB]