Virus Hunting in the Cloud Codeathon v2!

We are pleased to announce the second installment of the Virus Hunting Codeathon!

From 4-6 November, 2019, the NCBI will help run a bioinformatics codeathon in College Park, MD  hosted by the UMIACS and CBCB at the University of Maryland. We are going to put a few hundred thousand metagenomic datasets on cloud infrastructure and further identify known, taxonomically definable and novel viruses with even faster approaches!  We’re specifically looking for folks who have experience in Computational Virus Hunting or adjacent fields! If this describes you, please apply! This event is for researchers, including students and postdocs, who are already engaged in the use of bioinformatics data or in the development of pipelines for virological analyses from high-throughput experiments. The event is open to anyone selected for the codeathon and willing to travel to College Park (see below).

Working groups of five to six individuals will be formed into five to eight teams.  These teams will build pipelines to analyze large datasets within a cloud infrastructure. 


  • Fast, federated indexing
    • Big Query
  • Metadata features 
  • Genome graphs for viruses
  • Approximate taxonomic analysis
  • Domain/HMM Boundary and Taxonomic Refinement
  • Bringing together approximate taxonomy and domain models
  • Sequence data quality metrics
  • Phage-host interactions

The final list of projects will be unveiled before the codeathon starts, and will build off of previous NCBI codeathons.


After a brief organizational session, teams will spend three days addressing a challenging set of scientific problems related to a group of datasets. Participants will analyze and combine datasets in order to work on these problems. We will be writing code and solving problems.

Throughout the three days will breakout to discuss progress on each of the topics, bioinformatics best practices, coding styles, etc.


Datasets will come from public repositories, with a focus on metagenomics datasets in the sequence read archive that were been ported to cloud infrastructure, as well as derivative contigs of the above.


All pipelines and other scripts, software, and programs generated in this codeathon will be added to a public GitHub repository designed for that purpose (currently, but a new one may exist by the event).

Manuscripts describing the design and usage of the software tools constructed by each team may be submitted to an appropriate journal such as the F1000Research hackathons channel, BMC Bioinformatics, GigaScience, Genome Research or PLoS Computational Biology.  Ideally, we will present a searchable, streamlined virological index from these datasets on cloud infrastructure.

How To Apply

To apply, please complete this form (approximately 10 minutes to complete). Initial applications are due Monday, October 7th, 2019 by 3 pm ET. Participants will be selected based on the experience and motivation they provide on the form.

Prior participants and applicants are especially encouraged to apply. The first round of accepted applicants will be notified on October 8th by 11:59 pm ET, and have until October 11th at 4 pm ET to confirm their participation.  International applicants or those with particular skillsets may be accepted early. If you confirm, please make sure it is highly likely you can attend, as confirming and not attending prevents other data scientists from attending this event. Please include a monitored email address, in case there are follow-up questions.

Note: Participants will need to bring their own laptop to this program. A working knowledge of scripting (e.g., Shell, Python, R) is useful but not necessary to be successful in this event. Employment of higher level scripting or programming languages may also be useful. Applicants must be willing to commit to all three days of the event.

No financial support for travel, lodging or meals is available for this event. Also, note that the codeathon may extend into the evening hours each day. Please make any necessary arrangements to accommodate this possibility. Depending on the number of people that need accommodation, we will attempt to get a group rate at one of the local hotels. Please indicate on the registration form if you need a hotel room.

There will be no registration fee or cost associated with attending this event.

For more information, or with any questions, please contact Ben Busby ( ) with any questions.