SoCal Hackathon 2018

We are pleased to announce the SoCal Bioinformatics Hackathon.

From 10-12 January, 2018, the NCBI will help run a bioinformatics hackathon in Southern California hosted by San Diego State University!  The hackathon will focus on advanced bioinformatics analysis of next generation sequencing data, proteomics, and metadata. This event is for researchers, including students and postdocs, who have already engaged in the use of bioinformatics data or in the development of pipelines for bioinformatics analyses from high-throughput experiments. Some projects are available to other non-scientific developers, mathematicians, or librarians.

The event is open to anyone selected for the hackathon and willing to travel to SDSU (see below).


Working groups of five to six individuals will be formed into five to eight teams.  These teams will build pipelines and tools to analyze large datasets within a cloud infrastructure.  Potential subjects for this iteration include:

  • Identify phages and viruses from metagenomes
  • Classify SRA datasets by source
  • Identify QTLs in plants
  • A tool to automatically obtain expression and variation data for any gene from an SRA dataset
  • Use machine learning to characterize viral sequences
  • Develop a Machine Learning Tool to Differentiate Between Synthetic and Natural Genomic Regions in Plants.
  • Compute human ancestral alleles from chimp, gorillas, orangutan and macaque; and provide API access to ancestral allele for a given position on human genome GRCh38.

We are looking for new topics: If you would like to propose a topic for this (or future) hackathons, please complete this form.


After a brief organizational session, teams will spend three days addressing a challenging set of scientific problems related to a group of datasets. Participants will analyze and combine datasets in order to work on these problems. We will be writing code and solving problems.

Throughout the three days will breakout to discuss progress on each of the topics, bioinformatics best practices, coding styles, etc.


Datasets will come from public repositories, with a focus on the sequence read archive, or will be supplied by the project lead. During the hackathon, participants will have an opportunity to include other datasets and tools for analysis. Please note, if you use your own data during the hackathon, we ask that you submit it to a public database within six months of the end of the event.


All pipelines and other scripts, software and programs generated in this hackathon will be added to a public GitHub repository designed for that purpose (

Manuscripts describing the design and usage of the software tools constructed by each team may be submitted to an appropriate journal such as the F1000Research hackathons channel.

San Diego State University

Founded in 1897, San Diego State University is a public institution of higher education located in Southern California. SDSU is the oldest and largest university in San Diego and the third largest in the state.

SDSU is conveniently located eight stops along the same trolley line (the green line) as the Town and Country Hotel, home of the 2018 PAG


To apply, complete this form (approximately 10 minutes to complete). Applications are due Monday, December 11th, 2017 by 3 pm PT. Participants will be selected based on the experience and motivation they provide on the form.

Prior participants and applicants are especially encouraged to apply. The first round of accepted applicants will be notified on December 13 by 3 pm PT, and have until December 15 at noon PT to confirm their participation. If you confirm, please make sure it is highly likely you can attend, as confirming and not attending prevents other data scientists from attending this event. Please include a monitored email address, in case there are follow-up questions.

Note: Participants will need to bring their own laptop to this program. A working knowledge of scripting (e.g., Shell, Python, R) is necessary to be successful in this event. Employment of higher level scripting or programming languages may also be useful.

Applicants must be willing to commit to all three days of the event.

No financial support for travel, lodging or meals is available for this event. Also note that the hackathon may extend into the evening hours each day. Please make any necessary arrangements to accommodate this possibility. Depending on the number of people that need accommodation, Rob will attempt to get a group rate at one of the local hotels. Please indicate on the registration form if you need a hotel room.

There will be no registration fee or cost associated with attending this event.

For more information, or with any questions, please contact Ben Busby ( ) and Rob Edwards ( with any questions.