Introduction
ReadMapper is based on R code that was written to analyze our inhouse data. The code has been modified to make it user friendly and allow other people, who are not as versed in the R language, to be able to map sequencing reads to a nucleotide backbone, and create publication quality figures. In our project we were mapping translatome data to the genome of the phage Lambda (this is what the included sample data is), as well as the currently annotated gene open reading frames. Which looks just like the figure below.
The idea was to visualize the direction of annotated genes and see their level of transcription. The figure above is for a single time point after induction, and shows which genes are being transcribed at that time point.
Mapping transcriptome data is not the only use for ReadMapper, it can also be used to visualize read coverage of whole genome shotgun sequencing, or any other projects that requires reads to be mapped to a nucleotide sequence.
Usage
ReadMapper can be obtained from the github repository: https://github.com/deprekate/ReadMapper
Since ReadMapper is an R script you will need to have R installed. It also has the dependency ggplot2, which you can install in R, using the command install.packages('ggplot2')
.
ReadMapper does not perform the alignment of the read data, since this step is a time intensive task, and usually run on a cluster in parallel. The input for ReadMapper is a single file containing two† columns: the start location and end location of each read on the nucleotide reference. The easiest way to create this data is to use the BLASTN command:
blastn -subject GENOME.FNA -query READS.FNA -outfmt '6 sstart send slen' -max_target_seqs 1 > READ_MAPPINGS.BLASTN
If your sequencing reads file is large, an alternative to BLASTN, would be optimized aligners, such as bowtie or bwa.
Converting your GENOME.FNA
file to a BLASTN database will also speed up the read alignment step. This can be accomplished by using the command:
makeblastdb -in GENOME.FNA -dbtype nucl
and then running the command:
blastn -db GENOME.FNA -query READS.FNA -outfmt '6 sstart send slen' -max_target_seqs 1 > READ_MAPPINGS.BLASTN
Optionally you can plot the ORFs on the figure, in their respective frames. To create these mappings use the command below:
blastn -subject GENOME.FNA -query ORFS.FNA -outfmt '6 sstart send slen' -max_target_seqs 1 > ORF_MAPPINGS.BLASTN
†ReadMapper currently requires the very first line to contain the backbone length in a third column