Category Archives: PATRIC

Download a genome and remove the ribosomal RNA operon

For our search SRA engine, we want to remove the ribosomal RNA operon (not just the 16S gene, the whole opeon) before we run the search, otherwise all our hits are to the rRNA genes!

Here’s who you can use PATRIC to download a genome and remove the 16S region. For the example, we’re going to use a Faecalibacterium prausnitzii genome, because, well why not!

First, we download the genome and convert the GTO to fasta

p3-gto 657322.3
rast-export-genome -i 657322.3.gto contig_fasta > 657322.3.fna

Next, we use a couple of helper scripts from the EdwardsLab Git Repo. We start by converting the gto to a tab separated file with features and their locations

python3.7 ~/EdwardsLab/patric/parse_gto.py -f 657322.3.gto -p > 657322.3.tab

Then we can grep through that file for the ribosomal genes:

grep rna 657322.3.tab | grep Subunit

We only find two of the genes:

fig|657322.3.rna.5      Large Subunit Ribosomal RNA; lsuRNA; LSU rRNA   FP929046 586941 - 589785 (-)

fig|657322.3.rna.6      Small Subunit Ribosomal RNA; ssuRNA; SSU rRNA   FP929046 590567 - 591540 (-)

Now we can trim out the sequences and keep only the non-rRNA regions. Note that here I trim a little extra off the sequences, but you may not wish to do that

python3.7 ~/EdwardsLab/manipulate_genomes/trim_fasta.py -f 657322.3.fna -e 576941 -c FP929046 > FP929046.fna
python3.7 ~/EdwardsLab/manipulate_genomes/trim_fasta.py -f 657322.3.fna -b 601540 -c FP929046 >> FP929046.fna

We run this twice, which is suboptimal, but this is definitely not the most computationally challenging thing we will do with those sequences!