Just a quick note about installing CentOS stream 9 because I know I am going to have to do this again!
Continue readingMonthly Archives: February 2022
AlphaFold of all Phage Lambda Proteins
DeepMind’s AlphaFold is winning at predicting tertiary structures from primary amino acid sequences. We thought it would be fun to investigate how it performed on phage Lambda.
We took the NCBI version of λ and extracted all the proteins, and then ran them through AlphaFold. It was able to make a prediction for all the proteins except for three proteins: NP_040594.1 (144 amino acids), NP_040597.1 (232 amino acids), and NP_040645.1 (158 amino acids).
As you can see, many of the structures are just predicted to be long alpha helices with little order, but some of the structures are complex and closer representation to the predicted structures.
There are, of course, a heap of caveats to this analysis, including the fact that we did not (at this time) filter out any of the existing phage λ structures so one would hope that those are really good!
You can download all the best ranked structures for phage Lambda so you can view them in your favorite structure viewer
NCBI datasets and genome assembly data
Recently, NCBI released their new datasets API that might replace NCBI E-utils. At the moment, datasets is focused on genomes, genes, and viruses, but no doubt it will expand over time. [As an aside: I think the name is terrible, and they should use ncbi_datasets
(see this tweet)]
Here is a rough guide to extracting some data about genomes using datasets
.