If you are using utf-8 documents in Python, you may occasionally run into this error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 124106: ordinal not in range(128)

The fix is trivial!

If you are using utf-8 documents in Python, you may occasionally run into this error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc2 in position 124106: ordinal not in range(128)

The fix is trivial!

How can we generate a list of all the lengths of all the proteins [in a specific group] in genbank? Its easy with ftp!

CentOS is great because it is secure, but not great because it doesn’t have the latest software. Here is how to install C++11 on CentOS6 or CentOS7, and temporarily activate it in a shell. This does not change the default compiler and should cause less problems with your system (but that is not a money back guarantee … you are own your own if it does!)

When writing scientific names: italicize family, genus, species, and variety or subspecies. Begin family and genus with a capital letter. Kingdom, phylum, class, order, and suborder begin with a capital letter but are not italicized.

Here is the complete taxonomy:

- Domain
- Kingdom
- Phylum
- Class
- Order
- Family
- Genus
- Species

The 2015 SDSU Metagenomics Workshop is designed to be a combination of lectures, discussions, and practical hands on experience to bring people up to date on data analysis for metagenomics.

The workshop is being held in **Adams Humanities Room 2108** from 10 am – 6 pm every day from June 22nd – 26th, 2015.

Registration is closed.

The agenda is online here, and will be updated as we progress.

We will use a VirtualBox virtual machine during the class. More information about the image and how to download is here. (Please note, the image is still subject to change, and so don’t download it yet!)

We often have people ask us how to convert fastq files to fasta format. We have a variety of code on this website, but sometimes that is not easy enough.

Here are a couple of ways to do it on the command line: using a PERL script written by Bas, using the command line, or using prinseq-lite. Here is a C++ version that you can compile (e.g. with c++ -o fastq2fasta fastq2fasta.cpp) and run on your machine.

We also have a simple form that converts fastq files to fasta files (DNA only … it does not give you the quality scores).

We successfully completed a one-day training course for ~40 people on how to use anthill, and everyone is now an expert, right?

The latest version of the anthill training notes are now available at this link: AnthillTrainingNotes

Here we are testing the ROV in a pool.

There are two Perl repositories available on CPAN that deal with Chi-squared analysis(`Statistics::ChiSquare`

and `Statistics::Distributions)`

. However neither one outputs the Chi-squared value for the analysis of two binary populations.

We can use the formula below to calculate the Chi-squared value with one degree of freedom.

χ2 = [n(ad – bc)2] / [(a + b) (c + d) (a + c) (b + d)]

n = a + b + c + d

Where:

variable | population 1 | population 2 |
---|---|---|

+ | a | b |

– | c | d |

Example:

Suppose we wish to determine the relationship between disease in two species. Both disease and the species are binary variables, so the Chi-squared test is applied:

Diseased | species 1 | species 2 |
---|---|---|

No | 57 | 36 |

Yes | 63 | 88 |

n = (57 + 36 + 63 + 88) = 244

χ^{2} = [244*(57*88 – 36*63)^{2}] / [(57 + 36) (63 + 88) (57 + 63) (36 + 88)]

χ^{2} = 8.81

The critical Chi-squared distribution P-values at 1 degree of freedom are:

D.F. | 0.1 | 0.05 | 0.025 | 0.01 | 0.005 |
---|---|---|---|---|---|

1 | 2.71 | 3.84 | 5.02 | 6.63 | 7.88 |

The χ^{2} value (8.82) is below the P-value 0.005.

Since the corresponding P-value is less than 0.05 (P<0.05), the data suggest that the prevalence of disease is significantly higher in species 2. Therefore we reject the null hypothesis.

Below is a Perl subroutine to automatically calculate Chi-squared.

```
sub chi_squared {
my ($a,$b,$c,$d) = @_;
return 0 if($b+$d == 0);
my $n= $a + $b + $c + $d;
return (($n*($a*$d - $b*$c)**2) / (($a + $b)*($c + $d)*($a + $c)*($b + $d)));
}
print &chi_squared(57,36,63,88);
```

Output:

`8.81780430153469`