Calculating Chi-squared with perl

There are two Perl repositories available on CPAN that deal with Chi-squared analysis(`Statistics::ChiSquare` and `Statistics::Distributions)`.  However neither one outputs the Chi-squared value for the analysis of two binary populations.

We can use the formula below to calculate the Chi-squared value with one degree of freedom.

χ2 = [n(ad – bc)2] / [(a + b) (c + d) (a + c) (b + d)]

n = a + b + c + d

Where:

variable population 1 population 2
+ a b
c d

Example:
Suppose we wish to determine the relationship between disease in two species. Both disease and the species are binary variables, so the Chi-squared test is applied:

Diseased species 1 species 2
No 57 36
Yes 63 88

n = (57 + 36 + 63 + 88) = 244

χ2 = [244*(57*88 – 36*63)2] / [(57 + 36) (63 + 88) (57 + 63) (36 + 88)]

χ2 = 8.81

The critical Chi-squared distribution P-values at 1 degree of freedom are:

 D.F. 0.1 0.05 0.025 0.01 0.005 1 2.71 3.84 5.02 6.63 7.88

The χ2 value (8.82) is below the P-value 0.005.

Since the corresponding P-value is less than 0.05 (P<0.05), the data suggest that the prevalence of disease is significantly higher in species 2. Therefore we reject the null hypothesis.

Below is a Perl subroutine to automatically calculate Chi-squared.

``````sub chi_squared {
my (\$a,\$b,\$c,\$d) = @_;
return 0 if(\$b+\$d == 0);
my \$n= \$a + \$b + \$c + \$d;
return ((\$n*(\$a*\$d - \$b*\$c)**2) / ((\$a + \$b)*(\$c + \$d)*(\$a + \$c)*(\$b + \$d)));
}
print &chi_squared(57,36,63,88); ``````

Output:

``8.81780430153469``