xaktly | Probability & Statistics

$\chi^2$ analysis

Comparing distributions


Often we have the results of some statistics or probability experiment and we want to compare the overall distribution of those results to some expected distribution. Here's a classic and easy example that we'll use to develop the concept.

Let's say that there's a candy company that claims that it places a roughly-equal number of red, green, purple, blue and yellow candies in each bag, regardless of the size of the bag. So for a bag with 100 pieces of candy, given that there are five varieties, we'd expect about 20 of each. Of course there will be some random errors. We can imagine that at the candy factory, equal numbers of each color/flavor are made, then mixed, and that the mixture is used to fill bags. Thus we would expect some randomness in the number of flavors per bag.

Our sample

Now let's imagine we purchase a bag of candies to check out the company's claims. We count the pieces and obtain these results:

Color Count
Red 10
Green 13
Purple 9
Blue 12
Yellow 6
Total 50

Now if we'd like to learn about how much this distribution differs from the stated distribution, we'll need to add a column to our table, the expected distribution. Here's the complete table:

Color Observed Expected
Red 10 10
Green 13 10
Purple 9 10
Blue 12 10
Yellow 6 10
Total 50 50

Graphically, here's the comparison we're trying to evaluate:

The black bars represent our sample, and the colored bars are the hypothetical distribution. Now we need a test statistic to evaluate the differences between these two distributions. Here we'll use the Greek letter $\chi$ to define the $\chi^2$ ("chi-squared") statistic:

$$\chi^2 = \sum \frac{(\text{observed} - \text{expected})^2}{\text{expected}}$$

In this case the observed values are our counts and the expected are as in the graph (all equal to 10). Here's the value of our new statistic:

$$ \begin{align} \chi^2 &= \frac{(10 - 10)^2}{10} + \frac{(13-10)^2}{10} \\[5pt] &+ \frac{(9-10)^2}{10} + \frac{(12-10)^2}{10} + \frac{(6-10)^2}{10} \\[5pt] &= \frac{1}{10} (0 + 3^2 + 1^2 + 2^2 + 4^2) \\[5pt] &= \frac{1}{10} (30) = 3.0 \end{align}$$

Now here's a picture of the $\chi^2$ distribution, a function of just one parameter, the number of degrees of freedom of the problem, which in this case is one less than the number of categories (5): $5 - 1 = 4.$



Our $\chi^2$ value is $3.00$. The area to the right represents the probability of evidence as good as we have against the hypothesis that all colors are evenly distributed — or better. That's a P value. We can calculate that area using tables or using the $\chi^2-GOF$ test on the TI-84 calculator. The "GOF" stands for "goodness of fit." Here's the command line:

χ2cdf(3.0, 1E99, 4) = 0.5578

So our P value is relatively large by any standard, thus we conclude that we lack evidence to reject the hypothesis that the colors are evenly distributed, and further conclude that the distribution in our bag of candy could easily have occured by random chance.

Now we'll need to pick this example apart a little to clarify some of the things we did. Then we'll expand the idea of the $\chi^2$ test statistic to compare several distributions at once to see if there are significant differences between them.

$\chi^2$ distributions


The $\chi^2$ distribution is the probability distribution that goes with the $\chi^2$ statistic,

$$\chi^2 = \sum \frac{(\text{observed} - \text{expected})^2}{\text{expected}} \tag{1}$$

The only flexible parameter in such a distribution is the number of terms (separated by $+$ signs), which corresponds to the number of categories of data. The $\chi^2$ distribution is a categorical distribution, which means that we're only comparing numbers of individuals per category like "red," "red," and so on.

The number of degrees of freedom per $\chi^2$ distribution is the number of categories $(n)$ minus 1: $\text{df} = n - 1$. Here are some representative $\chi^2$ distriubtions:

Notice that as the number of degrees of freedom increases, the $\chi^2$ distributions look more like normal distributions. One difference is that there can be no negative values in the $\chi^2$ distribution [see equation (1) above.] Another way to think of it is that we can't have negative counts in categorical data.


Automation


When the number of categories gets large, you might imagine that calculating $\chi^2$ gets a little time-consuming. Fortunately, there are technological solutions that make it easy. One could use statistical programming packages like Minitab or R, but we'll focus on the TI-84 calculator here.

When we compare two distributions, our null hypothesis is generally something like, "$H_o$ The distribution is the one expected." Our alternative hypothesis is then something like "$H_a$ The distribution is different from the one we expected." These hypotheses are generally written as text like I just did, but you are also free to use mathematical symbols if that makes more sense to you in the context of the problem.

The appropriate test on the TI-84 for problems like this (like our first example) is the $\chi^2$GOF test, accessed through the STATS → TESTS menu. GOF stands for "goodness of fit."


Example 1

Let's say that a pair of dice is rolled 100 times, yielding this set of outcomes (sum of the numbers showing on the two dice):

Sum 2 3 4 5 6 7 8 9 10 11 12
Number 4 4 10 9 14 17 14 10 8 8 2

Is this distribution evidence that these dice might not be fair dice, that the distribution is so far from the expected distribution that we expect some sort of cheating?


Solution: First we need to use the expected distribution of two-dice sums to find the expected distribution of results for 100 rolls. Here's the table of expected outcomes.

Sum Probability Expected/100
2 $\frac{1}{36}$ 2.78
3 $\frac{2}{36}$ 5.56
4 $\frac{3}{36}$ 8.33
5 $\frac{4}{36}$ 11.11
6 $\frac{5}{36}$ 13.89
7 $\frac{6}{36}$ 16.67
8 $\frac{5}{36}$ 13.89
9 $\frac{4}{36}$ 11.11
10 $\frac{3}{36}$ 8.33
11 $\frac{2}{36}$ 5.56
12 $\frac{1}{36}$ 2.78

Now we enter the observed and expected distributions into two lists, L1 and L2. Here's what that looks like:

Then we'll run the $\chi^2$GOF test, found in STAT → TESTS.

Enter the list names for the observed and expected distributions, and the number of degrees of freedom. In this case $df = 11 - 1 = 10$.

Hitting "CALCULATE" gives these results.

The P value is large by any standard, so we conclude that the observed distribution is not evidence that there is something amiss with these dice. This distribution could have arisen just due to random chance.

Here's the graph that goes with this calculation. It's easy to see that most of the area of this distribution is covered by our P value, so that makes sense.

Just for drill, let's put in another observed distribution, this time with nearly equal instances of each category (sum of dice), $S = \{10, 10, 11, 7, 7, 9, 9, 11, 12, 6, 8\}$. If we re-perform our test with these observations, we get $P = 2.4 \times 10^{-6}$, which is small enough to reject the hypothesis that the distribution is the expected one. In this case, we'd suspect that there's something amiss with these dice.


Example 2


In simple genetics experiments, we expect the results of cross-breeding to give a set of expected results. For example, if we perform a dihybrid cross (as Gregor Mendel did) on plants with dominant (capital letters) and recessive (lower case) genes of pea plants, where

  • Y = yellow peas
  • y = green peas
  • R = round, smooth peas
  • r = wrinkled peas

then we expect a $9:3:3:1$ ratio of round-yellow, round-green, wrinkled-yellow and wrinkled-green peas, respectively. Imagine that we perform such a dihybrid cross experiment and obtain the following results:

Observation Observed Exp. ratio Expected
Round, yellow 315 $\frac{9}{16}$ 312.8
Round, green 108 $\frac{3}{16}$ 104.2
Wrinkled, yellow 101 $\frac{3}{16}$ 104.2
Wrinkled, green 32 $\frac{1}{16}$ 34.8
Total 556 $1$ 556

Solution: In the table above we already started our solution by using the expected ratios (9:3:3:1) to generate a set of expected gene combination out of the 556 total plants observed (far-right column).If we perform a $\chi^2$-GOF test using the TI-84, we get

  • $\chi^2 = 0.4776$

  • $P = 0.9238$

  • $\text{df} = 3$

That's a large P value by any measure, so we lack evidence to overturn the null hypothesis that the observed distribution holds for this data. We conclude that the fluctuations we observed could be easily due to random chance in our observation. That's the way of the universe.

Practice problems


  1. According to PPG, one of the largest suppliers of colors and painting equipment for cars, in 2020 the popularity of car colors in the U.S. had this distribution:

    Color Wh Bk Bl Gr Br Red Sil Gray
    Pct. 34 18 9 2 5 8 12 12

    Wh = white, Bk = black, Bl = blue, Gr = green, Br = brown/natural, Sil = silver

    A random survey of the colors of 553 cars was performed in the same year in a certain city yielding this distribution:

    Color Wh Bk Bl Gr Br Red Sil Gray
    Count 180 104 25 37 84 22 17 84

    Is there any evidence to suggest that the car-color distribution in this city is significantly different from the national average?

    Solution

    First we need to use the national color percentages in the first table to prepare the expected array of colors by multiplying each percent by the number of cars in our sample, $n = 556$. Here are the two lists we'll put in our $\chi^2$GOF analysis:

    Color Wh Bk Bl Gr Br Red Sil Gray
    Obs 180 104 25 37 84 22 17 84
    Exp 189 100 50 11 28 44 67 67

    The $\chi^2$ goodness-of-fit test yeilds $\chi^2 = 239.17$ and $P = 5.5 \times 10^{-48}$, a very small P value, so we conclude that the cars in this sample do not follow the expected distribution.


  2. Four identical six-sided dice are rolled 100 times. Each time all six are rolled, the number of dice that come up with an even number is recorded. Here are the results of one trial:

    n Evens Observed
    0 2
    1 10
    2 18
    3 35
    4 30
    5 4
    6 1

    Use a $\chi^2$ goodness of fit test to determine whether these dice are producing an outcome outside of what would be expected for six fair dice.

    Solution

    In principle, this experiment should follow a binomial distribution: $X \rightarrow B(6, 0.5)$, with probabilities of the form $P(n, k) = \binom{n}{k} p^k (1-p)^{n-k}$, where $n = 6$ and $k$ is the number of evens showing on the dice.

    Here's our table of observed outcomes with the expected values for 100 rolls:

    n Evens Observed Prob. Expected
    0 2 0.0156 2
    1 10 0.0938 9
    2 18 0.2344 23
    3 35 0.3125 31
    4 30 0.2344 23
    5 4 0.0938 9
    6 1 0.0156 2

    If we make a calculator list out of the observed and expected columns and run a $\chi^2$-GOF test on it, the results are

    • $\chi^2 = 7.1224$

    • $P = 0.3097$

    • $\text{df} = 6$

    Our null hypothesis is that the distribution of evens is as we calculated. This P value is relatively large compared to any significance ($\alpha$) level we'd choose, so we fail to reject that hypothesis and conclude that the observed distribution could have occurred by random chance. There's nothing wrong with these six dice.


  3. A high school has reason to believe that student absences are higher for math classes than the average attendance in school. They select 100 students at random and search last years' attendance records for their math-class attendance.

    Absences
    per term
    Test days Expected
    0-2 35 50
    3-5 40 30
    6-8 20 12
    9-11 1 6
    >12 4 2
    Totals 100 100

    The results are shown in the table. Determine whether the number of absences in math classes is significantly different from the general attendance of students at this school.


    Notice that these data have been put into categorical form by binning the results into categories, 0-2, 3-5, and so on. The $\chi^2$ test can only determine differences between categorical distributions.

    Solution

    The total number of students in each of the observed and expected columns are the same, so we don't need to do any calculations to scale the expected numbers. We can simply do the $\chi^2$-GOF test on a calculator,

    χ2-GOF(L1, L2, 4)

    The results are

    • $\chi^2 = 19.33$

    • $P = 0.000675$

    • $\text{df} = 4$


    This is a small P value. In the case, for example, where we perform this test at an α level of $\alpha = 0.05$, the P value is small by comparison, so we have ample evidence to reject the hypothesis that the two distributions are equal to within random-chance differences and conclude that there is something different about math class attendance.

Comparing multiple distributions


Often we're interested in comparing more than one distribution to see if there is significant evidence that the are or ar not the same, to within the fluctuations we'd expect from random-chance. We do this in a similar way to the $\chi^2$ goodness-of-fit test above, just over a whole table of observed data.

As an example, let's say our client is wondering whether gender is a factor in political party preference in their area. We might make a survey of a random sample of 500 voters, asking about gender and political preference, and come up with these results:

Observed values
  Democrat Republican Independent Total
Male 92 118 40 250
Female 114 91 45 250
Total 206 209 85 500

Now if we expect the party affiliations to be unaffected by gender, we'd expect each kind if comparison between two parties,

  • Democrat ↔ Republican
  • Democrat ↔ Independent
  • Republican ↔ Independent

to have the same distribution between males and females. We can set up a table of expected values based on this equality expectation and on the numbers of individuals in each category (Demoncrat, Republican, Independent) whom we surveyed. That table will look like this:

Expected values
  Democrat Republican Independent Total
Male 103.0 104.5 42.5 250
Female 103.0 104.5 42.5 250
Total 206 209 85 500

Now we can calculate a $\chi^2$ statistic that includes each of the six independent observations in our table. Wet's do it by hand here just once before we turn to automation.

$$ \begin{align} \chi^2 &= \frac{(92-103)^2}{103.0} + \frac{(118-104.5)^2}{104.5} \\[5pt] &+ \frac{(40-42.5)^2}{42.5} + \frac{(114-103)^2}{103.0} \\[5pt] &+ \frac{(91-104.5)^2}{104.5} + \frac{(45-42.5)^2}{42.5} \end{align}$$

$$ \begin{align} \chi^2 &= 1.1748 + 1.7440 + 0.1470 \\[5pt] &+ 1.1748 + 1.7440 + 0.1470 \\[5pt] &= 6.1316 \end{align}$$

Now we need to find the P value for this $\chi^2$ value in a distribution with the correct number of degrees of freeedom. In this case that number is $df = (3-1)\times(2-1) = 2$. That is, the number of degrees of freedom is the product of each of the numbers of categories — in this case, political parties and genders — each decreased by 1. Here is the $\chi^2$ distribution with $\chi^2 = 6.1316$ indicated.



The P value indicated in the graph can be found using the $\chi^2$cdf function on the calculator like this:

χ2cdf(6.1316, 1E99, 2) = 0.0466

In that command line we've used $1E99$ or $1 \times 10^{99}$ instead of infinity because it's the largest number the calculator can store. Actually, just about any sufficiently-large number will work. Try it yourself. Using 1000 for the endpoint is just fine because the height of the function at $x = 1000$ is so vanishingly small.

Because this P-value is relatively small, $P = 0.047 \lt 0.05$, for example, we have evidence to conclude that distribution among the political persuasions is not independent of gender. We ought to do more work in this area if we're to uncover any trends, though.


Automation (TI-84)


The test we did above can be automated in many ways, including through statistical software packages and using spreadsheets. Here we'll run through performing these tests using the TI-84 calculator's built-in program $\chi^2$test. This only requires the core data, just the distributions among categories. For example, the input from our last example would just be the $2\times 3$ table

92 118 40
114 91 45

The totals at the end of each column and row and the overall total number of individuals can easily be calculated by the appropriate program. That means the table of expected values can also be calculated and the $\chi^2$ test can be run. In the case of the TI-84 $\chi^2$ test, we just need a way to input that table of data. In the case of the TI-84 program, we do that using a matrix.

Let's run through it for this example with calculator screen shots. First we'll need to enter the core data into a matrix. The matrix functions on the TI-84 are located using the [2ND]→[x^-1] buttons. Arrow to the right to select EDIT and edit matrix [A]. You can edit up to 10 matrices on the TI-84.

First change the dimensions of the matrix (upper right) to 2 × 3 — 2 rows and 3 columns. The matrix will readjust and allow you to type in values across, then down. Fill in the matrix as I've done here.

Now you can run the $\chi^2$-test program. It needs an input matrix, which you've prepared as matrix [A]. It also needs an output matrix to hold the expected values, just in case you want to take a look at those. Choose any matrix you're not using and the program will reset the dimensions as needed; they'll match the dimensions of the input matrix, of course.



Running the test will give you the output we've already calculated the long way:



... but it's a lot easier. Now you can use this program to determine distribution differences for a wide variety of problems in statistics.

Note: Just because we enter the data for this test as a matrix, doesn't mean we're using matrix methods to solve the problem. Matrices are extremely useful for solving all kinds of problems in mathematics, but this isn't one of them. The fact that you enter your data into a matrix here is just an artifact of the way this program was written.


Example 3

A hospital chain offered five influenza vaccine types to its employees and kept (voluntary) records of who recieved which vaccine and whether they contracted influenza. The results are summarized in the table below. Perform a statistical analysis to determine whether there are any significant differences between the vaccines in terms of what proportion of people got or did not get the flu.

Influenza incidence
Vaccine
type
Got
influenza
Avoided
influenza

Total
A 43 237 280
B 52 198 250
C 25 245 270
D 48 212 260
E 57 233 290
Total 225 1125 1350

Solution: The data from 43 ... 233 were entered into a 5×2 matrix and that was used as input for a $\chi^2$-test on a TI-84 calculator. The results were

  • $\chi^2 = 16.555$

  • $P = 0.00236$

  • $\text{df} = 4$

This is a small P value by any measure, so we conclude that there are significant differences between the outcomes produced by these 5 vaccines. One would want to do more work, perhaps focusing in on vaccine C, which had a significantly lower proportional effectiveness than the other four.



Practice problems


  1. A survey of 9th grade students from rural, suburban and urban schools asked whether students valued getting good grades, being popular or being able to play sports in school. The results are in the table below.


    Goal
    Rural
    School
    Suburban
    School
    Urban
    School

    Total
    Grades 57 87 24 168
    Popularity 50 42 6 98
    Sports 42 22 5 69
    Solution

    The matrix of expected values (output from the TI-84 $\chi^2$-test program) is


    Goal
    Rural
    School
    Suburban
    School
    Urban
    School
    Grades 74.7 75.7 17.6
    Popularity 43.6 44.2 10.2
    Sports 30.7 31.1 7.2

    The results are

    • $\chi^2 = 18.56$
    • $P = 0.00096$
    • $\text{df} = 4$

    This is a small P value, so we conclude that these distributions are separated by more than random chance; they are different. Below is a set of graphs to bring the difference home. There is clearly a difference in the educational priorities between these groups. Note that the y-axes are of different scales, but the proportions of the bars are still meaningful.


X

The Greek alphabet

alphaΑα
betaΒβ
gammaΓγ
deltaΔδ
epsilonΕε
zetaΖζ
etaΗη
thetaΘθ
iotaΙι
kappaΚκ
lambdaΛλ
muΜμ
nuΝν
xiΞξ
omicronΟο
piΠπ
rhoΡρ
sigmaΣσ
tauΤτ
upsilonΥυ
phiΦφ
chiΧχ
psiΨψ
omegaΩω
X

artifact

In math and science, an artifact is something that happens as the result of applying a certain method to a problem. It is usually expected and easily corrected for if needed. For example, in some X-ray scans, parts of the image might look like a tumor, but qualified interpreters might recognize the object as having resulted from some well-known phenomenon having to do with the imaging technique.

Creative Commons License   optimized for firefox
xaktly.com by Dr. Jeff Cruzan is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. © 2012-2024, Jeff Cruzan. All text and images on this website not specifically attributed to another source were created by me and I reserve all rights as to their use. Any opinions expressed on this website are entirely mine, and do not necessarily reflect the views of any of my employers. Please feel free to send any questions or comments to jeff.cruzan@verizon.net.