Skip to main content

Section 6.1 Analysis of Variance

Testing Multiple Means.

In chapter five, we saw how a hypothesis test could be used to decide if there is evidence that two populations have the same mean. What do we do, however, if we want to look at more than two populations? Consider the following example.

A tomato farmer wishes to determine if there is any difference in the heights of tomato plants grown with three different types of fertilizer. To test this, he grows four tomato plants each using fertilizers \(A\text{,}\) \(B\text{,}\) and \(C\text{.}\) He finds the following data. How can this data be used to test the farmer's claim?

Fertilizer A: 32, 29, 34, 31
Fertilizer B: 30, 27, 33, 32
Fertilizer C: 28, 31, 29, 30
Table 6.1.2. Tomato Plant Heights
Solution

We could test this using three hypothesis tests for the mean heights observed using fertilizers A, B, and C.

  1. First Test.
    \begin{align*} H_0\amp:\ \mu_A = \mu_B\\ H_A\amp:\ \mu_A \not= \mu_B \end{align*}
  2. Second Test.
    \begin{align*} H_0\amp:\ \mu_A = \mu_C\\ H_A\amp:\ \mu_A \not= \mu_C \end{align*}
  3. Third Test.
    \begin{align*} H_0\amp:\ \mu_B = \mu_C\\ H_A\amp:\ \mu_B \not= \mu_C \end{align*}

If in each of the three tests we fail to reject the null hypothesis, then there is no evidence that the fertilizers correspond to different heights.

There is, however, a problem. In each of these hypothesis tests, there is a chance that we make the wrong decision. For example, suppose we used the \(\alpha = 0.05\) significance level on each test. The complement rule tells us that there is a \(1 - \alpha = 0.95\) probability that we will correctly identify equal means. But if we do this three times, using the multiplication rule, the probability that we correctly identify all three means as equal is only

\begin{equation*} (0.95)(0.95)(.95) = 0.8574\text{.} \end{equation*}

The more means we add to this mix (suppose there were, for example, four fertilizers) the worse our chances become. The above is an example of how conducting multiple tests increases the probability that we make an error. To remedy this, we will introduce a way to check if multiple means are equal using a single test. This process is called an Analysis of Variance test, or ANOVA test for short.

Definition 6.1.3.

An ANOVA Test, short for Analysis of Variance test, is used to test hypotheses concerning three or more population means.

In this section, we will see how to set up and conduct this type of test.

Subsection 6.1.1 ANOVA Hypotheses

Because the Analysis of Variance test is a single hypothesis test, it uses a single null and alternative hypothesis involving the means of our populations. In an ANOVA test, the various populations are often referred to as the treatment groups. This is because the test was originally designed to test for differences between groups in an experiment that received differing treatments. The null hypothesis for an ANOVA test is that the different treatments all have the same mean. The alternative, is that at least one of those means is different. This is summarized below.

Let's see what this would look like in the example from the first page of this lesson.

A tomato farmer wishes to determine if there is any difference in the heights of tomato plants grown with three different types of fertilizer. To test this, he grows four tomato plants each using fertilizers A, B, and C. He finds the following data. What hypotheses should be used to test the farmer's claim?

Fertilizer A: 32, 29, 34, 31
Fertilizer B: 30, 27, 33, 32
Fertilizer C: 28, 31, 29, 30
Table 6.1.6. Tomato Plant Heights
Solution

To conduct an ANOVA test of the farmer's claim, the null hypothesis is that the average height found in all three groups of tomatos are the same. These three groups represent the three treatments—the different types of fertilizer. Thus, the null and alternative hypotheses are:

\begin{align*} H_0\amp:\ \mu_A = \mu_B = \mu_C\\ H_A\amp:\ \text{at least one group of plants has a different mean height} \end{align*}

Notice that the null hypothesis says that all three means are equal. The opposite of that is not \(\mu_A \not= \mu_B \not=\mu_C\text{,}\) which says none of the means are equal. If the mean height for fertilizaer A and B were the same, but C was different, the null hypothesis would still be false. So we usually just state in words that “at least one of the means is different” instead of trying to write out our alternative hypothesis symbolically.

Figure 6.1.7. ANOVA Hypotheses I
Figure 6.1.8. ANOVA Hypotheses II

Suppose that the ANOVA null hypothesis for a certain test is:

\begin{equation*} H_0:\ \mu_1 = \mu_2 = \mu_3 = \mu_4 \end{equation*}

Question: what is the alternative to this null hypothesis?

Answer

\(H_A:\ \text{At least one of the means is not equal}\)

A consumer watchdog group wishes to determine if three toothpaste brands have the same whitening power. To test this, they collect samples from all three types of toothpastes and record their whitening power. They plan to conduct an ANOVA test using this data.

Question: if \(\mu_1\text{,}\) \(\mu_2\text{,}\) and \(\mu_3\) are the average whitening powers of the three toothpaste brands, what should the null hypothesis be for this ANOVA test?

Answer

\(H_0:\ \mu_1 = \mu_2 = \mu_3\)

Subsection 6.1.2 ANOVA Tables

Computing the test statistic for an ANOVA test can, unfortunately, be somewhat involved. The basic idea is that we are going to compare the variation within treatment groups to the variation between treatment groups. This will allow us to determine how much of the overall variation in our data is due to the randomness of our samples, and how much is due to an actual difference between treatment means. While the mathematics of how these variations are computed is beyond the scope of this text, in the example below, we examine a graphical representation of these variations to help us understand what is being computed.

Use a graph to illustrate and compare the variation in heights within each group of tomato plants to that between the groups of tomato plants seen in Example 6.1.5. The heights are given in the table below.

Fertilizer A: 32, 29, 34, 31
Fertilizer B: 30, 27, 33, 32
Fertilizer C: 28, 31, 29, 30
Table 6.1.12. Tomato Plant Heights
Solution

In the figure below, each tomato plant height is represented by a black shape. They are grouped by the type of fertilizer used.

Figure 6.1.13. Illustration of Tomato Plant Heights

As we try to understand the variation in our data, observe the following from this illustration.

  • Each of the three groups overlaps—that is, the highest value in each group is larger than the lowest value in either of the other groups and the lowest value in each group is lower than the highest value in each of the others.

  • The overall mean of all twelve data points is relatively close to the individual means of each group.

  • If we measure the horizontal variation within each group and compare that to the variation between the means of the groups, we see that they are very similar—one is not clearly larger than the other.

Creating a graph like this any time we wish to conduct an ANOVA test would be involved and would require a lot of “guess-work” when analyzing it. Instead, we use the mathematical computations mentioned earlier to create what is called an ANOVA table. While we won't go into the details of how each entry in the table is computed, it is important to understand what the entries represent and how they are related to each other.

Definition 6.1.14.

A table for a one-way analysis of variance test of \(k\) different treatment groups containing a total of \(n\) values, called an ANOVA Table, has the following entries.

Source df Sum Squares Mean Squares \(F_\text{test}\)
Treatments \(k - 1\) SST MST MST/MSE
Error \(n - k\) SSE MSE
Total \(n - 1\) TSS
Table 6.1.15. General ANOVA Table

The components of the table are:

  • Treatment degrees of freedom (df).

    The number of treatments, which we call \(k\text{,}\) minus one.

  • Error degrees of freedom (df).

    The total number of values, \(n\text{,}\) minus the number of treatments, \(k\text{.}\)

  • Total degrees of freedom.

    Sum of the treatment and error degrees of freedom: \((k-1) + (n - k) = n - 1\text{.}\)

  • Sum of the Squares of the Treatments (SST).

    This measures the variation between the treatment groups.

  • Sum of the Squares of the Error (SSE).

    This measures the variation within the treatment groups, due to random sampling “error”.

  • Total Sum of the Squares (TSS).

    Again, the total is the sum of treatment and error values: TSS = SST + SSE.

  • Mean Squares of the Treatments (MST).

    This is SST divided by the treatments degrees of freedom, \(k-1\)

  • Mean Squares of the Error (MSE).

    This is SSE divided by the errors degree of freedom, \(n-k\text{.}\)

  • Test Statistic (\(F_\text{test}\)).

    This is the test statistic which is the ratio of the average variation due to differences between treatments to the average variation due to sampling error (MST / MSE).

Just because we will not not learn how to compute these values doesn't mean we can't build ANOVA tables. Software packages that do statistics, such as Excel, can create ANOVA tables for us. Our task will be to identify and interpret the values contained in such tables.

A tomato farmer wishes to determine if there is any difference in the heights of tomato plants grown with three different types of fertilizer. To test this, he grows four tomato plants each using fertilizers A, B, and C. He finds the following data.

Fertilizer A: 32, 29, 34, 31
Fertilizer B: 30, 27, 33, 32
Fertilizer C: 28, 31, 29, 30
Table 6.1.17. Tomato Plant Heights

A computer program indicates that the sum of the squares of the treatments is SST = 8 and the total sum of the squares is TSS = 47. Use this information to construct an ANOVA table for the farmer' data.

Solution

We will analyze the data and fill in the ANOVA table one column at a time.

  • Degrees of Freedom.

    In our height data we see a total of twelve measurements, so \(n=12\text{.}\) We have three different treatment groups (the three fertilizers), so \(k = 3\text{.}\) This gives us the following degrees of freedom:

    • Treatments: \(k - 1 = 3 - 1 = 2\)

    • Error: \(n - k = 12 - 3 = 9\)

    • Total: \(n - 1 = 12 - 1 = 11\)

    As a double-check, we verify that the sum of the degrees of freedom for the treatments and error is equal to the total degrees of freedom, and indeed, \(2 + 9 = 11\text{.}\)

  • Sum of the Squares.

    We are told that SST = 9 and TSS = 47. Since we know that SST + SSE = TSS, we can solve:

    \begin{equation*} 9 + \text{SSE} = 47 \Rightarrow \text{SSE} = 47 - 9 = 39 \end{equation*}
  • Mean Squares.

    We get the mean squares by dividing the sum of the squares by the degrees of freedom. In particular,

    • Treatment: \(\text{MST} = \text{SST} / \text{df} = 8 / 2 = 4\)

    • Error: \(\text{MSE} = \text{SSE} / \text{df} = 39 / 9 \approx 4.3333\)

    Note that there is no “total mean square” to compute.

  • \(F_\text{test}\) Statistic.

    Finally, we compute our test statistic by dividing the mean squares of the treatment by the mean squares of the error. This gives

    \begin{equation*} F_\text{test} = \text{MST} / \text{MSE} = 4 / 4.3333 \approx 0.923 \end{equation*}

This results in the following ANOVA table.

Source df Sum Squares Mean Squares \(F_\text{test}\)
Treatments \(2\) \(8\) \(4\) \(0.923\)
Error \(9\) \(39\) \(4.3333\)
Total \(11\) \(47\)
Table 6.1.18. ANOVA Table for Tomato Plant Heights

As we have stated, the \(F_\text{test}\) statistic measures how much of the variation within our samples can be attributed to difference between the treatments. The larger the value of this test statistic, the more variation there is between the populations, and the less likely it is that the means are all the same. In order to make decisions based on this test statistic, we will need to know what distribution it comes from.

Figure 6.1.19. ANOVA Tables I
Figure 6.1.20. ANOVA Tables II

Consider the following ANOVA table.

Source df Sum Squares Mean Squares \(F_\text{test}\)
Treatments \(5\) \(4.562\) \(0.912\) \(2.980\)
Error \(12\) \(3.667\) \(0.306\)
Total \(17\) \(8.229\)
Table 6.1.22. ANOVA Table for Tomato Plant Heights

Question: how many individuals were sampled in total? That is, what is \(n\text{?}\)

Answer

18

Consider the following ANOVA table.

Source df Sum Squares Mean Squares \(F_\text{test}\)
Treatments \(5\) \(4.562\) \(0.912\) \(2.980\)
Error \(12\) \(3.667\) \(0.306\)
Total \(17\) \(8.229\)
Table 6.1.24. ANOVA Table for Tomato Plant Heights

Question: how many populations are involved? In other words, what is \(k\text{?}\)

Subsection 6.1.3 The F Distribution

The critical value in an ANOVA test is called \(F_\text{test}\) because it comes from the F-distribution. The F-distribution is a continuous probability distribution much like the normal and t-distributions with which we've already worked. Like the t-distribution, the F-distribution is actually a family of distributions depending on a pair of degrees of freedom—one for the numerator and one for the denominator. There are, however, several properties of the F-distribution which are different.

Definition 6.1.25.

The f-distribution is a family of probability distributions depending on two degrees of freedom—one for the numerator and one for the denominator. The f-distribution has the following properties:

  • values in the f-distribution are non-negative (zero or greater)

  • the F-distribution curve is skewed to the right

  • The larger the two degrees of freedom become, the more mound-shaped the f-distribution becomes

The picture below shows the f-distribution with several different pairs of degrees of freedom.

Figure 6.1.26. Various f-Distributions

We could use a table of critical values to look up probabilities for the f-distribution, or to look up critical values for a given probability in the tail. However, because we will only use this distribution in this one section, and will rely on technology for constructing our ANOVA tables, it also makes sense to rely on technology to compute critical values and probabilities from the f-distribution.

Recall that the f-distribution has the following probability density curve.

Figure 6.1.28. f-Distribution Density Curve

Question: which of the following is not a property of the f-distribution?

  1. It has both positive and negative values.

  2. The probability under the curve is one.

  3. It is mound shaped.

  4. It is symmetric.

  5. It is skewed left.

  6. It is skewed right.

Answer

(a), (c), (d), and (e) are not properties fo the f-distribution

Recall that the f-distribution has the following probability density curve.

Figure 6.1.30. f-Distribution Density Curve

Question: which of the following is a property of the f-distribution?

  1. It has both positive and negative values.

  2. It has no negative values

  3. It is skewed right

  4. The area under the curve is 0.5

  5. It has two peaks

  6. It is symmetric

Answer

(b) and (c)

Subsection 6.1.4 ANOVA Test

Recall that a \(F_\text{test}\) statistic is larger when the variation due to differences between treatments is bigger than the variation due to sampling error. Because the null hypothesis in an ANOVA test is that the treatment groups all have the same mean, we will only reject this if we get a large test statistic. Therefore, all ANOVA tests are right-tailed tests.

Figure 6.1.31. f-Distribution Density Curve

Like with the normal and t-distributions, we will reject the null hypothesis when our test statistic is further into the tail than the critical value allows. In the picture above, if the critical value \(F_\text{test}\) was approximately 3 and the red region is the rejection region. As mentioned previously, critical values for ANOVA tests in this class will always be supplied or found using technology, so we will not need to learn how to read an f-distribution table.

A tomato farmer wishes to determine if there is any difference in the heights of tomato plants grown with three different types of fertilizer. To test this, he grows four tomato plants each using fertilizers A, B, and C. He finds the following data.

Fertilizer A: 32, 29, 34, 31
Fertilizer B: 30, 27, 33, 32
Fertilizer C: 28, 31, 29, 30
Table 6.1.33. Tomato Plant Heights

This data leads to the following ANOVA table.

Source df Sum Squares Mean Squares \(F_\text{test}\)
Treatments \(2\) \(8\) \(4\) \(0.923\)
Error \(9\) \(39\) \(4.3333\)
Total \(11\) \(47\)
Table 6.1.34. ANOVA Table for Tomato Plant Heights

If the critical value from the f-distribution with 2 and 9 degrees of freedom at the \(\alpha = 0.05\) significance level is \(F_{0.05} = 4.256\text{,}\) what conclusion should we make?

Solution

The test statistic is \(F_\text{test} = 0.923\) while the critical value is \(F_\alpha = 4.256\text{.}\) This situation is shown below.

Figure 6.1.35. Rejection Region in f-Distribution

To reject the null hypothesis we would need our test statistic to be further into the right tail than 4.256. But \(0.923 \lt 4.256\text{,}\) so we must fail to reject the null hypothesis. There is no evidence that these fertilizers produce tomato plants with different average heights.

Figure 6.1.36. ANOVA Test I
Figure 6.1.37. ANOVA Test II

Consider the following ANOVA table.

Source df Sum Squares Mean Squares \(F_\text{test}\)
Treatments \(5\) \(4.562\) \(0.912\) \(2.980\)
Error \(12\) \(3.667\) \(0.306\)
Total \(17\) \(8.229\)
Table 6.1.39. ANOVA Table

Question: if the critical value from the f-distribution with 5 and 12 degrees of freedom is \(F_{0.05} = 3.106\text{,}\) what decision do you make at the \(\alpha = 0.05\) significance level?

Answer

Fail to Reject the Null Hypothesis

Consider the following ANOVA table.

Source df Sum Squares Mean Squares \(F_\text{test}\)
Treatments \(5\) \(4.562\) \(0.912\) \(2.980\)
Error \(12\) \(3.667\) \(0.306\)
Total \(17\) \(8.229\)
Table 6.1.41. ANOVA Table

Question: if the critical value from the f-distribution with 5 and 12 degrees of freedom is \(F_{0.10} = 2.394\text{,}\) what decision do you make at the \(\alpha = 0.10\) significance level?

Answer

Reject the Null Hypothesis