Skip to main content

Section 5.4 Hypothesis Tests for Differences Between Means and Proportions

Testing Claims about Differences.

Just as we constructed confidence intervals for the difference between two population means or proportions in Section 4.4, we can also conduct hypothesis tests for the difference between two means or proportions. The most common tests for differences are tests which seek to determine if two population parameters are equal to each other (so their difference is zero) or if one is greater than the other (so their difference is greater than or less than zero). Consider the following tests.

  • Early childhood education researchers wish to determine if babies whose parents spend time reading to them will have more success in school than babies who are not read to. To test this claim, they select a sample of 100 high school seniors who were read to as infants and 100 seniors who were not read to as infants. The mean G.P.A. for those who were read to was found to be 2.46 with a standard deviation of 0.77. The mean G.P.A. for the students who were not read to was found to be 2.33 with a standard deviation of 0.86.

  • An independent senator believes that she has equal support among members of both the Republican and Democrat parties. To test this belief, she commissions a study in which 340 Republicans and 418 Democrats are polled. 138 of the Republicans and 157 of the Democrats are found to support the senator.

In this section we will review how to state the null and alternative hypotheses for examples such as those above, present the test statistic formula for these differences, and finish by conducting both traditional and p-value tests.

Subsection 5.4.1 Formulating Hypotheses

When formulating hypotheses for the comparison of two means, we will rephrase these comparisons in terms of the difference. For example, if we claim that proportions from two populations are equal,

\begin{equation*} p_1 = p_2\text{,} \end{equation*}

we would state the null hypothesis as

\begin{equation*} p_1 - p_2 = 0\text{.} \end{equation*}

The possible null and alternative hypotheses for comparing two population proportions are listed below.

In each of these tests, the assumption from the null hypothesis is that \(p_1 = p_2\text{,}\) or in other words \(p_1 - p_2 = 0\text{.}\)

When dealing with means, however, we sometimes want a little more flexibility. Instead of saying that the mean of one population is larger than the mean of another, we may wish to say how much larger. For example, the statement “dogs live at least 5 years longer than cats.” can be written as

\begin{equation*} \mu_1 \gt \mu_2 + 5 \quad \text{ or }\quad \mu_1 - \mu_2 \gt 5\text{.} \end{equation*}

To get this added flexibility, we state our null and alternative hypothesis in terms of some difference \(d_0\text{,}\) which would have been 5 in this example. If we are testing a claim that two means are equal to each other, we set \(d_0 = 0\text{.}\) In most of our tests, we will use \(d_0 = 0\text{.}\)

Let's look at several examples involving these hypotheses.

Early childhood education researchers wish to determine if babies whose parents spend time reading to them will have more success in school than babies who are not read to. To test this claim, they select a sample of 100 high school seniors who were read to as infants and 100 seniors who were not read to as infants. The mean G.P.A. for those who were read to was found to be 2.46 with a standard deviation of 0.77. The mean G.P.A. for the students who were not read to was found to be 2.33 with a standard deviation of 0.86. Formulate hypotheses for this test.

Solution

This is a claim about two population means. The researchers believe that those in population 1, the students whose parents read to them, will have a higher mean G.P.A. The null hypothesis is that it makes no difference, so in other words the two means are equal. Therefore the difference is \(d_0 = 0\text{.}\) This gives the following hypotheses:

\begin{align*} H_0\amp:\ \mu_1 - \mu_2 \leq 0\\ H_A\amp:\ \mu_1 - \mu_2 > 0\text{.} \end{align*}

An independent senator believes that she has equal support among members of both the Republican and Democrat parties. To test this belief, she commissions a study in which 340 Republicans and 418 Democrats are polled. 138 of the Republicans and 157 of the Democrats are found to support the senator. Formulate hypotheses for this test.

Solution

This is a claim about two population proportions. The senator believes that the proportions of Republicans (\(p_R\)) and Democrats (\(p_D\)) who support her are equal. Thus, the hypotheses are:

\begin{align*} H_0\amp:\ p_R - p_D = 0\\ H_A\amp:\ p_R - p_D \not= 0\text{.} \end{align*}
Figure 5.4.5. Hypotheses for Differences I
Figure 5.4.6. Hypotheses for Differences II

A veterinarian believes that dogs and cats have, on average, the same number of offspring in each birth. To test this claim, she takes observes that in 96 cat pregnancies, the average number of offspring was 4.9 with a standard deviation of 1.26 offspring. In 85 dog pregnancies, the vet observed an average of 3.7 offspring with a standard deviation of 0.84 offspring.

Question: what null hypothesis should the vet use to test her claim?

Answer

\(\mu_1 = \mu_2\)

An IRS agent believes that tax fraud is more prevalent on income tax returns where the gross adjusted income is more than $200,000. He takes a sample of 400 returns with income of less than $200,000 and finds that 12 of them are fraudulent. He also takes a sample of 300 returns with more than $200,000 reported income and finds that 15 of them are fraudulent.

Question: if those making under $200,000 are population 1, what should your alternative hypothesis be in this test?

Answer

\(p_1 \lt p_2\)

Subsection 5.4.2 Test Statistic for a Difference Between Means

The test statistic for a difference between means measures how unusual the difference between our two sample means would be if the assumed difference from the null hypothesis is true. This measure of “unusualness” is again a z-score from the normal distribution. To find that z-score, we look at the difference between our observed difference and the assumed difference, and then divide that by the standard deviation for the difference between sample means. That mouth-full is represented symbolically below.

The following examples show this computation.

Early childhood education researchers wish to determine if babies whose parents spend time reading to them will have more success in school than babies who are not read to. To test this claim, they select a sample of 100 high school seniors who were read to as infants and 100 seniors who were not read to as infants. The mean G.P.A. for those who were read to was found to be 2.46 with a standard deviation of 0.77. The mean G.P.A. for the students who were not read to was found to be 2.33 with a standard deviation of 0.86. Find the test statistic for the difference between these sample means.

Solution

In Example 5.4.3, we formulated the following hypotheses.

\begin{align*} H_0\amp:\ \mu_1 - \mu_2 \leq 0\\ H_A\amp:\ \mu_1 - \mu_2 > 0 \end{align*}

From this null hypothesis, we assume \(d_0 = 0\) giving the following test statistic for these samples:

\begin{equation*} z_{\text{test}} = \frac{(2.46 - 2.33) - 0}{\sqrt{\frac{(0.77)^2}{100} + \frac{(0.86)^2}{100}}} \approx 1.13\text{.} \end{equation*}

A pet lover believes that dogs live on average at least 5 years longer than cats. To test this claim, he collects data on 63 randomly selected dogs, and 55 randomly selected cats. The average lifespan of the dogs is found to be 18.7 years, with a standard deviation of 3.1 years. The average lifespan for the sample of cats is 12.3 years with a standard deviation of 1.9 years. Find the test statistic for the difference between these sample means.

Solution

Because we are testing the claim that \(\mu_1\) is at least 5 more than \(\mu_2\text{,}\) our hypotheses will be:

\begin{align*} H_0\amp:\ \mu_1 - \mu_2 \leq 5\\ H_A\amp:\ \mu_1 - \mu_2 > 5 \end{align*}

Under this null hypothesis, the test statistic for the above samples is:

\begin{equation*} z_{\text{test}} = \frac{(18.7 - 12.3) - 5}{\sqrt{\frac{(3.1)^2}{63} + \frac{(1.9)^2}{55}}} \approx 3.00\text{.} \end{equation*}
Figure 5.4.12. Test Statistic for Differences Between Means I
Figure 5.4.13. Test Statistic for Differences Between Means II

A veterinarian believes that dogs and cats have, on average, the same number of offspring in each birth. To test this claim, she takes observes that in 96 cat pregnancies, the average number of offspring was 4.9 with a standard deviation of 1.26 offspring. In 85 dog pregnancies, the vet observed an average of 3.7 offspring with a standard deviation of 0.84 offspring. The cat population is designated as population number one.

Question: what is the test statistic? Round your answer to two decimal places.

Answer

7.61

A widget manufacturer uses two assembly lines to build widgets. The quality control engineer believes that the average weight of a widget made by the first assembly line is greater than the average weight of a widget made by the second assembly line. To test this theory he takes a sample of widgets from each assembly line and finds the following information.

Sample Size Mean Standard Dev.
Assembly Line #1 \(n_1 = 120\) 12.2 ounces 0.72 ounces
Assembly Line #2 \(n_2 = 120\) 11.9 ounces 0.81 ounces
Table 5.4.16. Widget Statistics

Question: what is the test statistic for this hypothesis test?

Answer

3.03

Subsection 5.4.3 Test Statistic for a Difference Between Proportions

When computing the test statistic for a difference between proportions, we again want to measure how unusual the observed difference is. However, because our null hypotheses will always use the assumption that the two proportions are equal, the test statistic formula is slightly simpler.

The null hypothesis asserts that \(p_1 = p_2\text{,}\) but doesn't tell us what that proportion of successes is. We must approximate \(p\) using the two samples that were drawn from these populations. While it is unlikely that these two sample proportions will equal \(p\) exactlyu, or even each other, by pooling them into a single proportion \(\hat p_{\text{pooled}}\text{,}\) we can get an estimate for the populations' proportion \(p\text{.}\)

Definition 5.4.18.

The pooled estimate for a proportion based on the sample proportions \(\hat p_1\) and \(\hat p_2\) is:

\begin{equation*} \hat p_{\text{pooled}} = \frac{n_1 \hat p_1 + n_2 \hat p_2}{n_1 + n_2}\text{.} \end{equation*}

If your sample data is reported in terms of number of successes instead of proportion of successes, then you should use \(x_1\) in place of \(n_1 \hat p_1\) in the above formula, and similarly for \(x_2\text{.}\) Let's see how this pooling works in the examples below.

An independent senator believes that she has equal support among members of both the Republican and Democrat parties. To test this belief, she commissions a study in which 340 Republicans and 418 Democrats are polled. 138 of the Republicans and 157 of the Democrats are found to support the senator. Find the test statistic for this hypothesis test.

Solution

We have already seen that the hypotheses are:

\begin{align*} H_0\amp:\ p_1 - p_2 = 0\\ H_A\amp:\ p_1 - p_2 \not= 0 \end{align*}

Under the assumption that \(p_1 = p_2 = p\) for some population proportion \(p\text{,}\) we must approximate \(p\) using a pooled estimate.

\begin{equation*} \hat p_{\text{pooled}} = \frac{138 + 157}{340+418} = \frac{295}{758} \approx 0.3892\text{.} \end{equation*}

Plugging this pooled estimate in for \(p\) in the test statistic formula above yields:

\begin{equation*} z_{\text{test}} = \frac{138/340 - 157/418}{\sqrt{\frac{(0.3892)(0.6108)}{340} + \frac{(0.3892)(0.6108)}{418}}} \approx 0.85\text{.} \end{equation*}

An educator believes that the proportion of females in the US who have completed college is greater than the proportion of males. To test this claim, a sample of 600 women is randomly selected and 227 of them are found to have completed college. A sample of 570 men is randomly selected and only 192 of them are found to have completed college. Find the test statistic for this hypothesis test.

Solution

The claim in this test is that \(p_W\text{,}\) the proportion of women who finish college, is bigger than \(p_M\text{,}\) the proportion of men who finish college. This leads to the following hypotheses.

\begin{align*} H_0\amp:\ p_W - p_M \leq 0\\ H_A\amp:\ p_W - p_M > 0 \end{align*}

If we assume that \(p_W = p_M = p\text{,}\) we must approximate \(p\) using a pooled proportion from the samples.

\begin{equation*} \hat p_{\text{pooled}} = \frac{227 + 192}{600+570} = \frac{419}{1170} \approx 0.3581\text{.} \end{equation*}

Using this in our test statistic formula yields the following test statistic.

\begin{equation*} z_{\text{test}} = \frac{ 227/600 - 192/570}{\sqrt{\frac{(0.3581)(0.6419)}{600} + \frac{(0.3581)(0.6419)}{570}}} \approx 1.48\text{.} \end{equation*}
Figure 5.4.21. Test Statistic for Differences Between Proportions I
Figure 5.4.22. Test Statistic for Differences Between Proportions II

An IRS agent believes that tax fraud is more prevalent on income tax returns where the gross adjusted income is more than $200,000. He takes a sample of 400 returns with income of less than $200,000 and finds that 12 of them are fraudulent. He also takes a sample of 300 returns with more than $200,000 reported income and finds that 15 of them are fraudulent. Suppose that tax returns for those making over $200,000 make up population one.

Question: what is the test statistic for this test?

Answer

1.36

A used car salesperson believes that a larger proportion of sports cars sold on her lot are red than the proportion of sedans that are red. To test this hypothesis, she collects the following samples.

Sample Size Number Red
Sports Cars \(n_1 = 73\) \(x_1 = 21\)
Sedans \(n_2 = 129\) \(x_2 = 33\)
Table 5.4.25. Used Car Statistics

Question: what is the test statistic for this situation?

Answer

0.49

Subsection 5.4.4 The Traditional Test

Conducting a hypothesis test for a difference between means or proportions requires a different set of hypotheses, and a different test statistic formula. However, once we have the test statistic, the rest of the hypothesis test works just as it did for single means or proportions. On this page, we will finish two of the previously seen examples using the traditional test method.

Early childhood education researchers wish to determine if babies whose parents spend time reading to them will have more success in school than babies who are not read to. To test this claim, they select a sample of 100 high school seniors who were read to as infants and 100 seniors who were not read to as infants. The mean G.P.A. for those who were read to was found to be 2.46 with a standard deviation of 0.77. The mean G.P.A. for the students who were not read to was found to be 2.33 with a standard deviation of 0.86. Conduct a traditional hypothesis test at the \(\alpha = 0.05\) significance level.

Solution

As seen before, the hypotheses are:

\begin{align*} H_0\amp:\ \mu_1 - \mu_2 \leq 0\\ H_A\amp:\ \mu_1 - \mu_2 > 0 \end{align*}

From this null hypothesis, we computed the test statistic:

\begin{equation*} z_{\text{test}} = \frac{(2.46 - 2.33) - 0}{\sqrt{\frac{(0.77)^2}{100} + \frac{(0.86)^2}{100}}} \approx 1.13\text{.} \end{equation*}

Because the alternative hypothesis involves “\(\gt\)”, this is a right-tailed test. Therefore, at the \(\alpha = 0.05\) significance level, our critical value is \(z_{0.05} = 1.645\) as shown below.

Figure 5.4.27. Critical Region for Example 5.4.26

Since the test statistic is not larger than the critical value, it is not in that right-tailed rejection region. We must therefore fail to reject the null hypothesis. There is no statistically significant evidence that G.P.A.s are higher for those seniors who were read to as infants.

An educator believes that the proportion of females in the US who have completed college is greater than the proportion of males. To test this claim, a sample of 600 women is randomly selected and 227 of them are found to have completed college. A sample of 570 men is randomly selected and only 192 of them are found to have completed college. Test this educator's claim using a traditional hypothesis test at the \(\alpha = 0.10\) significance level.

Solution

From previous work, we have hypotheses:

\begin{align*} H_0\amp:\ p_1 - p_2 \leq 0\\ H_A\amp:\ p_1 - p_2 > 0 \end{align*}

The pooled proportion for the population is:

\begin{equation*} \hat p_{\text{pooled}} = \frac{227 + 192}{600+570} = \frac{419}{1170} \approx 0.3581\text{.} \end{equation*}

Using this in our test statistic formula yielded the following test statistic.

\begin{equation*} z_{\text{test}} = \frac{ 227/600 - 192/570}{\sqrt{\frac{(0.3581)(0.6419)}{600} + \frac{(0.3581)(0.6419)}{570}}} \approx 1.48\text{.} \end{equation*}

Now because the alternative hypothesis involves “\(\gt\)”, this is a right-tailed test. At the \(\alpha = 0.10\) significance level, the critical value is \(z_{0.10} = 1.28\) as shown below.

Figure 5.4.29. Critical Region for Example 5.4.28

Because the test statistic 1.48 is further into the right tail than 1.28, it is in the rejection region. We therefore reject the null hypothesis. There is evidence tending towards significance that a higher proportion of women have finished college than men.

Figure 5.4.30. Traditional Hypothesis Test for Differences
Figure 5.4.31. Traditional Hypothesis Test for Differences

A veterinarian believes that dogs and cats have, on average, the same number of offspring in each birth. To test this claim, she takes observes that in 96 cat pregnancies, the average number of offspring was 4.9 with a standard deviation of 1.26 offspring. In 85 dog pregnancies, the vet observed an average of 3.7 offspring with a standard deviation of 0.84 offspring. The cat population is designated as population number one.

Question: what conclusion do you reach using a traditional hypothesis test at the \(\alpha = 0.01\) significance level?

Answer

Reject the Null Hypothesis

A used car salesperson believes that a larger proportion of sports cars sold on her lot are red than the proportion of sedans that are red. To test this hypothesis, she collects the following samples.

Sample Size Number Red
Sports Cars \(n_1 = 73\) \(x_1 = 21\)
Sedans \(n_2 = 129\) \(x_2 = 33\)
Table 5.4.34. Used Car Statistics

Question: what conclusion do you reach using a traditional hypothesis test at the \(\alpha = 0.05\) significance level?

Answer

Fail to Reject the Null Hypothesis

Subsection 5.4.5 The p-Value Test

As with the traditional hypothesis test, the p-value test is the same for testing claims about differences as it was for testing claims about individual population parameters. The following examples show how the p-value test works for tests of differences.

A pet lover believes that dogs live on average at least 5 years longer than cats. To test this claim, he collects data on 63 randomly selected dogs, and 55 randomly selected cats. The average lifespan of the dogs is found to be 18.7 years, with a standard deviation of 3.1 years. The average lifespan for the sample of cats is 12.3 years with a standard deviation of 1.9 years. Conduct a p-value test to see if the pet lover's claim has merit.

Solution

As seen earlier in this section, the hypotheses are:

\begin{align*} H_0\amp:\ \mu_1 - \mu_2 \leq 5\\ H_A\amp:\ \mu_1 - \mu_2 > 5 \end{align*}

Under this null hypothesis, the test statistic for the above samples was:

\begin{equation*} z_{\text{test}} = \frac{(18.7 - 12.3) - 5}{\sqrt{\frac{(3.1)^2}{63} + \frac{(1.9)^2}{55}}} \approx 3.00\text{.} \end{equation*}

Now because the alternative hypothesis involves “\(\gt\)”, this is a right-tailed test. The p-value for the test statistic is the area of the region shown below.

Figure 5.4.36. Critical Region for Example 5.4.35

This gives us

\begin{equation*} P(Z > 3.00) = 1 - 0.9987 = 0.0013\text{,} \end{equation*}

which is smaller than all of our standard significance levels of 0.10, 0.05, and 0.01. We therefore reject the null hypothesis at each of these significance levels. There is highly significant evidence that dogs live at least 5 years longer than cats.

You may have noticed that we did not give a significance level at which to conduct our test in this last example. When a p-value test is being conducted, we sometimes don't state a significance level as part of the problem statement. Instead, once we have the p-value for the test, we compare it with all three of the Common Significance Levels to see at which levels, if any, we can reject the null hypothesis.

An independent senator believes that she has equal support among members of both the Republican and Democrat parties. To test this belief, she commissions a study in which 340 Republicans and 418 Democrats are polled. 138 of the Republicans and 157 of the Democrats are found to support the senator. Conduct a p-value test to determine if the senator's support levels are different.

Solution

We have already seen that the hypotheses are:

\begin{align*} H_0\amp:\ p_1 - p_2 = 0\\ H_A\amp:\ p_1 - p_2 \not= 0 \end{align*}

Our pooled estimate for the common proportion was:

\begin{equation*} \hat p_{\text{pooled}} = \frac{138 + 157}{340+418} = \frac{295}{758} \approx 0.3892\text{.} \end{equation*}

Finally, the test statistic was:

\begin{equation*} z_{\text{test}} = \frac{138/340 - 157/418}{\sqrt{\frac{(0.3892)(0.6108)}{340} + \frac{(0.3892)(0.6108)}{418}}} \approx 0.85\text{.} \end{equation*}

Now as the alternative hypothesis involves “\(\not =\)”, this is a two tailed test. The p-value is therefore the probability of being further into either tail than the test statistic of 0.85. This is twice the area in the right tail, as shown.

Figure 5.4.38. Critical Region for Example 5.4.37

S from the standard normal distribution table, the p-value is:

\begin{equation*} 2\times P(Z > 0.85) = 2(1 - .8023) = 2(0.1977) = 0.3954\text{.} \end{equation*}

Therefore, if the null hypothesis is true and support levels are equal in Republicans and Democrats, we could see samples like this 39.5% of the time. That is not unusual. The p-value of 0.3954 is larger than all common significance levels, 0.10, 0.05, and 0.01. We therefore fail to reject the null hypothesis. There is no evidence that support levels differ between Republicans and Democrats. The senator could well be correct.

Figure 5.4.39. P-Value Tests for Differences I
Figure 5.4.40. P-Value Tests for Differences II

An IRS agent believes that tax fraud is more prevalent on income tax returns where the gross adjusted income is more than $200,000. He takes a sample of 400 returns with income of less than $200,000 and finds that 12 of them are fraudulent. He also takes a sample of 300 returns with more than $200,000 reported income and finds that 15 of them are fraudulent. Suppose that tax returns for those making over $200,000 make up population one.

Question: what is the p-value of the test statistic for these samples?

Answer

0.0869

A widget manufacturer uses two assembly lines to build widgets. The quality control engineer believes that the average weight of a widget made by the first assembly line is greater than the average weight of a widget made by the second assembly line. To test this theory he takes a sample of widgets from each assembly line and finds the following information.

Sample Size Mean Standard Dev.
Assembly Line #1 \(n_1 = 120\) 12.2 ounces 0.72 ounces
Assembly Line #2 \(n_2 = 120\) 11.9 ounces 0.81 ounces
Table 5.4.43. Widget Statistics

Question: what is the p-value for this test statistic?

Answer

0.0012