Section 5.2 Hypothesis Tests for a Mean
¶Testing Claims About a Mean.
In this section we will look at our first type of hypothesis test—tests for single population means. Consider the following situations, with which we will work throughout the section.
A certain line of tires is said to have an average lifespan of 60,000 miles or more. To test this claim, a consumer advocacy group collects data for a random sample of 160 different customers who purchased these tires and subsequently had them fail. They find that mean lifespan of tires in the sample was 58,952.1 miles with a standard deviation of 6,439.6 miles.
An environmentalist claims that the level of mercury in a local stream has risen above the government mandated 2 parts per million (ppm). To test this, he collects 40 random samples of stream water and has them tested for mercury content. He finds that the average mercury content in these samples is 2.3 ppm with a standard deviation of 0.6 ppm.
A parts manufacturer has just set up a new production line to make gears with an average width of 8 centimeters. One of their customers complains that the gears being produced do not have an average width of 8 centimeters. To test this claim, the manufacturer takes a sample of 125 gears and finds that they have a mean width of 7.9 centimeters with a standard deviation of 0.88 centimeters.
Objectives
After finishing this section you should be able to
-
describe the following terms:
Hypotheses for a Single Population Mean
Test Statistic for a Single Population Mean
-
accomplish the following tasks:
Formulate null and alternative hypotheses for tests of a single mean.
Compute the test statistic for a single mean.
Use this test statistic to conduct a traditional hypothesis test.
Use this test statistic to conduct a p-value hypothesis test.
Understand and identify type I and type II errors.
Subsection 5.2.1 Formulating Hypotheses
¶As we saw in Section 5.1, the first step in any hypothesis test is to identify the null and alternative hypotheses. When testing a claim about a single population mean, there are three basic types of null/alternative hypothesis combinations.
Principle 5.2.1. Hypotheses for a Single Population Mean.
To test a claim about a single population mean, we use one of the following sets of hypotheses, where \(\mu_0\) is a given value.
-
Left-Tailed.
\begin{align*} H_0\amp:\ \mu \geq \mu_0\\ H_A\amp:\ \mu \lt \mu_0 \end{align*} -
Two-Tailed.
\begin{align*} H_0\amp :\ \mu = \mu_0\\ H_A\amp :\ \mu \not= \mu_0 \end{align*} -
Right-Tailed.
\begin{align*} H_0\amp :\ \mu \leq \mu_0\\ H_A\amp :\ \mu > \mu_0 \end{align*}
Let's look at each of the three examples from the introduction and see if we can determine which of these sets of hypotheses should be used.
Example 5.2.2. Stating Hypotheses for a Left-Tailed Test.
A certain line of tires is said to have an average lifespan of 60,000 miles or more. To test this claim, a consumer advocacy group collects data for a random sample of 160 different customers who purchased these tires and subsequently had them fail. They find that mean lifespan of tires in the sample was 58,952.1 miles with a standard deviation of 6,439.6 miles. Find the null and alternative hypotheses for this test.
The claim is that the average lifespan is at least 60,000 miles. This involves equality, and must therefore be the null hypothesis. Thus, this is a left-tailed test with hypotheses:
Example 5.2.3. Stating Hypotheses for a Right-Tailed Test.
An environmentalist claims that the level of mercury in a local stream has risen above the government mandated 2 parts per million (ppm). To test this, he collects 40 random samples of stream water and has them tested for mercury content. He finds that the average mercury content in these samples is 2.3 ppm with a standard deviation of 0.6 ppm. Find the null and alternative hypotheses for this test.
The environmentalist's claim is that the mercury levels are more than 2 ppm. This does not involve equality, so it must the the alternative hypothesis. Therefore, this is a right-tailed test with hypotheses:
Example 5.2.4. Stating Hypotheses for a Two-Tailed Test.
A parts manufacturer has just set up a new production line to make gears with an average width of 8 centimeters. One of their customers complains that the gears being produced do not have an average width of 8 centimeters. To test this claim, the manufacturer takes a sample of 125 gears and finds that they have a mean width of 7.9 centimeters with a standard deviation of 0.88 centimeters. Find the null and alternative hypotheses for this test.
The claim in this case is that the average is exactly 8 cm. This obviously involves equality, and so it must be the null hypothesis. This yields a two-tailed test with hypotheses:
At this point it is important to point out that a slight change in the wording of a problem can change the type of hypothesis test that we use. If Example 5.2.2 had said “more than 60,000 miles” instead of “at least 60,000 miles,” then we would have reversed the tails in our test, using the null hypothesis that \(\mu \leq 60,000\text{.}\) If you are conducting your own hypothesis test, you need to be careful how you phrase your questions so that you wind up rejecting, or failing to reject, the hypothesis you really want to investigate.
Checkpoint 5.2.7.
A biologist believes that the fish in a certain lake have stunted growth, never reaching the typical 14 inch length for their species. To test this theory, she randomly selects a sample of 215 fish and finds the average length in the sample to be 13.5 inches with a standard deviation of 3.85 inches.
Question: what should the null hypothesis be in this test?
\(\mu \geq 14\)
Checkpoint 5.2.8.
The Wrigley candy company claims that a standard bag of skittles contains an average of 15 yellow candies. To test this claim you purchase 60 regular sized bags and find an average of 16.1 yellow candies in these bags, with a standard deviation of 4.9 candies.
Question: what should your null hypothesis be?
\(\mu = 15\)
Subsection 5.2.2 Computing the Test Statistic
¶When testing a claim about a population mean, the test statistic measures how unusual the observed sample is if the null hypothesis is true. The test statistic is really just a z-score for the sample mean based on the assumption that the population mean is as indicated in the null hypothesis. This formula should be familiar already from lesson 3.5, but is repeated below in the context of a hypothesis test.
Theorem 5.2.9. Test Statistic for a Single Sample Mean.
The test statistic for a sample mean \(\overline{x}\) used to test the assumption of the null hypothesis that \(\mu = \mu_0\) is:
Note that if the sample size is 30 or more, we can approximate \(\sigma \approx s\text{.}\)
When computing a test statistic, the null hypothesis must give us one value for the population mean. In the case of a two-tailed test, the null hypothesis that \(\mu = \mu_0\) does just that. In a left- or right-tailed test, we use the “worst-case” value of \(mu_0\) from the null hypothesis. That is, even if we have:
\(H_0:\ \mu \geq \mu_0\text{,}\) or
\(H_0:\ \mu \leq \mu_0\)
we will use \(\mu = \mu_0\) in computing our test statistic. Examples of this can be found as we continue working on the problems from the beginning of this lesson.
Checkpoint 5.2.10. Computing the Test Statistic for a Left-Tailed Test.
A certain line of tires is said to have an average lifespan of 60,000 miles or more. To test this claim, a consumer advocacy group collects data for a random sample of 160 different customers who purchased these tires and subsequently had them fail. They find that mean lifespan of tires in the sample was 58,952.1 miles with a standard deviation of 6,439.6 miles. Find the test statistic for this sample.
Recall that the null and alternative hypothesis were:
Using the assumption that \(\mu = 60000\) from the null hypothesis, we compute the test statistic for our sample as follows.
Checkpoint 5.2.11. Computing the Test Statistic for a Right-Tailed Test.
An environmentalist claims that the level of mercury in a local stream has risen above the government mandated 2 parts per million (ppm). To test this, he collects 40 random samples of stream water and has them tested for mercury content. He finds that the average mercury content in these samples is 2.3 ppm with a standard deviation of 0.6 ppm. Find the test statistic for this sample.
In a previous example, we found the null and alternative hypotheses to be:
Under the assumption that \(\mu = 2\text{,}\) we compute the test statistic for this sample as follows.
Example 5.2.12. Computing the Test Statistic for a Two-Tailed Test.
A parts manufacturer has just set up a new production line to make gears with an average width of 8 centimeters. One of their customers complains that the gears being produced do not have an average width of 8 centimeters. To test this claim, the manufacturer takes a sample of 125 gears and finds that they have a mean width of 7.9 centimeters with a standard deviation of 0.88 centimeters. Find the test statistic for this sample.
On the last page, we found the following hypotheses for this situation.
Under the null hypothesis assumption that \(\mu = 8\text{,}\) the test statistic is as shown below.
Checkpoint 5.2.15.
A biologist believes that the fish in a certain lake have stunted growth, never reaching the typical 14 inch length for their species. To test this theory, she randomly selects a sample of 215 fish and finds the average length in the sample to be 13.5 inches with a standard deviation of 3.85 inches.
Question: what is the test statistic in this problem?
-1.91
Checkpoint 5.2.16.
The Wrigley candy company claims that a standard bag of skittles contains an average of 15 yellow candies. To test this claim you purchase 60 regular sized bags and find an average of 16.1 yellow candies in these bags, with a standard deviation of 4.9 candies.
Question: find the test statistic for this sample.
1.74
Subsection 5.2.3 The Traditional Test
¶We are now ready to conduct a traditional hypothesis test and draw conclusions. Remember that the steps to conducting a traditional hypothesis test are as follows.
State the null and alternative hypotheses (done).
Compute the test statistic (done).
Find the rejection region and their critical values.
Compare the test statistic with the critical values to reach your conclusion.
Since we have already accomplished one and two for our example problems, we have only 3 and 4 left to do. These last two steps are carried out in each example below.
Example 5.2.17. Conducting a Left-Tailed Traditional Hypothesis Test.
A certain line of tires is said to have an average lifespan of 60,000 miles or more. To test this claim, a consumer advocacy group collects data for a random sample of 160 different customers who purchased these tires and subsequently had them fail. They find that mean lifespan of tires in the sample was 58,952.1 miles with a standard deviation of 6,439.6 miles. Conduct a traditional hypothesis test at the \(\alpha = 0.05\) significance level to determine if the companies claim has merit.
Recall that the null and alternative hypothesis were:
We also computed the test statistic as:
The next step is to identify the rejection region and the critical value that separates it from the rest of the normal distribution. Since the alternative hypothesis involves \(\lt\text{,}\) this is a left-tailed test with the entire significance level \(\alpha = 0.05\) in that left tail. This gives a critical value \(z_\alpha = -1.645\) as shown below.
Since our test statistic of \(z_\text{test} = -2.06\) is in the rejection region (less than -1.645), we reject the null hypothesis. There is statistically significant evidence that the tires have a lifespan that is less than 60,000 miles.
Example 5.2.19. Conducting a Right-Tailed Traditional Hypothesis Test.
An environmentalist claims that the level of mercury in a local stream has risen above the government mandated 2 parts per million (ppm). To test this, he collects 40 random samples of stream water and has them tested for mercury content. He finds that the average mercury content in these samples is 2.3 ppm with a standard deviation of 0.6 ppm. Conduct a traditional hypothesis test to check the environmentalist's claim at the \(\alpha = 0.01\) significance level.
Remember that the hypotheses are:
And the test statistic is:
Now we identify the rejection region and the critical value. This is a right tailed test (because the alternative involves \(\gt\)) with 0.01 in the tail. This picture is shown below, along with the corresponding critical value of 2.33.
Because our test statistic 3.16 is greater than the critical value of 2.33, and hence in the rejection region, we reject the null hypothesis. There is highly significant evidence that this stream contains more than 2 ppm mercury.
Example 5.2.21. Conducting a Two-Tailed Traditional Hypothesis Test.
A parts manufacturer has just set up a new production line to make gears with an average width of 8 centimeters. One of their customers complains that the gears being produced do not have an average width of 8 centimeters. To test this claim, the manufacturer takes a sample of 125 gears and finds that they have a mean width of 7.9 centimeters with a standard deviation of 0.88 centimeters. Conduct a traditional hypothesis test at the \(\alpha = 0.05\) significance level.
Recall the hypotheses:
The test statistic was found to be:
Because the alternative hypothesis involves “not equal to,” this is a two tailed test with the significance level \(0.05\) split evenly between the two tails. This gives us critical values of plus and minus \(1.96\) as shown below.
Because our test statistic \(-1.27\) is not in the rejection region, we fail to reject the null hypothesis. There is no statistically significant evidence that the average width of these gears does not equal 8 cm.
Checkpoint 5.2.25.
A biologist believes that the fish in a certain lake have stunted growth, never reaching the typical 14 inch length for their species. To test this theory, she randomly selects a sample of 215 fish and finds the average length in the sample to be 13.5 inches with a standard deviation of 3.85 inches.
Question: what decision do you make at the \(\alpha = 0.01\) significance level? Use a traditional test.
Fail to Reject the Null Hypothesis
Checkpoint 5.2.26.
The Wrigley candy company claims that a standard bag of skittles contains an average of 15 yellow candies. To test this claim you purchase 60 regular sized bags and find an average of 16.1 yellow candies in these bags, with a standard deviation of 4.9 candies.
Question: what conclusion do you reach? Use a traditional test at the \(\alpha = 0.05\) significance level.
Fail to Reject the Null Hypothesis
Subsection 5.2.4 The P-Value Test
¶Like a traditional test, a p-value test will help us reach conclusions about our hypotheses based on the test statistic. The first two steps in a p-value test are exactly the same as the first two steps in a traditional test. It is the last two steps which differ, as outlined below.
State the null and alternative hypotheses (done).
Compute the test statistic (done).
Find the p-value for this test statistic.
Compare the p-value with the significance level to reach your conclusion.
We will now repeat the tests seen in Example 5.2.17, Example 5.2.19, and Example 5.2.21 using these revised steps three and four. That is, we will find the p-value of each test statistic and compare it directly with the significance level to determine if we should reject the null hypothesis.
Example 5.2.27. Conducting a Left-Tailed p-Value Hypothesis Test.
A certain line of tires is said to have an average lifespan of 60,000 miles or more. To test this claim, a consumer advocacy group collects data for a random sample of 160 different customers who purchased these tires and subsequently had them fail. They find that mean lifespan of tires in the sample was 58,952.1 miles with a standard deviation of 6,439.6 miles. Conduct a p-value hypothesis test at the \(\alpha = 0.05\) significance level to determine if the companies claim has merit.
Recall that the null and alternative hypothesis were:
We also computed the test statistic as:
The next step is to find the p-value for -2.06. Because this is a left-tailed test, this is \(P(Z \lt -2.06)\) as depicted in the sketch below.
From the standard normal distribution table, that probability is \(P(Z \lt -2.06) = 0.0197\text{.}\) Therefore, the p-value for this test statistic is \(0.0197\text{.}\) Since this p-value is less than the significance level of \(\alpha = 0.05\text{,}\) this sample is less likely than we are willing to tolerate. We must therefore reject the null hypothesis. There is statistically significant evidence that the tires have a lifespan that is less than 60,000 miles.
In the above example, note that had we been using the \(0.01\) significance level, we would have failed to reject the null hypothesis. This is because our p-value of \(0.0192\) is greater than \(0.01\text{.}\) This shows one advantage of a p-value test. By reporting the p-value, the reader can see that, while the null hypothesis may have been rejected at this significance level, at others it may not have been rejected. This allows the reader to make a decision for themselves as to how significant a test they would like to conduct.
Example 5.2.29. Conducting a Right-Tailed p-Value Hypothesis Test.
An environmentalist claims that the level of mercury in a local stream has risen above the government mandated 2 parts per million (ppm). To test this, he collects 40 random samples of stream water and has them tested for mercury content. He finds that the average mercury content in these samples is 2.3 ppm with a standard deviation of 0.6 ppm. Conduct a p-value hypothesis test to check the environmentalist's claim at the \(\alpha = 0.01\) significance level.
Remember that the hypotheses are:
And the test statistic is:
Because this is a right-tailed test, the p-value for our test statistic is \(P(Z > 3.16)\) as sown in the diagram below.
Using the standard normal distribution table, this gives a probability of \(1 - 0.09992 = 0.0008\text{.}\) Therefore, as this p-value is less than the significance level of \(0.01\text{,}\) this sample is more unusual than we are willing to accept. Thus we must reject the null hypothesis. There is highly significant evidence that this stream contains more than 2 ppm mercury.
Example 5.2.31. Conducting a Two-Tailed p-Value Hypothesis Test.
A parts manufacturer has just set up a new production line to make gears with an average width of 8 centimeters. One of their customers complains that the gears being produced do not have an average width of 8 centimeters. To test this claim, the manufacturer takes a sample of 125 gears and finds that they have a mean width of 7.9 centimeters with a standard deviation of 0.88 centimeters. Conduct a p-value hypothesis test at the \(\alpha = 0.05\) significance level.
Recall the hypotheses:
The test statistic was found to be:
To find the p-value for our test statistic in this two-tailed test, we must find the probability of being more extreme than \(-1.27\text{.}\) That means we are either less than \(-1.27\) or greater than \(1.27\)—further into the left or right tails respectively as shown below.
This computation can be simplified using symmetry. We find \(P(Z \lt -1.27)\) and double it. This gives a p-value of \(2(.1020) = 0.2040\text{.}\) Since our p-value is greater than the significance level of \(0.05\text{,}\) we fail to reject the null hypothesis. There is no statistically significant evidence that the average width of these gears does not equal 8 cm. In fact, our p-value is greater than \(0.10\) so there is not even evidence tending towards significance that the mean width is other than 8 cm. For more examples of conducting a traditional hypothesis test, see the following videos.
Checkpoint 5.2.35.
A biologist believes that the fish in a certain lake have stunted growth, never reaching the typical 14 inch length for their species. To test this theory, she randomly selects a sample of 215 fish and finds the average length in the sample to be 13.5 inches with a standard deviation of 3.85 inches.
Question: what is the p-value of the test statistic?
0.0287
Checkpoint 5.2.36.
The Wrigley candy company claims that a standard bag of skittles contains an average of 15 yellow candies. To test this claim you purchase 60 regular sized bags and find an average of 16.1 yellow candies in these bags, with a standard deviation of 4.9 candies.
Question: what is the p-value for this test?
0.0818
Subsection 5.2.5 Type I and Type II Errors
¶Any time we conduct a hypothesis test there is a chance, because of randomness in our sampling techniques, that we will make an error. Recall that these errors come in two forms.
-
Type I Error.
This is the error of rejecting a null hypothesis even though it is true.
-
Type II Error.
This is the error of failing to reject the null hypothesis even though it is in fact wrong.
To be sure we understand what these errors would look line when conducting hypothesis tests for a mean, consider the following examples.
Example 5.2.37. Detecting a Type I Error.
A certain line of tires is said to have an average lifespan of 60,000 miles or more. To test this claim, a consumer advocacy group collects data for a random sample of 160 different customers who purchased these tires and subsequently had them fail. They find that mean lifespan of tires in the sample was 58,952.1 miles with a standard deviation of 6,439.6 miles. If you conduct a hypothesis test at the \(\alpha = 0.05\) significance level and the tires really do last 60,000 miles or more, what type of error will be made?
We saw in both the traditional and p-value test that we rejected the null hypothesis. If, however, the mean lifespan really is 60,000 miles or more, then the null hypothesis is true. Rejecting a true null hypothesis is a type I error, and the probability that this happens is the significance level, or 0.05 in this case.
Example 5.2.38. Detecting a Type II Error.
A parts manufacturer has just set up a new production line to make gears with an average width of 8 centimeters. One of their customers complains that the gears being produced do not have an average width of 8 centimeters. To test this claim, the manufacturer takes a sample of 125 gears and finds that they have a mean width of 7.9 centimeters with a standard deviation of 0.88 centimeters. If you conduct a hypothesis test at the \(\alpha = 0.05\) significance level and the mean width is actually 7.9 centimeters, what type of error will be made?
As we have seen before in Example 5.2.21 and Example 5.2.31, this sample leads us to fail to reject the null hypothesis. If the mean really is 7.9, then the null hypothesis that it is 8 is wrong. Failing to reject an incorrect null hypothesis is a type II error.
Checkpoint 5.2.41.
A biologist believes that the fish in a certain lake have stunted growth, never reaching the typical 14 inch length for their species. To test this theory, she randomly selects a sample of 215 fish and finds the average length in the sample to be 13.5 inches with a standard deviation of 3.85 inches. Based on this sample, you conclude that there is not enough evidence to reject the null hypothesis at the \(\alpha = 0.01\) significance level.
Question: ithe true mean is 13.9 inches, what type of error have you made?
Type II Error
Checkpoint 5.2.42.
The Wrigley candy company claims that a standard bag of skittles contains an average of 15 yellow candies. To test this claim you purchase 60 regular sized bags and find an average of 16.1 yellow candies in these bags, with a standard deviation of 4.9 candies. Based on this sample you reject the null hypothesis at the \(\alpha = 0.05\) significance level.
Question: if in fact the true average is 15 yellow candies, what type of error did you make?
Type I Error