Section 4.4 Confidence Intervals for the Difference Between Means or Proportions
¶Estimating Differences.
While estimating a mean or a proportion for a single population is useful, there are instances where we need to compare two (or more) populations. In any sort of comparative study, estimating the difference between two populations plays an important role. This section will show us how to estimate the difference between two population means or two population proportions.
Consider the following situations, which will appear as examples later in the section.
A researcher believes that on average, adults can hold their breath longer than children. The research collects a sample of 80 adults and finds that the average time they can hold their breath is 72 seconds with a standard deviation of 4.9 seconds. He also collects a sample of 43 children and finds that the average time they can hold their breaths is 58 seconds, with a standard deviation of 12.1. Estimate the difference, \(\mu_1 - \mu_2\text{,}\) between the average time an adult can hold their breath and the average time a child can hold their breath.
A certain mathematics department wishes to estimate the difference between the proportion of students who pass calculus having had high school geometry, and those who pass without having high school geometry. 50 Students who have had high school geometry and took calculus are randomly selected, and it was found that 46 of them passed calculus. 37 students who did not have high school geometry before taking calculus were also selected, and only 32 of them were found to have passed calculus. Estimate the difference, \(p_1 - p_2\text{,}\) between the proportion of students with high school geometry who pass calculus and the proportion without high school geometry who pass calculus.
Objectives
After finishing this section you should be able to
-
describe the following terms:
Confidence Interval for the Difference Between Means
Confidence Interval for the Difference Between Proportions
Margin of Error for a Difference Between Means
Margin of Error for a Difference Between Proportions
Point Estimate for a Difference Between Means
Point Estimate for a Difference Between Proportions
Sample Size when Estimating a Difference Between Means
Sample Size when Estimating a Difference Between Proportions
-
accomplish the following tasks:
Identify the best point estimate for a difference between means or proportions
Find the margin of error for a difference between means or proportions
Construct a confidence interval for a difference between means or proportions
Understand and list the assumptions that must be made to construct these confidence intervals
Compute the minimum sample size necessary for a given margin of error
Subsection 4.4.1 Point Estimate
¶It should come as no surprise by now that the point estimate we use for the difference between population means or proportions is the difference between the sample means or proportions. We state this formally below, first for means and then for proportions.
Theorem 4.4.1. Point Estimator for a Difference Between Means.
If samples with means \(\overline{x}_1\) and \(\overline{x}_2\) are drawn from two independent populations with means \(\mu_1\) and \(\mu_2\text{,}\) then the best point estimate for \(\mu_1 - \mu_2\) is \(\overline{x}_1 - \overline{x}_2\text{.}\)
Example 4.4.2. Finding a Point Estimate for the Difference Between Means.
A researcher believes that on average, adults can hold their breath longer than children. The research collects a sample of 80 adults and finds that the average time they can hold their breath is 72 seconds with a standard deviation of 4.9 seconds. He also collects a sample of 43 children and finds that the average time they can hold their breath is 58 seconds, with a standard deviation of 12.1. What is the best point estimate for \(\mu_1 - \mu_2\text{,}\) the difference between the average time an adult can hold their breath and the average time a child can hold their breath?
As stated above, the best point estimate for \(\mu_1 - \mu_2\) is \(\overline{x}_1 - \overline{x}_2\text{.}\) Thus, we should use \(72-58 = 14\) for our estimate.
And now we repeat this process for proportions.
Theorem 4.4.3. Point Estimator for a Difference Between Proportions.
If samples with proportions \(\hat p_1\) and \(\hat p_2\) are drawn from two independent populations with proportions \(p_1\) and \(p_2\text{,}\) then the best point estimate for \(p_1 - p_2\) is \(\hat p_1 - \hat p_2\text{.}\)
Example 4.4.4. Finding a Point Estimate for the Difference Between Proportions.
A certain mathematics department wishes to estimate the difference between the proportion of students who pass calculus having had high school geometry, and those who pass without having high school geometry. 50 Students who have had high school geometry and took calculus are randomly selected, and it was found that 46 of them passed calculus. 37 students who did not have high school geometry before taking calculus were also selected, and only 32 of them were found to have passed calculus. What point estimate for \(p_1 - p_2\text{,}\) the difference between the proportion of students with high school geometry who pass calculus and the proportion without high school geometry who pass calculus, should be used?
Again, the best point estimate is the difference between the sample statistics, \(\hat p_1 - \hat p_2\) in this case. This yields:
Checkpoint 4.4.7.
You wish to estimate the difference between the average proportion of Skittles that are yellow and the average proportion of M&M's that are yellow. You randomly sample bags of each type of candy and find the following statistics.
Candy | Sample Size | Mean | Standard Deviation |
Skittles | \(n_1 = 43\) | \(0.2990\) | \(0.01650\) |
M&M's | \(n_2 = 39\) | \(0.3210\) | \(0.02231\) |
Question: what is the best point estimate for the difference between the average proportion of yellow skittles and yellow M&M's?
\(-0.0220\)
Checkpoint 4.4.9.
You wish to estimate the difference between the proportion of men who regularly play video games and the proportion of women who regularly play video games. To do this, you take a sample from each gender and ask each person if they play video games at least once a week. The summary of these surveys is provided below.
Gender | Sample Size | Number who play |
Male | \(n_1 = 63\) | \(x_1 = 49\) |
Female | \(n_2 = 67\) | \(x_2 = 38\) |
Question: what is the best estimate for the difference between the proportion of males and females who regularly play video games?
0.2106
Subsection 4.4.2 Margin of Error
¶We will need to be careful when computing the margin of error for a difference between means or proportions. The standard deviation of both samples will need to be taken into account. We start with the difference between two means. Recall that the standard deviations for each of the sampling distributions is as follows.
The standard deviation for \(\overline{x}_1\) is \(\frac{\sigma_1}{\sqrt{n_1}}\) so the variance is \(\frac{\sigma_1^2}{n_1}\text{.}\)
The standard deviation for \(\overline{x}_2\) is \(\frac{\sigma_2}{\sqrt{n_2}}\) so the variance is \(\frac{\sigma_2^2}{n_2}\text{.}\)
When we subtract the two sample means, the variances from both samples need to be added together to get the combined variance for the difference. This becomes:
and taking the square root of this gives us the standard deviation for \(\overline{x}_1 - \overline{x}_2\text{.}\)
Theorem 4.4.11. Margin of Error for a Difference Between Means.
The margin of error when estimating the difference between two population means at the \((1-\alpha)100\%\) confidence level is given by:
As was the case with a single population, to use this margin of error we need to know that both \(\overline{x}_1\) and \(\overline{x}_2\) have normal distributions. This is true if either the populations from which they are sampled have normal distributions, or the sample sizes are both at least thirty. This is the case in the breath-holding example, continued below.
Example 4.4.12. Finding the Margin of Error for a Difference Between Means.
A researcher believes that on average, adults can hold their breath longer than children. The research collects a sample of 80 adults and finds that the average time they can hold their breath is 72 seconds with a standard deviation of 4.9 seconds. He also collects a sample of 43 children and finds that the average time they can hold their breaths is 58 seconds, with a standard deviation of 12.1. What is the 95% margin of error for the estimate of \(\mu_1 - \mu_2\) given by the point estimate \(72 - 58 = 14\text{?}\)
The margin of error is computed as shown.
For the difference between proportions, things work in much the same way. By taking the square root of the sum of the variances for both \(\hat p_1\) and \(\hat p_2\text{,}\) we get the standard deviation for their difference \(\hat p_1 - \hat p_2\text{.}\) This computation is shown below.
The margin of error is then the following.
Theorem 4.4.13. Margin of Error for a Difference Between Proportions.
The margin of error when estimating the difference between two population proportions at the \((1-\alpha)100\%\) confidence level is given by:
Note that we use \(p_1 \approx \hat p_1\) and \(p_2 \approx \hat p_2\) since \(p\) and \(q\) are not know. Again, we must make sure that \(\hat p_1\) and \(\hat p_2\) both have normal distributions. This is true so long as \(\hat p_1 \times n_1\text{,}\) \(\hat q_1\times n_1\text{,}\) \(\hat p_2\times n_2\text{,}\) and \(\hat q_2\times n_2\) are all greater than 5. We revisit the calculus example to see how to use this formula.
Example 4.4.14. Finding the Margin of Error for a Difference Between Proportions.
A certain mathematics department wishes to estimate the difference between the proportion of students who pass calculus having had high school geometry, and those who pass without having high school geometry. 50 Students who have had high school geometry and took calculus are randomly selected, and it was found that 46 of them passed calculus. 37 students who did not have high school geometry before taking calculus were also selected, and only 32 of them were found to have passed calculus. Find the 99% margin of error in the estimate of \(p_1 - p_2\text{,}\) the difference between the proportion of students with high school geometry who pass calculus and the proportion without high school geometry who pass calculus.
Computing \(\hat p_1 = \frac{46}{50} = 0.92\) and \(\hat p_2 = \frac{32}{37} = 0.8649\) we get the following.
Checkpoint 4.4.17.
You wish to estimate the difference between the average proportion of Skittles that are yellow and the average proportion of M&M's that are yellow. You randomly sample bags of each type of candy and find the following statistics.
Candy | Sample Size | Mean | Standard Deviation |
Skittles | \(n_1 = 43\) | \(0.2990\) | \(0.01650\) |
M&M's | \(n_2 = 39\) | \(0.3210\) | \(0.02231\) |
Question: what is the 99% margin of error for the difference between these means? Round your answer to four decimal places.
0.0113
Checkpoint 4.4.19.
You wish to estimate the difference between the proportion of men who regularly play video games and the proportion of women who regularly play video games. To do this, you take a sample from each gender and ask each person if they play video games at least once a week. The summary of these surveys is provided below.
Gender | Sample Size | Number who play |
Male | \(n_1 = 63\) | \(x_1 = 49\) |
Female | \(n_2 = 67\) | \(x_2 = 38\) |
Question: what is the 98% margin of error for the difference between these two proportions? Round your answer to four decimal places.
0.1865
Subsection 4.4.3 Confidence Interval
¶Once we have found the point estimate and margin of error, putting together the confidence interval is a fairly straight-forward task. The confidence interval formulas for the differences between means and proportions are given below, along with examples.
Theorem 4.4.21. Confidence Interval for the Difference Between Means.
The \((1-\alpha)100\%\) confidence interval for the difference \(\mu_1 - \mu_2\) between means of independent populations is given by:
Example 4.4.22. Finding the Confidence Interval for a Difference Between Means.
A researcher believes that on average, adults can hold their breath longer than children. The research collects a sample of 80 adults and finds that the average time they can hold their breath is 72 seconds with a standard deviation of 4.9 seconds. He also collects a sample of 43 children and finds that the average time they can hold their breaths is 58 seconds, with a standard deviation of 12.1. Construct the 95% confidence interval for \(\mu_1 - \mu_2\) based on these samples.
Using the formula above, we get the following.
Adding and subtracting these values gives us the confidence interval below.
This means that we are 95% confident that the average adult can hold his or her breath between 10.2 seconds and 17.8 seconds longer than the average child.
And now we look at population proportions.
Theorem 4.4.23. Confidence Interval for the Difference Between Proportions.
The \((1-\alpha)100\%\) confidence interval for the difference \(p_1 - p_2\) between proportions in independent populations is given by:
Note that we use \(p_1 \approx \hat p_1\) and \(p_2 \approx \hat p_2\) since \(p\) and \(q\) are unknown.
Example 4.4.24. Finding the Confidence Interval for a Difference Between Proportions.
A certain mathematics department wishes to estimate the difference between the proportion of students who pass calculus having had high school geometry, and those who pass without having high school geometry. 50 Students who have had high school geometry and took calculus are randomly selected, and it was found that 46 of them passed calculus. 37 students who did not have high school geometry before taking calculus were also selected, and only 32 of them were found to have passed calculus. Construct the 99% confidence interval for \(p_1 - p_2\text{,}\) the difference between the proportion of students with high school geometry who pass calculus and the proportion without high school geometry who pass calculus.
Recall that in Example 4.4.4 we determined that \(\hat p_1 = 0.92\) and \(\hat p_2 = 0.8649\text{.}\) Using these values in the formula for a confidence interval yields the following.
Again we add and subtract to get the following confidence interval.
Note that according to this confidence interval, \(p_1\) may be:
bigger than \(p_2\) because there are positive differences in the interval,
equal to \(p_2\) because zero is in the interval, or
smaller than \(p_2\) because there are negative differences in the interval.
Checkpoint 4.4.29.
You wish to estimate the difference between the average proportion of Skittles that are yellow and the average proportion of M&M's that are yellow using a 99% confidence interval. You randomly sample bags of each type of candy and find the following statistics.
Candy | Sample Size | Mean | Standard Deviation |
Skittles | \(n_1 = 43\) | \(0.2990\) | \(0.01650\) |
M&M's | \(n_2 = 39\) | \(0.3210\) | \(0.02231\) |
Question: what is the 99% confidence interval for the difference between the average proportion of yellow Skittles and yellow M&M'in a bag of candy? Round your confidence bounds to four decimal places.
\(-0.0332 \lt \mu_1 - \mu_2 \lt -0.0108\)
Checkpoint 4.4.31.
You wish to estimate the difference between the proportion of men who regularly play video games and the proportion of women who regularly play video games using a 98% confidence interval. To do this, you take a sample from each gender and ask each person if they play video games at least once a week. The summary of these surveys is provided below.
Gender | Sample Size | Number who play |
Male | \(n_1 = 63\) | \(x_1 = 49\) |
Female | \(n_2 = 67\) | \(x_2 = 38\) |
Question: what is the 98% confidence interval for this difference? Round your answer to four decimal places.
\(0.0241 \lt p_1 - p_2 \lt 0.3971\)
Subsection 4.4.4 Sample Size
¶When trying to determine the sample size that we need for a given margin of error in a confidence interval for differences, we need to make an extra assumption. Note that in the two margin of error formulas below, there are two, potentially different, values of \(n\text{.}\)
-
Margin of Error for Means.
\begin{equation*} z_{\alpha/2}\times \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}} \end{equation*} -
Margin of Error for Proportions.
\begin{equation*} z_{\alpha/2}\times \sqrt{\frac{p_1q_1}{n_1} + \frac{p_2q_2}{n_2}} \end{equation*}
Because we can not solve a single inequality for both \(n_1\) and \(n_2\text{,}\) we must make the simplifying assumption that both samples will be of the same size. That is, we assume that \(n_1 = n_2 = n\text{.}\) When dealing with the mean, we can now find the value of \(n\) that guarantees that we will have a margin of error no bigger than some fixed \(E\text{.}\)
This result is summarized below.
Theorem 4.4.33. Sample Size when Estimating a Difference Between Means.
To get a maximum margin of error of \(E\) at the \((1-\alpha)100\%\) confidence level, we must take samples of size \(n\) from each population where:
As was the case with a single population mean, we need to know the standard deviations before we can use this formula. Since we haven't yet onducted our survey, we must have some other way to approximate the standard deviations. Recall that our method was to either:
use standard deviations from a previous or preliminary study, or
use the range approximation.
In the question below, we will use the range approximation.
Example 4.4.34. Finding the Sample Size for a Difference Between Means.
You wish to determine the difference between the average time adults spend watching television, and the average time children spend watching television. To do this you will use a 95% confidence interval, and you wish your margin of error to be no more than 15 minutes. You assume that both adults and children watch between zero and 12 hours of television a day. How many individuals will you need to include in your samples to achieve this goal?
Both standard deviations—for adults and children—will be approximated by \(\frac{12 - 0}{4} = 3\) hours. Plugging this into the formula above, and recognizing that 15 minutes is 0.25 hours, we get the following.
Remembering to always round up, we must include at least 1107 adults and 1107 children in this study.
This sounds quite large, and it would be wise to either get a better approximation for the standard deviation with a preliminary study, or to allow for a larger margin of error.
Following a procedure similar to that above, we get the following formula for the minimum sample size for a difference between proportions.
Theorem 4.4.35. Sample Size when Estimating a Difference Between Proportions.
To get a maximum margin of error of \(E\) at the \((1-\alpha)100\%\) confidence level, we must take samples of size \(n\) from each population where:
To fill in the unknown values of \(p_1\) and \(p_2\text{,}\) we again have two methods available:
we can use values for \(p_1\) and \(p_2\) from previous studies, or
we can use the “worst case scenario” that \(p_1 = p_2 = 0.5\text{.}\)
Example 4.4.36. Finding the Sample Size for a Difference Between Proportions.
A study is to be conducted to determine the difference between the proportion of Democrats and Republicans who support the death penalty. A study from five years ago found that 39% of Democrats and 67% of Republicans support the death penalty. How many persons should be surveyed from each party if we wish to construct a 98% confidence interval for the difference between these two proportions, and we want a margin of error of no more than 4%?
We will use the provided values \(p_1= 0.39\) and \(p_2 = 0.67\) from the previous study in the following formula.
As usual, we round this up to 1558. We must include both 1558 Democrats and 1558 Republicans in our survey in order to meet the criteria specified.
Checkpoint 4.4.39.
The average reading speed, in words per minute, is to be measured among children in school districts designated as “low income” and school districts designated as “affluent.” A previous study suggests that the standard deviation of this reading score will be approximately 12 words per minute in the low income children, and 11.5 words per minute in the affluent children. A 99% confidence interval is to be constructed for the difference between the reading speeds of these two groups of children and a margin of error of 5 words per minute or less is desired.
Question: what minimum sample size must we collect from each school district?
74
Checkpoint 4.4.40.
The local human society plans on conducting a survey to compare the proportion of cats that have been spayed or neutered against the proportion of dogs that have been spayed or neutered. They wish to construct a 80% confidence interval for the difference between these two proportions with a margin of error of less than 5%.
Question: how many dogs and cats total (combined) must they sample?
656 (328 of each)