Confidence Intervals for a Proportion

Section 4.3 Confidence Intervals for a Proportion

Estimating a Proportion.

The process for estimating the proportion of a population which has a given characteristic is very similar to the process for estimating population means. In order to build a confidence interval for \(p\text{,}\) the population proportion, we follow these steps.

Get a point estimate for \(p\text{.}\)
Find the margin of error for \(p\text{.}\)
Add the margin of error to and subtract it from the point estimate to get the confidence interval.

The tools we use to find the margin of error again rely on the assumption that the sample proportion, \(\hat p\text{,}\) will have a normal distribution. Recall that this is true as long as \(n\times p\) and \(n\times q\) are both greater than 5.

Objectives

After finishing this section you should be able to

describe the following terms:
- Confidence Interval for a Proportion
- Margin of Error for a Proportion
- Point Estimate for a Proportion
- Sample Size when Estimating a Proportion
accomplish the following tasks:
- Identify the best point estimate for a population proportion
- Find the margin of error for a population proportion
- Construct a confidence interval for a population proportion
- Understand and list the assumptions that must be made to construct this confidence interval
- Compute the minimum sample size necessary for a given margin of error

Subsection 4.3.1 Point Estimate

How do we use a sample from a population to estimate the population proportion? Consider the following example.

Example 4.3.1. Describing the Process for Finding a Point Estimate for a Proportion.

An urn contains a large number of colored marbles. You wish to estimate how many of those marbles are red. Describe how this might be done.

Solution

Assuming that the urn contains too many marbles to go through, we will base our estimate on a sample. Draw a sample of 100 marbles, so that \(n=100\text{.}\) Then count the number that are red and call that \(x\text{.}\) The proportion of red marbles in the sample is

\begin{equation*} \hat p = \frac{x}{100}\text{.} \end{equation*}

We should use \(\hat p\) to estimate \(p\text{,}\) the proportion of all of the marbles that are red.

This is a specific example of using the sample proportion as a point estimate for the population proportion.

Theorem 4.3.2. Point Estimator for a Proportion.

The best point estimate for a population proportion \(p\) is the sample proportion \(\hat p\text{.}\)

In many examples, such as the last one, we are not directly given the value of \(\hat p\text{.}\) Instead, we are told the sample size, \(n\text{,}\) and the number of individuals in the sample that have the desired characteristic, \(x\text{.}\) It is up to us then to compute \(\hat p = \frac{x}{n}\text{.}\) Another example of this can be seen below.

Example 4.3.3. Finding a Point Estimate for a Proportion.

A storybook writer wishes to know the proportion of 2nd graders who believe in unicorns. He randomly samples \(350\) 2nd graders and finds that \(183\) of them believe in unicorns. What should he use as an estimate for the proportion of all 2nd graders who believe in unicorns?

Solution

The best estimate for the population proportion \(p\) is the sample proportion \(\hat p\) which is:

\begin{equation*} \hat p = \frac{x}{n} = \frac{183}{350} \approx 0.5229\text{.} \end{equation*}

The storybook writer should conclude that 52.3% of 2nd graders believe in unicorns.

Figure 4.3.4. Point Estimates for a Proportion I

Figure 4.3.5. Point Estimates for a Proportion II

Checkpoint 4.3.6.

You wish to estimate the proportion of men who know how to cook. To do this, you collect a random sample of \(725\) men and find that \(433\) of them know how to cook.

Question: what should you use as a point estimate for \(p\text{,}\) the proportion of men who know how to cook?

Answer

0.5972

Checkpoint 4.3.7.

In a sample of 100 bags of M&M's candy, you find that 34.4% of them contain more brown candies than any other color.

Question: what is the best estimate of the proportion of M&M's candy bags that contain more brown candies than any other color?

Answer

0.344

Subsection 4.3.2 Margin of Error

How good is a point estimate for a proportion? To answer this question we need to find the margin of error for such an estimate. The basic formula for margin of error is the same as in the past two lessons. We multiply a critical value by the standard deviation of the sampling distribution. In this case, we are using the sampling distribution for \(\hat p\).

Theorem 4.3.8. Margin of Error for a Proportion.

The margin of error when estimating a population proportion with a sample proportion \(\hat p\) drawn from a sample of size \(n\) at the \((1-\alpha)\) significance level is:

\begin{equation*} z_{\alpha/2}\times \sqrt{\frac{pq}{n}}\text{.} \end{equation*}

Note that since \(p\) is not known, we approximate with \(p \approx \hat p\) and \(q \approx 1-\hat p\text{.}\)

Let's apply this formula to the storybook writer example.

Example 4.3.9. Finding the Margin of Error for a Proportion.

A storybook writer wishes to know the proportion of 2nd graders who believe in unicorns. He randomly samples \(350\) 2nd graders and finds that \(183\) of them believe in unicorns. What is the margin of error in estimating the proportion of 2nd graders who believe in unicorns with this sample at the 99% confidence level?

Solution

At the 99% confidence level our critical value is \(z_{\alpha/2} = \pm 2.575\text{.}\) We use our point estimate \(\hat p = \frac{183}{350} \approx 0.5229\) to approximate \(p\text{.}\) This gives a margin of error of:

\begin{equation*} z_{\alpha/2} \times \sqrt{\frac{pq}{n}} = \pm 2.575\sqrt{\frac{(0.5229)(0.4771)}{350}} \approx \pm 0.0687\text{.} \end{equation*}

We are therefore 99% confident that the 52.3% of 2nd graders that we estimate believe in unicorns is no more than 6.9% different from the true population proportion.

One final note on the margin of error formula above. To use critical values, \(z_{\alpha/2}\text{,}\) from the standard normal distribution, we have to assume that \(n\times p\) and \(n\times q\) are greater than 5. Since we don't know \(p\)—after all, that is what we are trying to estimate, we again substitute \(\hat p\text{.}\) So we must make sure that \(n\times \hat p = x\) and \(n\times \hat q = n-x\) are both greater than 5 before we can use the formula above for margin of error. In other words, we must have more than five “successes” and more than five “failures” in our sample in order to proceed.

Figure 4.3.10. Margin of Error for a Proportion I

Figure 4.3.11. Margin of Error for a Proportion II

Checkpoint 4.3.12.

You wish to estimate the proportion of men who know how to cook. To do this, you collect a random sample of \(725\) men and find that \(433\) of them know how to cook.

Question: what is the margin of error at the 95% confidence level in the estimation of \(p\) by \(\hat p\text{?}\) Round your answer to four decimal places.

Answer

\(\pm 0.0357\)

Checkpoint 4.3.13.

In a sample of 100 bags of M&M's candy, you find that 34.4% of them contain more brown candies than any other color.

Question: what is the 98% margin of error in your estimation, based on this sample, of the proportion of M&M's bags that contain more brown candies than any other color? Round your answer to four decimal places.

Answer

\(\pm 0.1107\)

Subsection 4.3.3 Confidence Interval

Confidence intervals for proportions are found in the same way as they were for means. We take the point estimate, which is \(\hat p\) in this case, and we add and subtract the margin of error we just saw. This formula is summarized below.

Theorem 4.3.14. Confidence Interval for a Proportion.

The \((1-\alpha)100%\) confidence interval for a population proportion \(p\) is given by:

\begin{equation*} \hat p \pm z_{\alpha/2}\times \sqrt{\frac{p q}{n}}\text{.} \end{equation*}

In the example below we make use of this formula to construct a confidence interval.

Example 4.3.15. Constructing a Confidence Interval for a Proportion.

In order to judge the popularity of a certain politician, a polling firm surveys \(1500\) registered voters and finds that 53.5% of them have a favorable opinion of the politician. Construct a 98% confidence interval for the proportion of registered voters who have a favorable opinion of the politician.

Solution

In this example we are given \(\hat p = 0.535\text{.}\) Since we are using the 98% confidence level, our critical value is \(z_{\alpha/2}= 2.33\text{.}\) Putting this together yields the formula:

\begin{equation*} 0.535 \pm 2.33\sqrt{\frac{(0.535)(0.465)}{1500}} = 0.535 \pm 0.0300\text{.} \end{equation*}

Therefore, we are 98% confident that the true proportion of registered voters who have a favorable opinion of the politician is in the range:

\begin{equation*} 0.505 \lt p \lt 0.565\text{.} \end{equation*}

Note that this result could also have been reported as a 53.5% favorable rating with a margin of error of plus or minus 3%.

We can also construct upper or lower confidence bounds for a population proportion.

Example 4.3.16. Constructing a One-Sided Confidence Interval.

A cereal company claims that at least 20% of its cereal boxes contain a special prize. You collect samples of \(500\) cereal boxes and find that \(83\) of them contain prizes. Based on this sample you suspect that less than 20% of boxes actually contain a prize. Can you be:

95% confident of this?
99% confident?

Solution

This question calls for an upper confidence bound on the proportion of cereal boxes that contain a prize. Using our sample data,

\begin{equation*} \hat p = \frac{83}{500} = 0.166\text{.} \end{equation*}

For a 95% upper confidence bound, \(z_\alpha = 1.645\text{.}\) Therefore, the upper confidence bound is:

\begin{equation*} 0.166 + 1.645\sqrt{\frac{(0.166)(0.834)}{500}} \approx 0.1934\text{.} \end{equation*}

Yes, we can be 95% certain that \(p \lt 0.20 \) because \(0.20\) is not in the one-sided confidence interval that lies below \(0.1934\text{.}\)
Now for a 99% upper confidence bound, \(z_\alpha = 2.33\text{.}\) Therefore, the upper confidence bound is:

\begin{equation*} 0.166 + 2.33\sqrt{\frac{(0.166)(0.834)}{500}} \approx 0.2048\text{.} \end{equation*}

In this case we can not be 99% confident that \(p \lt 0.20\) since 0.20 is below the upper confidence bound, and hence still in the one-sided confidence interval.

Figure 4.3.17. Confidence Interval for a Proportion I

Figure 4.3.18. Confidence Interval for a Proportion II

Figure 4.3.19. Confidence Interval for a Proportion III

Checkpoint 4.3.20.

You wish to estimate the proportion of men who know how to cook with a 99% confidence interval. To do this, you collect a random sample of 725 men and find that 433 of them know how to cook.

Question: what is the confidence interval for this problem? Round your confidence bounds to four decomal places.

Answer

\(0.5503 \lt p \lt 0.6435\)

Checkpoint 4.3.21.

In a sample of 100 bags of M&M's candy, you find that 34.4% of them contain more brown candies than any other color.

Question: what is the confidence interval for this problem? Round your confidence bounds to four decimal places.

Answer

\(0.2333 \lt p \lt 0.4547\)

Subsection 4.3.4 Sample Size

A polling organization needs to construct a 95% confidence interval. They want to be sure that their margin of error is no more than 4%. How many individuals do they need to sample to make this happen? This is just one example of a common situation. Before we even start to collect data, we need to know how big a sample we should plan on gathering. Selecting too large of a sample will be expensive and time consuming. On the other hand if we pick too small of a sample, we will get a large margin of error, and may not even be able to use the normal distribution methods from this section.

To find the “just right” sample size, we need to refer back to the margin of error formula. Suppose that we want to know how big \(n\) must be so that our margin of error is no more than some fixed amount \(E\text{.}\) Then, we need to solve the following inequality for \(n\text{.}\)

\begin{equation*} z_{\alpha/2}\times \sqrt{\frac{pq}{n}} \leq E\text{.} \end{equation*}

Squaring both sides yields the following.

\begin{equation*} \frac{(z_{\alpha/2})^2pq}{n} \leq E^2\text{.} \end{equation*}

Finally, multiplying both sides by \(n\) and dividing by \(E^2\) gives us the formula below.

\begin{equation*} n \geq \frac{(z_{\alpha/2})^2pq}{E^2}\text{.} \end{equation*}

The results are summarized in the following theorem.

Theorem 4.3.22. Sample Size when Estimating a Proportion.

To get a maximum margin of error of \(E\) at the \((1-\alpha)\) confidence level, we must take a sample of size \(n\) where:

\begin{equation*} n \geq \frac{(z_{\alpha/2})^2pq}{E^2}\text{.} \end{equation*}

Note that we usually don't know \(p\) and \(q\) before we take our sample. The population proportion \(p\) is, after all, what we are trying to estimate. To make up for this we can do one of two things. We must either have an estimate for \(p\) from a previous study, or we can assume the “worst case scenario” that \(p = q = 0.5\text{.}\) We call this the “worst case scenario” because these values for \(p\) and \(q\) will lead to the largest sample size. To see examples of both of these, consider the following.

Example 4.3.23. Finding the Sample Size for a Given Margin of Error.

You wish to estimate the proportion of fish in a given lake that are too small to be legally kept by fishermen with a 95% confidence interval. To minimize the cost of your sample, you want to know the minimum number of fish you need to catch so that your margin of error is no more than 2%. Find this value of \(n\) if:

a previous study showed that 60% of the fish were too small to be kept.
you have no additional information.

Solution

We use the two options mentioned above to find our sample sizes.

If we estimate \(p = 0.6\) based on the previous study, we get

\begin{equation*} n \geq \frac{(1.96)^2(0.60)(0.40)}{(0.02)^2} = 2304.96\text{.} \end{equation*}

Since we must sample a whole number of fish, we round this up to a sample size of \(n = 2305\) fish. Remember to always round sample sizes up!
If we have no additional information, we must assume that both \(p\) and \(q\) are \(0.5\text{.}\) This gives

\begin{equation*} n \geq \frac{(1.96)^2(0.5)(0.5)}{(0.02)^2} = 2401\text{.} \end{equation*}

Thus, we must sample \(n = 2401\) fish.

Note that if we use the “worse case scenario” estimate of \(p = q = 0.5\text{,}\) we get a bigger sample size, by almost 100, than we did using the information from a previous study.

Figure 4.3.24. Sample Size I

Figure 4.3.25. Sample Size II

Checkpoint 4.3.26.

In an effort to determine support for a certain ballot measure, sponsors of the ballot measure wish to construct a 99% confidence interval for the proportion of registered voters who plan to vote for the measure. They have no previous information on level of support for this measure, but they wish to get a margin of error that is no more than 4%.

Question: what is the smallest number of individuals that they can sample to meet these requirements?

Answer

1037

Checkpoint 4.3.27.

A grocery store chain wishes to determine what proportion of its customers use coupons. They will construct a 95% confidence interval for this proportion based on a sample of size \(n\text{,}\) and they want the margin of error to be no more than 3.5%. Studies at other grocery stores have indicated that approximately 30% of customers actually use coupons.

Question: what is the minimum number of customers that the store should survey to meet the above requirements?

Answer

659