Section 2.4 Conditional Probability and Independence
¶Relationships Between Events.
Consider the following problem, which appeared in Parade magazines Ask Marilyn column.
Example 2.4.1. The Monty Hall Problem.
Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what's behind the doors, opens another door, say No. 3, which has a goat. He then says to you, “Do you want to pick door No. 2?” Is it to your advantage to switch your choice? (Whitaker 1990)
What would you do? Think carefully and make up your mind before checking out the video which explains the solution.
It turns out that it is to your advantage to switch, as explained in the video below.
If you guessed wrong, don't feel bad. After the problem was published with a solution similar to that shown in the video, approximately 10,000 readers, including many with Ph.D.s wrote into the magazine claiming that the solution is wrong.
In this section, we will study how knowing extra information can change probabilities. This means that we will first need to understand how the probability of one event can depend on another event happening or not happening.
Objectives
After finishing this section you should be able to
-
describe the following terms:
Bayes' Theorem
conditional probability
conditional probability formula
general multiplication rule
independent
test of independence
-
accomplish the following tasks:
Describe what it means for events to be independent
Identify pairs of events which are independent or mutually exclusive
Compute conditional probabilities
Use the general multiplication rule for probabilities
Use tree diagrams to organize events and compute their probabilities
Use Bayes' Theorem to find conditional probabilities from a tree diagram
Subsection 2.4.1 Independence
¶As the Monty Hall problem shows, sometimes knowing something about one event can change the probability of another event. Let's restate the Monty Hall problem solution in a different way.
Example 2.4.3. Restating the Monty Hall Problem.
In the context of the Monty Hall problem, define the following events:
\(A\text{:}\) the car is behind door number one
\(B\text{:}\) the car is behind door number two
\(C\text{:}\) the car is behind door number three
Suppose you pick door number three, and the game show host reveals that behind door number two there is a goat. Restate the conclusion to the Monty Hall problem in terms of these events.
When you choose door three, you are hoping that \(P(C) = 1\text{.}\) That is, you are hoping that even \(C\) definitely happens. When the game show host reveals the goat behind door number two, he tells you that event \(B\) definitely did not happen. This changes the probability of event \(C\text{,}\) and as we saw in the video, you should switch to door number one.
When knowledge about one event happening or not happening changes the probability of another event happening, the events are called dependent. When knowledge about one event makes no difference in the probability of another event happening, the events are called independent.
Definition 2.4.4.
Two events, \(A\) and \(B\text{,}\) are called independent if the occurrence of \(A\) has no influence on the probability of \(B\text{.}\)
To better understand the idea of independence, let&s practice identifying dependent and independent events.
Example 2.4.5. Determining if Events are Independent or Dependent.
Determine if the pairs of events \(A\) and \(B\) described below are independent or dependent.
A coin is flipped three times and \(A =\) “the first flip is a heads” while \(B =\) “the last flip is a heads.”
A coin is flipped three times and \(A =\) “the first flip is a heads"” while \(B =\) “all flips are heads.”
An urn contains three marbles: 2 red and 1 white. Two marbles are drawn, with replacement. \(A =\) “the first marble is white” while \(B =\) “the second marble is white.”
An urn contains three marbles: 2 red and 1 white. Two marbles are drawn, without replacement. \(A =\) “the first marble is white” while \(B =\) “the second marble is white.”
Does knowing if the first flip was a heads or tails change the probability that the third flip will be a heads? Coins do not have memories. Anything we know about the first flip has no effect on what happens on the second flip, so \(A\) and \(B\) are independent.
This experiment is different. Knowing that the first flip came up a heads does have an effect on whether all three flips come up heads. After all, if \(A\) did not happen, then \(B\) can not possibly happen either. Therefore, \(A\) and \(B\) are dependent.
Since we are replacing the marbles after we draw them, the probability of getting a red on the first draw and on the second draw are the same: \(\frac{2}{3}\text{.}\) What happens on the first draw does not effect the probability of getting a red on the second draw. Because of this, \(A\) and \(B\) are independent.
In our final question, we are again drawing marbles, but this time what happens on the first draw does make a difference. Since we do not replace the marbles we draw, if \(A\) does not happen (we draw red), there are two marbles left: 1 red and 1 white. So \(P(B) = \frac{1}{2}\text{.}\) However, if \(A\) does happen (we draw a white), then there are two marbles left, both of which are red. So \(P(B) = 0\text{.}\) Since \(P(B)\) changes depending on whether \(A\) happens or not, \(A\) and \(B\) are dependent.
As we shall see moving forward in this section, independent and dependent events play an important role in understanding more complex, multi-step probability experiments. This concept also becomes important later in the course when we are taking “independent random samples” and comparing them.
Checkpoint 2.4.8.
An urn contains 6 marbles: 4 blue and 2 red. Two marbles are drawn at random, without replacement, and the resulting colors are noted. Then, a single die is thrown and the number is noted. An example of an outcome in this experiment would be R-B-3 (1st marble was red, 2nd was blue, then a 3 was rolled). Consider the following events.
\(A =\) the first marble drawn is red
\(B =\) both marbles drawn are red
\(C =\) the die roll is an even number
\(D =\) the sum of the number of red marbles drawn and the die roll is more than 2
\(E =\) the die rolls a 1
Question: Which pairs of events are independent?
A and C, A and E, B and C, B and E
Checkpoint 2.4.9.
Events in everyday life can be classified as dependent or independent using common sense and the notion of cause-and-effect. For example,
You forget to put gas in your car, and the next day it won't start.
You toss salt over your shoulder for luck, buy a lottery ticket, and win $20.
You stay up all night playing games and sleep through a meeting the next day.
You miss paying the electricity bill and your lights and other electrical appliances stop working.
You speed through a school zone on Monday and don't get a ticket. You do the same thing on Tuesday and are pulled over and ticketed.
Question: which of the pairs of events above are independent?
(b) and (e) are likely independent
Checkpoint 2.4.10.
You toss a fair coin four times, noting whether heads or tails comes up on each toss. The event \(A\) is that the first toss is a heads. Consider the following pairs of events:
The second toss is a heads
All tosses are heads
None of the tosses are heads
One of the last three tosses is a heads
An even number of the tosses are tails
Question: which of the events described above are independent from A?
(a) and (d) are independent from \(A\)
Subsection 2.4.2 Conditional Probability
¶As we saw on the last page, the probability of one event can change depending on the occurrence of another. Because these probabilities can be different, we need a way to denote the probability of one event given than another has occurred. This is called the conditional probability of the first event, given that the second event happens. The formal definition is given below.
Definition 2.4.11.
The conditional probability of \(A\) given \(B\) is the probability of event \(A\) given that event \(B\) has occurred. Symbolically, this is written as \(P(A|B)\text{.}\)
Many times we read \(P(A|B)\) as “the probability of \(A\) given \(B\text{.}\)” This simply means the probability of \(A\) with the extra information that \(B\) has happened. In Example 2.4.5 we saw several pairs \(A\) and \(B\) of events for which having the information that \(B\) happened changed the probability of \(A\text{.}\) These dependent events provide good examples for computing conditional probabilities.
Example 2.4.12. Computing Conditional Probabilities from Outcomes.
A coin is flipped three times. Events \(A\) and \(B\) are defined as follows.
\(A\text{:}\) the first flip is a heads
\(B\text{:}\) all three flips are heads
Find the sample space for this experiment, the outcomes in \(A\text{,}\) and the outcomes in \(B\text{.}\) Use this information to find \(P(A)\) and \(P(A|B)\text{.}\)
Since the experiment involves flipping a coin three times, a sample outcome would be \(HHT\text{,}\) representing the first flip coming up heads, the second heads, and the last tails. The set of all outcomes is therefore:
The first event of interest, \(A\text{,}\) consists of all of the outcomes above resulting in a heads on the first flip. This is:
We can therefore conclude that
But what about B? The event B is:
If we know that \(B\) has happened, then we automatically know that \(A\) has happened as well. This is because if all three flips are heads, the first one must have been a heads (\(B\) is a subset of \(A\)). Therefore,
Note that \(P(A|B)\) and \(P(A)\) are different! This is equivalent to saying that \(A\) and \(B\) are dependent.
One good tool for visualizing conditional probability is a Venn Diagram. In the two Venn Diagrams below, we see how the “extra” information that \(B\) has happened changes the probability of \(A\) by, in a sense, redefining the sample space.
To compute \(P(A)\text{,}\) we ignore \(B\) and take the ratio of the yellow area in \(A\text{,}\) to the area of the whole box. This is equivalent to saying that
When computing \(P(A|B)\text{,}\) we know \(B\) has happened, so we are in the purple area above. To figure out how likely \(A\) is, we take the ratio of the area in common to \(A\) and \(B\) to the area in \(B\text{.}\) This is equivalent to saying that
Theorem 2.4.15. Conditional Probability Formula.
If \(A\) and \(B\) are events in a sample space \(S\text{,}\) then the conditional probability of \(A\) given \(B\) is:
It is a good idea to have several different ways to think of a difficult concept such as conditional probability. Let's take a look at another example in which we use a contingency table to represent two different events and then to compute conditional probabilities.
Example 2.4.16. Computing Conditional Probabilities Using the Formula.
A hospital conducts a research study on 138 patients who have a history of serious headaches. Several of the patients are given an experimental headache drug, while others are given ordinary Tylenol. Patients are then checked after 15 minutes to see if they are still suffering from headaches. The resulting data is summarized in the table below.
Experimental Drug | Tylenol | |
Headach Persists | 12 | 40 |
Headache Gone | 53 | 33 |
Suppose one of these people is randomly selected. Use the given table to answer the following questions.
Find \(P(\text{H. Gone})\text{,}\) the probability the person's headache is gone.
Find \(P(\text{H. Gone}\,|\,\text{Tylenol})\text{,}\) the probability the person's headache is gone given that they took Tylenol.
Are the events “H. Gone” and “Tylenol” independent? Explain.
Our first step is to find the marginal distributions for this contingency table. With those added, the table becomes:
Experimental Drug | Tylenol | Total | |
Headach Persists | 12 | 40 | 52 |
Headache Gone | 53 | 33 | 86 |
Total | 65 | 73 | 138 |
Using this table, we compute the following probabilities.
\(P(\text{H. Gone}) = \frac{86}{138} \approx 0.6232\) since 86 of the 138 peoples' headaches were gone after 15 minutes.
-
To find \(P(\text{H. Gone}\,|\,\text{Tylenol})\text{,}\) we use the conditional probability formula.
\begin{equation*} P(\text{H. Gone}\,|\,\text{Tylenol}) = \frac{P(\text{H. Gone}\,\cap\,\text{Tylenol})}{P(\text{Tylenol})} = \frac{\frac{33}{138}}{\frac{73}{138}} = \frac{33}{73} \approx 0.4521\text{.} \end{equation*} Since the probability of the headache being gone changed when we were told the person took Tylenol, these two events are not independent—they are dependent.
Note: When we have a contingency table, there is a short-cut to finding conditional probability. Since we are told in (b) that the person took Tylenol, we restrict ourselves to the Tylenol column. Then, the probability that their headache is gone is simply \(\frac{33}{73} \approx 0.4521\text{,}\) as above.
Checkpoint 2.4.21.
A large bowl of candy contains 120 colored M&M's in plain and peanut varieties. The contingency table for these M&Ms is shown below. You randomly select one M&M from this bowl.
Brown | Green | Yellow | Red | Blue | |
Plain | 13 | 17 | 12 | 19 | 7 |
Peanut | 9 | 12 | 21 | 8 | 2 |
Question: what is the probability that the M&M you select is red given that it is a plain M&M?
0.2794
Checkpoint 2.4.23.
A large bowl of candy contains 120 colored M&M's in plain and peanut varieties. The contingency table for these M&Ms is shown below. You randomly select one M&M from this bowl.
Brown | Green | Yellow | Red | Blue | |
Plain | 13 | 17 | 12 | 19 | 7 |
Peanut | 9 | 12 | 21 | 8 | 2 |
Question: what is the probability that the M&M you select is peanut given that it is yellow?
0.6364
Checkpoint 2.4.25.
Events \(A\) and \(B\) in a sample space have probabilities \(P(A) = 0.45\) and \(P(B) = 0.36\text{.}\) The probability of their union is \(P(A\cup B) = 0.69\text{.}\)
Question: what is \(P(A|B)\text{?}\)
0.3333
Subsection 2.4.3 General Multiplication Rule
¶In Subsection 2.3.3, we saw a rule for finding \(P(A\cup B)\text{,}\) which we called the general addition rule 2.3.24. We have not, however, seen a general rule for finding \(P(A\cap B)\text{,}\) the probability that both \(A\) and \(B\) occur. A little bit of algebra with the conditional probability formula produces just such a rule.
Theorem 2.4.26. General Multiplication Rule.
If \(A\) and \(B\) are events in a sample space \(S\text{,}\) then
This is called the general multiplication rule because it works in general—for any \(A\) and \(B\text{.}\)
Example 2.4.27. Using the General Multiplication Rule.
An urn contains 10 marbles: 7 red and 3 white. Two marbles are drawn randomly, one after the other, without replacement. Find the probabilities that:
The second marble is red given that the first marble was red.
The first and second marbles are both red.
Since we are told that the first marble was red, when drawing the second marble we have 9 marbles remaining, 6 of which are red. So, the probability the second marble is red is \(\frac{6}{9} = \frac{2}{3}\text{.}\)
-
The probability that the first marble is red will be \(\frac{7}{10}\) since seven of the 10 marbles are red. In (a), we saw that the probability the second marble is red given that the first was red is \(\frac{2}{3}\text{.}\) Using the general multiplication rule, the probability that the first and second marbles are red is
\begin{equation*} P(\text{1st }R) \times P(\text{2nd }R\,|\,\text{1st }R) = \frac{7}{10} \times \frac{2}{3} = \frac{14}{30} = \frac{7}{15}\text{.} \end{equation*}
In the special case where \(A\) and \(B\) are independent, the conditional probability for \(B\) given \(A\) is the same as the probability for \(B\text{.}\) That is, \(P(B|A) = P(B)\text{.}\) The general multiplication rule then becomes:
This is in fact one of the best ways to test two events and see if they are independent.
Theorem 2.4.28. Test of Independence.
Events \(A\) and \(B\) in a sample space \(S\) are independent if and only if \(P(A\cap B) = P(A) P(B)\text{.}\)
Let's try this new test for independence out on an a few examples.
Example 2.4.29. Testing for Independence.
Two events \(A\) and \(B\) in a sample space \(S\) have probabilities \(P(A) = 0.36\) and \(P(B) = 0.25\text{.}\) The probability of their union is \(0.52\text{.}\) Are \(A\) and \(B\) independent?
We first use the general addition rule to find \(P(A\cap B)\text{.}\)
Next, we check using our test of independence.
Since the above equation is true, \(A\) and \(B\) are in fact independent events.
Example 2.4.30. Using Independence to Compute Probabilities.
A coin is weighted so that the probability of a heads is \(\frac{1}{3}\) while the probability of a tails is \(\frac{2}{3}\text{.}\) This coin is flipped four times. Find the probability that:
all four flips are tails.
three of the four flips are tails.
Because flipping a coin multiple times gives independent events, (remember the coin has no memory, so one flip can not affect another), we can use the multiplication rule.
-
The probability all four flips are tails is computed as follows.
\begin{align*} P(\text{1st T}\cap\text{2nd T}\cap\text{3rd T}\cap\text{4thT}) \amp = P(\text{1st T})P(\text{2nd T})P(\text{3rd T})P(\text{4th T})\\ \amp = \frac{2}{3}\times\frac{2}{3}\times\frac{2}{3}\times\frac{2}{3}\\ \amp = \frac{16}{81} \approx 0.1975. \end{align*} -
This event is simular, but one of the four flips must be a heads instead of a tails. So, we split this into the four different cases that give us one tails and add the results of the multiplication rule together, giving:
\begin{align*} \amp \left(\frac{1}{3}\times\frac{2}{3}\times\frac{2}{3}\times\frac{2}{3}\right) + \left(\frac{2}{3}\times\frac{1}{3}\times\frac{2}{3}\times\frac{2}{3}\right) + \\ \amp \left(\frac{2}{3}\times\frac{2}{3}\times\frac{1}{3}\times\frac{2}{3}\right) + \left(\frac{2}{3}\times\frac{2}{3}\times\frac{2}{3}\times\frac{1}{3}\right) = \frac{32}{81} \approx 0.3951. \end{align*}
Checkpoint 2.4.33.
An urn contains 10 balls: 5 red, 3 blue, and 2 white. Two balls are drawn randomly, without replacement, and the colors are noted.
Question: what is the probability that the first ball is red and the second is white?
0.1111
Checkpoint 2.4.34.
A large bowl of candy contains 120 colored M&M's in plain and peanut varieties. The contingency table for these M&Ms is shown below. You randomly select two M&Ms from this bowl.
Brown | Green | Yellow | Red | Blue | |
Plain | 13 | 17 | 12 | 19 | 7 |
Peanut | 9 | 12 | 21 | 8 | 2 |
Question: what is the probability that the first is a plain M&M and the second is a peanut M&M?
0.2476
Checkpoint 2.4.36.
A die is weighted so that \(P(H) = \frac{1}{3}\) and \(P(T) = \frac{2}{3}\text{.}\) This die is tossed three times. Note that each toss is independent of the previous tosses.
Question: what is the probability that the first two tosses are heads and the last toss is a tails?
0.0741
Subsection 2.4.4 Tree Diagrams and Probability
¶A tree diagram has been useful in the past when working with multi-stage experiments. Since the multiplication rule naturally fits experiments that can be broken up into stages (first \(A\) happens, then \(B\text{,}\) etc.) it makes sense to look into using a tree diagram again. Consider the following example.
Example 2.4.37. Computing Probabilities with a Tree Diagram.
An urn contains six marbles: 3 blue marbles, 2 red marbles, and 1 white marble. Two marbles are drawn, one after the other, without replacement and their colors are noted. Draw a tree diagram and label the branches with appropriate probabilities for this experiment. Use the tree diagram to find
\(P(\text{2nd }W\,|\,\text{1st }R)\)
\(P(\text{2nd }W)\)
In the tree diagram below, the labels represent the color of marble drawn. The fractions on the branches represent the probability of drawing that color marble at that stage in the experiment.
Now using this tree diagram, we find the desired probabilities.
Note that \(P(\text{2nd }W\,|\,\text{1st }R)\) can be read right off the tree. Since we are given that the first marble was red, we go across that middle \(R\) branch. Now, the probability the second is white is the probability on the \(W\) branch, so \(\frac{1}{5}\text{.}\)
-
There are two ways the 2nd marble can be white. If the 1st is red or if the 1st is blue. These are distinct (mutually exclusive) paths through the tree. The probability of each is found by multiplying down the branches. Together, their probability is the sum of these two products, giving us:
\begin{align*} P(\text{2nd }W) \amp = P(\text{1st }R\cap \text{2nd }W) + P(\text{1st }B\cap \text{2nd }W)\\ \amp = \left(\frac{2}{6}\times \frac{1}{5}\right) + \left(\frac{3}{6}\times \frac{1}{5}\right)\\ \amp = \frac{2}{30} + \frac{3}{30}\\ \amp = \frac{1}{6}\text{.} \end{align*}
Note that the probabilities on the branches of such a tree diagram are conditional probabilities. In the example above, the \(\frac{2}{5}\) on the bottom branch from \(W\) to \(R\) is actually \(P(\text{2nd }R|\text{1st }W)\) because it connects the event “1st \(W\)” to the event “2nd \(R\)”. The general multiplication rule is then used to find the probability of any path through the tree.
Probability both red: \(\frac{2}{6}\times \frac{1}{5} = \frac{2}{30}\)
Probability the first is red and the second blue: \(\frac{2}{6}\times \frac{3}{5} = \frac{6}{30}\)
Etc...
Finally, as in the second question in our example, we can find the probability of events such as “the second is red” by adding together all the different probabilities for paths in which the second marble is red.
Checkpoint 2.4.42.
The following tree diagram depicts the outcomes in a two-step experiment. Use it to answer the question below.
Question: what is the probability of outcome \(X\text{?}\)
0.4500
Checkpoint 2.4.44.
The following tree diagram depicts the outcomes in a two-step experiment. Use it to answer the question below.
Question: what is the probability of outcome Y given that outcome A has occurred?
0.60
Checkpoint 2.4.46.
The following tree diagram depicts the outcomes in a two-step experiment. Use it to answer the question below.
Question: what is the probability of outcome \(B\) and then \(Y\) occurring?
0.10
Subsection 2.4.5 Bayesian Probability
¶Tree diagrams can be very useful for dealing with conditional probabilities, as long as the conditions are in the correct order. In a multi-stage experiment, the tree diagram does a good job of showing the probability of a particular outcome on the second step, given that a certain outcome on the first step occurred. What if, however, we want to look at things in the other order? Consider the following example.
Example 2.4.48. Computing “Backwards” Probabilities.
Recall from Example 2.4.37 that the tree diagram below gives us the probabilities for an experiment in which two marbles are drawn, without replacement, from an urn containing 3 blue marbles, 2 red marbles, and 1 white marble. Use the tree to help find the probability that the 1st marble was white given that the second is red.
We wish to find \(P(\text{1st }W\,|\,\text{2nd }R)\text{.}\) Unfortunately, our tree diagram is “backwards” for this computation. The probabilities on the branches are the probability of a particular color on the 2nd marble given that we got a certain color on the first. So, we must resort to the conditional probability formula.
The numerator of this fraction can be found by multiplying along a single branch in the tree—\(\frac{1}{6} \times \frac{2}{5} = \frac{2}{30}\text{.}\) To get the denominator, we need to add together the probabilities for all possible ways the last marble can be red. This is:
Therefore, the desired conditional probability is:
These “backwards” conditional probability questions make use of a formula created by a British mathematician and Presbyterian minister, Thomas Bayes. Bayes' theorem, as it was named after his death, is given below.
Theorem 2.4.50. Bayes' Theorem.
If a sample space can be split into mutually exclusive events \(A_1\text{,}\) \(A_2\text{,}\) \(\ldots\text{,}\) \(A_n\) and \(E\) is an event in \(S\) with \(P(E) > 0\text{,}\) then:
While this theorem formally states the rule we use to solve problems such as the one above, it is not something you should memorize. Instead, use a tree diagram and the conditional probability formula to solve these problems as we did in Example 2.4.48.
Checkpoint 2.4.54.
The following tree diagram depicts the outcomes in a two-step experiment. Use it to answer the question below.
Question: what is the probability A given that \(X\) occurred?
0.6667
Checkpoint 2.4.56.
An urn contains 8 marbles numbered 1-8. The marbles numbered 1-6 are red, and marbles 7 and 8 are blue. Two marbles are drawn randomly, without replacement.
Question: what is the probability the first marble was blue, given that the second is red?
0.2857
Checkpoint 2.4.57.
A math professor uses one of two overhead projectors — \(A\) or \(B\text{.}\) Projector \(A\) breaks down 20% of the time, while projector \(B\) breaks down 40% of the time. On Monday, the professor chose a projector at random, and it broke down.
Question: what is the probability that it was projector \(A\text{?}\)
0.3333