xaktly | Probability

Conditional probability

The basic flow of the probability pages goes like this:

Prior conditions

In this section we'll consider a new type of probability, the probability that one event will occur given that another has already occured, or conditional probability. We'll do this by example. The first will be a simple pictorial (Venn diagram) example, and the second a more practical one. The rest of this section will consist of examples in conditional probability.

Consider the Venn diagram below, consisting of the universal set Ω, and within it, overlapping sets A and B . These could be sets of events in a probability experiment.

There are 16 possible outcomes (black dots) in the whole space. Assuming an equal probability of each, the probability of any one outcome is 1/16. There are 7 elements in set A, so P(A) = 7/16. There are 10 elements in set B, so P(B) = 10/16 = 5/8. We also know the probability of finding an event both in sets A and B, P(A ∩ B) = 3/16.

Now we'd like to ask a different kind of question:

What is the probability of obtaining an outcome of set A after event B has already occured?

We use a special notation for this kind of probability, a conditional probability:


We read this as "the probability of A given that B has already occured," or sometimes "the probability of A given B."

Now if we assume that event B has already occured, then any probability outside of set B is impossible, thus in this diagram, it's been blotted out.

Now in this reduced universe of possibilities, the probability of event A occuring is 3/10 or 0.3. That is, P(A|B) = 0.3. So we see that

  • P(A) = 7/16 = 0.44
  • P(A|B) = 3/10 = 0.30

In this case, the prior constraint that event B has occured actually reduced the probability of event A. That's not always the case, though. Finally, notice that our probability P(A|B) was just the ratio of P(A ∩ B) to P(B). We'll have more to say about that in the next example.

Example: weather forecasting

Here's a more concrete example: Consider rainy days and weather forecasts. Let's say we know that, on average, about 30% of days last year there was both rain and a forecast of it. So we have

$$P(R \cap F) = 0.3,$$

where we let R be a rain event and F be a forecast of rain. Now imagine that about 40% of days we get a forecast of rain, whether it rains or not. We can represent this situation with a Venn diagram:

The red set represents days when a rain forecast is made, P(F) = 0.4. The yellow set contains the days that it actually rains, with probability P(R), which isn't known. Finally, the blue area, the set F ∩ R, contains the days when both occur together. Notice that we don't imply any order here: forecast before rain or rain before forecast. Recall that we usually refer to our universal set (gray) with the Greek letter omega (Ω).

Now let's consider the probability of rain occuring after a rain forecast has been made, P(R | F), which we read as "rain given a forecast.".

Here's the Venn diagram, the same as above, but we remove any rain that occurs without a forecast:

Now the probability P(R|F) is clearly just the ratio of the blue area, the set F ∩ R, and the set F. Algebraically, that's

$$P(R|F) = \frac{P(R \cap F)}{P(F)}$$

We have those proabilities, so we can calculate the result:

$$ \begin{align} P(R|F) &= \frac{P(R \cap F)}{P(F)} \\[5pt] &= \frac{0.3}{0.4} = \frac{3}{4} \\[5pt] &= 75 \% \end{align}$$

So if we have a forecast of rain, there is a 75% chance that it will actually rain — not bad for weather forecasting, actually.

Conditional probability

The probability that event A will occur after event B has already occured is written as P(A|B), and read "Probability of A given B."

$$P(A|B) = \frac{P(A \cap B)}{P(B)}$$

Example – simple die roll

Let's imagine that we roll a six-sided die twice, and we define two events. The first is that either a 1, 2, 3, or 4 is rolled. The second is that a 1, 2 or 3 is rolled. We can define these events using sets:

$$A = \{1, 2, 3, 4 \} \: \: \text{ and } \: \: B = \{1, 2, 3 \}$$

Now we can immediately calculate some elementary probabilities. These may or may not be helpful, but they're easy to do. First the probabilities of each of our events. Recall that there are only six possible outcomes of rolling a die, so we have

$$ \begin{align} P(A) = \frac{4}{6} = \frac{2}{3} \\[5pt] P(B) = \frac{3}{6} = \frac{1}{2}. \end{align}$$

The intersection of sets A and B is

$$(A \cap B) = \{1, 2, 3 \} \; \text{ and } \; P(A \cap B) = \frac{1}{2}.$$

Now let's calculate the probability of event B given that event A has already occured. This means that if we did roll a 5 or 6 on the first roll, our experiment would be over.

We'll find this conditional probability two ways:

1. Using reduced sample space

Once event A has occured, our new sample space is reduced from S = {1, 2, 3, 4, 5, 6} to {1, 2, 3, 4). Now event B is the set {1, 2, 3}, and the ratio of these is ¾, so our probability is

$$P(B|A) = \frac{3}{4}.$$

2. Using the definition

We've already got the necessary probabilities, so we have

$$ \begin{align} P(B|A) &= \frac{P(A \cap B)}{P(A)} \\[5pt] &= \frac{1/2}{2/3} = \frac{1}{2} \cdot \frac{3}{2} = \frac{3}{4} \end{align}$$

(Recall that division by a fraction is the same as multiplication by its reciprocal.)

We get the same result either way, as we should. The definition formula will always work, but sometimes it's useful to use a more intuitive way, like writing sets or drawing diagrams — more examples to follow.

Example: removal without replacement

Here's another example. Imagine that we have a jar containing two blue marbles and three red marbles (left).

First, consider the probability of choosing a red or a blue marble on one draw. There are two chances out of five to draw a blue and three out of five to draw a red, so we have

$$P(B) = \frac{2}{5} \phantom{00} \text{and} \phantom{00} P(R) = \frac{3}{5}$$

Now we can ask some interesting questions, like: What is the probability of drawing a blue marble after a blue has already been drawn? We have to make a distinction here, however, and say "without replacement," meaning that once a marble is drawn, we don't put it back.

An easy way to think of this problem is by simply writing down the sample space. In the case of the first draw, the space is

$$S_0 = \{B, B, R, R, R\},$$

and the probability of drawing a blue marble is ⅖. Once a blue marble is gone, we have a modified sample space:

$$S_1 = \{B, R, R, R\},$$

and now the probability of drawing a blue is just ¼, so we have

$$P(B|B) = \frac{1}{4}.$$

For such a simple system, drawing a complete tree of the possibilities can be easy and helpful. Take a look at this one. It shows two rounds of choosing marbles without replacement. The probability of each choice is shown on the branch of the tree.

From the tree we can read a number of useful probabilities. The probability of drawing a red and a blue marble in two turns is the same as for drawing two reds, P(B ∩ R) = P(R ∩ B) = P(R ∩ R) = 3/10. The probability of drawing two blue marbles is 1/10. We can also read the conditional probabilities: P(B|B) = ¼, P(R|B) = ¾, P(B|R) = ½ and P(R|R) = ½.

Finally, we can confirm that the algebraic definition of conditional probability meets our expressions by caclulating

$$ \begin{align} P(B|B) &= \frac{P(B \cap B)}{P(B)} \\[5pt] &= \frac{1/10}{2/5} = \frac{1}{10}\frac{5}{2} \\[5pt] &= \frac{5}{20} = \frac{1}{4} \end{align}$$

Sometimes it's easier to solve these conditional probability problems simply by looking at a tree diagram or writing sample sets (like S0) and modified sample sets (like S1).

Example – lottery ticket

Now let's consider the odds of winning a certain prize after purchasing a lottery ticket. We'll make up an example of a ticket that has four different payouts, with their odds or probabilities (those are generally listed on the back of the ticket).

Now let's define two events. The first is winning anything at all. In terms of payout, the set is A = {1, 10, 100}. The second event will be winning more than \$1. Maybe we want to see how often we'll win more than just the cost of another ticket, given that we win anything at all. That set can be B = {10, 100}.

The elementary probabilities, given the universal set Ω = {0, 1, 10, 100}, are

$$ \begin{align} P(A) &= 0.289 + 0.008 + 0.003 = 0.30 \\[5pt] P(B) &= 0.008 + 0.003 = 0.011 \\[5pt] P(A \cap B) &= 0.011, \end{align}$$

where the set (A ∩ B) = {10, 100}. Now we'll ask: What is the probability of winning more than $1 on a winning ticket?

Again, we'll do this in two ways, first by looking at the reduced sample spaces using our table, and second using the algebraic definition of conditional probability.

1. Using the table

The sample space is reduced from Ω = {0, 1, 10, 100} to A = {1, 10, 100}. Now we're trying to find the probability of a sample space of B = {10, 100} in that. The result is:

$$ \begin{align} P(B|A) &= \frac{0.008 + 0.003}{0.289 + 0.008 + 0.003} \\[5pt] &= \frac{0.011}{0.30} = 3.7 \% \end{align}$$

where the numerator and denominator of our starting fraction above are the sums of the probabilities of the elements of sets A and B, respectively.

2. By the definition

$$ \begin{align} P(B|A) &= \frac{P(A \cap B)}{P{A}} \\[5pt] &= \frac{0.011}{0.030} = 3.7 \%. \end{align}$$

So the odds of winning the \$100 prize are 0.3%, but given the condition that we don't have a losing ticket, they are improved to just under 4%.

Example – Two dice

Here's an example using two dice. We'll ask the conditional question, "What is the probability of rolling a total of 7 on two dice if the first roll lands on 1 ?" So if we let

$$ \begin{align} P(1) &= \text{probability of rolling 1} \\[5pt] P(7) &= \text{probability that sum is 7}, \end{align}$$

then we are looking for $P(7|1).$ Here are all possible outcomes (36 of them) for the sum of two independent, six-sided dice. Along the diagonal are all possible ways of getting a sum of 7, and on the top row are all possible ways of rolling two dice with the first showing a 1.

Without a constraint, there are six ways to roll a sum of 7, but with the constraint, there is only one, the combination 1, 6. So the conditional probability is ⅙.

The Venn diagram shows the scenario in a different way. The conditional probability is the probability of the intersection of the sets, the blue area, divided by the probability of rolling a 1 first, a 1-in-6 chance.

Now let's do the same calculation using the definition of conditional probability:

$$ \begin{align} P(7|1) &= \frac{P(7 \cap 1)}{P(1)} \\[5pt] &= \frac{1/36}{6/36} = \frac{1}{6} \end{align}$$

In the numerator, the intersection of the sets {total of 7} and {first roll = 1} has just one element of the 36 possible two-die rolls. The denominator is just the probability of rolling a sum of 7, 6 out of 36, or ⅙.

Example – color blindness

Let's imagine that in a given population, 5% of the men and 0.25% of the women are colorblind. Given this information, what is the probability that if we choose a colorblind person at random, they will be male? Assume that the population is half male ( ♂ ), half female ( ♀ ).

First, let's draw a tree of the possibilities:

We've used the symbols ♂ and ♀ for men and women, respectively; B will mean colorblind and !B = not colorblind. The intersections have been calculated. Now to calculate the conditional probabilities:

The conditional probability P(♂|B) is then

$$P(\unicode{9794} | B) = \frac{P(\unicode{9794} \cap B)}{P(B)}.$$

The numerator can be read straight off of our tree, but the denominator is trickier. Look at the tree and notice that the total probability of being colorblind is the sum of the intersections (♂ ∩ B) and (♀ ∩ B), so our conditional probability is:

$$ \begin{align} P(\unicode{9794} | B) &= \frac{P(\unicode{9794} \cap B)}{P(\unicode{9794} \cap B) + P(\unicode{9792} \cap B)} \\[5pt] &= \frac{0.025}{0.025 + 0.00124} = 95.27 \% \end{align}$$

So given the prevalence of colorblindness in men in this population, it's highly likely that if presented with a colorblind person, that person will be male. A similar calculation gives $P(\unicode{9792}|B) = 4.76 \% ,$ for a total probability of one, of all colorblind people being either male or female.

Notice that if our population were tilted in favor of males or females (like 60% ♀, 40% ♂), then our conditional probabilities would change because the intersections (see the tree diagram) would change accordingly.

Practice problems


Let's say we have three coins. One is a two-headed coin, another a fair coin (one side heads, the other tails), and the third is a biased coin, which comes up heads 75% of the time. When one of the three coins is selected at random and flipped, it shows heads. Calculate the probability that it was the two-headed coin.


For coins 1, 2 and 3, we have the following probabilities of elementary outcomes and intersections:

coin outcomes P(H) Intersections
1 {H, H} $P(H) = 1$ $P(1 \cap H) = \frac{1}{3}(1) = \frac{1}{3}$
2 {H, T} $P(H) = \frac{1}{2}$ $P(2 \cap H) = \frac{1}{3}\frac{1}{2} = \frac{1}{6}$
3 {H, T} $P(H) = \frac{3}{4}$ $P(3 \cap H) = \frac{1}{3}\frac{3}{4} = \frac{1}{4}$

Here we have taken P(1) = P(2) = P(3) = ⅓. That is, there's an equal probability of flipping any given coin. Now we want

$$P(3|H) = \frac{P(3 \cap H)}{P(H)}$$

We have the numerator in our table. The denominator is just the sum $P(1 \cap H) + P(2 \cap H) + P(3 \cap H).$ So our conditional probability is

$$ \begin{align} P(3|H) &= \frac{P(3 \cap H)}{P(H)} \\[5pt] &= \frac{1/4}{\frac{1}{3}+\frac{1}{6}+\frac{1}{4}} \\[5pt] &= \frac{1}{4} \cdot \frac{18}{54} = \frac{1}{3} \end{align}$$


A bag contains marbles, 5 white ones and 10 black ones. A fair, six-sided die is rolled and that number of marbles is randomly chosen from the bag (no peeking). (a) Calculate the probability that all of the selected marbles are white. (b) Calculate the conditional probability that the die landed on 3 if all the balls selected are white.


This tree shows the branches relevant to the problem. First a die is thrown. This event is obviously independent of the marble draws.

The probabilities of black marble draws aren't calculated for clarity.

If a 1 is thrown, then one marble is drawn, with a 1/3 chance of drawing a white marble and a 2/3 chance of drawing black. The total marble-drawing probability for a die roll of 1 is 1. This process is repeated for throws of 2-5, drawing marbles without replacement. Notice that because there are only five white marbles, the probability of drawing six whites is zero.

The total probability (regardless of die roll) of drawing all whites is the sum of all of these probabilities:

$$0.333 + 0.095 + 0.022\\[5pt] \phantom{0000}+ 0.004 + 0.0003 = 0.454$$

The conditional probability P(3 whites|3) can be read from the graph, it's 0.022 or 2.2%.


Let A, B and C be events, and we know that $P(A|C) = 0.05$ and $P(B|C) = 0.05.$ Which of these statements must be true?

  1. $P(A \cap (B|C)) = \left( \frac{1}{2} \right)^2$
  2. $P(!A \cap !C) \ge 0.90$
  3. $P(A \cup (B|C)) \le 0.05$
  4. $P(A \cup (B|!C)) \ge 1 - \left( \frac{1}{2} \right)^2$
  5. $P(A \cup (B|!C)) \ge 0.10$


Let A, C and D be events such that $C \cap D = \emptyset,$ and $P(A) = \frac{1}{4},$ $P(!A) = \frac{3}{4},$ $P(C|A) = \frac{1}{2},$ $P(C|!A) = \frac{3}{4},$ $P(D|A) = \frac{1}{4},$ $P(D|!A) = \frac{1}{8}.$ Calculate $P(C \cup D).$


First, let's recall that $P(C \cup D) = P(C) + P(D) - P(C \cap D).$ But because we have $C \cap D = \emptyset,$ we know that

$$P(C \cup D) = P(C) + P(D)$$

Now use the definition of conditional probability and the fact that $P(A)$ and $P(!A)$ are exhaustive to find:

$$ \begin{align} P(C) &= P(C \cap A) + P(C \cap !A) \\[5pt] &= P(C|A)P(A) + P(C|!A)P(!A) \\[5pt] &= \frac{1}{2}\left( \frac{1}{8} \right) + \frac{3}{4}\left( \frac{3}{4} \right) \\[5pt] &= \frac{1}{8} + \frac{9}{16} = \frac{11}{16} \end{align}$$


$$ \begin{align} P(D) &= P(D \cap A) + P(D \cap !A) \\[5pt] &= P(D|A)P(A) + P(D|!A)P(!A) \\[5pt] &= \frac{1}{4}\left( \frac{1}{4} \right) + \frac{1}{8}\left( \frac{3}{4} \right) \\[5pt] &= \frac{1}{16} + \frac{3}{32} = \frac{5}{32} \end{align}$$


$$ \begin{align} P(C) + P(D) &= \frac{11}{16} + \frac{5}{32} \\[5pt] &= \frac{27}{32} = 0.844 \end{align}$$

Are two events even related?

It's an important question, one answered in the next probability section: Independence of events. That should be your next stop. Think about it in terms of conditional probability: If P(A|B) = P(A), then doesn't this suggest that the occurence of event A has nothing to do with whether or not even B has occured?

Creative Commons License   optimized for firefox
xaktly.com by Dr. Jeff Cruzan is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. © 2012, Jeff Cruzan. All text and images on this website not specifically attributed to another source were created by me and I reserve all rights as to their use. Any opinions expressed on this website are entirely mine, and do not necessarily reflect the views of any of my employers. Please feel free to send any questions or comments to jeff.cruzan@verizon.net.