The existence of gambling for many centuries is evidence of long-running interest in probability. But a good understanding of probability transcends mere gambling. The mathematics of probability are very important for understanding all kinds of important topics.
In this section we will consider probability for discrete random variables.
Discrete in this sense means that a variable can take on one of only a few specific values. A good example is a coin. When laying flat, only one side can possibly be showing at a time. Another is a die (singular of dice), which can show numbers 1-6 only, and only one of those at a time. In the section on continuous probability we'll consider continuous random variables, but we're not there yet.
Our universe is driven mostly by random events, so it's very important to understand randomness and the probability of any event occurring in such a universe. Here are a few examples of where you might need to understand probability, but there are many, many others.
Any experimental measurement, no matter how carefully performed, is affected by random errors or “noise.” Shown at right is data from a high resolution far-infrared spectroscopy experiment. The "peak" represents absorption of a very small amount of far-infrared light by the C_{3} molecule. You can easily see the noise (roughness) in the signal. Random errors follow the laws of probability, which form the basis of how we estimate the effect of those errors on our results.
For example, we might measure a length and report it as 3.45 ± 0.03 meters, where the 0.03 is a measure of the “average” random error present in the measurement. Just how we estimate that average error comes from a study of probability.
Source: Giesen et al., Astrophys. J., 551:L181-L184 (2001)
Whether a chemical reaction takes place depends on a number of factors, like whether reactants collide (necessary for a reaction to occur), with what kinetic energy they collide, and in what orientation they collide (see the illustration). Because in any ensemble (group) of reacting molecules there will be a wide and randomly-occurring range of speeds, paths and orientations, these processes are best understood using the laws of probability.
There is a whole field in physics/chemistry called statistical mechanics, based on probability theory, that derives the laws of thermodynamics from a study of the behavior of large ensembles of atoms & molecules.
The laws of probability are crucial in the medical sciences. Among other things, they are important for developing effective tests for diseases and in testing for the presence of drugs and other substances.
In testing the effectiveness of drugs, researchers must carefully employ the laws of probability and statistics. There are legendary cases of reliance upon a drug to treat some disease which later was proven to be completely ineffectual by careful probability-based analysis.
Much of the work of public health professionals is backed up by a solid knowledge of probability to prove or disprove cause-and-effect relationships.
Image: Wikipedia Commons
A discrete random variable is one that can only take on one of a set of specific values at a time.
Discrete events are those with a finite number of outcomes, e.g. tossing dice or coins. For example, when we flip a coin, there are only two possible outcomes: heads or tails. When we roll a six-sided die, we can only obtain one of six possible outcomes, 1, 2, 3, 4, 5, or 6. Discrete probabilities are simpler to understand than continuous probabilities, so that's where we'll begin.
Let's look at flipping a coin first. The probability, we'll call it P, of obtaining an outcome (heads or tails) is 1 chance in 2, or 1:2, or just ½.
The possible elementary outcomes of our experiment (coin flipping) form a set {H, T}. If we call P the probability function and H and T the two possible outcomes, then P(H) = ½, and P(T) = ½. When we flip a coin, we have to get either H or T, so the total probability is 1. Here, of course, we need to say that we're ruling out the unlikely event that the coin will land in such a way that it sticks on its edge. When we flip a coin, we make the reasonable assumption that there are only two possible outcomes, and the one we get can only be one of those, or ½ of the total.
So the probability of obtaining either outcome H or outcome T from our experiment (flipping the coin) can be written:
P(H) + P(T) = 1
In other words, the sum of all possible discrete outcomes is one. Note that this is only true when outcomes H and T are mutually exclusive, i.e. when they can't occur at the same time. The story would be different if we could get heads and tails at the same time. (We disregard the very unlikely event that the coin lands on its edge.)
The sum of all possible discrete outcomes of a probability experiment is one.
We can write down two important rules of probability now ( → ).
The first says that or an outcome of a probability experiment to be defined, it must have a finite positive probability. If the probability is zero, it can never happen and we don't have to worry about it, and negative probabilities don't make any sense.
If you haven't seen summation notation before, rule (2) translates like this: the sum of all n outcomes (from one to n, labeled with the index i) is one. It means that for a particular experiment, like flipping a coin, the sum of the probabilities of all outcomes (½ for heads, ½ for tails) must equal 1. It's another way of saying that something has to happen.
Now let's take a look at something more complicated, rolling dice ...
Here are the 36 possible outcomes of rolling two distinguishable (one white, one black) dice.
The probability of rolling a three, for example, is just the number of ways of forming a total of three divided by the total number of possibilities (36). For distinguishable dice, we can roll a (1, 2) and a (2, 1), where the bold number corresponds to the white die. The probability of rolling a 3 is then 2/36 = 1/18. It's the same calculation for any other roll. Notice that there are six ways of rolling a 7 for a probability of 6/16 = 1/6, making it the most probable result.
If we insist that in rolling a 7, we roll a 1 on the white die and a 6 on the black, then the probability of that roll is 1/36. If we want a (6, 1) combination but we don't care which die comes up 1 or 6, the probability is 2/36 = 1/18. This is the case with indistinguishable dice (e.g. both white or both black).
Determine the probability of the following outcomes of rolling two distinguishable dice. Roll over the question to see its answer.
Continuing with the distinguishable dice from above, let's now define some more complicated events. An even will be something that can happen in our probability experiment. It can be an elementary outcome,
such as rolling a 3 with two dice, or something more complicated, such as the probability of rolling a 3 OR a 4. Here are a few examples of how we might define a few events for two dice:
Event | Description | Elementary outcomes | Probability |
---|---|---|---|
A | Dice add to 3 | (1,2),(2,1) | 2/36 = 1/18 |
B | Dice add to 6 | (1,5),(5,1),(2,4),(4,2),(3,3) | 5/36 |
C | White die = 1 | (1,1),(1,2),(1,3),(1,4),(1,5),(1,6) | 6/36 = 1/6 |
D | No 4 on either die | 11,12,13,15,16, ... 63,65,66 | 25/36 |
It's very often an advantage to combine two events. In the table below we'll combine some of the events (A, B, C, D) from the table above, and introduce some new notation while we're at it. This notation will be particularly helpful a bit later on in our study of discrete probability and Bayesian probability.
∩ | AND |
∩ | OR |
! | NOT |
Combination of events | Definition | Notation |
---|---|---|
A and B | A and B both occur | A ∩ B |
A or B | Either A or B occurs | A ∪ B |
not A | Event A does not occur | !A |
A and not C | A occurs and C does not | A ∩ !C |
not B or not C | Either B does not occur or C does not | !B ∪ !C |
An event either occurs or it does not. The sum of the probabilities of an event occurring and not occurring is 1:
What is the total probability of rolling a 1 on either die, rolling two dice at a time?
First, we'll establish some events. We'll let event A be rolling a 1 on the white die, and event B be rolling a 1 on the dark die.
The diagram above shows all 36 possible combinations of rolling two distinguishable dice. In red along the left side are the six ways that a 1 can be rolled on the white die, and across the top are the ways that 1 can be rolled on the dark die.
Now the probability of event A is P(A) = 6/36 = 1/6. Likewise, the probability of event B is P(B) = 6/36 = 1/6.
Now it would seem that the probability of rolling a 1 on either the white or dark die would be the sum of these
two probabilities because by adding the two events with the word "or," we're increasing our chance of success.
But we have to be careful not to over count the 1,1 outcome. We can't double count it as both an outcome of event A and event B, so the total number of ways of success is 11, not 12, and the total probability is 11/36, not 1/3.
Symbolically we write the probability of either event A or event B occurring as P(A ∪ B), and
where P(A ∩ B) is the probability that both events happen together. This takes care of our overlap.
The probability, P(A ∪ B), that either event A or event B occurs is
where P(A) and p(B) are the probabilities of events A and B, and P(A ∩ B) is the probability that both A and B occur together.
Let event A be rolling a sum of 3 on two dice, and let event B be rolling a sum of 5 on two dice. Calculate the probability of rolling a sum of either a 3 or a 5. The possibilities are illustrated below. The ways of rolling 3 and 5 are highlighted in blue and red, respectively.
In this situation, P(A) = 2/36 = 1/18, and P(B) = 4/36 = 1/9. Our experience up to this point suggest (correctly) that P(A ∪ B) = P(A) + P(B) - P(A ∩ B). But P(A ∩ B) is the probability that both A and B occur together. In this case, the figure shows that they can clearly not occur together.
Either we roll a sum of 3 or we roll a sum of 5. We can't do both. These events are called mutually exclusive, and P(A ∩ B) = 0.
So the total probability is
Notice that P(A ∩ B) = 0, reflecting the fact that A and B cannot occur simultaneously.
Events A and B are mutually exclusive if it is the case that, if A occurs, B cannot possibly occur, and if B occurs, A cannot possibly occur.
This Venn diagram is a nice illustration of how combining event probabilities and mutual exclusion works.
Rolling two dice: Let event A be that the sum of the dice is even, event B be that at least one of the dice shows a 6, and event C when the values of the two dice are equal. In the figure, A is highlighted in yellow, and B in blue. Note that some outcomes share yellow and blue. And C outcomes are outlined in red. We'll take a look at a variety of combinations of these events and their probabilities in the table below the figure.
The table below summarizes the probabilities of several event combinations. Let's step throught logic of each in these paragraphs:
P(A) is the probability that the sum of the dice is even. There are 18 such outcomes highlighted in yellow above. 18/36 = ½, so half of all rolls of two dice, on average, will yield an even number. Notice that we could call the probability of rolling an odd number P(!A) = ½, and because the two events are mutually exclusive, P(A) + P(!A) = 1.
P(B) is the probability that at least one die is a 6. For that we can simply count the number of rolls that contain one or two 6's (including 6,6) to get 11/36. We have to remember not to double count the 6,6 roll.
P(C) is the probability that each of the two dice show the same number. Those six outcomes are along the diagonal of the figure and P(C) = ⅙.
P(A ∩ B) is the probability that the sum of two dice is even and that at least one of them is a 6.
For that probability, simply count outcomes that share blue and yellow color in the figure; there are five such rolls: (26), (46), (66), (62) and (64). Remember that (46) and (64) are different rolls because we're using distinguishable dice. The probability is 5/36.
P(B ∩ C) is the probability that at least one die shows a 6 and that the two values are equal. In this case, there's only one such outcome, (66), so the probability is 1/36.
P(A ∪ B) is the probability that either the sum of two dice is even or that at least one die shows a 6. "OR" probabilities like this are calculated by summing the probabilities of each event occuring alone and subtracting the probability of each occuring together: P(A ∪ B) = P(A) + P(B) - P(A ∩ B), so the probability is ½ + 11/36 - 5/36 = ⅔.
Finally, P(B ∪ C) = P(B) + P(C) - P(B ∩ C) = 11/36 + 1/6 - 1/36 = 4/9. There is a 4-in-9 chance of rolling at least one six or having both dice show the same number.
The results are tabulated below.
Event | Expression | Probability |
---|---|---|
P(A) | P(A) | 18/36 = ½ |
P(B) | P(B) | 11/36 |
P(C) | P(C) | 6/36 = ⅙ |
P(A ∩ B) | P(A ∩ B) | 5/36 |
P(B ∩ C) | P(A ∩ B) | 1/36 |
P(A ∪ B) | P(A) + P(B) - P(A ∩ B) | ½ + 11/36 - 5/36 = 24/36 = ⅔ |
P(B ∪ C) | P(B) + P(C) - P(B ∩ C) | 11/36 + 6/36 - 1/36 = 16/36 = 4/9 |
Term | Definition |
---|---|
Random | without any order or definite aim – a list of random numbers will have no discernable pattern. True randomness is often difficult to achieve. |
Discrete random variable | |
Experiment | |
Elementary outcome | |
Probability | |
And ( ∩ ) | |
Or ( ∪ ) | |
Mutually exclusive |
Conditional probability is the probability that a second event will occur, provided that a first event has already occurred. Think about this experiment: What is the probability that the sum of two dice will be three?
We can run this experiment in two ways:
Method 1—throw distinguishable dice at the same time, and let A = {12} and B = {21}. P(A ∪ B) will be the sum of all of the probabilities of getting a sum of three.
Because A and B are mutually exclusive, P(A ∪ B) = P(A) + P(B) = 1/18.
Method 2—throw one die first, and suppose it comes up 1. Now what is P(A ∪ B)?
Now P(A ∪ B) is ⅙. This may seem trivial right now, but hang on ... it will become very important.
We will use this notation to denote conditional probabilities:
We define conditional probability that event B occurs after A as the probability that A and B occur together divided by the probability that event A occurs at all (right).
This definition reduces to what we call the multiplication rule of probability.
Think about this for a bit and convince yourself that it gets P(A|A) = 1 right, and also predicts that P(B|A) = 0 if A and B are mutually exclusive events.
Events may or may not be independent. We might want to know whether the occurrence of one event affects the occurrence of another. Two events, A and C, are independent if the occurrence of one does not affect on the probability of occurrence of the other. That means
P(A) = P(A|C) or P(C) = P(C|A).
Here are two examples of a two-dice experiment to illustrate how we can check for independence:
Bayes' rule is the root of so-called Bayesian probability. It's really just a rearrangement of the multiplication rule we developed above. If we know that P(A ∩ B) = P(B ∩ A), then we have
P(A ∩ B) = P(A|B)P(B) = P(B|A)P(A).
The result is a valuable link between P(A|B) and P(B|A) called Bayes' rule:
Bayes' rule links conditional probabilities that go in "opposite directions". We will make a lot of use of it as we go on. To see how Bayes' rule can clarify problems in probability, take a look at the example below and its solution.
Suppose that a drug test is 99% sensitive (i.e. it will correctly identify a drug user 99% of the time) and 99% specific (i.e. it will correctly identify a nonuser as testing negative 99% of the time). This seems like a pretty reliable test. Assume a test group in which only 0.5% of members are drug users. Find the probability that, given a positive test, the subject is actually a drug user.
In this problem, we have a drug test that correctly identifies a user 99% of the time. That's a conditional probability: If a person uses (D), s/he will be caught 99% of the time, so we have P(+|D) = 0.99.
Likewise, we are given that if a person does not use (!D), then the probability of getting a negative test is also 99%. That translates to the conditional probability expression P(-|!D) = 0.99. Here again ar the conditional probabilities we know:
Finally, we assume that of all people, 0.5% are drug users, so that's P(D) = 0.005. Notice that that fact also gives us P(!D) = 0.995. That is,
Now what we are asking in this problem is: If a person gets a positive test, what is the probability that s/he is actually a user. If you think about it, that's really the most important question about such a test. We don't want to go around making a lot of mistaken accusations. The conditional probability we're looking for is P(D|+), and it's defined like this:
We already know the numerator because we organized ourselves at the beginning. The tricky part is the denominator: What is the probability of getting a positive test, P(+), at all ? Well, that probability is the probability that all users get positive tests plus the probability that all nonusers get positive tests, like this:
We don't know what's on the right side of that equal sign until we expand those "and" expressions with Bayes' rule:
So now we have P(+) and now it's possible to step back to the P(D|+) expression to calculate
so that's a pretty remarkable result. Even though this test seems very accurate – 99% accurate at identifying users and non-users, the probability that someone who receives a positive test result is actually a drug user is only about 1/3! There are two reasons for this:
(1) We don't get to know ahead of time who is a user and who is a non-user, so the 99% accuracies don't really help us there, and
(2) Only 0.5% of all people are actually users, so any number of false-positive tests can make a big difference in our overall accuracy. The 1% of all people left over from 99% accuracy numbers can be large compared to 0.5% of the population.
You can download a .pdf copy of this solution here:
Finally, it's often useful to write out a scenario like this in a tree diagram that shows all of the probabilities. You might even find that solving these problems is easier if you just write out the whole tree. Here's the tree for this scenario:
Sometimes it is helpful to arrange a scenario like the one in that last problem in a tree diagram like this. If every member of some population can fall into category A or !A (e.g. has a disease or doesn't), and we let + and !+ be positive and negative tests for the drug, then we can, in principle, calculate the probabilities of each branch of the tree. At each step, the probabilities should sum to one.
This site is a one-person operation. If you can manage it, I'd appreciate anything you could give to help defray the cost of keeping up the domain name, server, search service and my time.
xaktly.com by Dr. Jeff Cruzan is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. © 2012, Jeff Cruzan. All text and images on this website not specifically attributed to another source were created by me and I reserve all rights as to their use. Any opinions expressed on this website are entirely mine, and do not necessarily reflect the views of any of my employers. Please feel free to send any questions or comments to jeff.cruzan@verizon.net.