Most events that depend on random chance ultimately tend to conform to a normal or Gaussian distribution. For example, if we measured the heights of all 25 year-old males in America, we'd come up with an average, and all other heights would be normally distributed around them, like this:

Think about it: If we measure just two heights and plot them, that's just two points and our "distribution" of heights won't look anything like that purple curve. But as we measure more and more 25 year-olds, right up to the roughly 2-3 million 25 year-old males in the U.S., that distribution will get smoother and smoother, ultimately looking just like a normal distribution. Furthermore, our distribution will look pretty smooth because with that many data points, we're going to see very small differences between one measured height and the next, say like 6'-0" and 6'+1/64". The distribution looks pretty **continuous**.

In the example below, we can see the same type of behavior in a **discrete probability** problem, the sum of two dice. If we toss two dice, we obtain one of 11 different sums between 2 and 12. There's only *one* way to get a 2 (1 and 1), but *six* ways to roll a 7 (16, 61, 52, 25, 43, 34). There are 36 different arrangements of the numbers of two dice if the order matters – say one is red and one is green. That means that the probability of rolling a 2 is 1/36, while the probability of rolling a 7 is 6/36 or 1/6.

In the graph below, the yellow circles show those exact probabilities. The gray curves are results for trials of two-dice throws of 100, 200 and 1000 throws, respectively, and the red curve represents 10,000 throws. Notice the considerable fluctuations from the expected values, but that when we do a very large number of trials (10,000), we get very close to the expected values.

- If we take a large enough random sample from a bigger distribution, the mean of the sample will be the same as the mean of the distribution.

- The same is true of the standard deviation (
**σ**) of the mean. It will be equal to the**σ**of the larger distribution.

- A distribution of sample means, provided we take enough of them, will be distributed normally.

Below we'll go through each of these consequences in turn using the following set of data:

Take a binary situation: a population can choose one option ("yes") or the other ("no"). Let's say that 50% choose A and 50% choose B. I generated 10,000 members of the population using a spreadsheet, where for each member of the population, 1 means "yes" and 0 means "no." The following examples come from considerations of that data set.

**1. If we take a large enough random sample from a bigger distribution, the mean of the sample will be the same as the mean of the distribution.
**

In this little experiment, we'll draw some small and large samples from our list of 10,000 yes/no decisions. Remember, the average of the data is 50% "yes." The table on the right shows the average number of "yes" answers (1's) in variously-sized random samples taken from it, from 100 samples to 5000.

It's not surprising that chosing 5000 out of the 10,000 samples yields average "yes" responses of near 50%. The averages are only a little worse for 2500 or 1000 data-point samples. After all, that's still quite a bit of data. Even 100 point means are pretty close to the mean of the "parent" data set, particularly if we draw ten 100 point samples and average our results.

While this doesn't prove our assertion, it is a pretty good example of it at work.

**2. If we take a large enough random sample from a bigger distribution, the standard deviation of the sample will be the same as the standard deviation of the distribution. **

In this little experiment, we'll draw some small and large samples from our list of 10,000 yes/no decisions. Remember, the average of the data is 50% "yes." The table on the right shows the average number of "yes" answers (1's) in variously-sized random samples taken from it, from 100 samples to 5000.

It's not surprising that chosing 5000 out of the 10,000 samples yields average "yes" responses of near 50%. The averages are only a little worse for 2500 or 1000 data-point samples. After all, that's still quite a bit of data. Even 100 point means are pretty close to the mean of the "parent" data set, particularly if we draw ten 100 point samples and average our results.

While this doesn't prove our assertion, it is a pretty good example of it at work.

**3. A set of sample means, if we take enough of them, will be distributed normally. **

In this little experiment, we'll draw some small and large samples from our list of 10,000 yes/no decisions. Remember, the average of the data is 50% "yes." The table on the right shows the average number of "yes" answers (1's) in variously-sized random samples taken from it, from 100 samples to 5000.

It's not surprising that chosing 5000 out of the 10,000 samples yields average "yes" responses of near 50%. The averages are only a little worse for 2500 or 1000 data-point samples. After all, that's still quite a bit of data. Even 100 point means are pretty close to the mean of the "parent" data set, particularly if we draw ten 100 point samples and average our results.

While this doesn't prove our assertion, it is a pretty good example of it at work.

**xaktly.com** by Dr. Jeff Cruzan is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. © 2012, Jeff Cruzan. All text and images on this website not specifically attributed to another source were created by me and I reserve all rights as to their use. Any opinions expressed on this website are entirely mine, and do not necessarily reflect the views of any of my employers. Please feel free to send any questions or comments to jeff.cruzan@verizon.net.