The new average is
Now think about that. Just because a very rich person walked in the room does not mean that your income increased by more than 700% !
So sometimes we need a better way to describe the situation of most people (or the state of most data) than the average.
One of the simplest ways to achieve this goal is just to think about the data point "in the middle." That's what median means—in the middle.
The middle in this case means the middle of a sorted list of all data points. In the case of our example, that would be
Now the median income is $100,000, and that's makes a lot more sense. The introduction of a much higher income really shouldn't have a disproportionate effect on how all of the others are interpreted.
The median value of a dataset is the value in the middle of a sorted list. If there is an odd number of data points, it's exactly the value in the middle. If there is an even number, the median is the average of the middle two.
In general, if the sample size get's large enough, the mean and median will be roughly the same, and they'll get closer as the number of data points gets larger.
So in a sense, the validity of the size of a sample can be judged by how close the median and mean are. If they're roughly the same, you've probably got enough data from which to draw some valid conclusions. The only problem is that mean and median can be equal accidentally, so later we'll develop some other ways to judge the quality of a data set.
If a dataset is large enough and meet the normal expectations, the mean and median will be roughly equal.
To find the mode, it's best to put the data in numerical order, then like data points can be calculated. Here's an example:
Notice that the mode is not necessarily unique. If there were one more value of 8 in our dataset, for example, we'd have two modes, at 5 and 8. We refer to such a dataset as bimodal (in this case) or multimodal (in the general case where there could be many modes.
The mode is another number that converges to the mean and median if a distribution of data is large enough and predictable enough.
I've referred to "distributions" of data and random fluctuations that are "predictable" and "large enough" a few times in this section. I'll try to tell you just a little about that below.
The mode of a dataset is the most frequently-occuring member of the set. There can be more than one mode. If there is one mode, the data is described as mono-modal. If there are two it's "bimodal", and so on.
xaktly.com by Dr. Jeff Cruzan is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License. © 2012, Jeff Cruzan. All text and images on this website not specifically attributed to another source were created by me and I reserve all rights as to their use. Any opinions expressed on this website are entirely mine, and do not necessarily reflect the views of any of my employers. Please feel free to send any questions or comments to firstname.lastname@example.org.