This post is about visualising the calculation of the mean (aka the average).
An average is a useful way of summarising a set of values into a single value. Averages allow us to quickly get a summary of observed data, and also to give a simple prediction of what might happen in the future. Knowing the average mark in an exam allows you to see if your own performance is particularly good or bad compared to the rest of your class; knowing the average winnings for each ticket in a lottery can help you to decide whether it’s worth buying a ticket (hint: by this metric, it’s never worth it!).
There’s actually several different types of average, but in this post, we’ll focus on what everyone usually calls the average: the mean. It’s surprisingly hard to find a simple definition of what the mean is (try and find one onthe wikipedia page!), so here is my definition: if you sum up all the data and then spread it out equally again across the observations, each observation is the mean.
Let’s take the lottery example again. There might be 10 million tickets bought for a given draw, for 1 pound each. Most of them win nothing, some win small, a few win big: the total winnings is 8 million pounds. If you spread those winnings equally among all the tickets, you have 0.8 pounds (i.e. 80 pence) per ticket. So the mean winnings for a ticket is 80 pence.
My definition of the mean already explains the calculation: if you have, say, 4 values then you add up the 4 values, and divide the total by 4. More generally, if you haven values then you add up alln values, and divide them by n. You can write this as a mathematical formula (i.e.), but to me it’s an algorithm, which means we can express it as a program. Here’s some simple Java code for calculating the mean of a non-empty array of numbers:
double calculateMean(double[] numbers)
{
double total = 0;
for (double x : numbers)
{
total += x;
}
return (total / numbers.length);
}
Let’s make that more exciting and tangible by visualising the calculation in Greenfoot. My data values will be whole numbers (because that’s easier to visualise), which will be shown as stacks of discs in columns. So for example, here’s the four values 4, 1, 6 and 7:
I’ve actually made two different visualisations for calculating the average, which we’ll explore in turn.
For my first visualisation, I will have a little spaceship that does the average calculation. First, it will fly across the columns, gathering up the data values into one giant stack of discs (i.e. adding up all the values into one total value). Then, it will go across the columns, dropping the discs equally into each column (i.e. dividing the total value by the number of data values, which is the number of columns). What we have left is the average.Play with the scenario, or watch this animation of the scenario in action:
The way to read the final result is that the average is 4.5: each column has 4, and if we divide the 2 odd ones at the end over the four columns, that’s 2/4 which is 0.5, giving 4.5 per column. (The mean has the occasionally awkward property that even if all the original data values are whole numbers, the mean often isn’t, which is irritating for producing a nice visualisation!)
I think the previous animation is probably the most obvious one for visualising calculation of the mean, but here’s an alternative approach. Think of having a cake, covered with icing (US: frosting) on top. You want to calculate the average depth of the icing. The easiest way to do this, rather than try to lift off all the icing and divide the volume by the area, is just to smooth the icing until it’s perfectly even, and then measure the depth.
This exploits an interesting property of the mean: taking a disc from one of our towers and moving it to another does not affect the value of the mean (the total is still the same), so if you move the discs around until everything is even, each stack will be the mean of the original data. Thus, we can visualise the calculation of the mean by getting the discs on top of the tallest stacks to roll off onto the smallest stacks, until all the stacks are as even as they can be. With this visualisation, I have room for more stacks, so I’ll be starting with this wider data-set:
You can have a play with the visualisation, or watch the animated calculation of the mean for the above data-set here:
Because there are ten stacks of discs this time, reading off the average is quite easy: it’s the highest whole number that all the stacks reach (in this case: 6), and the first decimal place is the number of stacks left over (in this case: 8). So the average here is 6.8.