Averages Mostly Suck at Almost Everything

..unless you’re dealing with baseball. When dealing with systems, many of us think “Average” is a measure of “Typical” or “Normal“. Many systems people will also use averages to look for “Abnormal“. However, average (or mean) doesn’t represent either “normal” or “abnormal” very well..

Looking for “Normal”

Engineering and the business unit might like to compare a newly installed upgrade to the previous version. Or, perhaps they’d like to look at long term trends. Should they use averages? They often do..

Using a simple hypothetical example; let’s say a database query takes ten seconds for four executions, and a minute for the fifth execution (not unheard of). The average would be 20 seconds, but “typical” would be better represented by the median, or 50th percentile; 10 seconds.

Long term trends are also better represented by medians, as they are not influenced by outliers as much as averages are. When looking at trends, you are not interested in outliers as much.

Lastly, when discussing typical with clients, medians are generally a more attractive number to use; lower. Again, this is because averages are influenced by outliers more and search performance is “unbounded”. With medians, you can say half the queries are less than the median. (If the customer is a “glass half empty person”, they’ll of course respond that half of them are more than the median.)

Looking for “Abnormal”

Averages also tend to “smooth out” the abnormal, hiding the very thing that you may be looking for when looking for “abnormal”. Yet I see people using it as the most common form of aggregating data.

Why Do We Use It?

It’s cheap and easy and we often need to aggregate lots of data into bins. Sum the latencies, divide by the count. Nothing could be simpler. Unfortunately, whether you’re looking for “normal” or “abnormal”, it’s usually not the best way to characterize the data.

Percentiles used to be harder to calculate; systems were slower and languages/libraries more primitive. There’s no excuse today really. If you’re looking for “normal”, use medians. If you’re looking for “abnormal”, use other percentiles, like 90% or 95% to find those elusive datapoints and fix those outliers! Previous articles on “Event Pair Difference Graphs” or “Event Pair Latency Dotplots” may also help in finding and diagnosing those issues.

In my opinion, average/mean is at best a mediocre measure of anything. Except in baseball. Maybe.

Characterizing system performance by any single number is usually not a great idea anyway and, I’ll have more on this subject in a series of articles being prepared; (Ab)use of the “R” Language.