Category Archives: Category: Detecting Performance Issues

Averages Mostly Suck at Almost Everything..

..unless you’re dealing with baseball. When dealing with systems, many of us think “Average” is a measure of “Typical” or “Normal“. Many systems people will also use averages to look for “Abnormal“. However, average (or mean) doesn’t represent either “normal” or “abnormal” very well..

Deep Dive, EPL Dotplots

While working at RIM, I had the privilege of working with some brilliant engineers. During that time I developed a few of the techniques that I’ll be describing; the EPD (Event-Pair-Difference) graph described in my previous post and the EPL (Event-Pair-Latency) Dotplot are a few of them.

Shades of Grey

System failures are often not black and white, but shades of grey (gray?).. Detecting and alerting on “performance-challenged” system components are a lot more difficult than detecting black or white (catastrophic failures). The metrics used are usually of the “time vs. latency” or “time vs. event count” variety, often aggregated and, often by using averages.