The Tail at Scale Revisited

My last article discussed some of the missing math related to setting back-end objectives. This article presents a chart which is useful in understanding the relationship to the user experience and we examine ways to dramatically improve the overall performance. Continue reading “The Tail at Scale Revisited”

Averages Mostly Suck at Almost Everything..

..unless you’re dealing with baseball. When dealing with systems, many of us think “Average” is a measure of “Typical” or “Normal“. Many systems people will also use averages to look for “Abnormal“. However, average (or mean) doesn’t represent either “normal” or “abnormal” very well.. Continue reading “Averages Mostly Suck at Almost Everything..”

WSMeter: Performance Evaluation for Warehouse-Scale Computers

Many of us have dealt with making changes in production environments, possibly against hundreds or thousands of systems and, we’d like to know how the change impacted performance. It was with this in mind that I eagerly read through the paper describing WSMeter.  Continue reading “WSMeter: Performance Evaluation for Warehouse-Scale Computers”

Shades of Grey

System failures are often not black and white, but shades of grey (gray?)..

Detecting and alerting on “performance-challenged” system components are a lot more difficult than detecting black or white (catastrophic failures). The metrics used are usually of the “time vs. latency” or “time vs. event count” variety, often aggregated and, often by using averages. All of these tend to obscure what we are looking for and have a very low “signal to noise ratio“.

Continue reading “Shades of Grey”