Looking for help naming (and finding other uses for) a novel technique in detecting grey failures. Possible use cases are discussed here: load balancing, finding saturation points, alerting.. [ed. Decided on the name “Saturation Factor“.]
Do you have insatiable curiosity and are driven by a relentless pursuit of the truth? You might make a great problem solver, but be careful how you deal with your findings!
If you’ve been around systems long enough, you know that opportunity for performance gains goes up dramatically, the further up the stack you look..
For years I’ve done most of my log scraping and analysis with the usual suspects; bash, sed, awk, perl even. The log scraping still uses those tools, but lately I’ve been toying around with “R” for the analysis.
..unless you’re dealing with baseball. When dealing with systems, many of us think “Average” is a measure of “Typical” or “Normal“. Many systems people will also use averages to look for “Abnormal“. However, average (or mean) doesn’t represent either “normal” or “abnormal” very well..
Many of us have dealt with making changes in production environments, possibly against hundreds or thousands of systems and, we’d like to know how the change impacted performance. It was with this in mind that I eagerly read through the paper describing WSMeter.
System failures are often not black and white, but shades of grey (gray?).. Detecting and alerting on “performance-challenged” system components are a lot more difficult than detecting black or white (catastrophic failures). The metrics used are usually of the “time vs. latency” or “time vs. event count” variety, often aggregated and, often by using averages. …