Reading Week #4 | bill duncan's blog

Monitoring the SRE Golden Signals, an excellent overview by Steve Mushero..

Steve Mushero gives an awesome overview of what metrics to monitor and how, for various services. He describes the overlap and combination of the “USE” and “RED” methods with an overview of why?, what to do with them and some of the challenges.

Be sure to check out some of the links to other articles, especially from Baron Schwartz and Brendan Gregg..

Monitoring and Observability with USE and RED
Why Percentiles Don’t Work the Way you Think
Anomaly Detection for Monitoring
The Essential Guide to Queueing Theory
Practical Scalability Analysis with the Universal Scalability Law
The USE Method

Steve, Baron and Brendan all agree that saturation is important in understanding where the bottlenecks are. The suggestions are that “queue length” and sometimes “concurrency” should be used to indirectly measure saturation. Often these metrics are difficult to get and unreliable however.

The Saturation Factor I’ve written about may be a more direct method in many cases; often easier to calculate, less noisy and more accurate.

Shades of Grey
Realtime Component Request Deficit