This article demonstrates a quick and easy approximation for the probability formulae which I described in two previous articles. Continue reading “The Tail at Scale Approximation”
My last article discussed some of the missing math related to setting back-end objectives. This article presents a chart which is useful in understanding the relationship to the user experience and we examine ways to dramatically improve the overall performance. Continue reading “The Tail at Scale Revisited”
The landmark “Tail at Scale” article was missing some of the math. We’re diving into it a bit here to show how the math can be used in setting objectives for latency budgets in back end systems. Continue reading “The Tail at Scale”
The current state of confusion around what a “Site Reliability Engineer” (SRE) role is..
Continue reading “What is SRE?”
BPF is one of the Swiss Army Knife tools for Performance Engineering on Linux. Continue reading “BPF Performance Tools”
Many companies in the logging/monitoring space will try to sell you on AI and ML (Artificial Intelligence and Machine Learning) to find abnormal. Continue reading “Event Logs and A.I.”
I’ve worked with event logs for, well, decades. There are quite a few companies that offer services for managing logs and, afaik, only a few doing it right. Continue reading “Event Logs and K.I.S.S.”
In both aviation and systems we build in redundancies wherever practical to avoid unpleasantness when components or subsystems fail. Continue reading “SPOFs and Partial Panel”
Up in the air, your eyes can’t be everywhere, all the time. You’re trained to scan the skies for “traffic” (other flying machines) as well as scanning instrumentation in the cockpit. Continue reading “Traffic At 2 O’clock!”
We were heading back from the practice area to the airport. I didn’t have my pilot license yet and my instructor says: “Push the throttle to Rental Speed!”. Continue reading “Own It !!”
Monitoring the SRE Golden Signals, an excellent overview by Steve Mushero.. Continue reading “Reading Week #4”
Here are some interesting reads if you’re fortunate in having some extra time off this Holiday Season.. Continue reading “Reading Week #3”
First of all, Merry Christmas if you celebrate it, Happy Holidays if you don’t! This week’s interesting read is about a subject I love.. Continue reading “Reading Week #2”
I’d like to start a new series of articles based on interesting articles to read for the week.. Continue reading “Reading Week #1”
We’ve been flying planes much longer than we’ve been running systems in production, so it might be instructive to learn what we can from our fellow aviators.. Continue reading “Checklists and Runbooks”
This is a limited time deal on 15 O’Reilly books for $15. Go. Buy. Right. Now! Continue reading “A Steal on O’Reilly DevOps/SRE Books!”
I was lucky to catch most of the SRE track and Keynote speakers with the All Day DevOps event this year. Fortunately, if you missed it or want to watch some of the other tracks, the videos have been made available.
We want to learn things from any idea, test, change, upgrade or (heaven forbid) outage in production..
This book just arrived this morning and I’m just through the chapter on building SRE teams. Continue reading “Seeking SRE”
Some thoughts about handling critical system issues at scale.. Continue reading “Don’t Panic!”
We called our albino squirrel in the backyard, “Snowflake”..
As an SRE, I’m very fortunate to have had training as a pilot. There are many similarities to system operations.. Continue reading “Operations in the Cloud”
Not just “Must Have”, but “Must Read!”. A new book has been released and is available, free to download for a short time. Continue reading “Must Have Books.. Another One!”