BPF is one of the Swiss Army Knife tools for Performance Engineering on Linux. Continue reading “BPF Performance Tools”
Category: Diagnosing Issues
Event Logs and A.I.
Many companies in the logging/monitoring space will try to sell you on AI and ML (Artificial Intelligence and Machine Learning) to find abnormal. Continue reading “Event Logs and A.I.”
Event Logs and K.I.S.S.
I’ve worked with event logs for, well, decades. There are quite a few companies that offer services for managing logs and, afaik, only a few doing it right. Continue reading “Event Logs and K.I.S.S.”
One Thing At A Time..
We want to learn things from any idea, test, change, upgrade or (heaven forbid) outage in production..
Don’t Panic!
Some thoughts about handling critical system issues at scale.. Continue reading “Don’t Panic!”
There’s Always a Problem
Do you have insatiable curiosity and are driven by a relentless pursuit of the truth? You might make a great problem solver, but be careful how you deal with your findings! Continue reading “There’s Always a Problem”
Look Up the Stack!
If you’ve been around systems long enough, you know that opportunity for performance gains goes up dramatically, the further up the stack you look.. Continue reading “Look Up the Stack!”
Averages Mostly Suck at Almost Everything..
..unless you’re dealing with baseball. When dealing with systems, many of us think “Average” is a measure of “Typical” or “Normal“. Many systems people will also use averages to look for “Abnormal“. However, average (or mean) doesn’t represent either “normal” or “abnormal” very well.. Continue reading “Averages Mostly Suck at Almost Everything..”
Deep Dive, EPL Dotplots
While working at RIM, I had the privilege of working with some brilliant engineers. During that time I developed a few of the techniques that I’ll be describing; the EPD (Event-Pair-Difference) graph described in my previous post and the EPL (Event-Pair-Latency) Dotplot are a few of them. Continue reading “Deep Dive, EPL Dotplots”