Realtime Component Request Deficit

Looking for help naming (and finding other uses for) a novel technique in detecting grey failures.  Possible use cases are discussed here: load balancing, finding saturation points, alerting.. [ed. Decided on the name “Saturation Factor“.]

CI/CD and Optimization

When we talk of CI/CD we’re often referring to Continuous Integration and Delivery while Optimization refers to Services/Systems. What I’d like to discuss is Constant Improvement/Continuous Development and Self-Optimization..

The DevOps Alternative

In a previous article, “There’s Always a Problem”, I described situations that can arise with the “Engineering vs. Operations” old way. The new way is a DevOps culture..

Don’t Aggregate, Consolidate!

In previous posts, I’ve emphasized that averages are particularly bad at characterizing most things that you might be looking for. However, storing aggregated data of any type can limit your ability to analyze data later.

Bitrot, Part 2

This article has a link to a simple script I’ve used for over a decade to detect corrupted files. It will detect and report on files that have changed, been added, deleted or possibly moved within the same directory structure.

Bitrot, Part 1

Your systems have drives set up in RAID configurations and besides, you have data copied to redundant systems and backups, right? Safe? Maybe not. I recently found corruption in a quarter of a million files that had not previously been detected, for years!

(Ab)use of the R Language

For years I’ve done most of my log scraping and analysis with the usual suspects; bash, sed, awk, perl even. The log scraping still uses those tools, but lately I’ve been toying around with “R” for the analysis.

Averages Mostly Suck at Almost Everything..

..unless you’re dealing with baseball. When dealing with systems, many of us think “Average” is a measure of “Typical” or “Normal“. Many systems people will also use averages to look for “Abnormal“. However, average (or mean) doesn’t represent either “normal” or “abnormal” very well..

Friday the 13th One Liner

Just for fun, how many combinations of months are there where Friday falls on the 13th?  This one-liner will print out a table of month combinations along with the years for a given range.

Deep Dive, EPL Dotplots

While working at RIM, I had the privilege of working with some brilliant engineers. During that time I developed a few of the techniques that I’ll be describing; the EPD (Event-Pair-Difference) graph described in my previous post and the EPL (Event-Pair-Latency) Dotplot are a few of them.

Shades of Grey

System failures are often not black and white, but shades of grey (gray?).. Detecting and alerting on “performance-challenged” system components are a lot more difficult than detecting black or white (catastrophic failures). The metrics used are usually of the “time vs. latency” or “time vs. event count” variety, often aggregated and, often by using averages.