SPOFs and Parial Panel | bill duncan's blog

In both aviation and systems we build in redundancies wherever practical to avoid unpleasantness when components or subsystems fail.

In aviation, redundancies can include things like two spark plugs per cylinder, magnetos for each set, two sets of fuel lines. Sometimes there are even two engines and two pilots!

In systems work we’ll have things like RAID, multiple availability zones, redundancies at the application and storage layers etc.

As systems engineers, we look for whatever practical SPOFs (single points of failure) that we can eliminate. While it often means doubling up on components, it can even be an “N plus 1” configuration. I’ve seen systems with three power supplies that can tolerate a failure in one of them.

Yet I’ve sometimes heard that companies will rely on only one monitoring solution, often a service. What happens when they go down or start lying to you?

Pilots are often trained in Cessna airplanes which have instrumentation that relies on two different underlying technologies; some rely on a vacuum pump system while others are electric.* Cross checking one set of systems with the other can alert you to a failure. For example, if the attitude indicator shows 10 degrees pitch down, yet the altimeter and vertical speed indicator show no descent, the attitude indicator may be lying to you.

I would suggest that in systems work, we need more than one way to monitor systems as well. You don’t want to be flying blind during an incident.

* Actually, there’s also the magnetic compass which relies on neither, and used to cross check, correct and replace the heading indicator if it fails.