We’ve been flying planes much longer than we’ve been running systems in production, so it might be instructive to learn what we can from our fellow aviators..
Obviously safety is a concern with flying and one of the primary things you learn while becoming a pilot. Your life depends on it. Customer experience is what we are concerned with in system operations. Your life also depends on it. (Customer experience trumps almost everything for companies.)
A common theme for checklists (and runbooks) is that they document procedures that cannot be automated (yet). They all deal with our very human ability to forget steps sometimes..
Beyond that, the reasons for creating checklists are differentiated by “urgency” and “familiarity”.
- Pre-flight and post-flight checklists. You are on the ground (not in production), so urgency is low. These procedures are done often, so the checklist (runbooks) serve as reminders of familiar procedures. These are checks for “airworthiness” and prevent launching with known issues.
- Takeoffs, Landings (hopefully an equal number) and “Normal Flight (Operating) Procedures”. Done often, reflexes should be fine and familiar. Checklists serve as reminders so nothing is forgotten.
- Emergency (or outage) procedures. Hopefully, not required often, but practiced (usually in a simulator or with an instructor). Again, used to augment our fallible memory, especially under stress.
What Makes a Good Checklist (Runbook)?
Ideally, checklists and runbooks serve as reminders of procedures that you already know how to execute. No detail. Links to expanded procedures if you need them, but the expanded procedures are normally studied on the ground while not in production (or in the air).
Reasoning behind what is in the checklist, why, what order they should be executed in, what results should be expected; these should all be discussed and thought out while on the ground (pre-production). If operating under duress (e.g. engine outage, fire, critical production issue) there may not be time to reason on any of the steps. Execute.
Checklists and runbooks can serve both the familiar and, the not so familiar emergency situations. They should be brief reminders of procedures that should already be known, practiced or rehearsed. Not everything can be automated; when they can’t, create the list..
Test It !
Test the checklist or runbook with a colleague or two to verify that you are not assuming too much tacit knowledge and that others can successfully execute the procedures.