NOTAM for SREs

In aviation, NOTAMs are “Notices to Airmen” for conditions that are generally temporary and hence not information published in the usual places.

NOTAMs[1] are used for alerting aircraft pilots of potential hazards along a flight route or at a location that could affect the safety of the flight. They normally contain information concerning conditions or deviations in any aeronautical facility, service, procedure or hazard, that are temporary.

While we like to think of our systems and environments as cookie cutter, I’ve never seen that. They each have their quirks, idiosyncrasies, temporary deviations..

As SRE’s, we often deal with many environments with many other team members. A many-to-many condition making it difficult to communicate effectively across the team about temporary conditions in each environment. It would involve “too much information” for members of the team who aren’t currently involved in the environments being described.

So how do you communicate the information effectively to the people who need it, when they need it?

  • Team meeting? No. Most people likely won’t need the information at that time and what about other teams who might need it?
  • Slack or chat channel? It will often end up here, but is difficult to find when you need it.
  • JIRA or other ticketing system? Possibly, but again, difficult to search for when you need it.
  • SILOs or environment owners? I’ve seen this effectively used at one company, but it’s only effective under certain conditions and have written about it here:  Own It!!
  • Permanent documentation about each environment is not a bad place to put it, but this is often scattered among many documents, not just one. For example, there might be a document describing the network, another for the hardware, another for administering the environment, etc. So, where in that bundle would you put temporary information that would (a) be easily found and (b) updated when the temporary condition no longer exists?

Unfortunately, the information often ends up in any of the above places, or perhaps even nowhere. So, if you trip over something like, “Why are these nodes cordoned?”, it can take valuable time in finding the information you need.

The best place, IMO, is a single place in whatever documenting system that you use, for just this purpose. Temporary information. It should describe any deviations from the standard permanent documents about an environment. Anything abnormal, currently under investigation, upcoming maintenance links, performance issues, etc. In short, temporary conditions that someone doing operations, diagnosing issues, or doing planning might want to know. Quickly.

 


[1] Wikipedia NOTAM

2 thoughts on “NOTAM for SREs”

Comments are closed.