What’s a postmortem, and what makes it blameless?
Many engineers are familiar with the concept of a postmortem. Postmortems are a tool used to examine mistakes engineers make resulting in a severe problem, such as an outage or data loss.
In these situations, the most common alternative to the postmortem is scapegoating the person who made the mistake and punishing them.
The logic behind punishing people in these situations is unclear to me. It feels to me like it’s so management can feel good about saying they did something to address the issue.
This approach is short-sighted though. Where human error is possible, it’s inevitable given enough time. Punitive action assumes that people are only trying to avoid mistakes to avoid getting punished, not because they want to do a good job. If the former were the case, maybe the company’s hiring process is the problem?
In contrast, blameless postmortems focus on the system or process surrounding a human mistake rather than the mistake itself. Barring rare cases of malice or gross negligence, the engineer was applying their skills to their problem in a way that made the most sense to them at the time.
For example, say an engineer at your company deleted a huge chunk of data. Nightly backups were running but no one had ever tested them, so the data loss was catastrophic. Here are some questions that might come out of the postmortem process:
- Were there any guardrails on the engineer being able to delete a massive amount of production data?
- Why was the engineer doing any work on the production database directly? Why wasn’t there a UI where they could get that work done in a less dangerous way?
- Why weren’t backups routinely tested?
- Was the engineer aware of the inherent danger of what they were doing?
The ultimate goal of the postmortem is to produce a list of corrective actions. These actions should prevent similar mistakes from happening in the future for anyone in the errant engineer’s position.
The principles behind blameless postmortems
While companies’ implementations of postmortems are often far from perfect, I’m completely aligned with the principles behind them. I often wonder why other disciplines don’t adopt similar principles.
In engineering, the postmortem-worthy problems might look like outages, data loss, and data breaches. In other disciplines, there are lost sales, bad hires, and PR disasters. Many of the reasons the postmortem is such an effective tool have nothing to do with engineering.
Transparency
To me, the most valuable element of postmortems by far is the level of honesty required to conduct them. Everyone involved is allowed to be 100% forthcoming about their roles knowing their jobs are safe. Without the honesty piece, engineers (and everyone really) would default to an approach of hiding mistakes, and teams would inevitably repeat them.
Grace
Blameless postmortems emphasize that you can make mistakes and still be competent. We’re all just doing the best we can with the information we have. The goal of the process is continuous improvement, not perfection.
Thoroughness
Many postmortems ask a chain of five whys in order to get to the root cause of a mistake. This root cause analysis is critical to treating the underlying issue instead of the issue’s symptoms.
A postmortem for my behavior
I’ve become fascinated by the theory behind human behavior and motivation in recent years. My interest started with Daniel Kahneman’s terrific book Thinking Fast and Slow, which discusses the competing priorities in two different parts of your mind.
This was a startling revelation — I’m not a single coherent self. I’m just a bundle of competing priorities with the illusion of being in total control of my actions.
Armed with this realization, I’m able to step back from my problem behavior and see it for what it is: a mishmash of conflicting emotions and goals rather than a “self” deciding what I’ll do. In other words, there are broken systems informing my behavior. As the postmortem process has taught me, broken systems can and should be repaired.
An example: analyzing my sleep habits
I’ll use my constant battle to get to sleep at a reasonable hour as an example. Since becoming a parent, 6:30 AM is the latest I can wake up and be on track for my family’s morning routine. However, I routinely find myself sleeping near or after midnight and feeling guilty about myself when I eventually go to sleep, then again when I wake up feeling like a zombie.
Guilt has been a bad solution. It’s roughly the same concept as a company reprimanding someone for making a mistake, and it’s equally ineffective. I thought about how to reconcile the inconsistency between how I treat problems at work vs. problems in my personal life.
When I apply the postmortem to my life, it’s more of a mindset shift than a full process. I don’t produce any written documents or go through the issue to the level of depth I would at work, but the exercise is helpful nonetheless.
First, I admitted this was an issue and that I wasn’t going to solve through willpower alone. Jury’s very much out on whether it’s possible to increase willpower as a long-term solution, or whether it even exists at all.
Next, I committed to stop being annoyed with myself. The negative feelings weren’t actually changing my behavior, so they weren’t worth keeping around. This is easier said than done, but I’m able to ignore them more easily after noticing how unproductive they are.
Lastly, I examined the emotions surrounding my decisions that lead to late bedtimes. I noticed a strong sense of urgency around finishing up some personal projects (like this blog) or having some time to decompress. This led me to two sub-problems:
- I have a 7-month old and the time after he goes to sleep is the only real free time I have. Apparently this mindset is common enough that it has a name: revenge bedtime procrastination.
- I lose track of time, possibly because I’m sleep-deprived from the previous night
- I’m overcommitted on projects and want to get too much done in any given day.
Each of these problems has its own set of potential solutions. For now, I’m working on the first issue by using a gratitude journal to cultivate my sense of thankfulness for having a healthy child that I get to spend a lot of time with. For the second, the solution involves changing my environment by setting reminders and alarms. I haven’t quite figured out the issue yet, but I’ll get there!
I’ve experienced a surprising amount of success with my strategies so far. In particular, the connection between gratitude and my late bedtime would have been totally opaque to me if not for this exercise.
Stop feeling bad, start solving the problem
If you’re anything like me, your bad habits impose a great deal of guilt on you. Rather than unproductively trying to change yourself through guilt or blame, give this process a try. It works for many tech companies. Maybe it can work for you too.