In this episode, Matt Stratton discusses incident response communication, leading by example, the way we should be handling postmortems, and telling the hero’s story vs the story of the people.
If you like Greater Than Code, you should check out The Transatlantic Cable Podcast from Kaspersky Lab. They look at cybersecurity issues that affect everybody, and also make sure the podcast fits into your busy day, by keeping them to 20 minutes or less. Check it out and subscribe wherever you get your podcasts.
Matt Stratton: @mattstratton
Matty Stratton is a HumanOps Advocate at PagerDuty, where he helps dev and ops teams advance the practice of their craft and become more operationally mature. He collaborates with PagerDuty customers and industry thought leaders in the broader DevOps community, and back when he drove, his license plate actually said “DevOps”.
Matty has over 20 years experience in IT operations, ranging from large financial institutions such as JPMorganChase and internet firms, including Apartments.com. He is a sought-after speaker internationally, presenting at Agile, DevOps, and ITSM focused events, including ChefConf, DevOpsDays, Interop, PINK, and others worldwide. Matty is the founder and co-host of the popular Arrested DevOps podcast, as well as a global organizer of the DevOpsDays set of conferences.
He lives in San Francisco and has three awesome kids, who he loves just a little bit more than he loves Doctor Who. He is currently on a mission to discover the best pho in the world.
02:59 – Matt’s Superpower: Taking metaphors and ideas around self-help and turning them into allegories and analogies of how we could be better at technology
03:58 – What does healing organizational trauma mean?
05:50 – Incident Response Communication
16:00 – Trust, Hyperarousal, and Hypoarousal; Stuck On or Stuck Off
23:32 – Leading By Example, Not Being in a Rush to Solve Problems, Seeking to Understand, and Encouraging Safety
29:23 – Handling Postmortems: How to do them well and how to do them effectively
39:17 – The Hero’s Story vs The Story of the People; Crafting Our Narratives
Coraline: The metaphors of storytelling.
Matty: Creating a forum of discussion around postmortems.
Janelle: Thinking about Metaphors We Live By by George Lakoff and how, at the foundation of our mind is essentially a system of shapes that we see the world through, that we reason about through, that we feel emotions through, and that creates the sense of gut.
Please leave us a review on iTunes!
To make a one-time donation so that we can continue to bring you more content and transcripts like this, please do so at paypal.me/devreps. You will also get an invitation to our Slack community this way as well.
Amazon links may be affiliate links, which means you’re supporting the show when you purchase our recommendations. Thanks!
[If you like Greater Than Code, you should check out The Transatlantic Cable Podcast from Kaspersky Lab. They look at cybersecurity issues that affect everybody and also, make sure the podcast fits into your busy day by heaping under 20 minutes or less. Check it out and subscribe wherever you get your podcast.]
JANELLE: Hi everyone and welcome to Episode 1-1-6 or 116 or however you would like to pronounce that number, like 11 and 16 squashed together in one. Today, I am happy to introduce a fun and exciting show. We’re going to talk about healing organizations and all the trauma and things that happens in our software lives. I’d like to introduce my co-host, Coraline Ada Ehmke.
CORALINE: Hi, everybody. Glad to be here and Jessica Kerr is here with me too. Jessica, do you want to introduce our fabulous guest?
JESSICA: I do. Thank you, Coraline. I’m happy to have Matty Stratton here. Matty Stratton is a HumanOps Advocate at PagerDuty, where he helps devs and ops team advance the practice of their craft and become more operationally mature. Matty has over 20 years’ experience in IT operations. He’s a sought after speaker internationally and the founder and co-host of the popular Arrested DevOps Podcast. I hear he’s responsible for the name of that one, as well as a global organizer of the DevOpsDays set of conferences.
Matty was in San Francisco and has three awesome kids and he loves them a little bit more than Dr Who, he claims. He is currently on a mission to discover the best pho in the world, which I got to participate last time when I was in San Francisco with him, which was for REdeploy Conference last August, which was an amazing conference, my favorite conference ever and Matty did a really awesome talk about healing organizational trauma and I was like, “Oh, my gosh. That is so much greater than code, you need to come on our podcast.”
MATTY: So, I guess this the part when I introduce myself or I say, hi? So yeah, Matty Stratton. It was really kind of fun when Jess and I, after trying to find great pho, we sat down after the event and she’s like, “You should really come on Greater Than Code,” and I said, “Oh, my. I super love that podcast. You should come on Arrested DevOps,” and she said, “What is that?” but then she came on and it was great and it was fun. It was awesome. I’m really excited to be here. I’m in New York right now, getting ready to give a talk at DevOpsDays New York, which is the same talk that I gave at about REdeploy called ‘Fight, Flight, or Freeze – Releasing Organizational Trauma.’ I want to talk a little bit about that and we can talk about the metaphor of post-traumatic stress and how organizations experience things that are kind of similar to that and how we can take some of the things we’ve learned from dealing with post-traumatic stress and how our organizations could be better about it.
JESSICA: But first, the introductory question, “What is your superpower?”
MATTY: My superpower is I used to like to say, it was impeccable fashion sense but that’s proven to not be true anymore. That’s maybe a superpower that I’ve always wanted to have. I think by superpower is being able to take metaphors and ideas around self-help and turning them into allegories and analogies for how we can be better in technology. I’ve given a talk called ‘The Five Love Languages of DevOps’ or ‘The Four Agreements of Incident Response’ and then also in this talk, so that’s why I take is what I know. Personal spoiler, I’m not a mental health professional, so this is what I know from my own experiences and take them and apply them to things like devops and software development.
JESSICA: It’s so much better when you’re not a professional because then you can just say things and let people take them or not and not to back them up.
MATTY: Yes, exactly. I’m like, “I am super not knowing anything I’m talking about but listen anyway.” That’s my superpower.
JESSICA: And when you talk about technology, people care about it.
CORALINE: Matt, for those of our listeners who have not had the opportunity to see your talk, can you kind of summarize at a high level what you mean by organizational trauma?
MATTY: Absolutely. If we’re taking a step about thinking about personal trauma, when we think about how animals work, if you have a zebra — imagine a zebra — and the zebra is just chilling out, there’s no predators and this is operating within what’s called rest and digest. This is the parasympathetic nervous system. Then now we’ll switch to the zebras being chased by a lion and there’s these physiological changes that happen when the zebra is activated and activating the fight or flight response or the sympathetic nervous system. The zebras heart rate increases, it’s breathing increases, there’s all these stress hormones like cortisol and adrenaline end up in the blood stream and then basically, the non-essential functions stopped because the zebra is literally preparing to run for his life.
Then when the zebra escapes, what happens is that if the zebra survives this encounter, they shake it off. They literally shake it off. They return to a resting state. Now, what happens is we’re not zebras. We have this thing called a prefrontal cortex and this is usually an advantage but this is how we mentally replay traumatic scenarios, which activates our sympathetic nervous system exactly like the real threat world.
When we think about that, what happens is when something activates us and we don’t process it properly, that’s called trauma. This happens with organizations too. Example of organizational trauma would be an incident or an outage. Something happens when our nervous system, as an organization, is being overwhelmed, so trauma occurs, our active response with threat doesn’t work. That’s kind of the definition of [inaudible]. Our organization, our team, our systems solution didn’t work. We lost service, something happened and organizational trauma that we don’t process means we continue to revisit this over and over again and we don’t respond to it very well and we have some really negative experiences about how we respond to this stuff.
CORALINE: Matty, you said before that processes can be a kind of scar over an organizational failure to communicate. How important this role of communication in incident response and in that moment where the sympathetic nervous system of the organization has taken over?
MATTY: Communication is super key during the process for sure because this is how we’re making sure, even if it’s the simple things about informing stakeholders or the stakeholders stay out of our way of restoring service, that’s kind of key but for the rest of us, our incident calls are interrupted every five minutes by someone senior management jumping out and saying, “What’s the status? What’s the status? What’s the status?” It’s really key to have good communication practices around that.
The place where that scarring really happened, I think is the communication that happens afterwards — the processing of it because we need to process this. We tend to sometimes be good about writing postmortems, writing accident reviews but we don’t really communicate about them and we don’t tell them stories and that story telling into the rest of our organization is what is processing the trauma, what keeps it from being unprocessed, so I absolutely love that point that you made about that these processes we build, if I’m taking your analogy correct, they’re scar tissue that we’re building over because we think we’re going to prevent that from ever happening again if only we had a process, right?
CORALINE: Exactly right.
MATTY: And spoiler alert, that ain’t going to happen. We have these cognitive distortions that we feel like we can do future predicting if only we had enough data, if only we knew enough to keep bad things from happening. The point is our systems are always in the state of some type of partial degradation. Things are always going to go pear-shaped and it’s about how we respond to it and be able to process and understand and actually treat incidents and outages as a safe and normal occurrence that is not an outlier. It doesn’t mean that we should be okay and say like, “Stuff breaking all the time, who cares?”
But the process itself, the experience even if you will, of dealing with incident response, should be something that doesn’t cause us stress because it’s just business as usual. It’s just something we do and then we, like the zebra, shake it off. We’re done and we move on.
JESSICA: Instead of trying to prevent this ever happening again and rejecting the experience, if we make a story out of the incident response, then we incorporate it into the narrative of the organization, which is much healthier.
MATTY: I would think that’s a really good way to put that. It’s the narrative stories that we tell because the other thing is to process something and it has to have an end and it doesn’t mean that we end it from ever happening again but it’s not just an open ended, we throw report out there, whatever. It’s a story and a story has an end and when the story ends, then our processing ends, then we can move on within a normal, a regulated window of tolerance of stress response.
CORALINE: I like the idea of turning into a story but stories to have value have to be repeated, that’s become part of our working memories. They’re actually part of something that we can draw on when a future occurrence happens. I think that’s a really difficult challenge. Where I work, we do bring these postmortems and there’s always a document involved and we have a story of this documents but I don’t feel like we’ve really internalized the postmortems and I don’t think we internalize the story of what happened in a way that is useful in the future. Not to say that we can predict it or prevent it but we need to be able to draw on that experience tried to move and grow.
MATTY: Yeah, absolutely and I think it’s important to be able to have the core of the stories because you need to be able to refer back to them but if you aren’t sharing them in a processing way, in a way of, like you said making a part of the narrative of your organization, because the other thing that we run into is that when we look at things like postmortems, we have to understand why are we writing a postmortem and depending upon the type of organization, sometimes it’s considered part of a due diligence. It’s a thing that we do, it’s a CYA, we have to fill up performance like doing your taxes. It’s like filling up your tax forms versus of being an exploratory way.
Something John Allspaw talks about a lot is that the best postmortems don’t come out of a template because they should be documents that raise questions. Rather than answer questions, postmortems should ask questions and that revolves into a conversation and that means we need to have conversation, rather than just sort of a report.
J Paul Reed who just wrote his dissertation on postmortems, one of the things he shared about that that’s really interesting is what he found is the larger the organization, the less sharing of the postmortem occurs within the organization. It’s not even about sharing the external but even internally, that in a small organization, things like [inaudible], after action reports and instant reports and postmortems, tend to be widely shared across the entire org versus when you get into larger orgs, they stay very, very siloed. That’s where they become much more dangerous because again, we’re being predictive about who we think actually cares about this and could find value out of it.
JANELLE: I’ve got so many thoughts right now and the thing I keep thinking about is this one experience I had years ago, which I would describe as a traumatized organization that was struggling to get over their trauma. The way that I learned what was going on as a consultant is I took everyone out for beer one at a time, to get the undercurrent story of what was really going on and what they were stressed about in the context of alcohol versus functional effect that it has of disabling our prefrontal cortex, so we just spill all our trauma and the thing that I keep thinking about in this idea that you presented with using personal identity and the challenges of just being a human in the world as a metaphor for our organization, the mechanic that becomes evident is self-deception.
I think about these conversations, you’re talking about having conversations and spreading the message of what this story is, this narrative in this postmortem and what Coraline brings up is we have this postmortem, we run through these rituals but we don’t actually internalize that. We don’t actually internalize the narrative and I’m thinking the problem with that is because the narrative is almost like a lie. It’s dissonant with the under current story that we’re actually telling ourselves. There’s this story that is the thing that we say in our rooms, the happy masking smile that we put the most, to cover up dealing with this trauma in the context for our organization and then you get this dissonant view but the real story is that we’re telling ourselves that management is all fucked up.
All these things that we say, that is the real story we’re telling ourselves and brooding inside and so, I’m wondering like it seems like the thing that needs to happen is essentially authentic conversation in order to heal and work through these kind of things and to kind of come together around a plan to make things better that people genuinely feel is going to make things better and isn’t just a political document that getting broadcast around to check mark somewhere and so, I’m wondering in your first hand experiences with these organizations, what kind of things have you done to inspire more authentic conversation?
MATTY: You really nailed it there where the challenge is having that conversation be authentic and to be able to facilitated being safe to do so because oftentimes, in the cold light of day of hindsight being 20/20 was if there is any questions that the decisions that were made, first of all, so people don’t necessarily feel safe sharing their decisions because sometimes, a really hard decision gets made but then nothing goes wrong and then it’s like, “Wait a minute. Why did you actually take down all of production? Nothing actually went wrong, so maybe we didn’t need to do that,” whereas what’s the right decision to make in the moment.
We already have this sort of second guessing nature that happens and really, this is where management matters when it comes to this because we think about what we can do with different levels of an organization and it has to be this ability to create an area where it is safe to inspire that is frankly for management to lead by example and to be vulnerable themselves in a postmortem because nobody believes in blamelessness until they actually break production and don’t get fired. Part of it is just an example of it that just have to happen. You can tell people this is how things work but it still exists and like all trusts, it’s very easy to lose and very hard to build, so all it takes is one time for finger to get pointed at somebody and all the work that you’ve got at the last six to 12 months to make people feel safe is completely eroded.
I think when you’re in management and in leadership in these areas, it’s far more important for you rather than figuring out like how are we going to come up with the most efficient way to narrow down into contributing factors and root cause is to say like, “If this is going to happen, I need to go above and beyond and set the examples and be able to build these areas where it is trusted,” because these after action reports can feel like they are very filtered, because people are afraid of what’s going to happen if the CTO reads it and sees that and if they can be truly challenging.
A big part of it is leading by example and then it takes time because I always like to say, “I was born in Missouri, show me,” from the ‘Show Me State,’ like I won’t believe until I see it and you’re probably going to have to show me a whole bunch of times.
JANELLE: What’s interesting about that is these examples of erosion of trust happening that becomes stuff that piles on to this trauma because I imagine, once you trust someone — trust in organization, trust your leadership — and then you have one experience with that trust being violated, hard to come back from that. That becomes part of the trauma that nobody talks about then.
MATTY: Right. You become this kind of hyper vigilant. I would like to say that what happens with this is when we talk about a deregulated nervous system as a person, we’re either hyperaroused or hypoaroused, which means we’re to stuck on or stuck off. For stuck on, which is hyperarousal, that means we’re always in fight or flight. That’s where anxiety comes in and panicked at hyperactivity and if we’re stuck off, that’s where we’re stuck frozen and that’s where we see symptoms around depression and lethargy and chronic fatigue and a lot of things like that.
Now organizations do the same thing. They can be hyperaroused or hypoaroused. When you’re thinking to the point you’re making about when this trust is evaded, this is what’s going to probably put us back into being hypoaroused as an organization. We don’t move forward. We’re stuck off. We’re always in freeze because we feel like maybe we extended ourselves and we got slapped right down, so it’s much safer to not move at all. It just build, you’re right. It’s kind of a compound interest factor that when we don’t trust, that we aren’t forthcoming, which means we don’t learn as an organization, which means then people don’t trust us because we’re not volunteering information. Because the last time we did, shame on you and who only wants this kind of stuff, right? It’s a really hard cycle to break out individually when it happens to us as individuals and then what happens to us as teams in organizations.
CORALINE: I personally wrestled with not only bipolar disorder and anxiety disorder but also PTSD. I found that on my own, I don’t really have the tools for dealing with any of these things. The internal dialogue that I have with myself is not productive, it’s not healthy and it doesn’t lead to healing. I have to work with a therapist who has an outside perspective on what my brain is doing and the pattern of my brain is following. How do you do that within your organization, within the context of an organization when your day-to-day has been influenced and affected by a lack of trust, by hypoarousal, by these sorts of things? How do you break out of that and start to see the pattern or begin to heal that wound?
MATTY: That’s a really, really good points about needing to have guidance. We talk about this a lot and so again, to put some background, I’m not a mental health professional but I also have bipolar anxiety, just a generalized anxiety disorder and post-traumatic stress. I’ve dealt with this. These are sort of my theories. They initially began based on my personal experiences.
When we think about during meditation, for example if the meditation isn’t guided, we will do things like avoid trauma. We’ll avoid it because we need to be guided, we need to know what’s going to happen when the trauma occurs, when we visit again in our meditation. Similarly, we need to be guided through this processing that we’re doing and one active thing that you can do is have your postmortem facilitated by someone outside of your group.
We do this at PagerDuty. We have a group of folks who facilitate postmortems. The kind of a trick of that is if you’re not right in the middle of it, it’s a little easier to start to look at those patterns and be able to have people who are really good at that particular practice and guiding people through, guiding teams through being able to have these productive conversations and not have the bias of having been there in the middle of it, which is going to lead the conversation kind of directed in a certain direction, even subconsciously.
JANELLE: I feel like I need to change my job title to organizational therapist. I’m thinking about, what is it that I actually do? It’s not necessarily an easy thing to answer but I think that’s a special skill — being able to look at the patterns of organizational dysfunction, of broken communication, of people holding a lot of tension inside themselves and it’s like you can see the waves of tension. You can see how people bond into different sub-tribes within the organization. You can see when people have conversation and there’s this undercurrent of contempt.
When you’re in the middle of it, when you’re directly involved, it’s like you’re almost blind to the situation you’re in. You know it’s going on, you talk about these things but in the same way that when you’re talking to a therapist and they hear you and give you validation that the things that you’re experiencing aren’t actually real and help you to see your patterns and observe that so you can shift them, I think that same dynamic occurs at looking at the nature of the relationships and observing and just pointing those things out and say, “There’s an interesting dynamic going on here. Is what I’m seeing here real to you? Let’s talk about this.”
MATTY: I think if you have somebody who’s not directly involved with the team, it feels like there’s no agenda. When we’re not directly involved, we are more objective, coming from the outside, being able to spot those trends, and also someone who can look at it and I could see these patterns that are occurring because if you have someone who’s helping facilitate your postmortems and that’s how they interact with the team, they’re going to see patterns that occur inside of those postmortems because they’re happening all the time. But they’re not putting it through the lens of I have all these other background information that’s coloring how I’m looking in these terms.
JANELLE: It’s interesting because it feels like on one hand, you have the ability to be objective. On another hand, there’s just things that you can’t understand because you don’t have context, you don’t have the history of all the things that isn’t really written down but it doesn’t mean you can’t help other people to get to a place that they can figure that out.
I think this is a brilliant observation that you’ve made though, of the importance of having a group that is outside of an internal consultancy model. I think about a company I worked on where they had an internal lean initiative and they had a team of people that were deeply trained in the discipline and techniques and processes around lean and they would go and work with one team at a time, set up a workshop, help them build a value stream, map out their process and facilitate discussion around it.
But it also kind of felt like the same sort of organizational therapy because once you map out your process and start looking at all the dysfunction, a lot of the same kind of things come up with this undercurrent of how messed up this is and we’ve told management this stuff 100 times and they don’t listen, like the age-old problems in an organization haven’t finally changed. We’ve had the management world versus the engineering world and this wall of ignorance between them, where these two different groups can’t see each other or understand each other and then, you’ve got this power dynamic at the same time, which is one of the reasons why in order for a trust to be built, it has to start with the people that ultimately have the power setting an example.
MATTY: Absolutely. That is the most important thing you can do as a leader in any kind of an organization is lead by example and there’s certainly some trite examples of that but we don’t have to go down into the whole unlimited PTO value of that kind of thing but if you’re going to have an organization that claims to encourage people to take time off, it’s really important that the leadership takes time off publicly and clearly and because we will all take our guidance from that.
First, you have the leadership by example, which is we’re going to ask insightful questions. There are going to be questions that are going to build a story and you are going to listen to this story and you’re not going to be argumentative about the story because I think that’s what happens a lot when stuff gets presented and we’re trying to tell the story is leadership, in a rush to try to solve the problem, will ask a lot of problem solving questions, rather than a lot of empathizing questions, a lot of trying to understand questions and I think that’s a good place to start with that.
I think it’s also really key to… I always has fixated for rewarding failure. It’s not about rewarding failure but it’s about rewarding people for speaking up. I work with a lot of organizations to end up with if what the CTO or the CIO cares about is the number of SEV 1’s that occur. For example, now your most important metric at any instant call is going to be how do we make this not be a SEV 1 because that’s what our leadership cares about versus being, it’s okay to raise a SEV 1. In fact, I encourage it and I’m going to reward it because it means that we’re looking for the right thing.
Then likewise, bringing up challenges in the organization should be something that, as we talked about, failure should lead to inquiry. It should be that if we’re bringing things up, it should be asking questions because we want to understand the system better. We want to understand our organization better, not because we’re trying to find root cause. That’s something like the sooner that leadership can get away from say, we’re trying to figure out what this root cause was and learning about words about contributing factors, whatever are the right type of questions to ask — how people feel safer and being able to bring things up?
JESSICA: Every failure is a clue.
MATTY: Right. They talked at the Stella Report about how incident is basically provide a glimpse into the code and by code, I mean sort of the undecipherable nature of our system — how incidents provide a decoded vision into our systems that we don’t usually have, so it’s an opportunity.
JANELLE: I think the thing I’m hearing you saying here is it’s not about the clues themselves. It’s not about what the root cause is. It’s about the process of creating safety and seeking to understand as sort of our first priority is to create this context and the right things will start to flow. We’ll start to get these answers we seek almost if we don’t focus on that directly, instead we focus on how we create safety and understanding and make it safe and have these conversations in the first place. Because if we shut down the conversations and focus on the objective clues as the goal, if we trying to shut down our emotions and trying to be purely rational about it, then maybe we don’t get the actual clues were after.
MATTY: We don’t and we missed so many of the key clues because so many of that have to do with people. Our systems are complex. They’re made up of technology and they’re made up of humans and every bit of insight or visibility that we have into our systems is colored by the humans that are observing it. It’s coming through our own perceptions of that. We ignore the humans at our peril because that’s the process by which we could actually observe our systems and understand them.
JANELLE: So fascinating. If I was sitting at home and listening to this podcast right now, you talked about importance by leading by example and not being in a rush to solve the problem, instead seeking to understand and creating this safe kind of context. If you think about a specific example, a specific story where you’ve seen some have done a great job in management and leadership, what does that look like? Can you tell a story of someone that did something cool that you’re like, “I really admire this person?”
MATTY: I’ll give an example and I’m going to bring it back to the company where I work now but it’s something that I find really inspiring. Our CEO, she listens to our incident calls, like the recordings of them after hours, almost like listening to a podcast. Then she raises really interesting questions and they’re rarely about our technology and they’re certainly never about how can we prevent this from ever happening again. They ask a lot of ‘whys’ and a lot of why is about fatigue which are, “What can be done to make the situation where we could restore service more quickly but then, also be able to expose information we needed to understand it later?”
I think that’s one of the things when I look at how this works. When I see this as kind of an inspiring story, is being able to understand the difference between what we do during the incident, during the trauma which is about restoring service. That’s what’s most important. We’re not trying to solve the problem actually at that point. We’re trying to restore service. We’re trying to get things back on track.
Then we’re able to be able to do some more forensic information later and be able to expose that but I think being able to go back and review and that’s one of the hardest things to do so. One of the bits of advice that I give to the folks that’s a really hard one is record your incident calls and it’s really hard because people feel like, “Now, I have this recording,” and actually the hard part isn’t the recording. The hard part is the listening. You can listen to it like 2X speed, like it’s a podcast or something but —
CORALINE: I was going to say, you kind of buried the lead there. I’d never thought about recording the audio or video of it in postmortem. It seems to me that you talked about how the write up can be problematic if you’re using a template, for example. It seems like you might get a lot more authenticity from having people speak and almost forget that it’s being recorded and different people who are not responsible for writing the report, sharing their perspectives, sharing their understanding of what happened, sharing their ideas for how to be better at responding to incidents in the future.
MATTY: I think that’s a really good point because we find ourselves, when we’re filling up a template, we’re going to only answer the question that the template ask. Being able to get to sort of that point, that Janelle raised earlier about saying like when I get everybody up to the pub, then we hear what really happened. We can’t really say we’re going to do all of our postmortems at a pub, for a lot of reasons. But what’s happening there is again, we’re starting to just have a conversation. It started to cause to ask questions, so facilitating more of a ‘we’re having a forum’ as opposed to a report.
The report is something that could eventually come out of this forum but I think more often than not, we’re starting to run into the scenario, where we say, “We had an incident. Someone’s got to be assigned to write the postmortem,” and in fact, that’s part of our good practice that we tell people. We say, “Before you get off the incident, call and assign to write the postmortem.”
Maybe, what’s a little bit better is someone still have to have some ownership but someone own setting up the postmortem forum. We’re going to sit and we’re going to have a conversation and that’s going to contribute to the postmortem more than the chat log, more than what we’re pulling out of our tooling but what are the conversations we have there and you know what’s hard about that is it’s more work and there’s no magical tool that’s going to do it for you, unfortunately. Not even paid to do it.
JANELLE: I’ve been listening to this and I have to ask, why not at the pub?
MATTY: I think there’s a couple reasons. We have these problems with a little bit of an alcohol culture in tech to begin with. While to some people, having it in a pub is going to actually make them feel more comfortable, for a lot of folks, it’s going to make them less comfortable. It’s going to kind of work against you in that perspective. I think trying to find some way where you could host the postmortem discussion where it feels more casual or feels more comfortable is absolutely key but we need to remember that just because one factor doing that makes some folks feel more comfortable. It’s actually make a lot of folks feel less comfortable.
CORALINE: I think maybe the important part is stepping away — stepping away from the office, changing the context so that you have a more relaxed perspective. You have a more relaxed feeling. You’re not in the war room with lots of people talking at the same time. Maybe making it feel a little bit more personal.
Janelle, what I liked about what you have said was doing a one-on-one with all of the people involved and that, of course is a big time commitment but maybe, that’s how we get that authenticity that we’ve been talking about.
MATTY: And maybe we have to start by doing it one-on-one for a while, so people feel comfortable that what happened is okay. What you really want to be able to get to is a position where we can say we feel comfortable sharing regardless, like we’re not worried that if I say the wrong thing, then my skip manager is going to find out about it and then I’m going to be put out on a performance plan because how dare I speak against management.
We want to see that that’s okay and if we can have these conversations one-on-one where maybe we do have the ability to check them and understand them a little bit better and then we start to see that these topics do come up and there isn’t repercussion about it, it makes us feel more comfortable about doing it.
CORALINE: Or even just a feeling that we’re speaking off the record, right?
MATTY: That’s the trick. If we are speaking off the record, we have to be speaking off the record then. We don’t want to feel like we’re speaking off the record while we’re being recorded. We just wanted to feel comfortable.
CORALINE: So in a world where you have someone outside of the silo overseeing the postmortem process, is that the person who’s having this one-on-one conversations and sort of looking for patterns, even if people are speaking off the record, maybe something comes up like this process that we have in place, that this manager really likes because they feel like it gives them a dashboard of what happens, as an example. That’s not working and if you hear that from two or three different people, you can raise that without violating that off-the-record provision.
MATTY: That’s awesome to be able to do that, to be able to bring it up and say, “Here’s a theme that we heard,” as opposed to, ” So and so said this.” It’s looking for consistent themes and that’s one of the things too that comes up when you think about being able to tell a story, it’s also understanding that not all of the action that we think we’re going to take out of this are things that we’re going to actually do because it’s very common coming out of a postmortem. You’re going to have way more ideas of things to do that are realistic, either because that’s too much or some of it, yes, that would solve the problem for us to do a complete dataset of migration but we’re probably not going to do that.
But having kind of dispassionate facilitator, to be able to take those things together and look for those common themes, it can be really helpful. The only thing I would say is that this sounds like a really great dream to be able to say, “We’ve got these folks. We have these organizational therapists in our company that do nothing but facilitate postmortem and do all this stuff,” and that’s not realistic for a lot of folks. But it can be something that the trick is like you could say, “There’s somewhat on my team who’s good at this and they’re going to do it for another team to help them but then we’re sort of trading it off,” like my team is going to facilitate these postmortem because we aren’t really necessarily, directly involved with that. We can be a little more dispassionate.
JANELLE: I think I generally found that there’s people within an organization who’ve got a natural knack at this ability that they sort of do it automatically and the dynamics of their team kind of natural facilitator-reflector types and that if you sort of state, “This is a role that we need within our organization. Would anyone like to volunteer to do this part time?” you can solve problems in that way.
I think the thing that would be helpful is to have an idea of an initial discipline to start for things to watch for and then, whatever this part time role is, if you start working on defining what the job is as you do it, these are the things that would work, here are postmortem process that has evolved and in that, maybe can index into all the lessons learned and these narratives stories that we communicate across our organization, like start putting together a knowledge base of here’s our story.
JESSICA: Speaking of our story, real quick before this plane takes off, I was around a couple of skiers the other day and they were telling stories about skiing and they never tell the stories of when it goes well. All the stories that they tell are terrifying. It just sounds like a terrible idea why would anyone ski, how they ran into this tree or almost running into the thee and lost their ski and they had to crawl back up 50 feet. The stories that define our narrative are the scary ones. I think that’s why adding these incidents to our organizational narrative, it really add meat to what it means to be in this organization.
JANELLE: It sounds like there’s a missing side there. You point that out and we define our world in terms of scary stories that if I’m going to share the story of my organization, I don’t want everything to be angry and scary stories. That kind of sucks and you think, what’s the other side of that is all about innovation and passion and —
JESSICA: They’re in between these scary things.
MATTY: I think we have to be careful of like that’s where the cognitive distortion of overgeneralization can come in or polarized thinking. We do a lot of telling of negative stories in tech, especially those of us in ops. We feel like it’s our job to kind of one-up each other about the most terrible thing that’s ever happened and we don’t talk about the success. But what I think is where things get dangerous with the overgeneralization when that happens, the distortion of overgeneralization of an individual human would be something like, “Oh, I got a C on the test, so I’m stupid and a failure.”
In an organization, this is like, “Oh, we had a SEV 1 incident on this particular service, so it’s unstable and terrible,” and this is just where we put in the mental filter distortion as well and we only see the negative and we eliminate all the positives about a situation, about a person, about a system. That’s something that can be pretty dangerous because we only pay attention to the negative and then when things go right, it’s really hard because we might have had to make a tough decision to make things go right but we’re only seeing that things didn’t go wrong, so maybe we made the wrong decision.
JANELLE: Interesting. So how do we capture more of the stories on the other side of things? What would that process even look like?
CORALINE: It’s a story of hero’s journey?
MATTY: Delve into that a little bit more. I’m intrigued.
CORALINE: The pattern that most postmortems take, at least at healthy companies are blameless but what about telling the story of the heroism of the people who resolved the issue?
MATTY: The tricky line because we want to avoid hero culture because we don’t want to celebrate people that had do so much work. We want to celebrate because they did a lot but we don’t want to make that a justification in its own. But I think telling the story of the humans is what super matters because that’s what we miss.
When you look at the history of an incident and John Allspaw, he has given this talk several times. He gave a talk similar to REdeploy that Jess and I talked about. He gave it in a PagerDuty Summit and I’ll make sure to get the link, make it put that in the show notes, I’m sure. But he talks about analyzing incidents from one specific people got involved and what were they doing and you can see that when you get the right person involved and with the right person, it happens to be just because they happened to know what a thing that was relevant at that particular time, when we look at that journey. I think, unwinding a little bit, I kind of really like this idea that it is the hero’s journey, it’s the story of the people and those are the things we should be asking about because those are also the things we have the most control over of who gets involved and when and how were they able to contribute and that’s the story of what happened because again, we can only work with how we act as humans. We actually really can’t control the technology. The technology is happening behind the filter that we barely understand because it’s going through the context of understanding of these people, so we need to focus on the people and their journey.
CORALINE: I think what raise that question for me is again, thinking of this in terms of the story and a story has characters and who is the main character out of a postmortem? Is it the system? Is it the organization? Is it the people who were engaged in resolving the issue? That’s my broader question.
MATTY: I think the characters are the people that are involved in restoring service. That’s who we know to go to the [inaudible] as well because how often do we run into this where we fix it but we don’t understand how. We did something and we don’t know why it made it work but it did.
CORALINE: Switching on and off seemed to work, so let’s go back to coding.
MATTY: Right, so we need to start to interrogate — not interrogate. We need to start to interview the person who made the decision to turn it on and off and say like, “So your gut told you to do that but maybe we can understand why. Maybe we can understand a little bit more about why you chose to do what you did because a lot of people who are subject matter experts, we do operate under an emotional and again, kind of a gut response, where this feels right,” because the promise during an incident that we’re waiting for a 100% data proof of everything we’re going to do. You’ll never ever try anything and we’ll never experiment, so we need to start to understand when you get this person who’s the 20-year DBA, who their intuition told them to do this one experiment, how can we learn from that experiment?
CORALINE: I love how this ties back in to what we talked about the beginning about figuring out what’s the story of the organization, what question should we ask more than what are the answers, also asking kind of what stories are we not telling.
JANELLE: This is really interesting, just summarizing some of the points you made. Looking at the journey as a story of the people or the story of characters and one of the things that you brought up was this hero’s culture as being a bad thing that we want to avoid. But at the same time, you’re advocating that we put deliberate effort into crafting our narratives, into crafting our organizational narratives and how to properly do that. I’m thinking that this idea of our hero’s culture and why that is bad say, it’s usually it’s not about the celebration being bad. It’s when that becomes sort of the shiny justification for abusing people continuously.
MATTY: It becomes the expectation.
JANELLE: It becomes the expectation.
MATTY: I remember when I was a young pup coming up as a sysadmin, there was this poster that we had in our office that I thought was the coolest poster in the world and now, if I saw it, I would burn it. It said, “How does it feel to save the day every day? Not all heroes wore capes.”
Again, I was like, “Yes, that’s what I do,” and we swoop in there and we save the day and we do all this stuff but the problem is that becomes the expectation of what the job is. That poster was propaganda towards young system engineers to make them think that what they had to do was work crazy hours and always be pulling all the irons out of the fire all the time. We don’t want that as an expectation because that’s how you get burn out.
JANELLE: That seems a little bit potentially different in that, if the things that you end up optimizing for in your organization are creating safe culture, are optimizing for how much learning you got out of things, are honor in sharing, if we tell stories about those things that we want to create more of, if we align our propaganda with our philosophy about what better is, it kind of shifts the definition of hero to be about whatever is you’re trying to create more of any organization and hero can mean, in a general sense, is to strive for greatness.
I think the arrow of what is greatness is essentially the core of our identity, both as individuals as an organization and maybe, the thing to do is that we need to take explicit ownership of our organizational arrow, if you will through deliberate reflection and deliberate crafting of our narratives, who doesn’t want to be a hero? I think that is a good anchor and maybe, hero’s culture is just something we need to define to me something a little bit different.
MATTY: Maybe a less superhero culture but I like the idea of the hero as a protagonist. I like the idea that we have a story celebrating behavior that we want to have more about. That’s a really, really, really insightful way to think about that. I like it.
CORALINE: At the end of our show, we like to reflect on the conversation that what was learned and what we want to do with it. As a writer, I love the whole metaphor of the story. I feel like we started out talking about metaphors of mental health and we ended up talking about the metaphor of storytelling. I really like the way that went. It’s kind of unexpected from the start. Things to think about are who is telling the story and who are the main characters and how does the story reinforce, who we want to be as individuals and as an organization. I can tell you that I’d love the practical advice that you gave Matty and I’m going to be encouraging everyone at Stitch Fix where I worked, to listen this podcast and think about how we can change the way we do postmortems to better reflect who we want to be as a company. Thank you for that.
MATTY: What I think about when you reflect that, I think there’s actually a bit around this creating more of a narrative. I really like this idea of saying how do we take ourselves out of the normal, ritual of the postmortem and bring it into something that’s more of a forum, more of a taking into a more comfortable, taking out of the work context. I think that’s something I’m really finding really interesting. I’d like to take that back to my organization and figure out how we can try to do that a little bit more because we ourselves, have certainly fall of the practice of the ticky box says we wrote a postmortem. We don’t necessarily do as much exploratory conversation as we could, so I think I’d like to talk to the folks who helped facilitate our postmortems and see how we could maybe, shape things up a little bit, do a little bit of like, “It’s continuous improvement. Let’s try it with one team. Let’s try to create more of a form of discussion.” I’m really excited to see what we can do about that.
JANELLE: This has been a super interesting show, Matty. There are so many great threads in here. The thing I keep coming back to is thinking about George Lakoff’s book, ‘Metaphors We Live By’ and how, at the foundation of our mind, is essentially a system of shapes that we see the world through, that we reason about through, that we feel emotions through that creates the sense of gut and our narratives of our identity is very wrapped up in that fabric and we see the world through stories. The stories are always there, whether we take the time to process and to find them, there is an unconscious, fuzzy model based on our past experience that’s always there.
If we start taking ownership of our narrative, it’s like installing a compression algorithm, I think of almost inside ourselves. We take all these fuzzy things and take ownership of our fabric of shapes in our brain and define the identity of ourselves, the identities of those around us in terms of characters in these stories and the identity of our organizations and who we are as a team and who we are as a team of teams.
I feel like this really extends to all levels of abstraction. You know, if we just start talking about our community and our countries and our world, they’re all just different levels of abstraction of identity that all these things we’re talking about today, can really be applied to every level of abstraction like what is self-deception at the global internet level. It’s kind of an interesting question when you start looking at these metaphors of identity — how far those things can extend. I’m really grateful for you bringing all those threads together and I think this is one of those episodes that I’m going to have all my friends listen to. Everyone I know were like, “You need to listen to this one. It’s really great,” so thank you so much.
MATTY: It’s been a real pleasure. I’m having my fanboy moment of being on one of my favorite podcast so I really appreciate being able to join you all and I really like this conversation. There was a lot of insight and I think there’s a couple of things I’m going to tweak from my talk at the Upstate New York tomorrow night, based on this conversation.
CORALINE: Awesome. We really appreciate you taking the time, Matty to speak with us. As a reminder, if you want to support us in bringing great conversations like this to you regularly, please join our Patreon at Patreon.com/GreaterThanCode. Donating at any level and you’ll get access to our Patreon-only Slack community, which is a wonderful community filled with very thoughtful people and all of our guests take part in that community as well. We can continue the conversation online. Thanks everybody and we will talk to you again.
Amazon links may be affiliate links, which means you’re supporting the show when you purchase our reommendations. Thanks!