This article is Part 3 in the 6-part series “The Bulletproof Maintenance Window”. For the rest of the story, see the links at the bottom of the page.
You need your technical peers to review your plans before they are executed. This review means different things for different kinds of changes. If you are doing something that is a standard change, a detached independent audit-style review is probably most appropriate. This is where you build the entire change document, and then hand it over to somebody who is a technical peer but who is not involved in this particular effort. This audit-style review requires the reviewer to basically rebuild the change from the ground up. This works well in low- to moderately-complex changes. If you have a lot of these changes happening, this is a sign that you need to standardize your environment and automate these things, because humans have a bad habit of screwing up simple tasks.
For complex and highly impactful changes, don’t use the audit style. Instead, use a collaborative approach. Get your technical peer to build the change with you. You’ll find that the process of building the change is more efficient, and more effective at preventing unplanned outages, than the audit style, but only if you follow a couple of rules:
- There must be a clear leader who is responsible for the change.
- The collaborating peer reviewer’s job is not to tell the change leader how to accomplish the work or to try to persuade the leader to do it some other way. The collaborator’s job is to provide guardrails and protect the change leader from the pitfalls common to complex changes and to the technical environment they are working in. The collaborator should defer to the leader unless the leader is clearly heading down a dangerous road. This part requires some emotional intelligence from both engineers but it produces excellent outcomes.
- The leader and collaborator must be organizational peers – or at least without a direct reporting relationship. These people must be truly “peers” to get good peer review. A balance of power in the peer review relationship is what produces the best ideas and what reduces the most errors.
Who Are Your Technical Peers?
You may find them in some surprising places. When looking for peers, don’t compare yourself and your experience or title to others – instead, look at the task you are trying to accomplish. If your change is to a routing protocol, go find someone who just passed a certification test focused on routing. If your change is to a switching environment, go talk to the guy that does port VLAN changes all the time (While you’re at it, mentor that guy and help him move up in the world). You aren’t looking for someone who has a brain (and ego) as big as yours – you’re looking for someone who can find your errors in the daylight before you get slapped by them in the nighttime. Very often, the person that can find those errors will be more junior than you think you are – and that is okay. A helpful side affect of this method is that you’ll gain an understanding of the unique strengths of the other engineers on your team, and this opens up “all sorts of interesting possibilities”.
The Right Tools
So far we have been speaking very much in the abstract. How do you actually execute on these ideas? In my career I’ve seen lots of different ways to document the steps needed to accomplish a change, among them:
- MS Word templates
- Service management web applications
- Plain text files
- Combinations of the above
A common document format will save you and your team a lot of grief. If everybody is using the same layout inside the document, you’ll find even more efficiency. Use the following principles:
- Choose a format that allows reviewers to embed questions and suggestions right in the document.
- This could be done with MS Word with tracked changes, for example. Or, even better, with plain text documents in a version control system like git.
- Expect reviewers to engage with the document and give either an
- explicit endorsement, or
- specific suggestions for improving the proposed procedures
- Use a consistent set of headings and sections within the document. A consistent layout will allow people to quickly find the right information. There’s nothing worse than scrolling through a 40 page Word document trying to find a single set of configuration commands.
- Change your business processes, if necessary, to allow the document to undergo revisions as part of the review process. I have seen systems that are so cumbersome that engineers will attach a document that is mostly empty that simply says “see procedures.doc on my laptop for more information”. I may have even been guilty of that myself one or two times…
A Word to Managers
If you are trying to implement or improve a peer review process, you will probably start to think about metrics. Be careful of what you choose to measure. It might be tempting to measure something like “number of errors discovered in a peer review” as a way to incentivize rigor and thoroughness on the part of the reviewer. Or, you might swing the other way with positive reinforcement and measure “number of ‘clean’ changes”. Measurement is powerful because it creates incentives. You can inadvertently create toxicity in your culture with this stuff if you are not careful. Reviewers who are incentivized to find minor typos may overlook far more significant problems in search of the “easy wins”. Reviewers who are incentivized to find only major issues may overlook the Devil in the Details and miss a critical error. In severely pathological cases, you can unintentionally set up competition between engineers where the goal mutates from “do good work” to “step on your peers to make yourself look good”. You can see where this is going. Peer review is one of the most powerful tools you have in your toolbox for building culture – be careful what you build with it. Finally, involve your team when you are building the process. The wealth of operational experience they have amassed in their careers is a critical resource that you ignore at your peril.
In Part 4, we’ll talk about how to execute the plan – and about one way to measure your own level of professionalism.
This article is Part 3 in the 6-part series “Bulletproof Maintenance Windows”. For the rest of the story, check out the following: