While the various concepts behind automation and programmability have trickled into the network space at an exponential rate, enterprises have been left scratching their heads regarding the most effective way to incorporate these ideas into their teams. Do you send your entire team on a week long Python retreat and assume everyone can immediately start “coding the network” afterwards? Do you hire dedicated network automation engineers (developers) and hope they’ll pick up network engineering experience along the way? Do you exclusively start hiring individuals that are proficient in both coding and networking? Truth be told, organizations are experimenting with some version of all of the above and finding what works specifically for them. Web scale companies being at the forefront of this experimentation, have created a new hybrid role in response to these challenges.
Network Reliability Engineer (NRE)
Building on the SRE concepts that came out of Google, a NRE would spend no less than 50% of their time focusing on automation, while spending the other 50% deeply embedded in the operations/engineering/architecture realms of networking. They participate in an on-call rotation to stay in touch with the ops side of the house, with a focus on “treating operations as if it’s a software problem” in response. NREs would provide a expert big picture view of BOTH the development/automation and network operation/design sides of the house. Sounds like we’ve checked all the boxes and can consider this a solved problem by exclusively hiring NREs, right?
Absorbing all of that, the skeptic inside of me instantly starts thinking back to the various teams and engineers I’ve worked with over the years. That senior network architect with 20 years of experience, can’t possibly be bothered with ops issues, right? When outages occur, engineers leave companies or the overall amount of ops work outweighs the cycles of the engineers, doesn’t automation take a back seat? How could this possibly translate to the real world of average enterprises?
The short answers: This paradigm shift requires full company buy-in ($$). Teams need to be properly structured and staffed to avoid treating automation as a second class citizen ($$). Finding truly qualified NREs is hard ($$).
Disconnect Between Ops And Architecture
Far too often, I’ve seen architecture teams fully isolate themselves from ops. Whether its reporting to separate management areas of technology or hiring dedicated engineers/analysts in a liaison role between ops and architecture, efforts to create silos around architecture teams are common. It is also very clear why these divisions exist: operations and monitoring are often an afterthought and nothing about them is sexy. Architects have usually done their time in ops earlier in their careers and want nothing to do with it anymore. While I understand the argument for the need of these silos in large organizations, the outcome is often the same. Architecture teams become out of touch with how their designs and decisions affect the supportability of networks. In the landscape of increasing demand to treat your network devices as cattle instead of pets, it is impossible to make informed design decisions without a constant view into what type of work comes into ops and how it is dealt with. The idea of a one-time operational acceptance made by ops teams in response to supporting a new project or environment handed to them by the architects, seems about as useful as that Visio diagram of your network you made a year ago. Operational acceptance and fault tolerant testing must be a constant ongoing exercise, that does not involve one team throwing something over the fence and never dealing with it again.
The New Network Architect Role?
With the rift between ops and architecture widening, will the NRE role start overtaking the traditional network architect in large enterprises? I’ll err on the side of caution and say that it is too early to tell. To reach an expert level in the network engineering field takes decades and requires ongoing learning. Acquiring expert level understanding of application development takes decades and requires ongoing learning. Add all those years up and you’re left with a very small group of people today, that probably focused on one area or another at a time. Looking to the future, as the necessity of automation systems and concepts continues to grow in networking (and it should), we will see more of these people and roles surface. One thing is certain though, the days of designing networks without heavy consideration about how those networks will be operated and monitored on a daily basis are numbered. Today’s architects deciding between vendor X and vendor Y for a new environment, will shift their thinking to deciding things like build vs buy for that same design. If we’re to believe that the rise of demand for open source and programmable systems in networking will continue, the need for this role will undoubtedly increase, as engineers begin focus on both network design and programmability as a whole.