You Can’t Build A System In A Silo: Let’s Reorganize IT

An idea I’ve come to believe in strongly over the last few years is that IT needs to align their staff around business function, and not just technology. Silos are killing us. By “silo”, I mean that IT practitioners are almost always grouped strictly according to technical competency. Teams are often grouped as follows (and understand that I’m generalizing A LOT here):

  • The Server Team – these folks rack and stack servers and blade centers. They know what sort of RAM to order, how many CPUs are in a particular model number, and have procedures to get a new x86 machine from the box to the data center. They worry about driver revisions and analyze performance issues. When a server is dead, they’re the ones hitting the iLO port to figure out what broke. They probably don’t know much about networking, but are expected to configure network interfaces with redundancy and perform basic communication troubleshooting based on some inadequately written vendor documentation. They probably know a little more about storage, but mostly just connect to what someone else told them.
  • The Systems Adminstration and Engineering Teams – these folks administrate the base OS that sits on the metal. They’re the Windows people. Or the VMware people. Or (heaven help us), the *NIX people. They can do things like administrate Active Directory, provide file and print services, and have strong opinions about why their favorite OS is superior to all the rest. They probably deal with internal DNS. Some of them use the abomination that is Microsoft Network Load Balancing. They construct exacting templates of what a cleanly provisioned OS should be like, down to the drive partitions, OS patches, management agents, and anti-virus software. They probably don’t have much experience with the networking side of things. As a result, systems are often provisioned with little knowledge of capacity, redundancy, or load sharing across the data center topology. When considering disaster recover or load shifting, extending layer 2 to locations hundreds or even thousands of miles away seems like a good idea that would just make everyone’s life easier.
  • The Storage Team – these folks make sure that the company’s data is always readily available. They monitor available capacity constantly, partition expensive disk cautiously, and fret about IOPS. They think about how their physical disk redundancy is built, and whether or not the striping scheme is optimized for the traffic being thrown at it. They know about fibre channel networks, think of iSCSI as a transport for plebians or the poor, and think of Ethernet as a rusting ’72 Ford Pinto about to explode into lossy chunks at the slightest provocation. There’s a good chance the storage guys worry about intersite data replication, backup schemes, and data retention.
  • The Networking Team – these folks handle routers, switches, and transit security, and are often called upon to perform dark magic to overcome the limitations of the applications being run across the network, much to the apparent surprise of their authors. As a result, WAN optimizers, load balancers, NAT boxes, tunneling devices, and other such jiggery-pokery is used to make the network perform in ways it was not originally intended to, manipulate traffic to mask an application’s shortcomings, or simply improve application robustness. In addition, networking folks are under constant pressure to manage aggregate capacity, port density, and resiliency. The modern corporation does not like to tolerate network outages resulting from full lines, dead links, or failed equipment. The networking team’s usual design principle is to build it bigger: to provision bandwidth wherever possible as much as possible under the usually correct assumption that it will get used.

There’s a lot more that could be said here. Your organization might not line up with these “team” designations exactly. In larger shops, there tends to be a team dedicated to security, for example. In smaller shops, sometimes these designations blur, or the “team” consists of one person. Just go with me here.

My point is that there’s a problem with this model of building and supporting an IT infrastructure siloed into our cozy little teams like this: we don’t effectively communicate. We sit in our corners, we talk to our fellow team members, and from our siloed viewpoint, we focus on what we’ve identified as important. When other teams call on our team to provision a service, it’s almost a nuisance, and the request is therefore fulfilled in the vacuum of a contextless ticket. Issue opened, service provisioned, ticket closed, don’t ask questions. Just get it done.

This is broken. Very, very broken.

IT is not in the business of building networks. Or creating massive disk arrays. Or auditing firewall rules. Or spinning up VMs. Yes, we engineer-types love to do these things, but that’s not why we get a paycheck. From a corporation’s perspective, IT exists to facilitate business goals. The very best IT groups recognize this and are organized accordingly. Or should be.

The New Data Center

One of the recurring themes of this past Tech Field Day event – Networking Field Day 3 – that came up both in offline conversations and in vendor presentations was the notion of how difficult it is becoming to build the data center. This is due in large part to the advent of software switching and the fact that the network edge has been virtualized. It’s hard to nail down a moving target, and I’m not just referring to that guest being vMotioned somewhere. I also mean the plethora of new technologies being hurled at IT practitioners by vendors and (to a notably and disappointingly lesser degree) standards bodies.

  • Meshed fabrics have taken the stage, with offerings from Brocade, Juniper, Cisco, and others. Here’s the problem I face as a network designer. If I don’t know the applications and traffic patterns, how can I properly provision such a service? How big should I make the mesh? What do the application tiers looks like, and which tiers talk to which other tiers, and in what way? Where will physical hardware be racked, and what will the consequent cabling topology look like?
  • Converged storage continues to be a hot topic. FCoE is still getting press, if not much of an install base. NFS and iSCSI continues in unabated popularity. Data center Ethernet has become a chocolate peanut butter cup. The challenge here is to make sure that storage is prioritized appropriately. Ideally, the Ethernet should not be a bottleneck for storage. But, how can network designers create the network architecture appropriately when their knowledge of the storage infrastructure is limited to a ticket informing them that a gargantuan new storage array has been purchased, and would they kindly please light up some Ethernet ports so that the storage team can make it go?
  • Much of the new data center is made of up of fiber optics. Optics are expensive. They need not be so pricey, as I heard from an insider this week that the cost to manufacture a 10GBase-SR optic is about $50-60. But, that’s one of the items end users get raked over the coals for from our vendors. The point is that the cost per port for 10GbE is still far above that of 1GbE copper, and parity is not coming anytime soon. Therefore, it becomes fiscally irresponsible for a network designer to just throw as many optics into a purchase as possible and hope for the best. Instead, a designer must know what is needed, and be able to justify that expenditure specifically. That’s a departure from the old mantra of, “Just buy the 48 port switch. They don’t cost that much more, and we might be sorry if we only buy the 24 port model.”

Those are just a few examples of the larger issue: to effectively deploy a modern data center infrastructure, silos must be broken down. IT has to work as a holistic team. A single entity. A communicating, connected, comprehensive group.

How We Fix It

Unifying IT is a tough nut to crack, and there’s at least three mindsets that must be smashed to break through the shell.

  1. Managerial isolation. Managers can be very territorial. These are my people. This is the technology I’m responsible for. If you want to talk to my people, you’ll go through me. If my people want to talk to you, they’ll go through me. I WILL be CC’ed on every e-mail you send outside of the group. Why didn’t you invite me to that meeting?!? This attitude is the ultimate silo-maker. The manager who operates like this doesn’t care about the business as much as cares about creating an aura of worthiness around himself. This manager needs to be needed, and therefore gets into the middle of as much as possible. Controlling conversations and manipulating outcomes to their perceived advantage is the way they function. This person is only going to cause harm in the overall IT team, and will only be able to blame-shift the inevitable project failures for so long before everyone else can see that the emperor has no clothes.
  2. Engineering apathy. Fellow engineers should be curious about what the other teams are doing, but often simply don’t care. Or worse yet, they actively don’t want to know. I commiserate with this viewpoint, because frankly, I’ve got enough on my plate most of the time. I don’t want to have to be worried about what the other groups are working on. But the fact of the matter is, we should all know what each other is working on because everything we’re doing affects everyone else. If we were smart, we’d coordinate our projects tightly, contextualizing our projects within specific business goals. Most organizations aren’t doing this.
  3. Engineering isolation. Like the isolating manager, an engineer can isolate by keeping other teams purposefully in the dark about what they are doing, how it’s done, and why. They don’t want anyone else getting up in their face about anything, so they figure the best way to be left alone is to keep everyone else at least a cubicle away.

Smashing these mindsets is something that comes from the top. The CIO, global IT manager, or other person who’s near the top of the IT management structure needs to set policy and precedent in the team to get everyone working together.

  • Let’s start by breaking up the management paradigm. Technical leads? Yes. Pure managers? No. Streamlining IT needs to happen, and managers have an unfortunate tendency to get in the way. While that means more “human resources” work will be spread among fewer managers, the reality is that in IT, a lot of highly competent engineers get forced into (or worse, think they want) a management role where their impact to the organization is therefore reduced by half or worse. Promotion to management is not some sort of a reward, although it’s true that many people take pride in being a manager. Competent IT people don’t need managers to hold their hand every day. They mostly just need priorities and direction set for them. That said, not every IT person is especially motivated, competent, or concerned. They just want to show up late, click the mouse, take a lot of smoke breaks, and go home at 5pm. Fine. Identify your rock stars, make them technical leads, and pay them accordingly. Let your rock stars do the work and your mouse-clickers do the tickets. At the same time, don’t overlook the up-and-comers.
  • Your rock stars should all be on the same team, and should all have a pretty good idea what the other one is doing or working on at any given time. Call this the architecture team, or the design group if you like. But just make sure there’s actual engineers involved. Some organizations have an architecture team, but make the horrible mistake of staffing that team purely with architects who haven’t implemented anything in years. Whiteboard engineering is not sufficient in most organizational contexts. The IT people setting the direction for a data center build need to also be the people who do actual work. Why? Because the devil is in the details. For example, it might seem architecturally sound to standardize on vendor C’s firewalls, until you find that vendor C has a screwy way of implementing IPSEC that makes it peering with non-vendor C firewalls failure prone. You don’t know that unless you’re in the trenches, dealing with those issues. The best engineers are also good architects, and the best architects are also good engineers.
  • Cross-train the people who can impact the business. IT people have their specialties, to be sure. I’m a hardcore network specialist. I know about Ethernet, IP, load balancers, firewalls, and so on. I know my networks well enough that I can answer most questions without looking anything up, and I instinctively know what the root cause of an issue is without checking any stats. That’s, in part, the value I bring to a business. There’s a virtualization guy I work with that’s the same way for his areas of expertise. When we look stuff up, it’s just to confirm what we are pretty sure about already. That said, if all I know is networking, or if all my buddy know is VMware, or all the storage guy knows is disk, then the business is not being as well-served as it could be. When you train the network folks on virtualization technologies, then we have a much better idea of how traffic moves around inside of a virtual machine, which has an impact on security design, network resilience, SPOFs, and potential bottlenecks. If you train the virtualization team on storage, they have a much better idea of what sort of configurations that should be considered to best leverage that fancy new array weighing down the raised floor. Etc. The issue here is that all of these technologies directly impact each other, and so having everyone design “their piece” of the data center from inside their silo is a non-starter right out of the gate. There is no “their piece”. A data center is a wholly integrated system, with no room for territorialism.

What We Get As A Result

When you combine the right folks together on the same team, you end up with the right technology being implemented for the right reasons helping the business succeed. Not only that, that equipment will be installed and configured with the best possible implementation, and it will be maintained by a team of people who understand how everything works together as a system. We’ll stop staring at the screws that hold the racks together, arguing about how much torque to apply, and start seeing the data center as an information engine that makes the business go. An engine we built. Together.

 

Ethan Banks
Ethan Banks, CCIE #20655, has been managing networks for higher ed, government, financials and high tech since 1995. Ethan co-hosts the Packet Pushers Podcast, which has seen over 2M downloads and reaches over 10K listeners. With whatever time is left, Ethan writes for fun & profit, studies for certifications, and enjoys science fiction. @ecbanks
Ethan Banks
Ethan Banks
  • Matt Ward

    Awesome article, I could not agree more that IT is in desperate need of a re-org. We are working on spreading some of our Rockstars out, that way they get into the silos and open them up a bit. And we’re definitely trying to become more cross-functional, not just WRT to training but with big projects.

  • http://www.facebook.com/profile.php?id=725486958 Darragh Delaney

    Very interesting article. I see a lot of ‘tasks’ being thrown from silo to silo in IT, once someone takes a look at something its thrown to the next person and is forgotten. A lot of organizations are centralising IT which only makes the problem worse. Large data centres equals lots of silos for some places

  • http://twitter.com/Vegaskid1973 Matt Thompson

    You’ve outdone yourself here Ethan. A concise overview of the way things sadly are and how they could and should be.

  • Joel Knight

    Very well written and thoughtful, Ethan.

  • Bogi Aditya

    nicely phrased, but I think for small IT Dept. and not-yet-established IT platform will still need the team in silos, just to accelerate the platform into stable version.

7ads6x98y