Automate, orchestrate, disable the CLI, fire your entire network team…
The message has been pretty universal over the past couple of years, whether catching up on your favorite blogs, listening to the latest podcast, or engaging with other engineers on social media. While no one will deny its importance or even its inevitability (sort of), why is it that the extent of automation found in most networks today minimal at best? Why do the majority of infrastructure teams still operate their networks the way they did 20 years ago? Scripting tedious tasks is a great start, but that is about the extent to which most network operators will go today to introduce automation and software principles into the management of their network. Why?
The real answer is rather straightforward and won’t surprise anyone. Automating your network is hard. Accounting for all your different vendors and corner cases is hard. Hiring engineers that are able and willing to operate a network in this way is hard (and expensive). You get the point.
Traditionally, network vendors had put in little effort in terms of making their products friendly to automation and scripting. At best, they provided you with access to their API and said “all yours”, while marking a check box on their end. Over the past couple of years however, vendors have finally started doing a lot of the integration themselves with various automation tools, to help lower the barrier to entry for automating your network stack. Things have thankfully come to the point, where if vendor X doesn’t have an Ansible module, they’re at a huge disadvantage to their competitors. There are many reasons for this recent vendor rush to integrate, not the least of which being the bar that public cloud has set on infrastructure as code.
Fast forward to today and the topic at hand: How realistic is it in 2017 to fully manage your network infrastructure as code?
For the sake of this exercise, we want to avoid doing custom module software development as much as possible, where engineers/teams are solely dedicated to maintaining this code. We also want to adopt many of the CI/CD principles that have been successful in maintaining and deploying code in other parts of the infrastructure stack.
Day 0: Initial Configuration & Deployment
ZTP : Deploying new devices in a consistent and scalable manner will start with a Zero Touch Provisioning process. Most modern vendors already implement some version of ZTP today, commonly relying on DHCP options to point new devices to a ZTP/configuration server. For those not familiar, there are some good write-ups readily available focusing on both vendor specific approaches and multi-vendor solutions. Since we’ll be working with a multi-vendor environment, the logic and complexity of a vendor-specific ZTP approach will be kept to a minimum. Instead, the new device will pick up an IP address and the minimum amount of configuration to be able to receive the rest from our centralized automation platform.
To start the provisioning process, we will connect our switch management interface to our infrastructure.
The interface will pull an IP address from our predefined DHCP range as well as the DHCP option pointing to a script file on our ZTP server.
The “initial_config.sh” file will have 2 uses in our case. The first will be to copy our minimum viable config into flash. The second will be to upgrade our new switch to our standard version of code.
Once the switch is ready to go and reachable over the network, we are ready to let our automation platform push the rest of the device configuration.
Automation Platform : To interact with our devices, we’re going to use Ansible. The agent-less nature and ease of use in comparison to other tools, has greatly helped Ansible gain momentum in terms of standard adoption by network vendors and operators alike. Even though you can use the platform to write very simple playbooks for executing command X on device Y, the ability to manage entire multi-vendor configurations at scale is the real drawing power for our exercise. We want to be able to specify both human readable and vendor independent inputs to configure our entire fleet in a uniform way. Within Ansible, we’re going to lean heavily on Jinja2 templates to help us accomplish that task and truly treat our network infrastructure as code.
To help us visualize the concept, let’s define some common parameters that all of our devices will be configured with, regardless of vendor.
We want to be able to read through this file and execute the necessary vendor specific commands on the back end to apply them to the configuration. This is where our use of Jinja2 templates comes in.
With the help of a for-loop, we’re able to iterate through the inputs file to execute the “sflow destination w.x.y.z 1234” on our EOS devices. If we want to account for another vendor, our input file will stay identical but our Jinja2 template will change.
Device specific configurations, like interface IP addresses will be handled in an identical way.
When using these types of template configs, the Ansible directory structure will need to be organized in a specific way.
Our Jinja2 templates will live the templates directory. The common variables which all devices will be configured with are going to live in the group_vars/all.yaml file, while device specific attributes like interface IP addresses will reside in the host_vars directory. Bringing it all together, we will add our production network device inventory into the prod file and tie all of our production components together via our playbook, prod.yaml. For the purpose of separating how we manage our production and development environments, our development inventory and playbook will be dev and dev.yaml (snippet below).
Finally let’s bring all of our function snippets together to configure our new Arista switch that was previously brought onto the network via ZTP, by running our dev playbook. For simplicity’s sake, before we involve our other automation components, let’s push this initial config out to our switch natively via Ansible.
To recap, in Part 1 we have demonstrated what a CLI-less provisioning and initial configuration process would look like. Building on this foundation, tune into Part 2 to see how we’ll handle daily operations, CI/CD and change management.