This guest blog post is by Nick Kephart, Director of Product Marketing at ThousandEyes. We thank ThousandEyes for being a sponsor.
You rely on a lot of service providers: DNS resolution services, ISP transit, co-lo facilities, CDNs and SaaS providers, to name a few. It’s likely you have dozens upon dozens of providers, and you probably spend more time dealing with them than you’d like to admit.
Some of these service providers have always been around. Your data center needs transit. Unless you’re a huge operation, you’re likely to outsource your DNS to a highly available service.
But other third-party providers (see CDN, IaaS) are part of a trend where your applications and networks are increasingly composed of many externally-sourced parts. The center of application delivery is shifting from your data centers to the Internet.
What’s driving these shifts? There are a few key trends:
- Network Costs: A move away from MPLS to direct Internet links
- App Architectures: Increasing use of IaaS, hosting providers, and external APIs
- App Delivery Models: Ongoing shift to SaaS applications, from sales to procurement to R&D
Troubleshooting With More Cooks In The Kitchen
With this increase in third-party providers, troubleshooting and escalations are getting more complex. As application delivery spans more network segments–across SIP trunks, VPN providers, SaaS vendors, numerous hosting environments and each of your ISPs–isolating root causes takes longer and longer. Managing the open tickets alone will bury you.
Plus, there are surprising dependencies. One of our customers recently discovered that their backup ISP had subcontracted to their primary ISP! There are also flame wars. Countless operations teams have stories of their ISP throwing complaints right back at them.
Being able to efficiently handle interactions with service providers has become a must-have skill for the modern network engineer.
Getting Your Ticket To The Front Of The Line
You know the drill when it comes to working tickets. You’ll put effort into tickets that are the most critical, as well as those most easily solved. Sometimes these overlap, sometimes they don’t. Support engineers like tickets that they can close quickly; give them one.
It reminds me of a law enforcement concept from one of my favorite TV series, The Wire. The police department is constantly obsessed with the ‘clearance rate’, or the proportion of solved crimes. For network engineers, it isn’t much different; everyone wants an ultimately solvable ticket. No one wants the ‘John Doe.’
So how can you use this to your advantage to skip ahead in your service provider’s endless ticket queue? It’s not just about paying for a gold-plated service contract. Provide the most precise root cause and you’ve raised the probability of timely resolution significantly.
Doing The Diligence
Getting tickets worked quickly starts with collecting detailed information to better inform diagnostics and troubleshooting. You’ll usually want to address a couple key decision steps:
What’s affected? Having a clear inventory of which services and applications are degraded or unavailable makes the rest of the process immensely easier.
Narrow down a timeframe. Of course, this takes a good monitoring strategy to tie together the performance of seemingly disjointed apps.
Network or not? It’s been referred to as Improving Mean Time to Innocence. Isolating the domain or team that needs to work on the issue (it’s a database issue, I swear!) is half the battle.
Whose network? Quickly zoom into which network and segment is at fault. An ISP? Your wireless AP? Narrow the problem down to a protocol, device, interface or network link. You’ll want a monitoring solution in place to drive these insights.
Leveling The Playing Field
You’re often sitting on a goldmine of operational information. Find ways to share your insights with service providers so that they can help you faster. This may take the form of some latency measurements or a traceroute. The more specific and incontrovertible, the better.
I’ve heard countless stories of an operations team approaching their CDN or hosting provider with data on the exact network interfaces that are degrading their application performance. In each case, the response was an order of magnitude faster than a ‘spray and pray’ approach of opening tickets and hoping they get some attention.
Always save a copy of your forensics data for later. The same issue may resurface in a week. Or a dozen times before your next contract renewal. Having evidence that links multiple events gives you the leverage to demand better service levels, support timelines, or contract terms. Find a logging solution, data store or monitoring product that can maintain event data. Not keeping evidence will leave you with a wall of silence.
Squeezing Better Performance From Service Providers
It is possible to get better service from your providers. In many cases, they want to provide it to you. But given that everyone is overworked and under-resourced, take a shot at sharing detailed forensics to accelerate the process. The squeaky wheel gets the grease, yes, but it’ll get it faster if you provide the can of grease.