The following is a transcript of the audio recording you can listen to in the player above.
Welcome to Briefings In Brief, an audio digest of IT news and information from the Packet Pushers, including vendor briefings, industry research, and commentary. I’m Ethan Banks, it’s December 6, 2018, and cloud visibility is on my mind.
Application Architecture Complexity
Imagine a complex application. There are multiple parts to it. A web farm behind a load balancer on the front end. A firewall or two. Probably some database calls. And then the stuff we tend to forget about like authentication and domain name services. Okay, you’re with me so far.
Now let’s make this more complex by splitting the web app into elastic microservices living in the public cloud. At least, part of the app lives in the public cloud, because an AWS bill shows up every month. Part of the app also lives on-premises. You think. Which is the problem. It’s actually getting hard to tell what is going on with this app, as the developers aren’t always in lock step with the operations team about what they’ve deployed where, and the architecture team just points you to a reference document…that is full of lies.
You shake your head that no one seems to know what’s going on. Business as usual. And then on a fateful day, the help desk tickets start piling up. The app performance has gone down the toilet, much like your hopes for a lunch outside the office, and no one seems to know why. Must be the network. Or the cloud. Or that Kubernetes thing. Or something.
Data Visualization With Kentik
What’s an infrastructure engineer to do? You need visibility. A few weeks ago, I had a briefing with Kentik. Their mission in life is to collect infrastructure data and help you gain meaningful insights from it. I’m not talking about stacks of RRD graphs that look cool while communicating almost nothing. Rather, Kentik shows how data relates to other data in an intuitive way that helps you make decisions or solve problems.
Let me give you an example. One of their core use cases has been helping service providers and Internet exchange points understand how data is flowing through their network. Who sent them this data? Where is this data going next? Oh, AS 12345 is sending us data for AS 54321, but it’s costing us a ton of money because it’s traversing our link to AS 31416. Maybe we should create a peering relationship with AS 54321 directly and stop running up our bill to AS 31416.
That’s just one example. In the latest demo I’ve seen, Kentik has applied their visualization and analysis to cloud traffic, helping IT teams understand the flows that are happening between services that make up an application.
Kentik Cloud Visibility
Kentik works by ingesting data. Massive amounts of netflow and other sorts of records from your network and endpoints. For cloud visibility use cases, Kentik is able to absorb AWS & GCP flow logs, with Azure support coming soon. Kubernetes for container orchestration and Istio for service mesh control are also data providers to Kentik, among many other data providers. These are added to the host level instrumentation and network device data Kentik has been able to gather since it came on the scene a few years back.
In the briefing I attended, Crystal Li, Senior Product Marketing Manager with Kentik pointed out, “We consume the tag and label information which contains the information about your infrastructure, your service mapping, and your user information.”
Which is quite granular indeed. When Kentik has ingested & analyzed the information, results, alarms and actions can be handed off to third party providers as complex as ServiceNow or PagerDuty, and as simple as JSON you bring into a tool of your choosing.
Let’s bring this back to our opening hypothetical situation of a distributed app that we need to understand in order to effectively troubleshoot. As a live demo which you can watch yourself by searching YouTube for “Kentik Cloud Native,” Kentik had several folks log into a reference storefront application orchestrated by Kubernetes, and then click around.
Kentik monitored GCP flow logs, available after you complete an integration between GCP and Kentik. Those flow logs provide a lot of information about what’s going on inside a Google VPC. Note that this is also available with AWS–the demo just happened to use GCP.
Kentik’s analysis resulted in an easy to comprehend display of useful data. For example, zone to zone latency based on actual data, and not just synthetic transactions. Inbound and outbound traffic by project, and many other contextualized data points were shared.
The demo further showed how to pinpoint what applications are responsible for costly inter-region traffic. This started simply, showing flows between Kubernetes clusters and what regions they were traversing. But then by checking a few more boxes with other elements to be graphed, the visualization showed Kubernetes node names, and the flows between each node across the regions.
Of course, you can see the data in real-time, but the data points are stored in a time-series database, allowing for history. Want to know what happened three hours ago? You can find out.
Using the Kubernetes API, Kentik was also able to correlate pod IPs to pod names and cluster namespaces. With that information, Kentik can visualize pod to pod and service to service traffic flows within a Kubernetes cluster.
I, personally, had never before seen such a clear view of how traffic is moving between components in a complex, orchestrated microservices environment that even leveraged Istio, the service mesh control plane we discussed on the Datanauts podcast in episode 145. The visualizations would be helpful for comprehending how application stack components communicate during a transaction. In addition, where slowdowns are occurring due to high latency would also come to light.
Complex Applications Need Correlated Telemetry
To be clear, Kentik doesn’t understand your application architecture. But what it does do is shine a bright light on virtualized, ephemeral, cloud infrastructure. If you were used to hopping on a node, firing up tcpdump, and then going on a treasure hunt to sort out a performance issue no one else could resolve…forget about that. I don’t see that methodology as viable going forward, with application stacks spreading out.
Effective troubleshooting in the modern era benefits from an all-seeing tool that correlates infrastructure telemetry from a variety of data providers in a way that’s actionable. Kentik is one such product.
The cloud native Kentik demonstration I focused on here is only one use case of the Kentik product. If you’re intrigued, search for Kentik on PacketPushers.net, and get a more complete sense of what they can do.
Cloud visibility is a topic that will keep coming up on the Packet Pushers podcast network, so I hope you’ll keep listening to Datanauts, Full Stack Journey, the Weekly show, and all the rest of our podcasts for IT engineers. Just search for Packet Pushers anywhere you listen to podcasts, including Spotify. And if you’d like to support us, become a member at ignition.packetpushers.net. We’d appreciate it.