I ran into an issue of unexpectedly high CPU utilization on a Cisco ASA firewall running 8.4.x family code; the CPU was running greater than 90%, when less than 25% was normal. The culprit was the “Dispatch Unit”; a little googling suggests that the ASA dispatch unit is the process through which the majority of packets are flowing for inspection and accomplishes I’m not sure what else. (I just did a lot of poking to find something that officially described the dispatch unit, but I came up empty. That said, let’s assume that the dispatch unit is more or less the ASA “big brother” process, monitoring traffic flowing through the ASA.)
The next step was to determine what might be pumping so much data into the ASA that the CPU was aggravated. I knew that it wasn’t…
- Volume related. The Mbps rates were normal.
- Connection related. I do historical graphing of connections transiting the firewall, and they were at a normal as well. I did poke a bit deeper at connections using “sh local | in host|count/limit” per a recommendation I found on a Cisco forum, but that didn’t find anything unusual. Just the mail servers getting flogged, per normal.
- NAT related. The NAT translation count was at normal levels.
- Encryption related. This firewall didn’t handle any VPN traffic.
The next stop was the log server. I happen to have access to Cisco Security Manager’ Event Viewer at this particular site, which gives me functionality similar to Check Point’s SmartView Tracker. Doing a real-time dump of firewall logs, CSM quickly revealed that an old and untended Linux FTP host in a DMZ was absolutely pounding a pair of internal DNS servers with lookups (permitted via pinhole), and at the same rate trying to connect to an external host on tcp/3303 (denied). I also saw some attempts to send mail via SMTP, tcp/25. As DNS lookups (udp/53) are very short-lived, these didn’t build up in the ASA connection table, even though they were coming at a rate of hundreds per second.
I was curious as to what tcp/3303 was, and don’t have a strong conclusion as yet. Based on googling, it seems plausible that tcp/3303 could be used for a command/control network via a chat protocol. Considering the behavior of this DMZ box, it seems a reasonable conclusion that the system was trying to connect to home base where it would receive further instructions from the botnet overlords. The furious DNS lookups were for hosts in advertising-related domain names. I didn’t spend much more time on the specifics.
In notified the appropriate parties about the badly behaving box. While waiting for a resolution, I traced down the switchport it was hanging off of using its MAC and switch bridging tables. (I found the MAC on the ASA using “show arp”. On the switches, I used “show mac address-table address”.) Since it was a physical host and not a VM, I shut the switchport down. The ASA CPU returned to normal in seconds.
Baseline and historical data is helpful. You get a sense of what’s normal now in comparison to what’s been normal in the past. Without history, you’re guessing whether the current state of affairs is normal or not. It’s very hard to catch an anomaly if you don’t know normality. Ask any couch sitting on the highly improbable starship, Heart of Gold.
You have to be alerted when there’s anomalies, like when your CPU is about to catch on fire. I’m embarrassed to admit that this firewall running so hot didn’t send up a flare on my NMS. The CPU issue was discovered by accident – we weren’t experiencing issues of any kind, as the firewall was working fine and not dropping packets unexpectedly. I’ve since configured an NMS alert that is triggered when the ASA CPU is running hotter than normal for longer than 10 minutes. The alert first logs into the firewall and runs a script that pulls connection count, xlate count, cpu-hog, and other possibly interesting stats; the script then e-mails that information to me. The only reason it hadn’t been done before is that all of the ASAs I manage at this site run at different baseline CPU utilization rates, and so I hadn’t taken the time to custom craft all of the alerts. Sometimes you have to take the time, even when you don’t have it.
Logging is very helpful. If you lack log detail or are missing logs completely, a real-time packet capture might also reveal the issue. The ASA can do packet capture from the CLI or ASDM. The ASDM interface is my favorite choice here; ASDM allows you to capture traffic and download it to your workstation as a PCAP, which you can then examine in Wireshark.
Remember to limit what DMZ hosts have access to outside your network as well as inside them. It’s a bad design to allow DMZ hosts to anything out on on the public Internet. If the DMZ host is compromised, you are protecting your company’s business most effectively by making sure that DMZ host can only talk to what is absolutely necessary, whether inside your network or outside. In this case, the host couldn’t get to much at all, and as such it was unable to connect to the presumably malicious tcp/3303 or deliver mail. Therefore, the damage was contained.
I was disappointed not to find a Cisco ASA architecture document on cisco.com that explained ASA packet flows inside the box or ASA processes & their use. I even have the Cisco Firewalls book put out by Cisco Press, and this topic is not addressed that I could find. If there is such an architecture document somewhere, please let me know. Such a reference would be invaluable, and it’s possible I just didn’t ask the search engines the right questions when looking for it.