I ran into an issue today where my network management station (NMS) was unable to manage SW1 in the picture above. SW1 was up and running; I could SSH to the SW1 CLI from hosts other than the NMS. Communication between SW1 and the NMS was broken, but there were no other issues reported at the site.
Normally, communication between SW1 and the NMS flows via R1 and across the WAN. R1 is SW1′s default-gateway (not default route, as SW1 did not have “ip routing” enabled). To troubleshoot, I logged into SW1 and did a “show ip route”, which revealed a single entry: 10.0.2.100 (the NMS) would be reached by the firewall, 10.0.1.254. Ah ha: SW1 sending his traffic to FW1 was the cause of the NMS communications failure. FW1 should only have been used in case of emergency to bring up a VPN tunnel, in case the WAN went down. If the WAN was up, traffic sent via FW1 would die due to an asymmetrical route.
The source of this host route discovered in SW1 was a puzzlement. How did the switch learn this entry? Did the firewall advertise it? Conceptually this was possible, as the firewall had a backup IPSEC tunnel back to the site where 10.0.2.100 lived, but I also knew that the firewall did not have reverse route injection enabled. Even if it did, it would have been advertising host routes. And beyond that, the switch wasn’t running a routing protocol where it would hear such advertisements.
What about a proxy ARP from FW1? FW1 would only have responded to a proxy ARP if an ARP had placed on the wire by the switch. Since SW1 had a default-gateway, this shouldn’t have happened. Even if it had, FW1 doesn’t happen to respond to ARP requests for remote IPSEC networks. It doesn’t even respond to ARP requests for NAT addresses it’s responsible for very well.
So here’s another important detail that puts all the pieces together. The R1 WAN link went down overnight briefly, and so all of the OSPF routes R1 knew disappeared while the circuit was dead. SW1 syslogs and traps to the NMS at 10.0.2.100. So while the WAN link was down, SW1 originated some traffic for the NMS. This traffic arrived at R1, since SW1′s default gateway is R1. R1, with a dead WAN circuit, no longer had a route covering 10.0.2.100. R1 did have a static default route of his own, however, to 10.0.1.254 – FW1. R1, rather than hairpin route all SW1′s traffic over to FW1 (inefficient), sent SW1 an ICMP redirect message, stating in effect that “the best place to forward traffic for 10.0.2.100 is 10.0.1.254″.
The solution might seem obvious. Just clear the route on SW1. And indeed, that is the answer, but surprisingly “clear ip route 10.0.2.100” didn’t work. Neither did “clear ip route *“. If the switch hadn’t been an ocean away and supporting an office full of people, I might have tried a few other riskier things (like enabling “ip routing“), to see what happened. Since avoiding risk was important to me, I dug around on cisco.com, and discovered the “ip redirects” command. I also found the “show ip redirects” command. And where there’s a “show“, there’s quite often a corollary “clear“…and so it was that “clear ip redirects” cleared out that pesky bogus route, and bidirectional communication between SW1 and the NMS was restored.
WAN circuits do break periodically, so how could I prevent this ICMP redirect message from breaking SW1-to-NMS communication in the future? I could disable ICMP redirects on R1 via “no ip redirects“. One would need to understanding one’s network topology well to know if this was a good idea or not, although generally speaking, it’s a security best practice to disable ICMP redirects as a DoS mitigation technique. I suppose I could filter ICMP redirect messages inbound to SW1 from R1 using an ACL with an ACE like “deny icmp host 10.0.1.1 host 10.0.1.10 redirect“. Note that fiddling with “ip icmp redirects” (hosts vs. subnets) might be appropriate in general, but wouldn’t have prevented the problem in this case.
More ways to prevent this in the future? Perhaps…I’m out of time at the moment to think of more, but chime in with comments.
cisco.com: Cisco Guide To Harden IOS Devices
cisco.com: When Are ICMP Redirects Sent?