Five Things About Cisco Nexus 5K Control Plane Policing (CoPP)

Let’s take a quick look at the control-plane policing services on the Cisco Nexus 5000 series. Almost all of these notes are my interpretation of the Cisco official documentation, supplemented by my experience in resolving a problem with poorly responding traceroute traffic on a Cisco Nexus 5596UP with the N55-M160L3-V2 routing engine running NX-OS 5.2(1)N1(1).

1. What is control plane policing & why is it needed?

Control plane policing (CoPP) classifies and then rate-limits traffic being sent to the CPU of a switch. The rate limits are enforced by policing, which will drop traffic that exceeds the defined rate. This protects the switch CPU by discarding excessive traffic destined for the control-plane. You care about this because the switch control-plane & CPU handle tasks like maintaining layer two topologies (spanning-tree, UDLD, etc.) and layer three adjacencies (OSPF, BGP, EIGRP, etc.). System access via SSH or HTTP & SNMP management traffic are also handled by the system CPU.

A pegged CPU means that the the switch might not be able to forward transit traffic because he has lost touch with his neighbors, that the layer two topology begins to break down with unpredictable forwarding results, and that other forwarding maladies are probable. Depending on where the victimized switch is located in the overall network design, the net result could be rack or pod isolation, campus segmentation, or complete data center network transport failure.

A denial of service attack against a switch control-plane & CPU, whether caused purposefully (hacker, disgruntled employee) or accidentally (topology loop, aggressive SNMP walk), is mitigated by CoPP.

Remember that CoPP has no bearing on traffic going THROUGH the switch (transit traffic). CoPP only polices traffic that is being sent TO an IP owned by the switch itself.

2. What sort of traffic makes up “control-plane” traffic, anyway?

Cisco documentation features an excellent table that explains each class of traffic policed by N5K CoPP. Just to get your mind churning, my brief (not exhaustive) list of traffic that is control-plane traffic if destined for an IP on your switch includes ARP, HSRP, VRRP, ICMP echo, STP, UDLD, LACP, CDP, LLDP, DHCP, BGP, EIGRP, OSPF, RIP, FCIP, IGMP, PIM and SNMP.

3. There’s 4 built-in policies, each with a slightly different emphasis.

  • The default policy should be fine for most situations.
  • The scaled layer 2 policy is similar to the default policy, but has higher rate policers for IGMP and ISIS.
  • The scaled layer 3 policy is similar to the default policy, but has higher rate policers for IGMP, ISIS, ICMP echo, multicast misses (packets unable to be forwarded by hardware), and gleans (IP packets with unknown MACs that force the CPU to ARP).
  • Out of the box, the customizable CoPP policy is identical to the default policy, but can be customized by you. Regarding the other three policies, official Cisco documentation states, “You cannot modify this policy or the class maps associated with it. In addition, you cannot modify the class map configurations in this policy.”

4. There’s a CoPP policy applied by default.

As implied by the name, the CoPP policy called copp-system-policy-default is applied to the N5K control-plane by default under NX-OS. This is a departure from the Catalyst switching line that, to the best of my knowledge, does not apply any sort of CoPP policy by default. This is an important point to grasp, as you need to be aware that data sent to the control-plane of your switch will be processed by CoPP before reaching your switch CPU. Therefore, when troubleshooting a peculiar behavior of some control-plane related traffic or other, CoPP’s potential role should not be overlooked.

For example, I found that performing a traceroute through my N5Ks equipped with the L3 engine showed inconsistent and sometimes high response times – as much as ~80ms – under the default CoPP policy. A switch local to you typically responds to traceroutes with a time of 1ms or less, so the high number reported by traceroute was disconcerting. The root issue was that some TTL=0 traffic as classified by the copp-system-class-excp-ttl class was being dropped by the CoPP policer (violations) as reported by “show policy-map interface control-plane“, an excerpt of the output shown below.

class-map copp-system-class-excp-ttl (match-any)
match protocol ttl
police cir 64 kbps , bc 3200000 bytes
conformed 37723254968 bytes; action: transmit
  violated 694952261 bytes;

5. You can customize the CoPP policy, but be careful.

When editing a CoPP policy, consider that if you get it wrong, you are putting the switch at risk. If you set policing values too low, the risk is that legitimate control-plane traffic will be dropped which could cause routing adjacencies to timeout, among other strangeness. If you set policing values too high, the risk is that the CPU could become overwhelmed during a DoS attack, and be unable to process legitimate control-plane traffic. Therefore, my recommendation is that you stay as close as possible to the policing rates found in the Cisco default policy, customizing only with judicious, conservative changes.

Cisco documentation states that the CoPP policy to edit is copp-system-policy-customized. I have not tried to edit the other three policies, as the documentation as quoted above states that you can’t…which doesn’t necessarily mean it’s impossible, but still, I didn’t try. I like to have default policies to fall back on, in case my changes are disastrous. So, to resolve the traceroute issue I cited above, I increased the rate of the copp-system-class-excp-ttl class policer inside the copp-system-policy-customized policy. Then I applied this policy to the control-plane. After this, I found that the N5K stopped registering violations for this class, and nearly always responded to traceroutes in the expected time of 1ms or less. The code I used follows below.

N5K-SWITCH# conf t
Enter configuration commands, one per line. End with CNTL/Z.
N5K-SWITCH(config)# policy-map type control-plane copp-system-policy-customized
N5K-SWITCH(config-pmap)# class copp-system-class-excp-ttl
N5K-SWITCH(config-pmap-c)# police cir 128 bc 6400000
N5K-SWITCH(config-pmap-c)# exit
N5K-SWITCH(config-pmap)# exit
N5K-SWITCH(config)# exit
N5K-SWITCH(config)# control-plane
N5K-SWITCH(config-cp)# service-policy input copp-system-policy-customized
N5K-SWITCH# clear copp statistics

Links

For more detail about Cisco Nexus 5000 series CoPP, including more code examples, go to the official Cisco documentation.

Ethan Banks
Ethan Banks, CCIE #20655, has been managing networks for higher ed, government, financials and high tech since 1995. Ethan co-hosts the Packet Pushers Podcast, which has seen over 2M downloads and reaches over 10K listeners. With whatever time is left, Ethan writes for fun & profit, studies for certifications, and enjoys science fiction. @ecbanks
Ethan Banks
Ethan Banks
  • sfprairie

    This caught me off gard, too. My pings to our Nexus 7K’s were sometimes showing high response, but nothing past them had any delays. Eventually figured out the CoPP.