Using CBQoS and NetFlow to Solve Network Traffic Problems

This sponsored blog post is written by Patrick Hubbard, SolarWinds Head Geek.

Admit it – you have a problem. A bandwidth problem. It’s probably worst on WAN links at about 10am Tuesday mornings, when VoIP MOS score monitors turn orange, tickets start coming in about slow Exchange access and an exec calls in to say his smartphone is slow. It’s random human behavior, intertwined and competing for a million slices of your expensive WAN bandwidth. When you close your eyes, you can almost hear the burst-billing spinning at big-telco.com. Fortunately, there is a great solution to bring order to even the most oversaturated links.

Sure, you could just ACL out all the YouTube traffic, but who likes sitting alone at lunch? A better approach is to shape your traffic using services that are already inside your network gear. CBQoS (Class Based Quality of Service) is a Cisco feature introduced in IOS 12.4(4)T, and of course NetFlow goes back to the Cisco dark ages: 1996. The trick is that you must use them together to regularly monitor and tweak your traffic policies.

Trust But Verify

During the cold war, U.S. President Ronald Reagan made “trust but verify” a signature expression, translating it from a Russian proverb – doveryai, no proveryai (Доверяй, но проверяй). You shouldn’t necessarily assume that your carefully planned out QoS strategy won’t work, but there are more ways to flub a config than possible addresses in IPv6. Only by regularly observing the actual traffic before and after you make QoS policy changes can you insure they actually work. Throw in changes in user behavior and other engineers adding new traffic, and it’s even more important. You should maintain a healthy distrust of policies – yes, even your own.

The good news is that even though the data collection process is a little convoluted, configuring a QoS traffic policy is straightforward. The better news is that there are some great products to take the pain out of data collection, flow analysis, visualization and reporting. In this article, I’ll walk you through an overview of how to set up CBQoS in your environment. Including bandwidth monitoring systems, you should only need a couple of hours to get to a live Hello Packet policy.

Eyes Wide Open

If you aren’t analyzing your network traffic with a flow analyzer, get one! Interface bandwidth charts won’t cut it. Make sure you select a solution that supports CBQoS monitoring in addition to NetFlow and combines both into a single view. SolarWinds Network Traffic Analyzer is a user favorite network traffic monitor, and you can download a free trial to learn about CBQoS and traffic shaping. Of course, the human mind is the most powerful analytic tool for sifting thought complex data, and you’ll be amazed at the insight from just 48 hours of NetFlow history.

Once your analyzer is running and ready to receive data, configuring NetFlow export on a source router is pretty easy:

# Configure flow monitoring on an interface
interface {interface name}
ip route-cache flow
exit

# Configure the destination and other details for the export
ip flow-export version 5
ip flow-export destination {flow Receiver IP} {port}

Note: This overview assumes some familiarity with IOS, and I’m using shorthand to keep this article compact.

Each device can export NetFlow records to two destinations usually called a Receiver. Receivers can process NetFlow data from multiple devices. Let your Netflow analyzer crunch the data for a few minutes, and you can start identifying your biggest bandwidth consumers. You’ll instinctively begin considering different traffic policy scenarios.

Shape and Iterate

Although that sounds like reps and pain at the gym, it’s not. I have more luck solving new challenges with an iterative plan than “big bang” projects. Rather than begin with an uber-policy map carefully designed over many days, get something online quickly and plan to replace it with something better. Think v1.0, v2.0, v2.1. Even if you already know everything about applying traffic policy, you won’t know exactly how the network, or your users, will react. Expect to make adjustments after any major re-config.

The configuration for CBQoS is amazingly flexible and easy to wrap your head around. Configuration combines three elements:

  • Traffic Classes with the rules about which traffic to control,
  • Policy Maps that determine what to do with each Traffic Class and
  • Configuration to collect metrics.

Config is via IOS CLI. If you can untangle ACLs, you’ll have no trouble.

First, you need to enable CBQoS, otherwise it won’t store the data or provide MIBs and SNMP access to track its effectiveness.

# Enable ifMIB persistence
snmp-server ifindex persist

# Enable CBQoS MIB index persistence
snmp mib persist cbqos

Next, define Traffic Classes. You can have as many as you want, and because you use Policy Maps to link them to interfaces, they’re reusable. For example, you might create a Traffic Class to capture streaming application traffic, but prioritize it differently depending on the subnet. Throw guest network Skype traffic in the bit bucket, while ensuring seamless telepresence for the boardroom. In general, Traffic Classes look something like this:

# Define a Traffic Class named “Bronze”
class-map match-any Bronze
# Set a match condition linked to ACL group 11
match access-group 11
exit

Where it’s really flexible is the list of possible match types beyond just ACLs. (Here’s a small subset):

match protocol protocol-name Quickly group traffic with standard protocol identifiers like ftp, https, ldap, rip, smtp, ssh, etc.
match packet length {length details} Ever wanted to optimize or temper certain chatty apps? Use the Layer 3 IP header length data.
match source-address mac MAC Handy to control devices regardless of IP
match cos-group cos-number Reuse predefined Layer 2 class of service (CoS) markings

There are many others. You can also specify match-not to exclude traffic that would otherwise be captured, and also nest maps for even greater reusability.

Define your Policy Map(s). These will be assigned to interfaces, so again they’re reusable. Here’s a basic example for our “Bronze” Traffic:

# Create the map
policy-map DropBronzeOnFloor
# Link the Bronze Traffic Class
class Bronze
# Set a police action to drop the traffic exceeding a certain rate
police 64000 16000 16000 conform-action transmit exceed-action drop
exit

What’s really powerful here is that you actually have two options: “police” which drops traffic and “shape” which enqueues packets to smooth their delivery. (Obviously, you can only police inbound traffic as interface time machines haven’t been invented yet.) Here are just a few of the action options:

police bps [burst-normal][burst-maxconform-action action exceed-action action [violate-action action] Applies specific traffic police action
bandwidth {bandwidth-kbps | percent percent } Designate a certain amount or percent of the available bandwidth will be reserved to the Traffic Class
shape {average | peak } mean-rate [burst-size [excess-burst-size ] Shapes traffic using queues to a specific rate
priority percent percent Great for streaming protocols like video and voip

And again, the available policy action list contains many other options.

Last, map your Policy Map to an interface to activate it:

# Select an interface
interface FastEthernet0/1
# Map the DropBronzeOnFloor Policy Map to this interface
service-policy output DropBronzeOnFloor
exit

For extra credit, it can be handy to create a dedicated SNMP view and community to make monitoring easier.

# Create view named 'cbqos' for all MIBs or CBQoS MIBs
snmp-server view cbqos 1.3.6.1.4.1.9.9.166.* included
snmp-server view cbqos 1.* included

# Create community with access to that view
snmp-server community iso view cbqos

Take A Break

You’re done with the hard part. Make sure you are collecting both NetFlow data and CBQoS class map reports, then…go to lunch. Don’t look at the data for an hour. Let your analytics system begin to trend. NetFlow can help you out of an acute problem like a virus flood, or Legal asking questions about a user’s activity, but with CBQoS I find it best to look at performance over time. You’re not interested in the latest wave of clicks on a new Gangnam Style parody video – you’re watching to see how much traffic YouTube represents throughout the day and week.

What you’re watching for is the difference between pre-map and post-map traffic. If you see the changes you expect, then you know your maps are working. If you see no change, or other previously throttled traffic jumps in to fill the void, then explore the new traffic hog. Here are some example maps:

CBQoS Pre-Policy Class Map

CBQoS Post-Policy Class Map

 CBQoS Pre-Policy Class Map  CBQoS Post-Policy Class Map
Here we see the matched traffic before the policies are applied. Note the dark khaki Best Effort traffic class is almost 25% of the bandwidth. After the application of policy for this interface, a police action is discarding – most of the Best Effort traffic is being dropped.

(screenshots from SolarWinds NTA)

If you’re lucky, you may be able to get away with revving to policy 1.5 and being done, but more likely you’ll find enough areas for optimization to throw away the first maps and start over.  Aggressive policy map pruning also keeps your config tidy and easier to maintain. 1.0 was probably more democratic and utopian, while 2.0 will be more pragmatic and include features like prioritizing all the BYOD traffic on the exec subnet.

Become The Guru

With the approach above and a couple of slick reports for management, you can also raise awareness of your contribution to your organization. Go even farther with good documentation. (Yes, I’m using the “D” word.)

Document your CBQoS policies or at least the overall strategy, so that others can understand how the invisible hand is guiding your network’s behavior. For extra credit, overlay major QoS policy details on your network diagrams, or record them in a spreadsheet. As you begin to see the benefits of policy maps, you’ll make them more specific over time.

Three final suggestions:

  1. Make reviewing your QoS effectiveness a part of your weekly or even morning walk-around.
  2. Tweak early and often, and don’t be afraid to experiment.
  3. Last, include programmatic IP SLA monitoring and solicit regular, subjective user feedback.

It’s been reported that Reagan said “Trust but Verify” so many times that Gorbachev finally responded, “You repeat that at every meeting.”  It’s only a matter of time before my network team says that to me. Trust your QoS technology and policy maps.  They will make your life easier and can make you an IT superstar. Just remember to keep a NetFlow spy satellite overhead.

  • Michele Bergonzoni

    Very good and correct. You might want to take note of the show-stopper that I usually stumble upon: the WAN line is attached to a device which has a serious lack of netflow and/or CBQoS features, like:
    – A catalyst switch, e.g. 3400 (it does something, but not so much and it’s not so easy)

    – A device from your carrier, where you can do almost nothing, and you carrier has no interest in helping (or even letting) you make the best use of your bandwidth

    More complicated show-stoppers include:
    – IPsec traffic terminated on a different device, incapable of matching / pre-classify / mark
    – jumbo email from poor soul attaching big file to N recipients, and small important email from management lagging behind (I don’t think “match protocol smtp” has the ability to distinguish email addresses)
    This is to say that ability to manage bandwidth should be part of the requirements in the design phase. Having your carrier manage bandwidth for you can work, but usually doesn’t, for the reasons you pointed out.

  • http://twitter.com/marcgq Marc Edwards

    I can’t emphasize the importance of spec’ing the right size application server and database (including RAID calculations of writes) if sampling isn’t used. While Orion licensing for Netflow is grand and marketing boasts the capacity the product can handle, if the back end infrastructural isn’t up to the task, it will be just one more device needing to troubleshoot.

    • http://twitter.com/FerventGeek Patrick Hubbard

      Marc, great point. NetFlow can be a fire-hose, and submerge even big collection hardware if it’s not tuned. Fortunately you have lots of configuration options for SolarWinds and most other flow analyzers. These range from configuring NetFlow exports for sampling more like sFlow, discarding selected traffic at the collector, distributed collector architectures and tweaking the data summarization windows. (That’s my preferred method to make the most of existing hardware because I can burst-collect more detail while troubleshooting, then summarize earlier for day-to-day).

      Is that something you’d be interested in seeing in a follow-up? For example best practices for getting the most out of your flow analysis software?

  • Alex Korobok

    Wouch, a Russian phrase! Thnx from Moscow))

  • dude

    QoS? Pfft. I have 10 gig wan links you insensitive clod.

  • Graeme Danielson

    …don’t forget the egressing interface FIFO transmit buffer. 
    cbQos software won’t kick in until the interface transmit buffer (aka tx-ring) overflows; this is the outbound interface congestion you are attempting to control.  Every phys interface type is different, on some the tx-ring can be quite large.   Get to know your interface hardware with “show controller …”. 
    If you need to make your cbQos do more work you may (carefully) adjust the tx ring smaller with “tx-ring-limit …”  And as the software is doing more work keep an eye on CPU utilisation.

  • Nosy Ferret

    PENSIONERS GET LOST IN FESTIVE SEASON. Police and Social Security have warned that pensioners can get lost on the lead up to Chistmas after the discover of a 75 year-old Auburn dead in her apartment on Thursday.
    Mary Lin Fuan of Mission Beach Road had recently turned on Netflow and QoS while while her gas had been disconnected due to unpaid bills. She had been dead for ten days when found.
    Father John Bailey of Angels of Mercy said she was found reading graphs from her SolarWinds Traffic Analyser.
    “People can be seduced by pretty graphs and staticstics into thinking they know what is going on the network. The aged and the lonely are particularly succeptible” he said.
    “At this time of year it is particularly important that everyone take a moment to talk to their neighbour and get to know them” said Mr Bailey.

  • Prakash

    I am workign with Cbqos and i have 2 policys on the router configured, one for the policing and other for marking. below is the config

    policy-map POLICE
    class SIGNALING
    police 10000000 conform-action transmit exceed-action drop
    policy-map MARKING
    class signal
    set ip precedence 6
    set qos-group 6
    class voice
    set ip precedence 5
    set qos-group 5
    class oam
    set ip precedence 1
    set qos-group 1
    class misc
    set ip precedence 0
    set qos-group 0

    Now Policy marking is applied on multiple interfaces of the router IN direction

    when we see the cbqos Actions in tool, we see that interface is doing policing and marking also, but in config we see tht we have only marking done.

    After some troubleshooting, i saw tht cbqos object index which should be unique, its not below is the walk which clarifies it …

    same index value is been used for the multiple running instance of service-policy MARKING

    ciscoMgmt.166.1.5.1.1.2.16.1 :
    Unsigned32: 17223898

    ciscoMgmt.166.1.5.1.1.2.16.65536 :
    Unsigned32: 351686093

    ciscoMgmt.166.1.5.1.1.2.16.65537 :
    Unsigned32: 621087690

    ciscoMgmt.166.1.5.1.1.2.16.65538 :
    Unsigned32: 1939966601

    ciscoMgmt.166.1.5.1.1.2.16.131072 :
    Unsigned32: 351551825

    ciscoMgmt.166.1.5.1.1.2.16.131073 :
    Unsigned32: 651568212

    ciscoMgmt.166.1.5.1.1.2.16.131074 :
    Unsigned32: 1893000480

    ciscoMgmt.166.1.5.1.1.2.16.196608 :
    Unsigned32: 309953102

    ciscoMgmt.166.1.5.1.1.2.16.196609 :
    Unsigned32: 622917584

    ciscoMgmt.166.1.5.1.1.2.16.196610 :
    Unsigned32: 1895345379

    ciscoMgmt.166.1.5.1.1.2.16.262144 :
    Unsigned32: 378168697

    ciscoMgmt.166.1.5.1.1.2.16.262145 :
    Unsigned32: 572510174

    ciscoMgmt.166.1.5.1.1.2.16.262146 :
    Unsigned32: 1895460567

    ciscoMgmt.166.1.5.1.1.2.16.327680 :
    Unsigned32: 1593

    ciscoMgmt.166.1.5.1.1.2.16.327681 :
    Unsigned32: 1594

    ciscoMgmt.166.1.5.1.1.2.32.1 : Unsigned32: 17223898

    ciscoMgmt.166.1.5.1.1.2.32.65536 :
    Unsigned32: 351686093

    ciscoMgmt.166.1.5.1.1.2.32.65537 :
    Unsigned32: 621087690

    ciscoMgmt.166.1.5.1.1.2.32.65538 :
    Unsigned32: 1939966601

    ciscoMgmt.166.1.5.1.1.2.32.131072 : Unsigned32:
    351551825

    ciscoMgmt.166.1.5.1.1.2.32.131073 :
    Unsigned32: 651568212

    ciscoMgmt.166.1.5.1.1.2.32.131074 :
    Unsigned32: 1893000480

    ciscoMgmt.166.1.5.1.1.2.32.196608 :
    Unsigned32: 309953102

    ciscoMgmt.166.1.5.1.1.2.32.196609 :
    Unsigned32: 622917584

    ciscoMgmt.166.1.5.1.1.2.32.196610 :
    Unsigned32: 1895345379

    ciscoMgmt.166.1.5.1.1.2.32.262144 :
    Unsigned32: 378168697

    ciscoMgmt.166.1.5.1.1.2.32.262145 :
    Unsigned32: 572510174

    ciscoMgmt.166.1.5.1.1.2.32.262146 :
    Unsigned32: 1895460567

    ciscoMgmt.166.1.5.1.1.2.32.327680 :
    Unsigned32: 1593

    ciscoMgmt.166.1.5.1.1.2.32.327681 :
    Unsigned32: 1594

    can you help me out for the same where i need to look more to resolve it .. ..

  • Sudhish

    Does CBQoS work on Juniper firewalls?