In the first five parts of this series we covered all the steps necessary to distribute QoS and monitoring to a large backbone. I guess at this point I should mention that this technology has a name (and acronym, of course.) Cisco calls it QoS Policy Propagation through BGP (QPPB.) I hope these blog posts have helped in showing all the necessary steps from beginning to end to make something like QPPB work.
When we left off in Part 5 we had developed all the steps to apply the QoS and monitoring, but we assumed that somehow, somewhere in the network we had tagged all our BGP prefixes with the appropriate community string. Let’s dive in and show this last step.
Somewhere in your network you learn prefixes from a peer. If that peer is an organization outside of your administrative control the best protocol will probably be BGP. BGP has many more dials and knobs to turn for this kind of arrangement. You can limit the number of prefixes you learn and you have a lot of control over misbehaving peers with route dampening. Some good examples would be a connection to an extranet partner or the Internet. But it doesn’t need to be a different Organization Unit (OU). You can do this with your own prefixes if you like.
I worked at an ISP at one point and as you would expect we learned customer routes via BGP. In this Cisco code I define what prefixes I will learn from the customer and mark them with a community string. Your organization can setup whatever designations you like. Let’s assume we have 65000:9 set aside for “Gold QoS” traffic.
ip prefix-list CUSTOMER_492_PREFIXES seq 5 permit 192.0.2.0/24
route-map CUSTOMER_492_IN permit 10
match ip address prefix-list CUSTOMER_492_PREFIXES
set community 65000:9 additive
The route-map syntax is very flexible. You could mark some prefixes from the customer with the Gold community string while others could be marked as Scavenger Class (or anything in between.) Your traffic classes can be setup for accounting purpose instead of QoS purposes, as we have seen with the traffic_index parameter.
Apply Neighbor Policy
From here we apply the route-map to all the prefixes we receive from the eBGP neighbor. At the end there is an implicit DENY which will drop all prefixes not specifically defined.
router bgp 65000
bgp maxas-limit 40
neighbor 198.51.100.2 remote-as 65030
neighbor 198.51.100.2 description < Customer 492 >
neighbor 198.51.100.2 password zkiXg9KGiD
neighbor 198.51.100.2 ttl-security hops 1
neighbor 198.51.100.2 route-map CUSTOMER_492_IN in
neighbor 198.51.100.2 maximum-prefixes 8
I’ve added some safety mechanisms here that are always good to have in place.
Max AS Path Length
If you are prepending your path more than a dozen times, put down the console cable, step away from the router and ask you mom or other qualified network engineer for help. Here I have a super conservative 40-hop limit (legit paths should really be shorter.) There have been some bugs and DoS attacks against BGP using super-long BGP paths. Cymru.com reports the longest AS path as of 2014-March (not counting prepending hops) is 12, so setting a max of 40 seems like a good figure.
The TTL-Security syntax is a fantastic improvement to the old way of simply checking that TTL=1 in the default syntax. With the old security check the legitimate remote router would set the TTL to 1 and you would receive it as 1. This essentially bounded the diameter of how far the BGP messages could travel in the network. Also, good actors that are properly setting their initial TTL=254 would not reach your router with a TTL=1 (unless they were actually 254 hops away!) If you are the operator and you had some reason to connect to an eBGP peer that was more than one hop away you could widen the diameter with the neighbor ebgp-multihop command, but otherwise you were forced to peer with routers that were only one hop away (typically a good practice.)
But bad actors are another story. If they found out they were exactly nine hops away they could set their TTL=9. When it got to your router the TTL would equal 1. They have now defeated this “security” mechanism and you have to burn CPU cycles deciding that they don’t have the proper MD5 hash (or they do and you are now completely pwned.)
The ttl-security command completely solves the problem. With this setup both legitimate BGP peers set their TTL=254. When it gets to the remote peer it will be checked to make sure it is still 254. The MD5 password will be checked and all is good.
If an attacker is N number of hops away they cannot defeat this new system. If the attacker sets the TTL=254 it will be decremented to TTL=253 at the first hop. Unless they also control all the routers in the path, each router will continue to decrement the TTL as the packet travels to its target. When it gets to your router it will be 253 or lower, depending on how many hops away they are. It will immediately fail the TTL test and be discarded. Since the simple arithmetic of “greater than or equal to 254” is much easier than computing an MD5 hash, your router can quickly deal with any kind of brute force attack without burning a lot of CPU cycles.
Moral of the story: use TTL Security on every single platform you have that supports it for every eBGP peer. There’s no downside except one line of code and lots of upside for your security posture.
You use an MD5 hash on all your BGP peers, right? Hopefully you are choosing a password with high entropy and not just “password.” In my code samples I always like using passwords that resemble what I would use in real-life. Using “ABCD” in code samples just encourages bad behavior.
Max Number of Prefixes
I always specify the maximum-prefixes syntax. It’s a safety valve in case something goes wrong. The real safety mechanism is the prefix-list but this serves as a nice check against human error. In this case, eight prefixes were about twice the number needed for the typical customer so I just use it everywhere.
Checking the AS Path?
What is missing is a check to make sure the peer is not advertising AS numbers in their path other than their own. I’ve read in many areas where you should always check the prefixes they are sending and the AS path they are sending. I don’t agree.
The prefix check is obvious and necessary. You don’t want your customer advertising a subnet that they do not own. If they announced IBM’s or Google’s or anyone’s prefix other than their own it would be a huge security threat and you would be complicit in a massive routing train-wreck. We cover ourselves with the prefix-list syntax.
But the AS path is different. The neighbor statement on your side will always tack the 65030 AS number to the front of the path. The customer cannot override this behavior. If they send the <null> AS path you will tack on 65030. If they send their own AS number you will tack on 65030 again. If they send fifteen duplicates of their own AS number, you will tack on a sixteenth copy of 65030. They can never misidentify themselves (accidentally or maliciously) as anyone other than AS 65030.
What if they put a copy of Google’s AS number in the path they advertise (and then you tack 65030 to the front)? It will just make their AS path one hop longer and therefore less desirable. The data plane will always still route their packets to you first, and then to AS65030. It will never route packets to Google because it will route it to you first. This might be a bizarre decision on their part, but it will not harm Google.
Hey, maybe the information is actually correct. Maybe to get to prefix X it goes through the customer and then goes to some Google datacenter. Maybe the customer wanted to represent this logical topology with the AS path they are sending you. Trying to protect yourself from what you “think” is the proper internal structure of the customer’s network doesn’t always work out. Sometimes those customers try some crazy things! I’m interested to see if anyone disagrees, so feel free to leave a comment.
That’s it. We kind of poked around some BGP topics that had nothing to do with QPPB but we did end up marking the prefix with the community string of 65000:9. We used the additive keyword because we didn’t necessarily want to obliterate any of the preexisting community strings.
With the community string set we can now do all the fancy QoS and accounting that we discussed in the first five parts of this blog. It’s just as scalable as BGP itself so the QPPB policy can happen locally or across a backbone of thousands of routers. It would be tough to accomplish that kind of distributed QoS/accounting otherwise.
Hope you enjoyed this analysis of the qos-group, IP Precedence and traffic_index tags in CEF. If your organization needs any help implementing this kind of design please feel free to contact me. I’m most easily reached at [email protected]