Slow Down, It’s In The Details: A Story of BGP Peering Trauma

My new job sucked a little time away from me at the start, and now I am getting back on board with some nice new blog posts.

Much like any exam and from what I hear, the CCIE is all about what’s in the details. Here’s a real-world example. After a few short discussions and quite a few more emails with our ISP and account managers, we wanted to establish a BGP peering amongst sites. This is a private intra-site BGP peering leveraging the ISP’s internal network. Nothing overly complex – just a simple BGP peering. Outlined in the long email chain which all parties had was the ISP-assigned ASNs and our assigned ASNs.

In the outage window we attempted to peer and saw the error below.

BGP: 192.168.10.54 open failed: Connection refused by
remote host, open active delayed 14931ms
(35000ms max, 60% jitter)

Initially thinking it was an ACL blocking port 179 or something else along those lines, we looked into it. After checking our end thoroughly and confirming our configs were fine, we then looked back at the ISP. (I bet I am not the first to have to do this. ;) ) It ended up that the ISP configured an ASN on their router, but didn’t read their own supplied config diagram. The moment they corrected their ASN, peering came up, and we had connectivity.

Other examples where BGP peering can fail are listed below (references this Cisco document):

  • The neighbor statement is incorrect.
  • No routes to the neighbor address exist, or the default route (0.0.0.0/0) is being used to reach the peer.
  • The update-source command is missing under BGP.
  • A typing error results in the wrong IP address in the neighbor statement or the wrong autonomous system number.
  • Unicast is broken due to one of the following reasons:
    – Wrong virtual circuit (VC) mapping in an Asynchronous Transfer Mode (ATM) or Frame Relay environment in a highly redundant network.
    – Access list is blocking the unicast or TCP packet.
    – Network Address Translation (NAT) is running on the router and is translating the unicast packet.
    – Layer 2 is down.

What I learned from this, and what we can take from it:

This goes to show the importance of reading all the information before running in with half a picture. I am of the strong belief that knowledge is power and allows you to make sound decisions. Slowing down and reading everyone’s input in this case would have meant a much smoother migration and things working the first time.

Anthony Burke
ABOUT ANTHONY - Network Engineer, blogger and CCIE wannabe. I am a guest blogger on PacketPushers, my own content over at blog.ciscoinferno.net and on Twitter @pandom_