Anycast HSRP and Design Considerations

HSRP is the first hop redundancy Cisco property protocol which allows a transparent failover of the first-hop gateway. Many technologies have been slightly modified to use it efficiently. In this article although Anycast hsrp will be explained but first I want to first explain how basically HSRP works.

 

HSRP has Version 1 and 2. The difference between version 1 and 2 is; version 2 supports MD5 authentication. Version 1 and 2 use different virtual MAC address range which can be important for OTV implementation. Also BFD support of version 2 is important for efficiently detecting failure of active HSRP router since tightening control plane timers can affect performance negatively. Many bfd implementations are supported on the data plane.

 

hsrp

Figure-1

In figure-1 HSRP v2 is enabled between two gateway devices. HSRP uses one virtual IP and one virtual MAC concept. (Same like VRRP, different than GLBP) Within distributed control plane network design, like in figure-1, one router will be elected active and other will be elected as standby based on the priority value. Bigger priority wins since Hsrp is a layer 3 protocol.(Yes you can generalize if it is layer 2 lower wins)

.

If control plane would be centralize like in Cisco VSS you would not need to deploy Hsrp, since the SVI would be hosted on both devices and both devices still would actively forward the traffic.

 

For the data center, since the idea was implementing FCoE on the N7K and keep the fabric topology as separate, MLAG has been implemented in slightly different way by the Cisco. On Nexus 7K, VPC is used and it is different than VSS. Difference is important from the first hop redundancy protocol point of view, since the VPC is used on maximum two devices and both devices have their own control plane.

 

Separate control plane and separate data plane with VPC, so we need first hop redundancy protocol with VPC. Unfortunately GLBP is not supported but both HSRP and VRRP is supported by the VPC.

 

hsrp_isolate

Figure-2

 

Figure-2 shows one pair Nexus device in each data center. Although picture show the right Nexus as standby-hsrp, it still forwards the traffic since same Virtual IP and virtual mac is used actively by both devices. The picture also shows PACL (Port ACL) between the data center. First hop redundancy protocol isolation is important concept, so let me explain.

 

Assume north-bound traffic is coming to Site-1, but the destination is in Site-2 and for optimal path selection from the north site of network nothing implemented. (LISP, IP Mobility, DNS). Although traffic pass through data center interconnect link and reach the destination in the Site-2, we at least want to send return traffic directly from site-2 to north to prevent triangulation.

 

FHRP isolation might be seen that solves every problem,  but in real life, case may not be so easy. Assume you have stateful devices in front of gateway devices, if traffic hits site-1 and site-2 devices don’t have a state in their table, traffic would be dropped. So either you will accept triangulation and implement source-based NAT on devices or you will carry the state information (Cisco ASA Cluster) although there are arguments that it is not a good idea if data center interconnect link fails.

 

Let’s turn to our main topic. So far we have covered classical distributed control plane switches without layer 2 multipath ( MLAG in this case ) and hsrp interrelation, then with VSS we covered centralized control plane, lastly I explained distributed control plane with MLAG which is VPC.

 

The limitation of VPC is only two switches can be act as a one logical device. But if you want to get rid of spanning tree at least in the core of architecture, more scalable design so more host port might be supported then large scale bridging can be an option although after network overlays such as VXLAN, NVGRE there are a lot of discussion about it. Fabric path is the large scale bridging solution of Cisco.

 

fabric_path

Figure-3

Anycast hsrp is applicable to fabric path. We are not limited to maximum two devices like in VSS or VPC. Also Juniper and HP can support more than two devices for their MLAG solution without large scale bridging and leaf/spine architecture.

 

Beginning with the release 6.2(2) Cisco support anycast HSRP on Nexus 7000, so for layer 3 forwarding at the spine layer, limitation is not two anymore. I know some discussion for fabric path and its layer 3 forwarding limitations, so it is important to have this feature if you decided to implement leaf and spine architecture and Cisco as a vendor.

 

First, all leaf and spine switches has to have anycast hsrp feature in their software so code upgrade might be necessary. And they support Hsrp v2 since anycast hsrp works only with hsrp v2.Code supports up to four devices as HSRP gateway maximum for now.

Behind the science, anycast switch ID is advertised by the spine switches and IS-IS calculates the cost running SPF to the switch ID and can use all four nodes so layer 2 ECMP is achieved.

 

After you enable anycast HSRP, one device will be selected as active, one standby and all the other devices will be in listen mode. The difference is all nodes will respond with the same virtual mac and will forward actively.

 

If the devices which are not in active state can forward the traffic why then one device is in the active state and other is in standby, both for VPC and fabric path topologies ?.

 

Assume you have device connected to only standby hsrp device as orphan port which means not connected to VPC. The traffic for that device will be handled by the active HSRP device, so VPC peer-link will be used for those devices which are not connected to VPC.

 

References:

http://www.cisco.com/c/en/us/td/docs/switches/datacenter/sw/6_x/nx-os/fabricpath/configuration/guide/fp_cli_Book/fp_interfaces.html

Orhan Ergun
Orhan Ergun, CCIE, CCDE, is a network architect mostly focused on service providers, data centers, virtualization and security. He has more than 10 years in IT, and has worked on many network design and deployment projects. In addition, Orhan is a: Blogger at Network Computing. Blogger and podcaster at Packet Pushers. Manager of Google CCDE Group. On Twitter @OrhanErgunCCDE
  • dumlutimuralp

    Great article and also the summary ar the beginning should be really helpful for most of the engineers.
    Disclaimer : I work for Brocade.
    This is not a marketing message, instead to let all know that there are alternatives worth mentioning regarding first hop redundancy, end node connectivity scalability and also the “planes” of operation.

    Within all the Fabric offerings, Brocade VCS fabric ( IETF TRILL based) characteristics are :

    – Control & Data Planes are totally distributed, Management Plane is centralized (Logical Chassis)
    – could be up to 32 switches in one fabric
    – a VLAN could have four active active default gateways simultaneously (this is supported since August 2013)
    – an end node could be connected to eight different fabric switches with up to 64 ports in total (call it Virtual Link Aggregation (vLAG) , Port Channel or etc)

    • http://www.packetpushers.net,http://www.networkcomputing.com Orhan Ergun

      Thanks for participation. Since in this post I wanted to explain the Anycast technology from the Hsrp point of view I didn’t want to compare vendors fabric implementation.Yes I am planning to have a podcast for data center fabrics and In addition to Cisco, Juniper, HP, Arista , Brocade will be talked for sure. But since you have mentioned already, let me briefly write my opinion about specifically for VCS Fabric.

      More or less all of the vendors supports layer 2 multipath and also layer 3 multipathing up to four active-active forwarders for off-vlan. Four is not a technology limitation it can be expanded.I believe L3 multipath support newly came with the new code of Brocade before it was not there.

      Yes Brocade has TRILL like data plane encapsulation and property control plane FSPF (I will discuss this in the podcast for sure). VDX of Brocade is very similar to Nexus7000/5000 in nature, one thing is very obvious that Brocade has really nice load balancing on the trunks.Per packet load balancing from the same flow might be possible since they measure RTT and can delay the packets, you will not reassemble the packets at remote site, so elephant flows would not be an issue ( It reminds me always OSPF overlapping relays for MANET networks somehow ), so this is interesting.

    • dumlutimuralp

      Hi Orhan, yeap it makes sense. Since other vendors are mentioned that was the reason I wanted mention my employer.
      Yeap, four is totally not a technology limitation, it is more of a scalability practice. That is why all vendors are working on to scale more either through proprietary solutions like Cisco DFA or NSXs distributed control plane implementations. What we do as Brocade is invest more and more and provide knowledge about the same solution which is called VCS Fabric.
      As I mentioned, L3 multipathing, moreover, the whole Layer 3 code came on board back in Aug 2012 as I mentioned.
      About Brocade` s Trill and Cisco` s TRILL, I am sure you already know this; it is totally opposite between Cisco and Brocade. Cisco uses MAC in MAC in data plane and IS -IS in control plane. Brocade uses exactly TRILL encapsulation in data plane , not TRILL like encapsulation and uses FSPF in control plane, since they have been using this protocol since 1996 the year it was founded. So it is a really mature protocol and has been inherited even by our competitor for their SAN switch space.
      About frame based load balancing , it is not based on RTT, it is a chipset level proprietary technology as Ivan perfectly summarized in his “Brocade has almost perfect load balancing” blog post 3 years ago. When I read that blog first time, I thought why invest in so much in this, but I didnt know that this was supported since day 1 back in 1996. Like I said this is chipset level and we assure in order frame delivery. That is the reason even Ivan mentioned it as almost perrfect.
      Actually the attractive side is ,
      in L1 multipathing we use ASIC level technology.
      in L2 multipathing we use the regular Flow based distribution but taking the number of flows at any time and distribute the flows with proportional to the capacity of the ISL trunk (that is what we call Link Aggregation between VDXs themselves)
      in L3 multipathing it is up to four gateways now, but hopefully it is gonna scale for more cause eventually all technologies push the intelligence to the edge and requests the characteristics of a fully distributed first hop gateway functionality.

      • http://www.packetpushers.net,http://www.networkcomputing.com Orhan Ergun

        I would rather to talk about it on podcast further but only couple points. After a podcast I would like to see your comment again. Cisco is not opposite, at least from the data plane point of view, they did not choose SPB, SPB-M or SPB-Q,Cisco’s data plane is trill like fabricpath for large scale bridging. It is not trill since you don’t have layer 3 header, so you can not use classical switches between the leaf and spine nodes, but yes brocade use at least slightly modified TRILL in the data plane , you can use classical switches but in both case, you will need new chipset. (Cisco,Brocade)

        This bring us to discussion, if you have already invested, would you change your gears. It may not make sense.For greenfield?. My answer would be any fabric based architecture will not live long as Gartner also says.

        Rick Mur and I will discuss in the podcast about Virtual Network Overla, MLAG , Fabric based architecture, Openflow/SDN. Please follow me on twitter @OrhanErgunCCDE, Thanks Dumlu for your participation again.

  • jsicuran

    Good article, for a client DC migration I actually tested this with 6.2(2a) FabricPath multi topology and using a L3 F2 based VDC with SVIs for Anycast points. Works well.

    Fabric path packet 4.2.1 Endnode ID field
    As of NX-OS release 6.2(2), this field is not currently used by the FabricPath implementation. However, the presence of this field may provide the future capability for a FabricPath-enabled end station to uniquely identify
    itself, allowing FabricPath-based forwarding decisions down to the virtual or physical end-station level. – now that would be cool.

    Kinda funny where this stuff goes.
    HSRP was Cisco proprietary at first then VRRF
    FabricPath was Cisco proprietary at first then full IETF Trill soon
    EIGRP was Cisco proprietary at first now Open EIGRP?

    For Fabricpath if he use of the end

    • dumlutimuralp

      There is a difference with TRILL; the main designer/pusher/designer of TRILL is Radia Perlman (mother of the internet, inventor of IEEE 802.1d STP, many link state routing approaches.

      She actually went to IEEE first but they said that they have SPB already; so TRILL came out from IETF.