Resiliency of the networks is almost the most important design criterion which needs to be considered. Packets need to be reached to destination within the time expected by the application. Although too much redundancy will affect MTBF/MTTR curve directly and start to increase MTTR of the entire system, carefully designed network topologies will play a big role for optimal redundancy and resiliency.
If your family consists of 6 people, most probably only one bedroom will not be enough; this is exactly the approach which we should think while designing a network. This is true either for your own physical or virtual data center within your premises or within a cloud. If you are not building a service provider, you wouldn’t also build a house for twenty people , so scaling is the key.
Network topology planning is like building a floor plan that needs to be planned carefully. For predictable traffic flow, planning is essential.
Business requirements need to be understood well; traffic flows and applications should be identified. Then for better troubleshoot, maintenance, predictable topology and some design best practices should be applied such as Modularity and hierarchy.
In the past, some of the servers for the applications used to be put on site or at the HQ , but with the increase of bandwidth availability we commonly find them in the Data Center. If the company has remote offices and depends on the applications which are used , topology might be hub and spoke, partial mesh or something else. Alternatively, control plane traffic might be hub and spoke, but data plane is full mesh. For the spoke to hub you can always use traffic-engineering technique with PBR, packet filter, metric, tunnel, etc.Traffic manipulation is not covered in this article.
For instance, the company may have voice application, for the call control remote branches might reach to Data center which house the call manager/voice gateway but for the data traffic you might reduce the latency and build spoke to spoke connection.
This can be done either statically or dynamically. As you might guess DMVPN is a very good example for dynamic creation. (Phase 2 or 3).
In this article I will mention highly scalable and resilient, network design topologies. Let’s briefly review the conventional and common core design topologies, and then jump to our main topic which is not so common core topologies. Some topologies might be used in Data Center or even on campus. My intent is to give as much as alternative topology as I can give and readers match that to their network based on business and technical requirements, application and traffic flows, security, Qos or multicast need. In this article I will not deep dive into these requirements or specific technology. I will give general description, then strengths and weaknesses’ of each topologies.
Ring topologies are the simplest to deploy and easiest to scale topologies. Resiliency can be achieved at the physical layer. Convergence of the ring topologies are generally slow compared to other alternatives such as partial mesh, full-mesh and diverge planes topologies.
Figure-1 is traditional ring topology, adding a new node is fairly simple, traffic flow is predictable and with dual-ring redundancy resiliency can be improved. Besides, convergence is fairly slow. Assume A-B link failed, if A sends the traffic to D, until control plane converge, since primary path of D is A, packet is dropped for distance vector and looped for link-state protocol. There is a mechanism to send the packets to either D, or tunnel to node B while bypassing the C. These are covered in here.
Full Mesh Topologies:
If every node is connected to each other in Figure-1,then it is called full-mesh. Since from every node to another there is only 1 hop, finding an alternate path and redundancy is very high. This amount of redundancy can easily negatively affect the overall resiliency and convergence. Ideal connectivity between the two nodes for the higher layer protocol is generally accepted as 2.
Full mesh topologies are most expensive to deploy and most difficult to scale. If we add a 5th node to the network in Figure-1, since connectivity is necessary from every node to 5th node, scaling at the physical and at the network layer is very difficult.
Full mesh topologies can be created only at the network layer while overlaying physical layer. One example is BGP. BGP needs to be fully meshed inside an AS but network might be either ring or partial-mesh at the physical layer.
When we virtualize at the network layer although physical links and ports at the network nodes reduce, maintenance, scalability, troubleshoot and operational complexity stays as an issue.
Partial Mesh Topologies:
Partial mesh can be thought as variant of the Ring topologies. In Figure-1 If there is low latency requirement for the application between A-C, connection of the direct link between node A and C creates partial-mesh topology. Predictability decrease, but since it turns to triangle design, convergence increase compared to the Ring topologies.
Also, if there is a different type of traffic in the network, path diversity might be necessary. Assume there are two types of traffic, high bandwidth and low latency; high bandwidth traffic is between Node A-B. Low latency traffic flow is necessary between A-C. In Ring design, if we send both types of traffic through link A-B, low latency traffic might suffer.
Rather, if we use direct link between A-C and send low latency traffic through direct link, utilization at the A-B does not affect low latency application.
So far we have covered common type of Wan or Data center core topologies. But all these topologies are single core single data plane topologies. If there is IGP, only one instance of the protocol is used, if there is BGP, again only one AS is used at the core. Failure at the control plane might affect the entire network, and whole network might fail. Another requirement might be software or hardware upgrade without impacting the network and any of these topologies doesn’t match with the requirements.
Dual Plane – Disjoint Parallel Planes:
This design topology can be created with totally separated data planes. From enterprise customer’s point of view, it can be achieved by receiving two different layer2 services from two different service providers.
SRLG links need to be carefully identified between the providers and if there are shared links, diverge links need to be demanded.
Control planes are also disjoint between the planes in dual plane topologies. Separate IGP and separate BGP is used to protect the one plane from failure at the other plane. OSPF is on one plane, IS-IS is on the other plane that can be used.
Since with the layer 3 VPNs, control of the core networks entirely depends on service provider, even two layer3 service from two different service providers are not a disjoint parallel plane design.
Same thing is not true for overlay. If Enterprise receives layer2 service from one provider and Internet from another provider, this can be dual plane or disjoint parallel planes design. Enterprise can create its own VPN through Internet. Still data path separation needs to be ensured.
Also some networks use different vendor equipments at the two planes. Separation of the control plane is essential and protocols are not redistributed anywhere at the network. Careful filtering along the edge is critical for Dual Plane topologies as you will see below.
Routers A, B, C, D, and E are one core; Routers P, Q, R, S, and T are the second. These two cores are not connected in any way –there is no way to forward traffic from Router D to Router S, for instance, without passing through either Router Y or Z. Routers Y and Z, however, are configured so routing information is not shared between the two cores, to prevent traffic from passing through them between the cores.
Filtering at Router Y and Router Z is very critical to prevent routing between the planes.
Same BGP AS also can be used with the Dual Plane topologies. Recommended design is single BGP since it might be challenging especially in MPLS environment. Let’s assume We are using Dual Plane design and different BGP AS in the core.
Since POP to Core BGP neighborship will be EBGP, Inter AS MPLS solutions might need to be used.
Thus brings operational complexity. Even using same BGP with two IGP is operationally complex, two BGP and two IGP would be nightmare from the operational point of view, troubleshooting would be hard, MTTR would be increase incrementally.
It may even require two different teams to operate network. In each plane ring or partially mesh topology can be used. Thus convergence of dual plane topologies depends on each underlying topology. While one plane is ring, other plane can be partially-mesh.
This actually gives opportunity to segmentation. While low latency application is sent through partially-mesh plane, high bandwidth application is sent through Ring topology plane.
One very important benefit of dual plane is service deployment. Let’s assume new MPLS traffic engineering service is deployed at only one plane, if this new service causes a network failure, or a service degradation, critical network traffic can be quickly moved to unmodified plane along the network edge. Service can be restored without backing the new service out of production, and can give time to network engineers to troubleshoot the problem.
As we can see, if fast service adoption, high availability (5 9s and more) and business continuity is necessary, dual plane is an attractive design.
Also deploying different vendor equipments at the planes prevent vendor lock-in. Besides it comes with a cost, cost of the second planes, operational complexity, sub optimal traffic flow and convergence depends on underlying plane topology.
Dual planes are a solid choice for those situations when the network simply cannot fail –ever-and where costs aren’t as much of an issue. These are complex, difficult to manage topologies reserved for the largest of the large, where data processing is such an integral part of the business that a network failure costs more than the cost of deploying and managing this type of network
MULTIPLANAR – DIVERGE DATA PLANES:
Multiplanar design has similar characteristics with the Dual Plane designs. In Multiplanar We have two or more disjoint data planes but only one IGP and one BGP instance is used. To connect planes, shunt links are used between the planes. Operational complexity and hard adoption for some services such as MPLS was main concern with the dual plane topologies.
Since there is only one IGP used at the core, convergence along the edge can be improved with IP/LDP based fast reroute technologies.
Services along the edge move to the appropriate plane based on the underlying plane topology like in dual plane core. Low latency application can be sent through partial mesh; high bandwidth high throughput can be mapped to the Ring based topologies.
Shunt links are configured with the high metric at the network layer to prevent inter planes traffic and only it is used as a last resort. If failure happens within one plane , it should be recovered within the plane first. Depends on the location of the failure , if topology cannot be recovered within the plan , then recovery should happen along edge to the core. Then lastly shunt links can be used as a last resort to recover.
While in Dual plane designs, control plane failure at one plane does not impact another plane, same thing is not true for multiplanar designs. Since there is only one control plane for entire core, control plane failure might affect the entire network. CAPEX and OPEX of these both designs are pretty high compared to single plane design. Redundancies for both designs are achieved with redundant planes. Thus try resiliency within the plane instead of between the planes removes value from the overall design.
Dual and Multiplanar designs need to match the business requirements and network topology needs to be fit for these types of designs. Multiplanar designs are very similar to Cube or stacked cube designs but for cube designs shunt links metric are normal and can be used even in steady state. Cube, stacked cube, hypercube and many more core design alternatives are explained in the new book of Russ White.