Continued from Part 1…
RSVP-TE Hello Protocol (Hello, It’s ME..)
So we discussed earlier that how RSVP is a soft-state protocol and needs periodic refresh messages (PATH and RESV) to maintain the session state. Now lets say If a Node goes down for whatever reason, we have to wait until the refresh messages are timed out before we can release the resources occupied by the session crossing that node. If we want a quicker resource release, then we can decrease the refresh timer which will result in periodic refresh messages sent more frequently and in that way a failure can be detected quickly resulting in a quicker resource release. However, this will consume in large amount of resources. Essentially, with this approach we solve one problem but created another one.
A better way to solve this particular problem was to learn from other protocols like IGP ‘s and implement an Hello extension to detect neighbor router failures and that’s what exactly happened. The Hello extension introduced the concept of RSVP adjacency which didn’t exist before. Earlier the routers were only aware of the status of the interfaces running the RSVP protocol (operationally up or down) and the state of the LSP-Path (or, more accurately, the state of the RSVP sessions). With the introduction of HELLO extension, RSVP started behaving like other protocols and can detect neighbor failures by constantly sending HELLO messages (REQUEST object) and expect an ACKNOLWEDGEMENT object in return. When several HELLO packets are missing, the neighbor is declared down and all the RSVP sessions are cleared. One can set faster interval for RSVP Hello messages as there are only two routers involved in RSVP peering routers compared to reducing the RSVP Refresh timer which will affect all the RSVP sessions crossing the midpoint router. Though keep in mind this only solves the Problem of detecting a neighbor failure faster.
Reducing RSVP Refresh Overhead:
So if we have a large number of (Tens of thousands) of RSVP sessions at a midpoint router then we start facing scaling issues like the Router has to handle tens of thousands or more PATH and RESV messages in steady state, network failure and re-optimization situations. This will result in too many incoming or outgoing messages which can cause the queues to overflow. And If that results in any delay or loss of PathTear or ResvTear messages, it may cause the router to hold resources for unused sessions for the relatively long time which will prevent the other legitimate LSP-Paths to use those resources.
(Note: From a CPU/Memory perspective, we shouldn’t have any issue’s in todays world.)
RFC 2961 describes few RSVP-TE extensions to reduce the RSVP messaging overhead. Refresh reduction is based on not removing the requirement to refresh RSVP state, nor on changing the interval between refreshes. Instead, the focus is on reducing the amount of processing required by both the sender and the receiver of a state refresh message and bring efficiencies in the process. Below are the three major enhancements:
- RSVP message bundling: This allows multiple RSVP messages to be packed together as a bundle within a single IP message. A new RSVP bundle message is introduced which consists of an RSVP message header followed by one or more RSVP message.
- Reliable message delivery: Three new objects are deﬁned to allow more efﬁcient processing of unchanged refresh messages, MESSAGE_ID, MESSAGE_ID_ACK, and MESSAGE_ID_NACK. The reliable message delivery mechanism is done on a per-hop basis.
- Summary-refresh: A new Summary-refresh message is introduced to allow partial transmission of the refresh message by encapsulating a list of message identiﬁers with the same values as the ones in the MESSAGE_ID object of the refresh messages.
The first and second are independent (although they may be used together), but the third extension builds on the second. The support for these new extensions are signaled in a new Flag setting (Refresh-Reduction-Capable) in the flags field in the Session Object.
RSVP Message Bundling
The RSVP BUNDLE message is a new type of RSVP message and is used to aggregate messages to the same neighbor into a single larger RSVP message, rather than sending messages individually. The benefit with this approach is we get a more efficient transmission as RSVP and IP headers are not required for every message included within the bundled message (so we save sending few extra bytes).
Sampel Capture of RSVP Bundle message.
On the flip side, we have to keep in mind that in order to bundle multiple messages together a sender must have at least two messages ready to be sent at the same time. This could be hard to implement as you don’t want to hold on to one message in the hope that another message will need to be sent soon. It’s also possible that it may damage the randomization of state refresh messages to deliberately bunch refresh messages into a single Bundle message.
Reliable Message Delivery and Reduction in Overhead Processing
Here we will look at two problems with RSVP. One problem we will touch is the overhead of processing these RSVP refresh messages and the other is that it didn’t have a way to reliably deliver an RSVP message. Let’s take a look at the processing overhead problem for RSVP messages first. So when an RSVP node receives a PATH or RESV message, it has to distinguish between the three cases:
- Is the message for a new flow ?
- Is the message a change to an existing flow aka modification?
- Or is the message is a refresh message.
For #1, it’s easy to distinguish if it’s a new flow or not as there will be no matching stored PATH or RESV state for a new flow. For #2, RSVP messages will contain changes in one or more of the objects (e.g. SENDER_TSPEC) associated with messages when compared to the previous message received (Please keep in mind that a PATH or RESV message contains various objects and a modification can happen in any of the associated objects within a message). Now the problem is that in order to differentiate between #2 and #3, every time a refresh message is received, Router has to compare the message with the previous message– since the order of objects in a message may vary without affecting the meaning, A Router can not simply compare the whole message as a block of memory, but has to compare the objects one by one to make a decision if it’s a Refresh packet or Modification . This introduces an overhead in processing these messages.
This overhead of processing these messages is addressed by Refresh reduction by introducing a message identifier field for each message. Basically the idea is that we assign a numeric value to each message, and if the numeric value of the RSVP message received is the same as the previous then nothing has changed, if it has increased then probably something has changed and if the numeric value is less than the previous message then its an old message which probably got stuck somewhere in the network. The message identifier also has an epoch field (which could be just random number) and is generated each time a node reboots or RSVP restarts. If the epoch number has changed, then it helps the neighbor to distinguish the case where a Node/process may have been started and had started reusing the message identifier values. So that’s how we solve the problem of overhead associated with RSVP refresh messages.
Now the other problem with the RSVP message delivery mechanism was that it didn’t have any way to ensure that the messages are delivered successfully. The way you can make it reliable would be by introducing some kind of ACK mechanism sent by a receiver once the message is received to the sender, but in order to do that you have to have a way first to uniquely identify a Message so that the sender knows that an ACK is in a response to that message. Message ID solves that purpose as well in addition to solving refresh overhead as we have a numeric value assigned to each message.
In the Message Identifier Object, it has a flag that requests the receiver to acknowledge receipt. This acknowledgment is carried in a Message Ack Object. If there is no message being sent in the opposite direction, the receiver must still acknowledge the received message identifier as soon as possible. It can do this by sending an Acknowledgement message that simply carries the acknowledged message identifiers, The sender of a message carrying a message identifier that has requested acknowledgment retransmits the message periodically until it is acknowledged or until it decides that there is a problem with the link or with the receiving node. Retransmission is relatively frequent (roughly every half a second), so it is important not to swamp the system with retransmissions. RFC 2961 suggests that the sender should apply an exponential back-off, doubling the time between retransmissions at each attempt. It also suggests that a message should be transmitted a maximum of three times even if it is not acknowledged (that is, one transmission and two retransmissions)
Sample packet capture showing ACK Desired flag set for a RESV message and the ACK message sent in response to that.
This extension builds on Message IDs introduced earlier. One of the biggest contributors to the RSVP signaling message traffic load is the use of PATH and RESV messages to refresh the established LSPs. A Summary-refresh extension eliminates these refresh messages while refreshing all LSPs using the new, more efficient Summary-refresh messages. The Summary-refresh message uses only the message_identifier value received from the RSVP messages with the MESSAGE_ID object. This significantly reduces the signaling cost compared to the traditional method of using the same PATH/RESV messages for the LSP establishment to refresh the LSP. The message_identifier is only 4 bytes, compared to the PATH message, which is usually a few hundred bytes for each LSP-Path.
Upon receiving the Summary-refresh message, the router will refresh all states with matching ID values. The IDs are matched based on the source IP address of the message, the object type, and the message_identifier value. If there is no matching value in the LSP database, the receiver sends the RSVP messages back to the sender with MESSAGE_ID_NACK to indicate that no matching entries were found in the state database. When the Summary-refresh message sender receives the MESSAGE_IN_NACK object from the receiver, it performs a local check of the state database against the listed message_identifier values. If there are any matching entries, a regular PATH and/or RESV refresh message with the corresponding message ID value is sent to refresh the state again.
Sample capture of RSVP Refresh message.
So we started with some very basic definitions and concepts, and then looked into the LSP setup process. Then we looked at the various extensions introduced by RSVP Refresh reduction to solve pain points with RSVP-TE scaling. I hope this post was somewhat useful to you.
- Designing and Implementing IP/MPLS-Based Ethernet Layer 2 VPN Services: An Advanced Guide for VPLS and VLL by Zhuo Xu
- The Internet and Its Protocols by Adrian Farrel