This is Part 2 in a special series looking at the silicon and hardware inside your network device. Although software will be at heart of network innovation, it will still run on hardware and it’s time to expose the internals of our network hardware and understand the hardware architecture inside a typical device. Many people are surprised to find that CPUs, memory, storage and buses are similar to computers while the forwarding engines are rather spectacularly different.
Thanks to our guests for working very hard to bring this show to you.
- Find the outbound interface – quickly
- CAM/TCAM – Lookup the cells based on their content – Single lookup – many times the number of transitors of DRAM. (One example: 16x transistor count, 90x the footprint)
- (For lots of switches out there today, the “CAM” for L2 is actually a hash table)
- TCAM for Longest prefix match routes. CAM for host routes. (does anyone actually do this?)
- Fast RAM and suitable data structure (eg some sort of tree for longest-prefix match) – scales to much bigger forwarding tables, but can be harder to design. (Lots of variation in what exactly “fast RAM” means – RLDRAM is one example today)
- Actually a lot of non-obvious art in a state-of-the-art FIB – things like fast reroute, prefix-independent convergence, multiple of levels of load-balancing etc all have major effects on the silicon design
- Where to store the packets, whilst their headers are being processed. (Lots of buffering required in distributed systems)
- Generally DRAM – Range from Massive dedicated buffers (N7K 46MB~, to 9KB of buffers for the 48Ports Trident)
- Normally carve packets into fixed-sized chunks and use hardware to maintain the linked lists – works well, but means you won’t get 100% usage
- Things like access lists, QoS operations, policy routing etc all require classification by more than just destination IP address – often done using TCAMs, but other approaches also used (eg compiled decision trees and fast RAM)
- Like graphics cards, packet forwarding is easiest when it’s embarassingly parallel, so packets are independent (other than avoiding re-ordering of individual flows). Features that break that (eg stateful firewall, NAT) necessarily impose a performance burden (especially if you need the dynamic state syncing to a standby device)
- Shaping, especially complex hierarchical queuing, is one example of something that general purpose CPUs are especially poor at, and dedicated hardware is massively more performant.
Non-volatile memory – Flash (maybe removable) or internal USB.
- Holds your bootloader and OS, need this to boot.
- SFP/QSFP Transceiver and Cages
- Signal conditioner chips. (Often called PHY or SerDES)
- Glue the serial signal from the transceiver to the MAC/PHY layer on the ASIC
- Potential for major tangent here on ‘What’s a PHY, what’s PHYless’