Show 187 – The Silicon Inside Your Network Device – Part 2

This is Part 2 in a special series looking at the silicon and hardware inside your network device. Although software will be at heart of network innovation, it will still run on hardware and it’s time to expose the internals of our network hardware and understand the hardware architecture inside a typical device. Many people are surprised to find that CPUs, memory, storage and buses are similar to computers while the forwarding engines are rather spectacularly different.

Thanks to our guests for working very hard to bring this showthe-silicon-inside-your-network-part-2 to you.

Route Lookup

  • Find the outbound interface – quickly
  • CAM/TCAM – Lookup the cells based on their content – Single lookup – many times the number of transitors of DRAM. (One example: 16x transistor count, 90x the footprint)
  • (For lots of switches out there today, the “CAM” for L2 is actually a hash table)
  • TCAM for Longest prefix match routes. CAM for host routes. (does anyone actually do this?)
  • Fast RAM and suitable data structure (eg some sort of tree for longest-prefix match) – scales to much bigger forwarding tables, but can be harder to design. (Lots of variation in what exactly “fast RAM” means – RLDRAM is one example today)
  • Actually a lot of non-obvious art in a state-of-the-art FIB – things like fast reroute, prefix-independent convergence, multiple of levels of load-balancing etc all have major effects on the silicon design

Packet Buffers:

  • Where to store the packets, whilst their headers are being processed. (Lots of buffering required in distributed systems)
  • Generally DRAM – Range from Massive dedicated buffers (N7K 46MB~, to 9KB of buffers for the 48Ports Trident)
  • Normally carve packets into fixed-sized chunks and use hardware to maintain the linked lists – works well, but means you won’t get 100% usage

Forwarding features

  • Things like access lists, QoS operations, policy routing etc all require classification by more than just destination IP address – often done using TCAMs, but other approaches also used (eg compiled decision trees and fast RAM)
  • Like graphics cards, packet forwarding is easiest when it’s embarassingly parallel, so packets are independent (other than avoiding re-ordering of individual flows). Features that break that (eg stateful firewall, NAT) necessarily impose a performance burden (especially if you need the dynamic state syncing to a standby device)
  • Shaping, especially complex hierarchical queuing, is one example of something that general purpose CPUs are especially poor at, and dedicated hardware is massively more performant.

Non-volatile memory – Flash (maybe removable) or internal USB.

  • Holds your bootloader and OS, need this to boot.
  • ONIE

Ports

  • SFP/QSFP Transceiver and Cages
  • Signal conditioner chips. (Often called PHY or SerDES)
  • Glue the serial signal from the transceiver to the MAC/PHY layer on the ASIC
  • Potential for major tangent here on ‘What’s a PHY, what’s PHYless’
Greg Ferro
Greg Ferro is a Network Engineer/Architect, mostly focussed on Data Centre, Security Infrastructure, and recently Virtualization. He has over 20 years in IT, in wide range of employers working as a freelance consultant including Finance, Service Providers and Online Companies. He is CCIE#6920 and has a few ideas about the world, but not enough to really count. He is a host on the Packet Pushers Podcast, blogger at EtherealMind.com and on Twitter @etherealmind and Google Plus.
Greg Ferro
Greg Ferro
  • J Max

    Greg great shows. Would love to see a diagram of flow of packets traversing chip sets. Also boxes of brief name/description of each chip on the switch and router. Or something very basic to illustrate flows on the switch/router

  • Peter Carstairs

    There are hundreds of networking blogs but so few have nitty-gritty content like this. Thanks to those involved in another awesome show.

  • Bubba

    Excellent show! I was a little surprised at the guy saying he’s never seen LRM in use. I see it all over the place where folks have older fiber plant connecting their IDFs to their MDF.

    • http://thenetworksherpa.com/ john harrington

      Hey Bubba,
      My background is in data center networks, but I’m guessing that LRM was targeted more at the Campus / Enterprise networks, so perhaps I spoke too soon. Thanks for setting me straight though, great to get listener feedback.

      /John H

  • Duane Grant

    Enjoyed the show!

    For comparison, all the SP edge cards I use have ~100ms of buffer (ingress and egress) dedicated per port (7600 ES+, ASR9k, MX), which is more than the “Massive dedicated buffers” for 7k you mention above. The Cisco site has this for the ES+ (Packet memory 512 MB Up to 200 ms combined bidirectional buffering (100 ms ingress and 100 ms egress) at 10 Gbps).

    You also have a minor typo for the trident port buffer, it’s 9MB, rather than 9KB which would be a ~1 jumbo frame. ;-)

    Keep up the good work!

    • http://thenetworksherpa.com/ john harrington

      Hey Duane,

      Thanks for the correction on the Trident shared memory, that was indeed my typo when preparing for the show, I’ll update. 9MB is small enough, no need to reduce it to 9KB.

      John H

  • Damian

    One of my favorite episodes