This article was originally posted on Packet Pushers Ignition on April 26, 2021.
Data center virtualization exacerbated problems for network security designs that relied on a handful of appliance-based (whether physical or virtual) control points, which typically focused on external threats. With advanced persistent threats (APTs) that focus on compromising internal systems, security strategies must evolve to identify internal threats using fine-grained detection, and contain the damage with robust segmentation.
Pushing security control points into the NIC provides the finest detail into activity and application payloads and represents the next phase of defense-in-depth strategies.
DPUs and SmartNICs such as NVIDIA’s BlueField line (an evolution of the ConnectX products it acquired from Mellanox) offer the best platform for implementing interface-level security without adding processing overhead to the host CPU
However, software development remains a challenge. This area is ripe for standards and frameworks that simplify the process. The DOCA and just-announced Morpheus SDKs represent NVIDIA’s approach to DPU programming, but leave many open questions because neither has been generally released. In the absence of high-level tools like these, network security developers are left with more complicated and rudimentary alternatives based on eBPF (extended Berkeley packet filter) and XDP (express data path).
Background And SmartNIC Programming Fundamentals
The proliferation of VM, container clusters, and microservices has created an explosion of nodes, on premises and in the cloud. Network engineers have been coping with significant increases and changes in data center traffic volume and connection patterns created by this explosion. As the volume of intra-data center, aka east-west, traffic mushroomed, organizations adopted flatter two-tier CLOS (leaf-spine) designs connected by ToR switches built from rapidly improving merchant silicon. While such designs successfully handle the traffic load, they create bottlenecks as security engineers insert control points like firewalls, malware scanners, and application proxies to provide security and metrics within the network fabric.
Programmable network processors such as DPUs represent the best approach for distributed security without adding to CPU overhead. A popular approach to SmartNIC software uses a standard Linux module as the hook to sidestep the CPU. In contrast to CPU acceleration approaches like DPDK that bypasses the kernel to access CPU resources directly, programs using eBPF combined with XDP are sandboxed and run early in the Linux networking stack and don’t require modifying the kernel or loading extra kernel modules.
eBPF combines the ability to see all system calls with packet- and socket-level access to networking operations in a way that allows offloading packet processing, monitoring, telemetry, and security functions to a secondary processor like a DPU, network processor (what Netronome calls an NFP) or FPGA.

Source: NVIDIA. The BlueField-3 will include 22 billion transistors, 16 Arm cores, DDR5 memory and multiple 400 Gbps interfaces.
Smart network processors typically include multiple Arm or MIPS cores for general-purpose programs and invariably have an SDK for creating user-level network services or kernel enhancements. For example, the Marvell OCTEON SDK can compile any Linux or DPDK, often without modification, and targets 5G baseband, vEPC, vRouter-switching and security applications. In contrast, Netronome has a set of software products tailored to different situations, including, OVS, vRouter, SSL/SSH visibility, virtual firewall, and eBPF programs.
For example, Netronome’s Agilio products include pre-built eBBF functions for XDP offload, TC (traffic classification) offload, match/action, filtering, load balancing, DDoS mitigation, and chained filter functions. Its SDK includes a JIT (just-in-time) compiler that translates eBPF code into NFP machine code that is greatly accelerated when run on a SmartNIC. In one example, Netronome found that the throughput of an NFP-offloaded bpfilter was 5.5-times that of the same rules processed via iptables in the host CPU.

Source: Netronome product brief; Agilio eBPF execution stack
NVIDIA’s AI-Infused Hardware And Frameworks
The GPU specialists at NVIDIA see an opportunity to improve security via deep learning and parallelize packet processing using DPUs embedded with Tensor Cores and other acceleration units to run inference on new detection models. Its recently-announced Morpheus security framework exploits GPUs embedded in the BlueField DPUs to process data streams and network telemetry. Morpheus will ship with pre-trained models for:
- Data classification to detect leaked sensitive data such as login credentials, keys, passwords, and financial account numbers.
- Anomaly detection to identify malicious code, misconfigurations, or unusual activity from log data.
- Phishing detection by analyzing email text and using NLP to classify messages as desired, spam, ham (subscripted bulk mail), or phishing.
- Log error identification using a predictive NLP model to spot potential failures or warnings in security logs that wouldn’t be caught by traditional filtering.
Morpheus also allows developers to build custom models using PyTorch, TensorFlow, TensorRT, or other tools supported by ONNX, the Open Neural Network Exchange.
Morpheus can also be used to analyze network telemetry and PCAP packet data from a SmartNIC which can be used to modify security policies, rewrite filtering rules, and change monitoring parameters on the NIC in real-time.

Source: NVIDIA. Morpheus-BlueField security system: EGX server with BlueField DPU running Morpheus stack BlueField DPU locally executing security models and DOCA policies

Source: NVIDIA
Alternatives: Pattern Matching Engines And VM Software Appliances
NVIDIA’s latest focus might be on using AI for network security, but GPUs aren’t the only accelerators it has planned for BlueField. Last spring, it acquired Titan IC whose technology includes the RXP engine for offloading and accelerating the evaluation of regular expressions and other pattern-matching algorithms used in SPI (stateful packet inspection) and firewall rules.
Porting virtual network security appliances to a DPU is another option for NIC-based control points. VMware’s Project Monterey will allow VM images and containers to run within a bare metal ESXi hypervisor running on the Arm cores in a DPU. When modified with DOCA or an equivalent SDK to exploit packet-processing hardware embedded in a DPU, localizing virtual appliances on a SmartNIC allows for thousands of highly distributed control points without sacrificing CPU performance in host machines.
Edge-based security using programmable SmartNICs takes defense-in-depth to extreme granularity. Combined them with embedded GPUs and other acceleration modules will significantly improve performance and enable more accurate, timely, and subtle detection methods.
Unfortunately for network security professionals, the new capabilities also come with a steep learning curve, particularly for those wishing to develop custom deep learning models. The transition to programmable DPUs is an extension of broader trends in automation and monitoring that require network engineers to become fluent in programming models, scripting languages, and SDKs. Indeed, as software eats the data center, it turns every IT professional into a developer.