I just ran across a pointer to this research on Bruce Schneier’s blog:
Networking system components that are well-behaved in separation may create counter-intuitive emergent system behaviors, which are not well-behaved at all. For example, cooperative behavior might unexpectedly break down as the connectivity of interaction partners grows. “Applying this to the global network of banks, this might actually have caused the financial meltdown in 2008,” believes Helbing.
Globally networked risks are difficult to identify, map and understand, since there are often no evident, unique cause-effect relationships. Failure rates may change depending on the random path taken by the system, with the consequence of increasing risks as cascade failures progress, thereby decreasing the capacity of the system to recover. “In certain cases, cascade effects might reach any size, and the damage might be practically unbounded,” says Helbing. “This is quite disturbing and hard to imagine.” All of these features make strongly coupled, complex systems difficult to predict and control, such that our attempts to manage them go astray.
The original research isn’t related to computer networks, but rather people and financial networks. What caught my eye here is the idea of a failure that takes a “random path” through a networked system.
We spend of lot of time in network design thinking about failure modes, specifically in terms of making certain every failure mode we can think of has a deterministic outcome (if this link fails, traffic will flow through that link, so I need to make certain that link has enough bandwidth, and the right QoS “stuff,” to prevent a cascading failure onto another link…). In other words, we try to second guess every possible “random failure path,” or make those paths “not so random,” by controlling the failure path through the control plane.
Are there any other solutions? The authors of this research say:
For example, when systems become too complex, they cannot be effectively managed top-down” explains Helbing. “Guided self-organization is a promising alternative to manage complex dynamical systems bottom-up, in a decentralized way.” The underlying idea is to exploit, rather than fight, the inherent tendency of complex systems to self-organize and thereby create a robust, ordered state.
There are some interesting thoughts here, but essentially the takeaway is that centrally managed, top-down networks will all eventually fail because we just can’t think of everything. Failure is too random, in essence.
So what’s the solution? The best solution I can think of already lies before us, in many ways: the opaque API. The layer boundary between the transport and the application are there for a reason — but how often do we design that way? I think Doc Searls and Ivan have hit on this in their recent posts about networks as a service.
Here is a tradeoff we don’t often think about — services offered verses network stability. It’s one worth thinking about.