I’ll be live blogging from ONUG for this fall 2015 session. I’m in Manhattan already, looking forward to the event that starts on Wednesday. What is ONUG, you ask? The Open Networking User Group is a gathering of networking end users who are working to bring cutting edge networking techniques into their production environments.
You know how on the various Packet Pushers podcast channels we’ve talked quite a bit about software defined networking, whitebox switching, and so on? Well, we know that some of you don’t see SDN as applicable to your networks, and so you put these topics on mute. As the months — and now years — have gone by, I believe we’ve seen a slow but steady shift to something other than, “I’ll just buy what I’ve been buying for years, but only faster.” Fair enough if you haven’t noticed that, but it is happening.
The shift by those on the leading edge is driven in part by an integration of organizational networks into a larger IT operational whole. In other words, as IT silos break down, networking becomes merely one function that’s a part of an integrated whole. That whole is centered on application delivery, where individual technical disciplines must work together. While IT working together is not a new idea, operationalizing the network to fit into automated application delivery is. Networking has lagged behind other IT disciplines when it comes to automation.
Let’s be more specific with an example. Spinning up virtual machines, applications, and now containers in an automated way via scripts (or more sophisticated software) is commonplace. Scripting with whatever language — PowerShell, python, etc. — are tools used by infrastructure engineers to make the instantiation of services easy and predictable. Networking is progressively fitting into that automated mix. And ONUG is where much of the progression is documented.
What’s working? What’s not working? What use cases make sense? What use cases simply don’t? What results have been seen in testing? What vendors are contributing products and features that lend themselves to an open networking model? What are the results of end user working groups who have been testing products? All of these issues and more come to a head at ONUG meetings.
My live blog will cover sessions I am able to attend. I’ll do my best to pull out the key points of presentations and share them in real-time. To get the latest information, you’ll need to refresh this page periodically.
If you’re interested in the ONUG Fall 2015 schedule, click here. For Wednesday, 4-November, I’m planning to be in the following sessions.
- ONUG: Navigating the Sea Change
- Morgan Stanley and Open Frameworks – Why Do We Care?
- IPv6: An Open Cloud Infrastructure Enabler or Corporate Tax?
- The Future of Overlays and Container Networking (sorry, no link)
- The Great Debate: Will Software Make Everything Better?
In the evening, I’ll be co-hosting a live Packet Pushers podcast recording on operationalizing SD-WAN, hosted by Viptela. If you’re in NYC and wish to join, there’s still some room, last I knew. Register here. Greg and I will have Packet Pushers stickers, plus you get good food and the chance to network with a bunch of networking comrades-in-arms.
Come back to this page on Wednesday for the live blog, which should start up at roughly 9:30am ET/NYC.
Day 1: 4-November-2015
Good morning from New York. I’m ensconced along the wall of the 4th floor auditorium in NYU’s Kimmel Center, waiting for the morning’s opening address. I’m hoping this doesn’t tweet every time I do an update. That could get annoying. Hmm…
ONUG: Navigating the Sea Change (Nick Lippis)
3 main points…
- IT consumption models are changing across the industry.
- Exponentially increasing demand for new IT skill sets.
- SDN is important, but it’s part of a transition to open infrastructure and open frameworks.
Consumption models. ONUG defines open software-defined infrastructure as an automatable networking, compute, and storage. Much of this leads to commodity hardware, where the hardware isn’t all that interesting compared to the software. There’s an ecosystem that’s built up around this consumption model, with companies & projects like Kubernetes, Chef, Vagrant, Mesos, Docker, etc. stepping up to help with orchestration & automation.
Many mainline vendors have added to the mix by allowing their higher-level infrastructure (firewalls, etc.) to fit into these frameworks. Other use-cases have formed their own industries, such as SD-WAN.
Underneath all of this is the physical network, and a number of companies are plugging into this open infrastructure model as well. Plexxi, Big Switch, NEC, Brocade, Juniper, Cisco, Cumulus, and many more offering APIs to integrate with orchestration platforms.
Nick points out that most of the companies in this open infrastructure ecosystem world are software companies. Software used to be the smallest part of IT infrastructure (hardware the largest), but that has been turned on its head. Thus, we’re in a new era where the cycle is ramping up: a new open software world driving innovation and operational changes in the practice of IT.
This is driving a shift in IT skills. There’s a conflict between the desire for infrastructure abstraction (we don’t want to know) and deep infrastructure knowledge (we must know everything). This conflict is complicating conversations between application builders & devops folks (I don’t want to know, and shouldn’t have to care) and infrastructure engineers (you can’t keep your head in the sand to build at scale). The reality is meeting in the middle. Both sides have valid points.
The convergence skill sets between deep knowledge experts and devops/application builders are genuine computer scientists & full stack engineers. I completely agree with Nick here. We’ve been talking about this on Packet Pushers for years, and it’s at the heart of the new Datanauts show I’m co-hosting with Chris Wahl.
Shifting back to consumption. Over 80% of ONUG members are either deploying (40%+) or evaluating (40%+) network virtualization. A very high percentage are also looking at broad SDN integrations across the network. Far less interest in SDN islands.
The key driving these stats is the desire to get away from proprietary stacks to solve business problems. “Not this time,” Nick puts it. Thus, the IT stack is being reinvented in layers. Build a flexible infrastructure that does whatever is required now and in the future, tied to specific business value.
If this doesn’t seem interesting to you, Nick points out that we’re at (or nearly at) an inflection point where cloud is impacting the industry so completely, that businesses must change to avoid falling behind other companies that change their IT consumption models to adapt. In other words, changing IT operations isn’t really a choice. My inferential comments here — too many have adopted or are adopting already to consider this a hipster trend or flash-in-the-pan. Sure, those changing now are large organizations and agile leaders. Perhaps you’re not able to shift as readily. But the shift is here, and it is driving the sort of infrastructure that’s available from the open source community and from vendors.
Nick announces an ONUG IT Service Lifecycle Management framework. The intent is to help with folks in an ITIL world, by starting to build process around open infrastructure.
The next session is going to talk more about the open infrastructure and frameworks Nick highlighted as his point #3.
Morgan Stanley and Open Frameworks – Why Do We Care? (Tsvi Gal, Morgan Stanley)
Tsvi’s network is sizeable, 71K physical servers, 300PB of raw storage (not all useable, some replicated, etc.), 900+ global locations, etc. Several complex environments with various requirements, such as low-latency trading, centralized IT, high performance computing, high-value targets to attackers, and so on.
Morgan Stanley has highly specialized goals to cut seconds, milliseconds, microseconds, and even nanoseconds to cut down transaction time. Constant vigilance and analysis to achieve these evolving goals to give them an advantage. Now, this is all for naught if they don’t get the computed data wherever it needs to be in a timely fashion.
Morgan builds what they can in the house, if they can differentiate. But buy commodity stuff (because why build?). Thus, Morgan is a technology company. And…they like open. Have for a long time, going back to early adoption of Linux.
Tsvi points out that open source is not equal to an open technology framework. And an open technology framework is where they see the benefit.
So…what’s an open technology framework?
- The ability to make independent decisions on components of technology. Choices at one layer shouldn’t dictate or limit choices at others.
- Loosely coupled systems without embedded logic or policy.
- Well defined, publicly documented, and (ideally) stateless interfaces between components.
- Good things can be made from creative combinations of simple re-composable technology.
We all say we don’t want to be locked into a specific vendor, but interestingly, building in-house custom systems can create a lock-it situation. That’s a trap that’s easy to fall into and speaks to the cultural shift that might be required beyond the relationship with proprietary vendor systems. Morgan recognizes this, and is trying to make a shift.
IPv6 & devops requirements are examples of a technologies that are hard to recruit for right now. For example, “I need a IPv6 network person to do some Erlang development for Linux system admin and application container management.” That person is very hard to come by, and yet represents a combination of skills that Morgan will (or does) actually need. (Tsvi shows a very complex graph of systems that will be impacted by their adoption of IPv6. Scary.)
Tsvi moves on and points out that open source is not free. Installing OpenStack took two days. Using OpenStack took far longer.
Okay. So…two questions Tsvi poses as he closes his talk.
- How do we better foster and encourage the use and development of Open Frameworks?
- How to we limit or decrease an enterprise’s investment and difficulty in leveraging and adopting open frameworks?
Question to Tsvi: does Morgan Stanley use OpenStack for dev or prod?
Answer: Dev, mostly. But OpenStack is getting more mature and is headed for production workloads eventually. It’s a matter or time.
IPv6: An Open Cloud Infrastructure Enabler or Corporate Tax? (panel moderated by John Curran of ARIN)
Jim Kyriannis, NYU. IPv6 useful for research and collaboration. This is key to connecting to international sites that do not have an IPv4 option. IPv4 NAT complexity is an IPv6 driver. RFC1918 address collisions are a driver. IoT also a driver, for example – NYU wants to IP-enable all power outlets.
John Burns, Wells Fargo. John argues that v4 is the corporate tax. v6 addresses real business problems — ability to execute mergers, ability for all customers to connect. They have built out infrastructure IPv6, and are insisting that applications have a road map to get to IPv6.
Samir Vaidya, Verizon Wireless. Launched LTE network greenfield, and ran IPv6 right from the beginning. Users get /64 address space. IPv4 address exhaustion was a big driver. NAT (CGN) is an application breaker. Facebook tests show that IPv6 was faster, which VZW confirmed. VZW is also in the mobile space, which is exploding in growth. v4 is not scalable. Over 50% of all traffic that leaves VZW is over v6.
John Curran, ARIN. There are 4B IPv4 address. There are 7B people. You all want a phone? And that’s just considering mobile. What about all the rest of the endpoints? ARIN is officially out of v4, and tell carriers that every day. To grow, SPs and content providers are turning on v6. Connections for end-users on IPv6 to v4 devices are often via translation devices, meaning you don’t have an end-to-end connection with your users if you are not offering an v6 native service.
Question: Is dual-stack a good idea?
- Depends on how much v4 address space you have. But dual-stack works better on some devices than others. (Samir Vaidya)
- Also, public Internet IS MOVING to v6 if you like it or not. (John Curran)
- As long as you have the v4 you’ve still got the liability of it. But it’s a big jump to make. (John Burns)
- Public facing infrastructure really must be dual-stack. Internal network doesn’t have to be dual-stack. Could remain v4. (John Curran).
- Happy Eyeballs can be a pain for users if a quad-A DNS record is returned when there is no end-to-end connectivity resulting in slow connectivity. (Jim Kyriannis)
Question: When is missing v6 so impactful that business needs can’t get met?
- The IPv4 brokerages out there can be a stop gap. But not a long-term solution. v6 always is the best (and easiest). (John Curran)
- Space squatting is a technique some are using, but not viable long-term, either. There is almost always a uniqueness requirement. (John Burns)
Question: Which IPv6 routing protocol are you deploying?
- OSPFv3 in core. MP-BGP at the edge. For multicast, it’s complex. Inter-domain especially hard. Have used v4 for a long time to share video. Not sure there is a future for this. (Jim Kyriannis)
- OSPFv3 in the core. Would like to converge v4 and v6 OSPF control planes, can’t yet. Recommends that networking vendors make their endpoints v6 addressable. (John Burns)
The Future of Overlays and Container Networking (Harmen Van der Linde, Citigroup)
This session considers what’s next in the context the result of an ONUG working group dealing with overlays. Where do we want to go from here? The format is that of 6 vendor panelists who are experienced with orchestration frameworks and mature network virtualization (overlay) network implementations.
Question: What is the future of overlays in your networking products?
- What is the business problem I’m trying to solve? That’s what is driving adoption. If this network segment doesn’t want to talk to that network segment, or if I don’t want to talk to corporate IT, tunnel over it.
- Next wave of innovation is technology and partnerships. First up, hardware VTEPs. Second, ecosystems that brings together the overlay network with the higher level services. (Cisco)
- No network vendor can presume that they have the one answer to protocols/control-planes/encapsulation type in your network. What is the source of truth in your data center? Where does the data map reside for your network topology? Must talk to the sysadmin teams to figure out what orchestration looks like. (Arista)
Question: Can we build public cloud integration?
- Focus used to be on building private cloud. No interest in public cloud back in the day. But 2 years later, we’re seeing a softening of the stance against public cloud by financial services. Most everyone is finding public cloud use cases for specific workloads. The trick is in integration. How do you make public cloud look like a seamlessly integrated resource in the context of the larger environment? Much of this is happening now via integration with open source projects like OpenStack, Kubernetes, Mesos, and Docker — helping those OSS projects to mature. (VMware)
Question: To what extent does overlay networking need to integrate into projects like Mesos and Kubernetes?
- These projects were intended to deliver microservices. Changing their architecture would be difficult. But as long as they are pluggable, we can move ahead with integration. But it’s also an issue of mutual education. Keep the vendors talking to the open source projects and vice-versa.
- Overlays hide data, and that’s a security risk. Visibility into the tunnel is exciting for customers. How do I see what’s going on? Therefore, adoption will depend on tools. These are things that will make the change happen.
- The networking solution needs to align with the different way that Kubernetes spawns compute jobs. Declarative model must work the same way for networking tools as it does for compute.
- Majority of encapsulation is happening often at the hypervisor or container layer. Not in hardware. It’s most compelling when it happens at the endpoint, because that gives the software the most flexibility and power. Hardware VTEPs are an exception, not the rule – got to be a specific use-case to drive that. So, networking vendors talking about this topic is a little odd, and presumes a wrong focus. Thus, security posture is driven at the endpoint, not down inside of the network transport layer, and maybe not managed by network infrastructure. (Arista)
Question: Do we have a reasonable set of tools available to effectively support an overlay deployment?
- It’s coming along, but the general problem is that this is not just an application problem. Endpoint management is hard due to diversity. This is not just a server problem. Servers aren’t even a problem, as we have control over them. It’s mobile, laptop, executive exceptions, etc. (Pluribus)
- Working with an ecosystem of partners via an API makes the integrations happen. Build your own toolkit is also an option. (Cisco)
- Overlays as a whole are past “are we there yet” questions. Growing rapidly. For instance, the OVN project is seeing a lot of developers joining from a variety of companies. Maybe not ready for production as yet, but is rapidly heading that way. Perhaps the next release of OpenStack, we’ll see OVN as an available option.
- Visibility is still an issue for lots of customers. For example, they would object to IP-in-IP or MPLS implementations back in the day due to a lack of visibility, so that objection still exists today. But with most overlay implementations, there are tools available to assist. It’s a matter of taking what’s available and customizing it for consumption by their local organization.
The Great Debate: Will Software Make Everything Better?
Debaters: Dr. Douglas Comer (pro-hardware) vs. Dr. P. Brighton Godfrey (pro-software)
Pro-software position (Dr. Godfrey)
What are the grand challenges in networking?
- 7 years ago, informal survey points to “my project,” “security/privacy, reliability, usabilty,” “no specific challenge.”
- Move to today, we look at ONUG working groups. Virtualization, visibility, security, automation.
Looking at each of these…
- Virtualization is about turning hardware into software as much as possible. Open APIs are key.
- Visibility is about correlation data sources. A big data problem.
- Security is an issue of understanding applications and moving with applications, wherever they are. So, if we have to keep up with software applications, we need security to be as agile as software.
- Automation represents an important software potential. Today, human change management processes take the place of automatable processes. We need to shift forward using software.
Conclusion – software is our only hope. Thus, it better work! Two examples.
- Software defined data centers are virtualization, automation, and evolvability wins.
- WAN traffic engineering (Google, Microsoft) are wins for efficiency, automation, and visibility.
However, there are open questions. We must tease out the killer apps, as existing success stories don’t span all enterprise needs. Reliability remains a challenge. Etc. (I couldn’t type quickly enough.)
The only hope is for a software-defined world. The remaining question is how narrow of a scope can be defined for hardware?
Pro-hardware position (Dr. Comer)
Humorous note, we seem to be having software troubles bringing up Dr. Comer’s presentation.
Software is bug-ridden. Comer points out some stories from earlier in the summer of 2015 that major impacts to the NYSE and more were the result of bugs. He points out that software bugs are not merely annoying. Lives can even be lost.
Why is it that true hardware engineering expertise seems easier to come by than true software engineering expertise? Also note that hardware is sold with a warranty legally, while software can be sold as-is.
Comer pops a slide titled, “The Next Version Will Be Worse.” Each generation is more bloated, more overfeatured, and badly conceived. As Fred Brooks warns, “Beware the second system syndrome!”
Software engineers believe that if it ain’t broke, it doesn’t have enough features.
Okay, so based on all of that…are we really supposed to believe that if put more software in the network, it will be better? REALLY? How does this make any sense at all?
Conclusion: Software is fraught with peril. We’ll have lots of problems. Everyone has gone crazy. Software is a drug: just say no!
Question: Not really a question, but a comment from a self-described “old hardware engineer” that software bugs indicate lousy engineering.
- Hardware mentality is that “we better get it right because it’s going on a chip.” Software mentality needs to get there because it’s going to be driving critical infrastructure. (Godfrey)
- Hardware engineers are likely to solve a given problem in the same way. Software engineers are likely to solve the problem in different ways. We need to stick to tried and true design rules. (Comer)
Question: We use AGILE & LEAN techniques in our software development process. Please comment.
- New techniques are coming out all the time. We go from one to another to another new technique. The flavor of the day is not going to be the solution. (Comer)
- In many cases, we’re not replacing hardware with software. We can take ideas from hardware design and apply them to software through unit testing, controller testing in advance of deploying. Is that AGILE? Will that move us to high reliability? We’ve got to have processes in place to build reliable software to support critical infrastructure.(Godfrey)
Question: Automation is misconfiguration at scale. Problems show up in hardware, too — sometimes only in weird scenarios. Then I have to upgrade it. But if I have a software problem, all I have to do is upgrade. Comments?
- What is even more insidious is intentionally introduced flaws in hardware due to a compromised supply chain. At least with software, we can fix it. (Godfrey)
- Sure, there are failures. Nothing is perfect. But compared to hardware, software is far more failure prone. (Comer)
Question: Software has enabled far more things than hardware by itself. The number of EE students is far higher than the number of CS students who want to do hardware.
- Oh, sure. They want to do software because it’s so much easier than hardware. Much laughter from audience. (Comer)
- We do have to get software right. We don’t have any choice. We can capitalize on a huge array of innovation opportunity. The bug discussion is pointing to being forced to build system (or system of systems) with software. We don’t have a choice. (Godfrey)
Question: Vendors seems to believe in ASICs in certain places.
- Yes, hardware is the performance winner. And easy point to make. (Comer)
- What functions do we need to hardware-ize? FPGAs to build hardware in switches, the emerging P4 language, etc. How little hardware can we get away with is the real issue? Of course we’ll always require physical hardware, but how much can we standardize it?This allows us to execute on modular design. Also, there’s more to performance than line rate on a box. In a network, there is much larger topology to consider. To improve network performance, analysis of the entire system is required. Thus, the future of performance in broader systems will require software. (Godfrey)
Question: Automation is often about eliminating humans for certain routine tasks. Comments?
- Humans are even more buggy than software. (Godfrey)
- Humans are better than software. (Comer)
Question: Creating a new chip every time you want something new seems unrealistic. Comments?
- Well, it’s true with software that you can create something that will fail very fast. (Comer)
- With software, you end up with something that will change quickly and can evolve. Evolvability is key. (Godfrey)
Day 2: 5-November-2015
ONUG has morning meetings that are not open to media folks. As a result, I was not able to live blog. The afternoon sessions are commencing, and I’ll blog what I can on the following sessions.
- Why Computer Scientists are Running Open Cloud Infrastructure
- Town Hall Meeting: How to Use Hybrid Cloud Services with Split Application Architecture
Why Computer Scientists are Running Open Cloud Infrastructure
Moderator, Eric Hanselman (451 Research).
- Dr. Aditya Akella (University of Wisconsin)
- Pablo Espinosa (Intuit)
- Steve Russell (Morgan Stanley)
- Harmen Van der Linde (Citigroup)
Question: Is there a battle between increasing automation and maintaining deep system knowledge?
- I work across many teams. Going to cloud has meant an integration of many different technology silos. There is an evolution. (Van der Linde)
- We have over 10K switches and routers, so we’re doing “automation” in the form of expect scripts, which is a step in the right direction. There was a big desire for an SDN function coming from the server team. But when you boiled it down, they wanted to be able to add an “ip helper” — that was it. There’s not an inherent trust between the silos, and so it wasn’t easy to delegate that to them. All parties need to get comfortable around the table with each other. (Russell)
- Running over 1K change tasks per month. Many were already automated. So, how do we put those functions into the hands of others to help them achieve what they are trying to get done? (Espinosa)
- We need to eliminate the specialized skills. I need full stack engineers to be successful moving forward. (Van der Linde)
- Expressing intent at a higher level sometimes translates to actions that can be easily templated. At the same time, there are complex actions touching multiple devices that must be done manually. SDN can be helpful dealing with the more complex tasks.(Dr. Akella)
Question: It seems there need to be studies in specific areas to understand how areas are impacted.
- Understanding complex processes is key to knowing what tools can be applied to the problem. Some experimentation can be done in clusters where you can keep the blast radius to a minimum. (Russell)
- There are some things that the application expects the network to do (fault tolerant routing), but other things that the application must do for itself. There needs to be an understanding of the boundary that separates what the application does and the network does. (Dr. Akella)
Question: What is the delineation point between what developers need to know about the network, and vice-versa?
- There must be community as well as cooperation. But there must also be clear lines drawn. From a skillset perspective, there’s always a lot of ramp up for all these teams. Basic software engineering and basic API design is often missing. Which is reasonable. Folks are good at what they do, but you need to beef up the skillset. (Van der Linde)
- Oddly, people who are doing ‘devops’ are doing horizontal tasks like moving IP addresses. Why? You do need to have exposure to other skillsets. But there’s a cultural challenge in understanding the flow itself. You want the guy that’s really good at networks, but you really want the guy that can diagram what happens when you do a vMotion and explain all the caveats throughout the system. (Russell)
- The idea of the integrator – focused on taking systems & staff and integration applications into the infrastructure? You still need deep knowledge by vertical, but we’re seeing the need for that intermediate player who understands the whole stack – an interesting shift. (Espinosa)
- I would like to see someone I consider “cloud architects.” They don’t have to be deep technology experts as much as deeply understanding workflow. (Van der Linde)
- We haven’t created a cloud architect as such. We have a “five guys” problem, where it’s always the same five guys on the phone figuring out the problem. So, how do we make it six guys? Seven? And if we can grow those people internally, how do we feed them problems? We’ve chosen to feed them complex operational problems as a way to grow them. We want those people spread throughout the organization. We want them to be culture carriers that lift up the organization. You’ve got to find a way for them to be in the organization to affect it. (Russell)
- What kind of collaboration culture leads to less buggy code? (Dr. Akella)
Town Hall Meeting: How to Use Hybrid Cloud Services with Split Application Architecture
Moderator: Greg Lavender (Citigroup)
- Martin Casado (VMware)
- Tom Hockin (Google)
- Mahdu Venugopal (Docker)
- Ken Duda (Arista)
- Tobias Knaup (Mesosphere)
- Dave Ward (Cisco)
Question: How we can do hybrid cloud — an extension of our private cloud data centers. Our WAN edge is key. The challenge is financial regulation. We need good audit controls, QoS controls, etc. We need visibility out to the cloud. How do we approach this problem?
- This is an SDN problem. Not simply lighting up a path. The big challenge is the telemetry coming back out of that path. It’s cross-domain (inter-domain). The choices that need to made and haven’t been addressed by SDN is that you’re at the whims of where that traffic get delivered across the public Internet. The I/O guarantees from public cloud are not part of the application definition. We measure quality cycle completion time in this way. Yes, we can get to the public cloud, but that’s not the definition of a performance metric. End to end, all the different pieces need to line up to make the most of bursting and get your money worth. (Ward)
- No shortage of challenges for hybrid cloud. Public clouds were not designed that way (for hybrid). So maybe we start by redesigning applications to work in the public cloud environment. Now we can re-think what requirements are needed for hybrid cloud. Walk before you can run. Then work towards synchronizing state on the backend, etc. (Duda)
Question: How are containing impacting us?
- Containers are an interesting application abstraction. As infrastructure guys, we know how machine abstractions that have familiar interfaces can be managed. This is why existing tools can plug into these functions. (Casado)
- Containers are not just VMs – they represent flexibility. They can come and go. They are simple. It encapsulated not just the namespace, but also packaged. But a lightweight set of packages. They can spin up in milliseconds. Not cattle, but ants. But some people are still thinking of containers as VMs. But as containerization scales, it isn’t manageable if you treat every container as precious. (Venugopal)
- As developers, contains play well to things we already understand. (Hockin)
- Because containers are so lightweight, we use a lot more of them. But we need to start thinking about things in the scheduler to be aware of things such as network latency between containers and take that into consideration when spinning them up. (Knaup)
Question: How do I keep my addressing, NAT, etc. straight when containers live a very short life?
- We absolutely must automate. The days of managing these functions manually are gone. Devops culture must go into every aspect of what we do. (Venugopal)
- Networking for containers is in its infancy. Insert your plugin here. Useful, but not anything like mature. Host container dependency on Linux kernel networking is a problem – I’d like to see that all get into user space. For the massive agility you want where a certain amount of networking services live around then, it’s a huge challenge. But it’s a massive challenge in hypervisor networking as well. (Ward)
- We’ve been flubbing this problem as an industry for years. Somehow, we’re supposed to go from a high-level understanding of objects and systems to a low level. This is hard in networking. IP addresses are meaningless and topology dependent. Distrbuted state maintenance is really hard with people. We need to talk to devices, but the canonical addressing is different by vendor. Hard. We need an “erector set” that lets us build an abstraction that allows us to build a policy language that works across all network infrastructure. We’re in the chaos phase. And we’re not close yet. We’re in the golden era of distributed systems finding their way into networking. (Casado)
- Network virtualization is struggling because of all the competing views. You need a physical infrastructure that supports all the different approaches. We don’t need a vendor saying “we solved it.” Because they haven’t. (Duda)
- Do discovery through the RPC stack prescribed for the applications, and set policies using higher level primitives like layers or environments. That moves it closer to the application layer. Early companies of today are paving the way. (Knaup)
- Nothing is the same as it was ten seconds ago — we need to accept this at the application level. This bring enormous complexity. But, it’s necessary because almost all application assume stability, that nothing changes (like an IP address). Converting to a model of thinking to things like “I can look up a DNS name” will take a while. Bad thinking needs to evolve out. (Hockin)
Question: We need better tools & telemetry in a complex cloud stack. Yes?
- Yes, absolutely. At Google we have instrumentation of everything. We can see what changes have been made when there’s something wrong and reverse that change. In the Google world, everyone can see via public consoles everything that is going on. The first time you have an outage you can’t explain, you realize you should have logged that. (Hockin)
- The next step is building a good anomaly detection system to pick up out of spec telemetry faster than a human could. (Knaup)
- If you have a network and do an overlay, it’s all so confusing. When you build tools, you focus on the infrastructure first, and the operational tools later. It’s backwards, but that’s the reality. There is an extra addressing layer being introduced — which makes everyone nervous. The bottom line is that if you’re dealing with a distributed system, it’s really hard. Having the supporting tool chains is necessary, but will take a long time. (Casado)
- I would like developers to be “no stack” and not “full stack”. They shouldn’t have t0 know…but that’s a couple of years from now. Finding an individual who is developing an app and has enough cycles to keep up with everything going on in infrastructure to keep up? No way. We need to get to policy, or intent, or big data…whatever it is where developers don’t have to worry about the specifics. (Ward)
- There is a huge amount of information to digest in networking. Getting data out of networking in generally is hard. This must change. (Venugopal)
Question: Building an app cloud-native makes it much easier to influence the infrastructure. Yet, there’s not much development in this way. There aren’t great blueprints out there for how to do this (build a cloud app and horizontally scale it). We aren’t taking advantage of cloud infrastructure. Thoughts?
- The future is now. No reason to hold back on this. (Venugopal)
- We are at a point in our industry where we realize we’ve written our applications wrong. But we can’t easily fix this. Docker makes is easier to bridge this architectural gap. Running multiple apps in a single container is not best practice, but it works. You can do it. That helps while microservices are peeled off one at a time to their own containers. Containerized VMs is a halfway step. (Hockin)
- We are in this awkward phase — but also a frustrating phase. NFV, for instance. It’s a boring “lift and shift” of physical to virtual. That doesn’t translate at all well to containers – NFV will need to be changed to take advantage of microservices architectures. Right now, it’s mayhem out there. Would be nice if everything was cloud-native, and there’s a strong advantage to this. But we aren’t there yet. (Ward)
Question: Orchestration becomes key here, whether its Kubernetes or something else.
- Distributed systems is a very hard problem to solve. Do make it happen, you need really good people to properly drive cloud native systems. (Casado)
There were some audience questions after the official moderated session with miscellaneouc discussion, but they were hard to hear and I lost the thread of the conversation as a result. I’m calling an end to this live blog for ONUG Fall 2015.
I was going back through this post, and I wrote a lot. Perhaps not all of it was scintillating. Live blogging is a challenge – you’re trying to capture important ideas coming from speakers in real-time, and not all speakers are gifted with getting to the point. Other speakers have rich thought tumbling one of the over which is hard to distill.
I think the main takeaway from this specific blog is that there are big ideas you can find if you parse through what people are saying. Think through what the end users and vendors are seeing, consider your own business, and then jump from there into exploration of your own.