Today on the Tech Bytes podcast we talk cloud storage. More specifically, we dive into why it’s time for NFS to sail off into the sunset, particularly for cloud datasets. Our guest is Tom Lyon, an industry legend who has delivered a talk entitled “NFS Must Die.” We talk with Tom about the strengths and weaknesses of NFS, the benefits of object storage, and a new storage project he’s working on. While our sponsor is MinIO, Tom is not an employee or representative; he’s here to share hisinsights on the best way forward for cloud storage.
Episode Guest: Tom Lyon
Tom Lyon is a computing systems architect, a serial entrepreneur and a kernel hacker. Prior to founding DriveScale (sold to Twitter in 2021), Tom was founder and Chief Scientist of Nuova Systems, a start-up that led a new architectural approach to systems and networking. Nuova was acquired in 2008 by Cisco, whose highly successful UCS servers and Nexus switches are based on Nuova’s technology. He was also founder and CTO of two other technology companies. Netillion, Inc. was an early promoter of memory-over-network technology. At Ipsilon Networks, Tom invented IP Switching. Ipsilon was acquired by Nokia and provided the IP routing technology for many mobile network backbones. As employee #8 at Sun Microsystems, Tom was there from the beginning, where he contributed to the UNIX kernel, created the SunLink product family, and was one of the NFS and SPARC architects. He started his Silicon Valley career at Amdahl Corp., where he was a software architect responsible for creating Amdahl’s UNIX for mainframes technology.Tom holds numerous U.S. patents in system interconnects, memory systems, and storage. He received a B.S. in Electrical Engineering and Computer Science from Princeton University.
Full Episode Sponsor: MinIO
MinIO is pioneering high-performance object storage for AI/ML and modern data lake workloads. The software-defined, Amazon S3-compatible object storage system is used by more than half of the Fortune 500. With 1.5B+ Docker downloads, MinIO is the fastest-growing cloud object storage company and is consistently ranked by industry analysts as a leader in object storage. Founded in November 2014, the company is backed by Intel Capital, Softbank Vision Fund 2, Dell Technologies Capital, Nexus Venture Partners, General Catalyst and key angel investors.
Episode Links:
Eminent Sun Alumnus Says NFS Must Die
NFS Must Die presentation on YouTube
Tom on Mastodon : @[email protected]
Tom’s Blog: https://akapugs.blog
Episode Transcript:
This episode was transcribed by AI and lightly formatted. We make these transcripts available to help with content accessibility and searchability but we can’t guarantee accuracy. There are likely to be errors and inaccuracies in the transcription.
Automatically Transcribed With Podsqueeze
Drew Conry-Murray 00:00:00 Today on the Tech Bytes podcast, we’re talking about cloud storage. More specifically, we’ll dig into maybe why it’s time for NFS to sail off into the sunset, particularly for cloud data sets. Our guest is Tom Lyon, a computing systems architect and entrepreneur. And I’m not going to read you Tom’s full bio here, but it’s impressive. He’s had a long career in the IT industry, including being the founder of multiple companies and early employee at Sun Microsystems, and a contributor to the Unix kernel. Our sponsor today is MinIO, but Tom is not an employee of MinIO. He just wants to share his views on why object storage is a better option than NFS, particularly for data sets in the cloud. So, Tom, welcome to the podcast and before we dive into the it’s time for NFS to die, can you just give us a little bit of background on yourself and your career?
Tom Lyon 00:00:41 I’ve mostly been a networking guy since back at Princeton, when I was trying to get a PDP 11 to talk to a IBM mainframe. And, that was most of my career at Sun Microsystems, where I was one of the NFS contributors and started several companies after that. A couple of them have done well financially, but people still haven’t heard of them anymore these days, so we’ll skip that.
Drew Conry-Murray 00:01:09 Okay. Well, I guess it’s a good sign if you start a successful company and then it just gets absorbed into another company, is what I assume happened.
Tom Lyon 00:01:15 I am, by the way, retired for the third time, and my wife has promised to kill me if I start another company. So, so what I’m talking about today is an idea for more of an open source project.
Drew Conry-Murray 00:01:27 Okay, so let’s dive into that then, because sort of the theory of this recording is, NFS must die. So let’s start out. Why?
Tom Lyon 00:01:36 Well, NFS was a very carefully selected set of trade offs chosen 40 years ago, and in next year it will be officially 40 years since NFS shipped. And the whole paradigm doesn’t quite work in the cloud age. We’ve learned a lot about distributed systems since then. If you look at, you know, cloud storage, right. There’s there’s three types. There’s block storage, object storage and file storage. Object is really the only cloud native storage technique invented by, pretty much invented by AWS so they could have something simple that scale to a worldwide degree. You can have a browser in Botswana and access a server in Cydonia [Sedona], all of it residing on data centers in Virginia or whatever. But it’s a very impressively scalable system. Files just have never scaled to that extent. Blocks don’t aren’t really there for scaling, but they work very, very well because they’re trivial semantics. You can read and you can write, and that’s pretty much it. So if you look at files and what’s, what’s the problem with NFS and it’s not really not really just NFS, it’s pretty much any, any approach in network file sharing. And so the first thing that’s wrong with it is, is the paradigm is just all wrong. it encourages multiple users to access shared writable data spaces.
Tom Lyon 00:03:04 And one of the most important lessons we’ve learned in distributed computing is that shared mutable data is a really bad idea. Things get out of sync. You can’t control, synchronize multiple actions. Everyone has a slightly different view of the data, so it’s a huge mess and you can’t really build strong semantics on top of that. And this is usually talked about in the context of programming languages, like when you’re fighting over shared memory. So we’ve had languages like Erlang and Go that discourage it. And languages like Rust that really prohibit it. And they have a much more successful approach to concurrency. The same thing happens when you have distributed systems only now you have a flaky network in the middle as well. So things get much, much harder.
Ethan Banks 00:03:53 Flaky network as in, you can end up with partitions and separation of network segments that cause challenges for the distributed system.
Tom Lyon 00:04:02 Right. Right. Exactly. And you’re never sure if did the network go down, the remote end go down? Did I go down? You know, who knows.
Tom Lyon 00:04:11 So another big problem is everybody wants slightly different semantics, right? And NFS has a subset of POSIX. And it was always, you know, criticized for not not meeting the project standard even though that is in fact a moving target. And NFS and POSIX came out about the same time anyway. So the semantics are iffy. But particularly for POSIX, if you look at the modern understanding of networking, the CAP theorem, you mentioned partitioning. The CAP theorem says you can’t achieve both consistency and availability in the presence of partitioning. And in the NFS days, we kind of discovered this the hard way. And we had to make the option for soft mounts and hard mounts. So hard mounts, if you wanted consistency guaranteed, you’re not going to screw up your data or soft mounts if you want availability, meaning you can actually get work done when the server crashes.
Ethan Banks 00:05:07 You’re arguing that NFS must die, but yet it has had an awfully long run. As we pointed out, I mean, NFS is everywhere. It’s baked into everything. It’s used, very commonly. So is there a . . hmm . . . okay . . .
Tom Lyon 00:05:21 But it’s not, it’s never used at scale, right. And the trouble with NFS is you pretty much need a human babysitting thing is when things go wrong. Because when something goes wrong, you’re never quite sure where to where it affects it. And that’s what, that’s what gets me about cloud scale things and things have to be 100% reliable to be deployed at the scale that happens in the cloud.
Ethan Banks 00:05:48 Okay. So, so the challenge here and what you’re advocating for is when you need to operate at scale. NFS and similar file oriented storage protocols are not the way to go. There’s too many problems to be able to function at scale without inconsistencies coming up in the data and so on.
Tom Lyon 00:06:06 And the semantics have never been well defined. So you can end up in situations where you think your program is behaving in a way that’s going to work fine on NFS, but then you get an updated library from somebody who did something that uses some new file system feature that doesn’t work, and it all starts to crumble.
Tom Lyon 00:06:27 So how do you, how do you prevent that kind of thing? And there’s really no definition of what semantics work well with files. There’s the local file systems, even ext4 and NTFS differ in subtle ways. There’s the system call interface, there’s protocol standards, there’s the POSIX standards. So which semantics are you going for? Nobody knows. And so I argue that the only reasonable mainstream thing is the semantics of a local ext4 file system, because that’s what everybody has. Therefore, if you make that work, it’ll work pretty much everywhere. So the semantics are messed up. And then, of course, the security model of NFS has never been usable in a multi-administration way. So you can’t you can’t have NFS sharing across administrative domains.
Drew Conry-Murray 00:07:17 So when we’re talking about scale, what kind of scale are we talking about? When does that sort of switch happen between NFS and appropriate because of scale versus object storage?
Tom Lyon 00:07:28 Well, the killer feature of NFS is you can access arbitrarily large data sets with network oblivious programs and kind of poke around in a data set without having to copy the whole thing first. So that’s the killer thing. So how do you get that in a cloud scale way where I can use grep on some data set that somebody made available to the public and it happens to reside in AWS and the US, but I’m grepping from China. How do you make that work? That’s the cloud scale I’m talking about. Okay. And how do you make it really easy? You know, I have an approach to all this. And I gave a talk at the Netherlands Unix User Group starting to outline all that. But it’s, it gets kind of hairy to get into. But basically if you look at cloud scale file sharing there’s two really vivid examples today. One is Git. So the Linux kernel people develop it all over the world, I don’t know, 30,000 developers and it scales. Docker files are kind of similar, it’s spread all over the place. And what those things have in common is they use layered immutable versioning so that anything you share is fundamentally immutable. And when the next layer, when the next version comes along, it’s a new layer on top of the old stuff, and you can never share anything until it becomes immutable. Because it’s immutable, it can be copied and cached arbitrarily for arbitrary amounts of time, and it never needs to be invalidated. So if you take the same approach in a way that’s much friendlier to arbitrary POSIX programs, you can, you can build a much, much better approach.
Ethan Banks 00:09:23 Caching being critical to the scale aspect of this. We can’t, we can’t scale big without the cache.
Tom Lyon 00:09:29 Plus, if you, if you’re using a normal Linux file system, it expects to be able to cache blocks underneath and that’s really critical for performance. I mean, I use NFS at home, you know, because I’m one user, five machines or whatever. It’s still particularly slow every time I want to list a large directory so there’s ways you can achieve, what I have in mind is basically you access the data set. When you access it, you get your own writable layer on top of the immutable data set. So it looks like a writable file system. You can use whatever programs you want, but when it comes time to share that, that’s when you have to freeze it and make it immutable. So you have a, the equivalent of a git commit and at that point off you go.
Ethan Banks 00:10:24 Okay. It’s very different from file locking let’s say.
Tom Lyon 00:10:28 Right. This is more a, you know, we get rid of doing anything at the file level and do it at the file system layer level. So the unit of administration and tracking becomes a lot more granular. The thing that enables the layering is the overlay FS which is also used in Docker. And it’s, it’s a great grandchild of what we at Sun called the translucent file system, which I’m one of the patent holders on. This stuff has been living in my head ever since the NFS days. So how can we actually do this right?
Ethan Banks 00:11:01 So, so how close is what you’re describing to, if we were to examine S3 as a protocol, is it, is it quite similar, quite different?
Tom Lyon 00:11:11 Well it’s quite different. It, it’s S3 is the object protocol, right? And what I’m proposing is not really about the protocols. It’s more about leveraging existing file systems and overlay FS with existing protocols. You need, you need a block protocol for writable storage. But as soon as you freeze it, you can snapshot it to your object storage system. And that, that gives you world wide access in a fairly trivial way. So, but you’re snapshotting again at the file system level. So it’s a much, much more granular thing. So you can totally use this with object storage, except for that first writable layer where you either want local writable storage or something over coming over NVMe over fabrics or something like that.
Drew Conry-Murray 00:12:05 So what is the name of this project you’re talking about?
Tom Lyon 00:12:08 I call it Beyond File Sharing, BYFS, for short. The downside of all this is I am not capable of engineering anymore. So there’s essentially zero actual progress on this. But that’s why I’m soliciting collaborators, especially people for whom this might be a real problem. We need the users to to help drive a open source project.
Ethan Banks 00:12:31 So, Tom, you’re looking for a specific sort of a user that would be interested in this?
Tom Lyon 00:12:34 Somebody for whom, you know, cloud scale file sharing is, is a problem. And, I don’t know for sure, but I imagine a lot of the AI stuff where the, the data sets are gigantic. They want to distribute data to thousands of GPU nodes and do it in some consistent way. So they, they probably have a huge versioning problem. And there’s been lots of other layered immutable data facilities like the Delta Lake in the cloud is very popular in the I forget the name of the company, but in the Java world and there’s a system called Pachyderm, which but the original tagline was Git for your data. So there’s a lot of similar approaches, but none of them accomplish the NFS feature of never having to copy stuff before you use it.
Ethan Banks 00:13:28 That is, being able to do, do it over the network and have it, have the network be transparent to that transaction.
Tom Lyon 00:13:34 Right. Not, and it’ll because of the CAP theorem, it’ll never really be entirely transparent. But, and again in the cloud it’s not a human very often who’s doing this stuff. It’s deeply embedded. So a person using the system is now some kind of process or job or agent or something. And if something goes wrong, because of the network, the right thing to do is just restart everything, restart that job. Humans don’t like to be restarted, but in the cloud it’s totally the right thing to do.
Drew Conry-Murray 00:14:05 So I’ve got a couple of takeaways here. One is you’ve got this idea Beyond FS, you’re looking for collaborators, but you’re also saying that this can tie into object storage underneath.
Tom Lyon 00:14:17 Absolutely. It’s you know, object storage is the, the way to do cloud storage. You know, it’s cheap. It’s fast enough. File sharing in my mind in the cloud is just an abomination left over from, from the enterprise world. And block storage isn’t, it’s fine. You know, it’s very well defined semantics and is necessary for compatibility of all these whole operating systems.
Drew Conry-Murray 00:14:46 Okay, so what I’m taking away then is that, and you’re saying object storage is ideal for public and private cloud because of performance, scale and cost?
Tom Lyon 00:14:57 Right. Yeah. I mean I talk about cloud. There’s an awful lot of cloud native things happening on prem these days as well. And just the ratio of machine to user, you know, is, is a thousand times bigger than it used to be or at least a thousand times. So things have to work really, really well, well-defined ways in scalable ways to, to have a place in the cloud native world.
Drew Conry-Murray 00:15:27 So is there a place folks can go if they’re interested in Beyond FS and want to take a look at it or see if they want to become a collaborator?
Tom Lyon 00:15:34 There are both slides and a video from the talk I gave. This was at the Netherlands Unix Users Group, which, a bunch of old Unix, old farts like myself, but you can find that on the web, and I think we’ll reference it in the show notes.
Drew Conry-Murray 00:15:52 Yeah. We’ll absolutely have those links in the show notes. Well, thank you, Tom, for joining us and we will have those links in the show notes if folks are interested in the Beyond FS project and want to get a look at it and see if they’re interested in collaborating. We’ll also have some ways to contact them to get in on this.
Drew Conry-Murray 00:16:07 We’d also like to thank MinIO for sponsoring this episode. If you would like to find out more about what MinIO was up to in regard to object storage, just head it over to Min.io. You can get more details and read their technical blogs, or download it and try it for yourself. Thank you for listening. You can find this and many more fine free technical podcasts and our community blog, it’s all at packetpushers.net. You can find us on LinkedIn, you can hear us on Spotify and if you would, leave us a rating on Apple Podcasts. And last but not least, remember that too much networking would never be enough.