For some, storing data in the public cloud isn’t going to happen. You know who you are, and you have your reasons, I suspect most of them around the fact that your data is the focal point of your business. For instance, you work with devices that generate an enormous amount of sensor data. Or you’re in media streaming, and the increasing video resolutions are increasing storage requirements. Or you’re in biotech, where the cost of scientific equipment is less that the the cost of the storage to handle the generated data.
In all of these cases, the data in question is both voluminous and increasing. The data is not going away. If these challenges sound familiar to you, Igneous Systems has built a storage platform for you to consider. Igneous bills themselves as, “Infrastructure as a service for data you can’t or won’t move to the cloud.”
The Igneous Systems storage platform highlights.
Cloud services. Rather than buying an Igneous Systems platform up front, you pay via a subscription model. Igneous owns and manages the system. You aren’t stuck with a depreciating asset or operational burden. Igneous couples this idea with an S3 API-driven, developer-centric interface, and bills the platform as “Zero-Touch Infrastructure.” Igneous feels so strongly about “Zero-Touch Infrastructure,” that they registered a trademark.
On premises & data-centric computing focused. Igneous is targeting folks with large amounts of data who need that data to be local. How much data? Igneous pricing is ~$40K per year for 212TB of storage. Storage is purchased in 212TB chunks. Therefore, if you need, say, 300TB, you’ll pay Igneous Systems $80K per year for a 424TB system (2 x 212TB).
What Igneous Systems storage looks like on the inside.
Arguably, you don’t need to know what an Igneous Systems drive array looks like on the inside. Igneous is doing the heavy lifting of owning and managing the box, leaving you with subscription-based, consumable storage developers can get at with an API. If it’s performing and meets your needs, do you care what’s under the hood?
Sure you do. Let’s take a peek at just a few highlights.
Igneous acknowledges that there is a lot of open source storage software out there. Existing open source software represents a challenge. Does it make more sense just to use existing software or build something new?
Ceph caught Igneous Systems’ eye, because the Ceph project had addressed some difficult scale-out problems. For instance, how do you lay data out across a distributed system with hundreds or thousands of nodes, while still knowing where the data is at? Ceph’s CRUSH algorithm is excellent at this. Ceph is also good at monitoring system health and data integrity right down to individual drives. Ceph is also big on appending data as opposed to overwriting data, which for performance reasons, was appealing to Igneous.
And yet, Igneous Systems is not employing Ceph. Why? Ceph does not offer granular enough control for laying out data across the network. Also, Ceph is focused on scaling out block storage, while Igneous is actually an S3-oriented object store. OpenStack’s Swift might come to mind as a object-friendly alternative to Ceph, but Igneous rejected Swift as being too inflexible for their design goals. Swift is also tuning-intensive — too painful to get down to the ultra-granular drive-level that Igneous manages.
In the end, Igneous Systems built their own data path consisting of the following layers.
Object layer. This is the front-end — how you’ll interface the storage array. Access control, namespaces across the entirety of the infrastructure, metadata assigned to objects, as well as indexing of all the data lives here.
Resilient Store. Igneous points out that this is similar to what you find in Ceph. In the resilient store, Igneous makes decisions about where to put data so that in the event of a hard drive failure, the system can continue without loss of data. This is done with erasure coding to spread data across disks. Data rebuilds also happen here.
Disk Layer. Igneous has granular control down to the disk itself. This is done with a nanoserver per disk — an inexpensive, dedicated ARM CPU hanging off the back of every spinning disk that translates Ethernet into SATA and back. The nanoservers are dual-plumbed into non-blocking Ethernet switches. The result is like a JBOD, but with dual Ethernet switches replacing the usual dual disk controllers. You end up with a 1:1 ratio of servers to drives, minimizing the fallout of a control node failure. Igneous can also control exactly how data is written to individual disks.
Yes, but…what about hardware failures in a turnkey, hands-off system?
Since you don’t own the drive array, an important question to answer is, “How are drive failures handled?” Normally, disk failures are a big deal demanding the quick attention of an on-site operator to handle. But in the case of Igneous Systems, there is no local operator managing the storage. The answer is that drive failures are not that big of a deal in an Igneous Systems array. Yes, they matter. Yes, they need to be addressed. But the impact to performance and data availability of a drive failure is minimal.
The reason for the low impact of failure is what I’ll term “the wide Igneous stripe.” Igneous’ specific layout scheme is 20+8.
Jeff Hughes, Igneous Systems CTO, summed up the scheme in a presentation to Tech Field Day this way.
For every 20 blocks out there, we generate 8 erasure encoded blocks, which are a combination of local repair codes, plus Reed Solomon. You could consider this like 20+3 if you just considered Reed Solomon, which is very similar to what either a RAID-DP uses or something like that, but then we also scatter in a number of these local parity blocks.
And this is something that we actually saw from folks like Microsoft and Facebook. You can go look at some papers around LRC. Which is this idea…if I layout a big stripe like this, the problem comes when I need to repair. I have a huge I/O expansion. If I lose just one block, I have to read twenty blocks to replace that one block. And that becomes really expensive for just the usual case of losing a single device.
So, instead you chunk little sections of, in this case, 4+1, so you have an encoding block for every four blocks in this picture. And now, if I lose a single device which is the most common thing out there, I’m only having to read four other devices to reconstruct that data. And so the I/O expansion in those cases is greatly minimized.
The end result of this scheme is that background tasks don’t cripple array throughput. Rebuilding data around failures doesn’t hurt that much since the amount of data that must be read to reconstruct the missing data isn’t that much, despite the wide stripe.
The view from the hot aisle.
Igneous Systems storage delivers scale-out storage on-premises as a turnkey, subscription-based service. The pricing model seems attractive, even if having to purchase in 212TB increments lacks the granularity their disk management capabilities boast. Then again, maybe it’s not easy to deliver the performance benefits without a lot of disks to write that wide Igneous stripe across. The solution might not work in a way that meets all the performance and capacity objectives at less than 212TBs at a time. Although that’s a guess.