Who Cares About Storage?

Yesterday, a customer asked whether some of the future directions for GlusterFS might result in less need for storage or system administrators. As part of my reply I hit on a theme that seemed to resonate well enough that it’s worth expanding on it a bit more. By the way, it’s not an original theme. It’s just one that doesn’t seem to get enough attention.

The whole idea of a “storage administrator” is becoming outdated. “Storage” implies stasis. In common usage, when you store something it just sits there, not moving, often at some distance from its point of use, for some considerable time. In the technical world, “storage” is often used to mean moving data to/from one device, subject to the performance and other constraints of that one device. You might have many such devices, they might use RAID or some other kind of redundancy internally, but this model of storage is still about data in repose and in isolation. The problem is that all of the interesting problems nowadays are about the data as it moves from place to place. Storage as a specialty is about little islands of data; data as a specialty is about many heterogeneous islands and transport between them. Storage performance is about feeds and speeds in/out of a box; data performance is about speed to/from whichever box is most convenient, accounting for all kinds of replicas and caches within the data layer. As data need outpaces data speed by ever greater margins, a data administrator must make ever more sophisticated decisions about which data should move through which pipes to which locations, and when. Managing data integrity becomes inextricably entwined with managing consistency across multiple copies. That brings with it a whole host of difficult problems and tradeoffs, and I haven’t even gotten to security or format/protocol issues yet. Just as there’s a difference between low-level networking (e.g. Ethernet/IP) knowledge and higher-level distributed systems knowledge, there’s also a difference between storage expertise and data expertise. We need to make the storage part simpler precisely because a data administrator has to understand all of this other stuff as well. Humans should be handling the policies and exceptions that machines can’t, not bogged down managing the mere mechanics of something that the data layer should be able to do autonomously.

Say goodbye to the storage administrator. Say hello to the data administrator. What, you say they look the same? Exactly.