Over the weekend – and, obviously, a little bit today – I’ve been working on one of those projects that has been baking in the back of my mind for a long time. I always have a bunch of these queued up. Sometimes I use them as warmups or breaks when I’m feeling a bit stuck, much like a novelist in one genre might write a poem or a short story in a different genre to overcome writer’s block. Other times I use them to learn or refresh skills that I don’t otherwise get to exercise as part of my regular duties. Anyway, in this case the project was spurred by my recent efforts to add a Python interface to Avati’s new “gfapi” GlusterFS library interface. Why stop there? Why not go all the way and provide glue to write actual translators in Python? Thus was born glupy (pronounced “gloopy” in my head because I find it amusing). With that in mind, I read the Python ctypes documentation more carefully and added the embedding documentation as well. Python extension (letting Python code call C) is pretty familiar territory to me, but I had never tried embedding (letting C call Python) before and I get the impression that my experience mirrors that of the community in general, so this was a learning experience. With all of that information semi-digested, I started hacking. After overcoming several typical kinds of issues with this kind of glue programming, I got to the point where I had something that basically worked, and decided to implement a version of my negative-lookup-caching translator using the new Python infrastructure. Having done that – results below – I now feel comfortable that glupy is “for real” enough to write about. I’m going to save the “how” for later, because it turns out that I might get the chance to write about that at greater length elsewhere, but let’s get some of the “why” out of the way.
The reasoning behind glupy is mostly the same as the reasoning behind FUSE itself, or the Python bindings for FUSE. I’m a firm believer that X functionality should be implemented in the X subsystem, where X in this case is storage. I’m frankly a bit tired of seeing people implement storage functionality as layer after uncoordinated layer on top of the storage subsystem, just because writing code within the storage system is too hard for them, so anything I can do to make it easier seems worthwhile. The simple fact is that higher-level languages reduce barriers to entry. Having access to sophisticated code and data structures with automatic memory management makes code easier to write. This effect tends to compound itself, as the higher-level-language libraries for any given task also tend to have more coherent and generally pleasant interfaces than their C counterparts, so the higher up the stack you go the more benefit you get. I know this approach works, because I’ve personally worked on a project (C3D at EMC) where just the conversion of a prototype implementation from Python to C took longer than getting the prototype working in the first place. If I’d had to debug the protocol and the language-specific implementation at the same time, in the less convenient language, I’m quite sure the overall project time would have tripled. Sometimes the storage subsystem is the right place to implement functionality but C is the wrong language.
The secondary questions have to do with my choice of higher-level language. Why Python instead of Ruby or Lua? Why CPython instead of PyPy? In both cases, my own familiarity was a factor. I learned Python back in the 1.5 days and, having learned it, never felt the others were different enough to justify an extended effort to learn them properly. Furthermore, I have experience integrating Python with C, so this probably took me half as long as the other integrations would have. Also, Python is the alternative people ask for. CPython in particular is the scripting language that’s most likely to be installed on GlusterFS users’ systems out of the box, it’s the only scripting language I’ve heard people ask for, UFO is written in Python, many parts of HekaFS were written in Python, etc. Maybe I’ll stretch a little more and do one of the others some day, but I already have my work cut out for me so don’t hold your breath.
OK, enough justification. How about those performance results? What I expected was that the same performance benefit I used in my Red Hat Summit slides would still exist, because – and I can’t stress this often or strongly enough – when you’re dealing with performance in a distributed system the first thing you should seek to minimize is network round trips and synchronization delays. Only then should you even worry about disk performance, let alone CPU overhead. The use of a higher-level language just shouldn’t matter for the case that negative lookup caching is meant to address. So, without further ado, here are the results for my “PHP simulation” which measures average time to do a thousand include-file lookups across ten directories (with a power-law distribution with 80% of requests to 10% of files).
- Vanilla configuration: 5.8ms
- Add Python-based negative lookoup caching: 1.5ms
This is actually a better result than the 3x improvement I saw with the C implementation. I wouldn’t obsess over the precise numbers too much because this is just one run of a fairly small-scale synthetic benchmark, but it’s certainly enough to support my theory that language overhead doesn’t matter in this case. Also, the Python code (for this specific translator, not the infrastructure) is approximately half as long as the equivalent parts in C and I’d say it’s a lot more understandable as well.
This is still early days for glupy, there are literally dozens more functions I have to implement in addition to the two I needed for this test, and then there are all sorts of other infrastructure I need to add to make the Python environment as complete as that for C, but it’s a very auspicious start.