Through the Looking Glass

"Nebula: a prototype of some ideas on file stores"

Posted on April 7, 2015

I’ve been toying around lately with some ideas for a file store; I finally got around to implementing some of those ideas as a project called Nebula. There is source code available (written in Common Lisp) and a demo HTTP frontend. It’s a very rough idea demonstrator, and the HTTP front end was mostly so I could test the idea faster — caveat user.

The motivation section is where I step on my soapbox and muse on the thoughts that lead me to this. You can skip this section if you’re not fond of such diatribes and go directly to the nebula section.

Motivation

Nebula was motivated by a dissatisfaction in the current state of systems these days. Without going into too much detail on the subject, solutions like Docker1 and container orchestration seem more primitive than what we’re capable of. I’m also of the heretical opinion that Unix isn’t the end-all be-all of operating systems2. This isn’t to impugn the people working on these; right now, this is the best we have. I just think we can do better. But, given the trend of worse is better, I don’t think we’ll see any useful improvements on the status quo in the future (for a variety of reasons primarily non-technical), probably continuing to add more and more layers3 as we keep doing now.

I’d much rather see something more along the lines of the Mirage OS / unikernel approach. Packaging an application shouldn’t require bringing in an entire Linux installation in a container4, or using a container system with pseudo-isolation. Applications should be properly isolated, like those running as unikernels on Mirage OS.

In thinking about this problem, I was strongly motivated by the paper “A Security Kernel Based on the Lambda Calculus”5; I’m not arguing that the security kernel introduced there is a good solution to this problem, but it has a good discussion of the characteristics of a security kernel.

On Security Kernels

The core tenant of the paper is that a secure system provides three guarantees:

  1. Completely isolated environments6. Applications cannot access each other’s environments directly. This is a subtle guarantee: in a capability system7, for example, this doesn’t necessarily guarantee that they can’t access the same memory. Environments may also have different scopes: they can apply to individual threads, or to application instances, depending on how the system is architected.

  2. Inter-environment communications. This is the environment analogue of IPC. Many of the same techniques apply: sockets, shared memory, and message passing, for example.

  3. Safe cooperation between environments provided by access mediation. The security kernel provides guarantees of the previous two, as well as resource constraints needed to ensure that applications can run harmoniously.

As I envision it, an operating system for the future would employ a unikernel approach to running processes; a small supervisor or hypervisor would mediate access and resource control, and it would probably employ a capability system to control access. In considering this, I wondered how the filesystem would play into this.

On File Systems

Files are fundamental to the Unix way. However, the POSIX model doesn’t really work well with a capability system; they employ the access control list security model, which is the opposite of the capability model. In a capability system, the user presents some capability8 to the security kernel or file server, which translates that capability to some resource. If user G7079 wants to share a file with G708, instead of adding G708 to an access roster (whether via a Unix group or by modifying the world permissions on the file), G707 will supply a capability to G708 that G708 can give to the file server to retrieve the file. In fact, any user with that capability can access the file. As I’ll discuss later, there are ways for G707 to grant a capability to G708 such that G708’s access can be restricted without restricting G707’s access.

This granting could be done via a message passing system, for example.

Imagine an editor and compiler running as unikernels; the editor will have created some source file, and wants to grant access to it to the the compiler. In this case, the editor sends the capability to the compiler, and the compiler can access the source file. Neither the editor or compiler would necessarily need to know about the file sharing mechanism; it’s likely that there is a filesystem driver included that knows how to translate between filenames and this capability-based access. It’s similar to how we don’t access files by their disk and inode; we have the abstraction of paths and hierarchical filesystems at our disposal.

I basically got this point thinking about this, then decided to play with what such a file server would look like.

Nebula

I like to start projects with a clearly defined problem statement10; what is it exactly that I’m trying to solve? Accordingly, the problem statement I came up with for Nebula is this:

Users have data: they want to be able to keep track of revisions to this data, and they would like to be able to share this data with other users. Users would also like the ability to cluster into groups to share and collaborate on this data.

Some secondary goals that I wanted to accomplish were

Characterising the solution

The next step after defining the problem statement is to figure out what the characteristics of this system are.

Towards a solution

I’ll be upfront: Nebula doesn’t meet all of these characteristics. It’s a playground to see how some of these characteristics turn out. Specifically, it runs as an HTTP API (that’s sort of REST-like, if you squint hard enough) on a Linux server; it can’t make guarantees about what happens to its data outside of the system. Internally, it’s consistent but since it isn’t running on an isolated system, it can’t make strong guarantees12.

Towards this, Nebula is built on two concepts:

A blob is some chunk of data. The server treats it as an octet string13, and internally, they are identified with their hex-encoded SHA-256 identifier.

Referring to data by blob is only useful for getting the contents of the data; it also poses potential information leaks.

The blob, then, is used for the immutable data: a piece of data as it exists at some point in time.

Entries contain metadata, and they have four main properties in this version of Nebula:

For this prototype, I used Postgres to store entries15; during some discussion on IRC, it was mentioned that the simpler storage option for entries would be to serialise them to disk. This made garbage collection much more difficult: it’s much easier to query the database to see if there are any remaining entries with a certain target than to scan a bunch of files or try to ensure some in-memory data structure was consistently getting serialised to disk16.

I mentioned earlier that one user could grant access without jeopardising their access, and the discussion of the parent property hints at the solution. In capability systems, this is known as the “proxy”17 pattern: user G7926 proxies some entry E805718, creating a new entry E3002 whose target is set to E8057. When Nebula goes to resolve the target of E3002, it will see that it points to E8057, and will look up E8057’s target (some blob) and return this data. If it turns out that access to the data via E3002 is no longer needed or desired, it can be removed; E8057 is still accessible. If instead E8057 is removed, Nebula knows to also remove E3002 as it would point to an invalid target.

To prevent information leakage, proxying defaults to creating a new entry with no parent (but it will preserve the timestamp). It’s possible to walk back the list of parents of the original entry, building a history list (called a lineage here). This list can then be proxied in full, providing a proxied copy of the original data and its history.

A capability in Nebula, then, is an entry identifier19. Anyone who presents this identifier can read the associated data.

Conclusions

There’s plenty of area for more work to be done on this.

Testing Nebula

You’ll need

Quicklisp normally installs to $HOME/quicklisp and I’ll assume that’s the case.

Create the database credentials file in $HOME/.nebula.lisp with the following template (but be sure to fill in the right information):

;;; Example names taken from the postmodern docs.
((:DB-HOST "localhost")    ;; hostname of database
 (:DB-NAME "testdb")       ;; database name
 (:DB-PASS "surveiller")   ;; password
 (:DB-PORT 5432)           ;; port
 (:DB-USER "focault"))     ;; username

The following sequence will get the project running:

$ cd ~/quicklisp/local-projects
$ git clone https://github.com/kisom/cl-nebula
$ git clone https://github.com/kisom/cl-nebula-www
$ sbcl
* (ql:quickload "nebula-www")
* (nebula-www:startup)

The API documentation contains a list of endpoints.

Going forward

The next step is to build something on top of Nebula. My first thought is to map a POSIX-like file system on top as a proof of concept. It would handle some notion of identity and groupings, and allow refering to entries by a friendly string identifier. Specifically, one thing I thought of was a collaborative sort of text editor built as a web interface. I’m not entirely sure about this, though; it still needs some thinking.

Feel free to send your comments to kyle (at) tyrfingr (dot) is, or see the contact links on the comms page.


  1. I run quite a few services for myself inside Docker containers, and we use them quite a bit at work. Most of my thoughts here are informed by this experience.

  2. That being said, I’m one of those Linux-on-the-desktop people. You’ll find me armed with a Thinkpad running Ubuntu server, NixOS, or OpenBSD, using StumpWM as my window manager and spending most of my time in emacs or xterm. For those who point this out and say something to the effect of “just use OS X”, I reply with a “been there, done that, lost enough time”. Having spent a considerable amount of time running OS X, I find it gets in the way of my usage patterns more often than not. Perhaps I’m not so much a user as a refugee. This may be a case of “worse is better”; it’s an actively maintained operating system that still allows me to be productive.

    An interesting argument has been made previously that I should adapt to the OS X usage model. I take the opposite stance: I prefer to adapt the computer to my usage model. This subject could make a blog post of its own; I touched on it some in a previous post.

  3. There’s an important difference between adding layers of abstraction, which might streamline certain things and allow higher-level reasoning of a system, and adding layers of obfuscation. For example, a half-truth is that the CPU doesn’t differentiate between data types (not strictly true, but generally true); in assembly, you deal mostly with the structure and interpretation of computer memory. In a higher-level language like C, it’s useful to differentiate between integers of various sizes and sign, or characters, or ~struct~s of other data. These data types are only abstractions of the programming language. The layering I see in solutions like Docker strikes me as more of the “yo dawg, I herd u liek to lunix while you lunix so i stuck some lunix on your lunix” variety.

  4. I am aware of RancherOS. It’s not something that I see many people using, and the fact that it is still a Linux distribution notwithstanding, my other arguments apply.

  5. J. Rees. “A Security Kernel Based on the Lambda Calculus”, A.I. Memo 1564, MIT, 1996. It’s available on the web; I’ve also typeset the paper as well.

  6. The term environment here means the lexical environment; it shouldn’t be confused with the concept of Unix environments. A lexical environment is the mapping of names to their values.

  7. Wikipedia defines a capability as “a communicable, unforgeable token of authority. It refers to a value that references an object along with an associated set of access rights.” A capability system uses capabilities (instead of access control lists) as the primary mechanism for access control.

  8. Figuring out what the hell a capability actually is has been an interesting area of personal research and really one of the main motivators for Nebula.

  9. I’ll tend to gensym names in this post.

  10. This approach has been directly inspired by Rich Hickey’s “Hammock Driven Development” talk.

  11. Other feelings about VMS aside, this was a useful feature.

  12. It probably could, though this would be hacky and wouldn’t actually be useful in the service of prototyping the idea.

  13. A blob has the Lisp type (SIMPLE-ARRAY (UNSIGNED-BYTE 8)); in C, this would be a uint8_t[].

  14. A collision would be two distinct entries getting the same identifier, even if one of those entries no longer exists.

  15. There is no evidence that this choice of data store was primarily motivated by a desire to use postmodern. You can’t prove a thing.

  16. Postmodern turned out to be the perfect tool for this task. I’d written this originally in Clojure for reasons that aren’t germane to this discussion, and attempted to port it to Haskell. The Clojure version used SQLite, and the Haskell version used serialisation. As a totally-arbitrary comparison, the Common Lisp entry and database code (e.g. entry.lisp and db.lisp) is 101 SLOC; the Clojure version (e.g. entry.clj and db.clj) is 213 SLOC; and the Haskell version (e.g. Entry.hs) uses 111 SLOC, all as counted by cloc. I didn’t get garbage collection working in the Haskell version, which turned out to be a fatal flaw that led me to just use CL. Certainly much of this compactness of the Common Lisp code is due to a deeper familiarity with the language.

  17. I think.

  18. To avoid entering full UUIDs here, I’m using modified gensyms for entry IDs.

  19. I think.