Vault Format 7: Behind the scenes

With Cryptomator 1.5.0 we introduced a new vault format. In this post i’d like to give a short explanation why it was necessary.

Basics

cryptoFS

When I write “Cryptomator encrypts stuff”, I actually mean that its encryption library cryptoFS encrypts this stuff. No matter if its the content encryption, the file name encryption or the directory flattening, this library is doing all of it (and more) and can be seen as the core of Cryptomator.

Misc

To prevent confusion, I introduce some terms here and will (hopefully) use them consistently through the rest of this article.

The first thing one should know is the difference between the length of a filename and the length of a path. For example the path /some/random/path/to/a_normal_file.name has a length of 39 characters, but the the targeted file has a filename length of only 19 characters.

Another point is the decrypted file and encrypted file. Like the name suggests, the decrypted file is readable for everyone and is “located” inside the vault , while the encrypted file is … well, encrypted and stored on your hard drive.

Also i’d like to stress out the difference between the file content and its name.

Why introducing a new vault format at all?

Breaking it down there were two motivations: Improving robustness and performance and fixing an issue with the Google Backup&Sync client.

It is a good start to know how it was in…

Vault format 6

As I already mentioned, cryptofs does not only encrypt file content, but also file names and directory structure. The resulting encrypted file name (EFN) is due to encoding longer than the decrypted file name. But cloud providers or OSs have limits on the length of filenames, therefore the EFN must be shortened if it exceeds a certain threshold. To preserve the original EFN and indicate that in the encrypted directory is only a short version, the file in question got the ending .lng and was linked to a file in the m directory which had the real EFN as file content. In vault format 6 this threshold was set to a quite conservative value of 129 ASCII characters.

Of course this impacts performance, because every time an *.lng was encountered in a directory listing, to display its real name one needed to make the detour over the m dir. Therefore, if one chooses a low threshold it happens oftern and slowing down the listing of a directory. But also robustness suffers from this design, because what happens if the m directly is not properly synced or gets corrupted for whatever reasons? This will corrupt files all over the vault and not only a specific directory, making it hard to recover from it.

And lastly, the aforementioned issue with Google Backup&Sync comes into play.

As you can read in this comment of the regarding ticket, this sync app does not allow to change the ending of a file. This was problematic due to the following scenario:

  • Let’s say you have inside your vault a decrypted file with a very long name, such that its EFN gets shortened and the ending .lng is appended to it. If the decrypted file gets renamed to a shorter version and its EFN length falls below the threshold, Cryptomator renames the .lng file and removes the file ending.

Vault Format 7

To tackle these two problems, we did three things:

  1. increasing the shortening threshold
  2. unify the internal vault structure
  3. appending to each file the ending .c9r or .c9s

Shortening Threshold

The threshold is now set to to 220 characters, but can be flexible adjusted to a lower value.

The idea came up in the summer last year and was first proposed openly on our issue tracker in july.
To gather intel what a good threshold would be, we asked you, our community, for help and published the results one month later. Based on these results, we decided to increase the shortening threshold to 220 characters. But we were a little bit to brisk with that. Even thou with an open beta for half a year, when we released the 1.5.0 version, we received a lot of input that certain set ups don’t work anymore. Since we are not so good in backpedaling, we introduced the ability to probe the local filesystem capabilities and flexible set the threshold of this shortening constant when unlocking a vault. But this is only a local limit, files added to a vault in another setup (e.g. another device) may be longer again.

Vault structure

We unified the vault structure, such that all first level content (files in the directory, the subdirectory “files” and symlinks) of a decrypted directory is now also part of its encrypted counterpart.

With this change the indirection with m-directory is therefore obsolete. Also maintaining the code gets easier, since everything follows the same scheme and gets only split up when it comes to the actual file type.

The file ending .c9r

Every encrypted file/ directory inside the vault storage location has now an file ending, to prevent the rename bug of Google Drive BackUp & Sync mentioned above.

Files & directories with unshortened names get the file ending .c9r and if it is only a shortened name, they get the file ending .c9s.

5 Likes