Handling of long filenames in Cryptomator 1.5.0

As previously announced, we will migrate to a new vault format in Cryptomator 1.5.0, which will be more efficient at handling long file names.

To support any filename length, Cryptomator will internally save the encrypted name of files with very long names in a special way that is less efficient than how it stores shorter names. With the new vault format we want to increase the threshold at which the less efficient storage takes effect. However, we can not increase it to infinity and beyond, as third party software (such as cloud sync software or your file system) wouldnā€™t be able to deal with it. Keeping this in mind, it is now our job to find the ideal threshold.

When you analyze the name of files on your file system, youā€™ll notice that shorter names are more likely than longer ones. The distribution of lengths seems to follow some logarithmic function. This means that by increasing the threshold to factor x, the number of affected files will approximately decrease by aĖ£.

filename%20analysis

Previously, Cryptomator considered filenames with more than 80 chars as ā€œlongā€. In the future version the threshold will be increased to 146.

We asked our users to tell us how many files are longer than the old vs. the new threshold and you submitted a total of 21,126,485 files! Wow! Thank you! :heart: Here are the results:

Min Max xĢ„ xĢƒ ĻƒĀ²
> 80 chars 0,030% 26,415% 2,936% 1,220% 0,266%
> 146 chars 0,000% 1,853% 0,079% 0,002% 0,001%

This means: For the average user the likelihood for Cryptomatorā€™s long filename handling to kick in will decrease from 1,220% to 0,002%. This is roughly 600 times less likely!

Thanks again for everyone helping us gain this data. :raised_hands:

3 Likes

Could you please elaborate on why this is a necessary change? If this is changed, I was expecting this to go towards the opposite direction, i.e. to spend some extra CPU time (which is becoming more abundant by the day) in order to reduce the threshold of long file paths and shorten more file names/paths and thus increase the compatibility with systems like windows.

The proposed change tries to save some CPU time while allowing much longer file paths, which is likely to increase problems that have already been reported here in the past (see accidental deletions of ā€œinvisibleā€ files/folders, undeletable/unmovable files etc etc).

It is important to note that although the file path will be of acceptable/valid length at the time of file creation, the biggest problems usually arise due to later moving/rearranging folders, including changing the path the leads to the vault (which is not handled by cryptomator, but could still cause accidental loss of cryptomator files indirectly).

Since our first priority is to preserve our most important files and to avoid costly accidents like the above, why would we want to save an extra tiny amount of time if that could end up costing us our most important files down the line?

(Please note that redundant backups do not protect against file loss IF the user is not even aware that he deleted ā€œinvisibleā€ folders in order to restore them before the backups become old enough to be overwritten. Someone would have to keep versioning backups forever to just avoid this issueā€¦ Also, no matter what backups someone has, it doesnā€™t mean that we should make the software be less compatible anyway).

I would like to kindly request that this proposed change is discussed in much more detail before making any final decisions. Also, if I am missing something and this change indeed has to be done for other reasons, please let us know if it would be possible to make this threshold a user-adjusted setting (within a predefined range) so that we can lower the threshold instead of increasing it for those that need it.

Thank you

There are many false assumptions here: First of all, we save little to no CPU time. This is all about compatibility and I/O optimization. More importantly: We donā€™t get ā€œmoreā€ compatible. We either are compatible or we are not. Any filename length below 259 chars is compatible with both, Windows and OneDrive.

No, no, no. :grin:

This is the important misconception: Yes, long filenames can cause problems. To be more precise: Those filenames, that were longer than the old threshold caused those problems because they were shortened.

I want to stress that vault format 7 does not simply increase the threshold to mitigate such problems, there are also some other changes to prevent them (explained on GitHub, would be off-topic here).

You probably mean issues such as files disappearing from GDrive-synced vaults if long files were renamed to shorter filesā€¦? Will be fixed, but again the fix is not related to the filename length per se.

The shortening is about individual file name length and unrelated to the whole path. We use directory flattening to make sure that even deep hierarchies will not exceed any limits (such as 32k chars on NTFS, ā€¦)

No, it doesnā€™t. As long as the internals of the ciphertext directory is not touched, you can move it around as you like, it will not harm any contents.

We donā€™t save timeā€¦

Donā€™t know what youā€™re referring to. You express yourself as if the change would cause harm, while in reality it reduces I/O complexity and due to reduced shortening allows us to make a lot more checks atomic.

Deleting invisible foldersā€¦? Whatā€¦? :thinking:

Changes have been discussed for months, some of them even for longer than a year on GitHub. Development is done on GitHub, these discussions donā€™t belong into this forum.

Thank you for clearing up the misconceptions and providing clarifications, looks like everything is under control. Just one minor clarification from my side, the invisible/disappearing files that I mentioned above that are caused by long file paths are these:

This is indirectly related to:

I.e. the above is true, as long as moving the vault does not inadvertently cause the path length to exceed 260 characters, in which case files could temporarily disappear from explorer and be accidentally deleted (as described in the link above).

Thatā€™s why I mentioned doing anything to reduce path lengths (including any file names) seems to be a good thing. If we can avoid this scary issue then all is good. But is there any chance the proposed change allows longer file names that could make the disappearing file issue more frequent? The closer to 260 characters we get, the easier it is to move/rename a folder outside a vault and cause some vault files to exceed 260 characters and potentially disappear until the path is shortened.

Oh, I see! If I understood you correctly the problem arises if the path inside the vault (!) exceeds 260 chars and only when using the default Windows WebDAV client. Apparently this client isnā€˜t capable of handling longer paths.

We have to distinguish this (cleartext) path from the ciphertext path Cryptomator creates at the vaultā€™s storage location. Regardless of how we store the ciphertext, the cleartext path will/must stay the same, so the WebDAV problem persists. The only solution in this case is using either Dokany or a different WebDAV client.

Also, the cleartext path doesnā€˜t get longer when moving the vault directory to a more deeply nested location.

That said, of course we need to make sure not to exceed limits applying to the underlying file system where the vault directory is stored.

Actually I never had any issues with the files inside the vaults. It is windows explorer that is causing the issues with the encrypted files on disk IF those end up somewhere that exceed 260 characters. Here is a more detailed example of what I mean, based on my actual situation:

Consider the below folder structure on disk. A dropbox folder on disk root (to minimize file paths), then a folder structure which cannot be shortened any more, then the actual placement of the vault with all the encrypted files in it:

E:\Dropbox\somefoldername111\somefoldername222\somefoldername333\MyVAULTname\d\7K\57RUFKAASNWHSPH5KX3FL2R3RIWRFA\QT2P3VC323BSE33BLFYH5ZSPLVBN455NYR6B34L2KF2LT3L45J2CFC7P5FMBRQPEXFTW5M7QH7QZSLMOR2LFJDJH5CLD4R7VWCCCV3CDLNLGATV74ORDBDANHZ5XWDP4

This file has a total path length of 242. The first 77 characters is the part of the path that leads up to the encrypted vault, and the remaining 164 characters of the path are taken by the encrypted vault file.

Cryptomator makes sure this second part of the path never exceeds a threshold by shortening the file/folder names where needed. However, the first part of the path that leads up to the encrypted vault on disk is outside of the control of cryptomator (as it should be).

But if the user moves the encrypted vault (or renames one of the containing folders) and makes the first part of the path to be extended, some of the encrypted files will inadvertently be pushed outside the 260 character limit and will usually either:
a) disappear in windows explorer, which in this case (I just realized) shouldnā€™t be much of a problem as a user should not even go there to view the encrypted files anyway.
b) might become inaccessible by other programs, including backup or cloud sync programs. This could cause partial backups that will mess the encrypted folder structure and cause files to be lost. The user will be unaware of these partial backups or partial cloud syncs until he tries to restore vaults and notice that files are missing.

For example, if in the above example path we moved dropbox to itā€™s default location on windows desktop, the new longer path would be this (without changing anything in the vault itself):

C:\Users\someUsername\Desktop\Dropbox\somefoldername111\somefoldername222\somefoldername333\MyVAULTname\d\7K\57RUFKAASNWHSPH5KX3FL2R3RIWRFA\QT2P3VC323BSE33BLFYH5ZSPLVBN455NYR6B34L2KF2LT3L45J2CFC7P5FMBRQPEXFTW5M7QH7QZSLMOR2LFJDJH5CLD4R7VWCCCV3CDLNLGATV74ORDBDANHZ5XWDP4

This new path has a length of 269, which means this specific file along with many others will now become inaccessible by many programs as described above. These files can potentially be lost if/when the user restores a backup etc.

For this reason I always assumed it is safer to shorten the encrypted file names as much as possible in order to provide more available path length to the user for naming folders, to avoid accidentally pushing the encrypted files over the 260 limit when moving stuff around or renaming folders.

I.e. this part of the path takes almost 2/3rds of the max path limit:
\d\7K\57RUFKAASNWHSPH5KX3FL2R3RIWRFA\QT2P3VC323BSE33BLFYH5ZSPLVBN455NYR6B34L2KF2LT3L45J2CFC7P5FMBRQPEXFTW5M7QH7QZSLMOR2LFJDJH5CLD4R7VWCCCV3CDLNLGATV74ORDBDANHZ5XWDP4

Will this become even longer in the new proposed file structure? If yes, that would leave much fewer characters for the user-defined part of the path and it would become very easy to push files over the limit of 260 characters when moving stuff around.

The apparent path length is due to downwards compatibility with old DOS standards. This is why you see files disappearing in Windows Explorer when exceeding this limit. But as you already mentioned, people arenā€™t supposed to work on the encrypted files anyway.

Sync and backup programs on the other hand should be able to deal with it (as long as they donā€™t use old DOS APIs). Starting with Windows 10 version 1607 you can enable handling of paths for up to 32k chars via the registry. Note that this will still not make Explorer compatible with longer paths, though. :roll_eyes:

As a last resort you can always map a deeply nested directory to a new drive letter. This will effectively make the path that leads to a vault short enough even for programs that can only handle 255 chars.

Yes, definitely.

We will of course check compatibility with the most common applications but if you rely on software that canā€™t handle long paths, I canā€™t guarantee they will work correctly.

Iā€™m using Synology with Google drive, just updated to the 1.5.x and now it wonā€™t migrate because it says my filenames are too long. Copy to local computer, mount and migrate, but when I go to copy back to Synology, windows 10 file explorer reports about 10 files out of 7k that are ā€œtoo longā€ and wonā€™t copy over. Thatā€™s sad that iā€™m going to be stuck having to install 1.4.x and stay there indefinitely or until you make it work with older vaults that worked perfectly fine before.

Weā€™re working on an update that tells you exactly which files are too long instead of blocking the whole vault. Will still take us a few days though and it will probably not support automatic migration.

First of all I love Cryptomator. I suppose a second or third donation is in order.

I too have had an occurrence where I upload my encrypted content to PolarBackup and they have indicated I have a file name that is too long for upload. My challenge is I have no way of knowing which file name is creating the problem since the file name is encrypted which is what I want to occur and not be visible to others.

I suppose I could find some way to list all files names in some list structure and change the ā€œlongestā€ of the names and see if the error messages go away - trial and error method. Or has there been any progress of copying a file name into a tool where it can decrypt that file after entering my password so I know which file is the culprit?

Thanks in advance.

Not yet. There are no technical issues, it just needs to be done. We plan some more diagnostics features for 1.6.0, though.