DISCUSSION: down the rabbit hole of unicode characters in passwords

mods - The only real questions I have are: 1 - what category should this go in? and 2 - is this appropriate use of the forum? Feel free to move this to another more appropriate area of the forum. Or for that matter feel free to delete it if you consider it an inappropriate waste of time for the people that monitor the forum (you won’t hurt my feelings)

Background: Being new to cryptomator, I was surprised to see that unicode characters are allowed in passwords (that’s not very common). That intrigued me and I researched the subject some on my own, with a view towards understanding the practical cost/benefit and strategies for using unicode characters in cryptomator passwords. After getting stuck in that rabbit hole for longer than I should’ve, I’d like to share what (I think) I’ve learned since it might be of interest to a small subset of the users of cryptomator who (like me) dont’ use password managers to access cryptomator (see further discussion of why no password manager for me at (*) )

At first glance it seems like there is tremendous entropy to be gained by using unicode characters chosen from among ~140,000 unicode characters. But the entropy of such a unicode character (if randomly selected among those 140k characters) is ln140k/ln2 = 17 bits. Whereas the entropy from a random selection among let’s the “normal” character space of 95 characters (upper case, lower case, typical “special characters”) is ln95/ln2=6.6 bits. So the entropy increase by adding a single random unicode character at the end of your password (17 bits) is less than you’d get by adding 3 “normal” characters (3x6.6~19.7). It seems underwhelming.

But that’s not fair you might say, the attacker doesn’t know which position the unicode character will go into. Ok let’s address that. Let’s say I have a 15 “normal” character password and I substitute a random unicode for one of those characters. How much entropy is gained? First the substitution: I added 19.7 bits for the new character and took away 6.6 bits by removing the old character, for a next gain of 19.7-6.6=13.1 bits. But that substitution can be in any of 15 positions, so I have to add an entropy of ln15/ln2=3.9 bits, which brings the increase to 13.1+3.9 = 17 bits. Once again we could have achieved the same entropy increase by simply adding 3 normal characters to the end.

If you substitute 2 unicode characters among the 15 in predictable positions, then the entropy gain is 2x (19.7-6.6)=2x13.1=26 bits. If we put them in two random (unpredictable) positions the number of positions is “15 choose 2” = 15!/(2!*13!) = 105. So the entropy from positioning those two characters is ln105/ln2=6.7. The total entropy from random position substitution of 2 unicode characters is 26.2+6.7 = 32.9 bits. We could have achieved the same by adding 5 normal characters at the end. Still a bit underwhelming.

All of that at first glance seems to argue against going to the trouble of using unicode characters. But let’s look at some things outside of entropy that may be a consideration:

If you have trouble typing your long password and show it on your monitor, you might sometimes briefly display it on your screen. That represents a window of vulnerability when the hacker may catch a screenshot. But it’s going to be a heckuva lot more difficult for an attacker to read a screenshot of an oddball unicode character. Take the unicode character hex 9977: 饷. If you have access to cut/paste, then it’s fairly easy. But if you have only a screen graphic,it’s more of a challenge. If you’re bored go to https://shapecatcher.com/ and see if you can draw that shape to retrieve the correct hex code point 9977 (I couldn’t do it in two tries, although admittedly drawing with a mouse is tricky and maybe there are easier ways to do it)

Also among people who debate the use of unicodes in passwords in a more general context (passing accross the internet), there seems to be a lot of consternation about possible misinterpretting the character due to different encoding schemes (UTF-8, UTF-16, UTF32 IIRC). Different browsers and applications encode the same unicode character differently, so it ends up with possibility for misinterpretation if not properly normalized at the endpoint after having been transmitted among these different channels. That seems like a potential challenge to the hacker as well. (By the way cryptomator seems to have handled it fine,I can access my vault with oddball unicode characters in the password from either android or pc. )

In the end it will come down to effort associated with the selected password for the hacker vs effort by the user…

About the hacker, we could speculate that they will try to optimize their strategy based on what they think you will most likely do, and if they don’t think generic unicodes are likely, then they may not invest their computing resources in that direction. But that’s speculation, Let’s talk about the user side which is more practical/productive…

On the user side there are issues of user-friendliness of remembering and entering the password. Those may be somewhat subjective and the remainder of this post will explore the various ways to enter unicode and how easy they are to perform on windows platform and on Android platform.

===== ACCESSING UNICODES ON WINDOWS PC ====*

On pc, most well known are the alt-codes which are entered by holding down alt, typing the number on numeric keypad and release. I like this link https://altcodeunicode.com/. On the left are 3-digit alt-codes, and there’s 256 of them. So if you select from these that’s 8 bits of entropy per character, not a heckuva lot more than the 6.6 bits per character. On the right are 4 digit alt codes, starting with a 0. Ordinarily adding a 0 in front of a number doesn’t change the meaning, but in this case it does (alt 131 does not give the same character as alt 0131). There is a lot of overlap among the two lists. I estimate among the two lists discarding the non-printable characters in red, there may be 350 unique characters so maybe 8.5 bits of entropy per character if you select randomly from the two lists.

We might be interested in a broader selection of unicodes that just those few. If you google how to enter unicode on pc you will probably land on windows character map (it is an application built into windows 10). It may serve a good purpose, but it doesn’t have all the unicodes in there, and if you’re searching by unicode number you have to know the proper “font” or whatever first. So I dont’ like that.

Myself, I’d prefer to deal directly with the codepoint number in hex. In MS Word, MS notepad, LibreOffice writer, or LibreOffice calc, the following approach works: enter the hex code and THEN type alt-X. It’s a strange sequence, but it works. Easy Peezy!

If you’re like me, at some point you’ll probably browse the vast collections of different thypes of unicodes on the interwebs. You may find some that you want to use because they are memorable for one reason or another (maybe a given collection is memorable to you). But there’s a lot of ways to get confused in that process. One way to get confused in the process is that many unicode characters that look the same are actually different (or the same character can look different on different platforms). Another way to get confused is that the codepoint for the character is experessed in different ways… there is hex and there is decimal and I even saw a few other ones.

So imo it’s essential to have a 2-way conversion tool between unicode characters and their codes to be able to nail down your characters and not make mistakes. It’s easy to build in a spreadsheet, here is mine on google sheets.

You can paste in a character (from either Android or PC) into a green cell in column G to learn it’s unicode. Or if you have your own unicode you can create the character (to cut/paste elsewhere in either android or pc) by entering it’s codepoint into a green cell in column A. I also list the decimal codepoints, but personally I ignore those and stick with hex codepoint to avoid confusing myself.

So that spreadsheet gives us another way to create characters (we can cut/paste from that spreadsheet to anywhere else).

I also included in that spreadsheet a few examples of codes numbers that might be memorable to you for various reasons (these are not codes I use myself, I keep my memorable codes elsewhere). As you can see Hex codes can be associated with any of the following: ACE-2 Receptor; 4th of July; The current year; The current year by the jewish calendar; even numbers; odd numbers; Christmas; small capital H, small capital I. I’m sure anyone can come up with more that would be memorable to them.

===== ACCESSING UNICODES ON ANDROID ====*

You’ve probably found a symbols screen on your android keyboard. Most android keyboard can generate additional characters by long-pressing on a letter to see choices. In gboard you will see choices by long-pressing on any vowell, and also by long pressing on any of the three constants c, n, and s (those are the only consonants that work, and by coincidence they’re easy to remember because they are the first three consonants in the word… consonant!). I’m pretty sure all of these choices map to unicode characters that are also available by pc alt-code, so if you’re looking for the easy way then maybe you can restrict yourself to these (maybe just switch out all your vowels in the first word of your passphrase with an accented vowel). As above this limited choice is not going to gain a lot of entropy but it’s an option.

Then there’s emoji’s. They have found their way to every android keyboard out there. It seems there are around 3600 of them. That’s 11 bits of entropy (that’s just shy of twice the bits of the 6.6 bits for the normal characters and over half the 19.7 bits of a generic unicode… assuming in all cases random selection which is probably debatable with the emoji keyboard). They are somewhat easy to insert, to the extent your familiar with them and where to find them. They may have a personality/meaning that’s easy to remember and a logical tie to words within a passphrase. If you are using the vault mostly with android it might make good sense to use these. If you want to use the same emoji’s on pc then find the unicode code point (paste them into my spreadsheet is one way) and record / remember that in whatever way you record / remember the rest of your password. In fact I might suggest it’s a good idea to keep track of those code numbers even if you aren’t going to use pc, just to make sure you don’t get fooled by a change in your keyboard configuration or look-alike emojis.

If you want to create unicode characters in the wider 140k code space, you could use the same spreadsheet that I linked above (you should be able to access it for cut/paste just as easily on android as pc).

Another option I’d recommend is the free and ad-free app Unicode Keyboard by Tim Wunderlich It is NOT open source but it doesn’t request any internet permissions. (I can tell that looking in my netguard app… since the permission manager built into android settings doesn’t seem to think internet permissions are a permission worthy of displaying, and I’m not sure I trust theplaystore to tell me the truth since google seems to be trying to conceal internet permissions). To use the app you type in the hex code and it displays the character on the keyboard itself, then tap on that character to insert it into your text. On my Samsung phone there is an option to “show keyboard button” so that a keyboard switcher appears in the lower right area of the screen (in the navbar) whenever a keyboard is open. If you have that enabled, it’s super-easy to switch keyboards on the fly when typing.

=============== WRAP UP =============

I’m sure it’s a personal decision whether it makes sense to include those unicode characters. The easier it is to enter them, the more sense it makes.

( * ) I know some of you are dieing to tell me to use a password manager rather than worrying about this silly stuff. I have a few responses already lined up:

  • 1 - I might be using cryptomator to store my passwords. It would be a home-made password manager, which is not as obvious of a target to attackers. Also it may end up with more security for entry than a typical password manager (you have to get into both my 2FA-protected cloud account and my cryptomator password). And it may offer a more convenient interface (some people prefer spreadsheets to the interface offered by password managers like bitwarden).
  • 2 - I might want to separate my password manager on pc from my TOTP app on phone, and if I want to keep these on separate devices (outside of emergency use) then I can’t routinely use password manager to get into cryptomator on my phone.
  • 3 - there were a few other security benefits metnioned above in terms of hacker seeing a screenshot or having trouble decoding a unicode accross platforms as discussed bove.
  • 4 - Maybe I just enjoy the process of trying to come up with ways to create multiple very long randomish passwords that I can remember in my head, even if others consider it a fool’s errand.

#4 definitely applies to me. I’m in the habit of trying to create multiple long randomish passwords that I can remember. As part of that strategy I try to incorporate a variety of transformations and “tricks” into a single password. So the unicode presents a whole new set of tools to work with in that effort, memorable in new and different ways. I think if someone else is working primarily on android and familiar with emoji’s they also could perhaps easily get a lot of “bang for the buck” (password entropy per password entry effort) with an emoji-laden password that they find easy to type (assuming they also don’t want to use a password manager). What I still wonder about is is custom keyboard maps, where we could maybe map our own favorite obscure set of unicodes to be easily accessible from our keyboards on pc or android.

@overheadhunter once posted this image and from my point it describes exactly why having complex passwords does not gain security as much as adding length to a password will do :slight_smile:

And he also did some math on the effort of bruteforcing a cryptomator vault.

So from my point of view it would be the much easier and safer way to create a long passwords based for example on a sentence than trying to use Unicode character in the password to make it more complex.
In addition to the user experience issue you are describing, there’s also a technical one. Despite that cryptomator does allow Unicode chars, what if you want to access your vault with third party tools, like cyberduck (or any other tool that can handle cryptomator vaults). Not sure if they do also allow Unicode chars in their tool.
(The last argument is just a minor one)

1 Like

describes exactly why having complex passwords does not gain security as much as adding length to a password will do

I am in full agreement with respect to entropy. I did provide some entropy calculations comparing adding unicode characters vs increasing the password length with “normal” characters, and I characterized the results for adding 1 or 2 unicode characters as “underwhelming”, which I think is similar to what you’re saying. I went on to discuss some factors outside of entropy.

So from my point of view it would be the much easier and safer way to create a long passwords based for example on a sentence than trying to use Unicode character in the password to make it more complex.

I can appreciate that, and I’m definitely not suggesting what anyone else should do.

From my thinking it can be helpful to introduce complexity into the mix to to reduce the number of characters that have to be typed. The number of characters is a lot more of an issue on mobile platforms than pc because it’s hard to type long things on mobile keyboad (my phone disables swiping mode for passwords). For me it’s an interesting mental challenge to come up with memorable and unique ways to increase entropy without raising the number of characters too high. The practicality is certainly subject to debate and is in part tied to our ability to remember these things and our ability to enter them quickly. The emoji keyboard does indeed provide for rapid entry if we can remember where to find them and where to plug them into our password. I don’t rule out there may be even easier ways to enter unicodes that I haven’t fully explored, especially custom mapping of keyboards.

And he also did some math on the effort of bruteforcing a cryptomator vault.

EDIT - I missed that the first time around. There was a very interesting thing in your link about the “key derivation function”. The crackability of the password is not just dependent upon the entropy of the password itself, but also upon the way the password is processed (that key derivation function). Cryptomator apparently includes a key derivation function that makes a given password a lot less crackable than it would be when used in other applications. I’ll have to read up on that, could make most of my post irrelevant.

I guess this might interest you as well, if you haven’t found it already.
https://docs.cryptomator.org/en/latest/security/architecture/#

1 Like

Thanks. I see on that page it says: “Both keys are encrypted using RFC 3394 key wrapping with a KEK derived from the user’s password using scrypt.”

More about scrypt (prounounced ess crypt) here. There’s a lot I don’t understand but my summary is that this algorithm slows things down, which I gather is also the objective of a lot of other key derivation functions. But I think they’re saying what’s unique about this one is that it also increases the memory requirement, in order to guard against the attacker leveraging parallel computing power.

As relates to my original post, I’m thinking now I need to guard against making my password more cumbersome than reasonably necessary. That could be counterproductive from a security standpoint, if it encourages me to leave my vault unlocked for longer periods than I otherwise would.