[Owncloud] special characters in filenames

Evert Pot evert at rooftopsolutions.nl
Thu Aug 2 14:52:38 UTC 2012


> My two cents as a user: Why is this still a problem? I am using Linux & Windows and all my filenames are fine with some accented & special characters which are not present in English alphabet.  Where's the real issue with UTF8, why do we need to convert it to anything else? Isn't UTF8 the same for all OS and filesystems and databases etc?

I can chime in here!

Just the 3 main differences between the operating systems:


1. Linux does not encode filenames. Any byte sequence is allowed for filesnames except 0x00 and the slash (/). This implies that you can create filenames with backspaces, bells or other crazy stuff that's not valid in most encodings.

2. Windows internally uses a type of UTF-16. (not exactly, forgot the precise name). This does indeed support most characters and I'm not aware of any direct issues with this.
However! If you run owncloud on a windows machine, you cannot make use of this. On a english windows server all the PHP filesystem api's talk CP1252 (which is kind of a superset of latin1). This means that if owcloud on windows is the server, you cannot store most characters.

3. OS/X uses UTF-8, BUT! They normalize to unicode normalization form D. (kind of, mostly.. not exactly the standard normalization form). In a nutshell this means that a character like ü (u-umloat) is stored as 2 unicode codepoints (the ¨ and the u separately). Windows is more likely to combine them into a single codepoint.

Because Windows doesn't normalize, it means that two files with different (but very similar) names will be normalized to a single filename on HTFS+ filesystems. Lastly.. the normalization form OS/X uses, actually behave buggy on windows when I checked it (granted, this was Windows XP).


If you want the details, I wrote a blog post about this a few years ago:
http://www.rooftopsolutions.nl/blog/filesystem-encoding-and-php

Evert




More information about the Owncloud mailing list