[Owncloud] special characters in filenames

Emre Erenoglu erenoglu at gmail.com
Thu Aug 2 15:00:49 UTC 2012


On Thu, Aug 2, 2012 at 6:52 PM, Evert Pot <evert at rooftopsolutions.nl> wrote:

> > My two cents as a user: Why is this still a problem? I am using Linux &
> Windows and all my filenames are fine with some accented & special
> characters which are not present in English alphabet.  Where's the real
> issue with UTF8, why do we need to convert it to anything else? Isn't UTF8
> the same for all OS and filesystems and databases etc?
>
> I can chime in here!
>
> Just the 3 main differences between the operating systems:
>
> 1. Linux does not encode filenames. Any byte sequence is allowed for
> filesnames except 0x00 and the slash (/). This implies that you can create
> filenames with backspaces, bells or other crazy stuff that's not valid in
> most encodings.
>

OK, we just limit the characters that can be used in the filenames and
client or web interface shall give an error if user tries to upload
something with a strange filename.

2. Windows internally uses a type of UTF-16. (not exactly, forgot the
> precise name). This does indeed support most characters and I'm not aware
> of any direct issues with this.
> However! If you run owncloud on a windows machine, you cannot make use of
> this. On a english windows server all the PHP filesystem api's talk CP1252
> (which is kind of a superset of latin1). This means that if owcloud on
> windows is the server, you cannot store most characters.
>

This issue with PHP on Windows is not nice and I have nothing to comment on
this since Owncloud heavily depends on PHP.  Maybe we can consider dropping
server support for windows (or use some other API than the PHP one).

>
> 3. OS/X uses UTF-8, BUT! They normalize to unicode normalization form D.
> (kind of, mostly.. not exactly the standard normalization form). In a
> nutshell this means that a character like ü (u-umloat) is stored as 2
> unicode codepoints (the ¨ and the u separately). Windows is more likely to
> combine them into a single codepoint.
>

Since OS/X is generally not used as a server, can't the client or web
interface handle this when detected?


> Because Windows doesn't normalize, it means that two files with different
> (but very similar) names will be normalized to a single filename on HTFS+
> filesystems. Lastly.. the normalization form OS/X uses, actually behave
> buggy on windows when I checked it (granted, this was Windows XP).
>

I've never faced such issue in Windows. (I assume you mean NTFS). Maybe
this is too remote possibility to consider?


> If you want the details, I wrote a blog post about this a few years ago:
> http://www.rooftopsolutions.nl/blog/filesystem-encoding-and-php


Thanks, I'll definitely read it. :)

-- 
Emre
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.kde.org/pipermail/owncloud/attachments/20120802/a0f04282/attachment.html>


More information about the Owncloud mailing list