<div dir="ltr"><div class="gmail_quote">On Thu, Aug 2, 2012 at 6:52 PM, Evert Pot <span dir="ltr"><<a href="mailto:evert@rooftopsolutions.nl" target="_blank">evert@rooftopsolutions.nl</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im">> My two cents as a user: Why is this still a problem? I am using Linux & Windows and all my filenames are fine with some accented & special characters which are not present in English alphabet. Where's the real issue with UTF8, why do we need to convert it to anything else? Isn't UTF8 the same for all OS and filesystems and databases etc?<br>
<br>
</div>I can chime in here!<br>
<br>
Just the 3 main differences between the operating systems:<br>
<br>
1. Linux does not encode filenames. Any byte sequence is allowed for filesnames except 0x00 and the slash (/). This implies that you can create filenames with backspaces, bells or other crazy stuff that's not valid in most encodings.<br>
</blockquote><div><br></div><div>OK, we just limit the characters that can be used in the filenames and client or web interface shall give an error if user tries to upload something with a strange filename.</div><div><br>
</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">2. Windows internally uses a type of UTF-16. (not exactly, forgot the precise name). This does indeed support most characters and I'm not aware of any direct issues with this.<br>
However! If you run owncloud on a windows machine, you cannot make use of this. On a english windows server all the PHP filesystem api's talk CP1252 (which is kind of a superset of latin1). This means that if owcloud on windows is the server, you cannot store most characters.<br>
</blockquote><div><br></div><div>This issue with PHP on Windows is not nice and I have nothing to comment on this since Owncloud heavily depends on PHP. Maybe we can consider dropping server support for windows (or use some other API than the PHP one).</div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
3. OS/X uses UTF-8, BUT! They normalize to unicode normalization form D. (kind of, mostly.. not exactly the standard normalization form). In a nutshell this means that a character like ü (u-umloat) is stored as 2 unicode codepoints (the ¨ and the u separately). Windows is more likely to combine them into a single codepoint.<br>
</blockquote><div><br></div><div>Since OS/X is generally not used as a server, can't the client or web interface handle this when detected?</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Because Windows doesn't normalize, it means that two files with different (but very similar) names will be normalized to a single filename on HTFS+ filesystems. Lastly.. the normalization form OS/X uses, actually behave buggy on windows when I checked it (granted, this was Windows XP).<br>
</blockquote><div><br></div><div>I've never faced such issue in Windows. (I assume you mean NTFS). Maybe this is too remote possibility to consider?</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
If you want the details, I wrote a blog post about this a few years ago:<br>
<a href="http://www.rooftopsolutions.nl/blog/filesystem-encoding-and-php" target="_blank">http://www.rooftopsolutions.nl/blog/filesystem-encoding-and-php</a></blockquote><div><br></div><div>Thanks, I'll definitely read it. :)</div>
<div><br></div></div>-- <br>Emre<br>
</div>