Character encodings (UTF16)

Andras Mantia amantia at kde.org
Wed Feb 9 16:21:56 GMT 2005


On Wednesday 09 February 2005 18:01, Waldo Bastian wrote:
> I assume that the LE designation stands for "little endian" 
Yes.

> and that 
> Qt defaults to "big endian". 
I'm not sure. If I save a file in utf16 from Kate and test with the file 
utility, it will say:
utf.txt: Little-endian UTF-16 Unicode English character data


> I believe one is supposed to insert a 
> BOM (byte order mark) so that applications can guess correctly
> between utf16LE and utf16BE. 
The question is if this is mandatory to have or not.

> The spaces that you see in utf8 mode are 
> the NUL values from the high-bytes.
Yes, I know.

> I think it would be possible for konqueror to detect LE and BE by
> looking for "<NUL" versus "NUL<" and adjust accordingly. Would be
> easier if there was a separate "utf16le" codec.

Yes. Auto-detection would be fine, but having the possibility to specify 
the endianness, would be also a solution.
As I see QTextStream has the possibility to read files without byte 
order marks:
void QTextStream::setEncoding ( Encoding e ) :
 UnicodeNetworkOrder   Uses network order Unicode(utf16) for input and 
output. Useful when reading Unicode data that does not start with the 
byte order marker. 
  UnicodeReverse   Uses reverse network order Unicode(utf16) for input 
and output. Useful when reading Unicode data that does not start with 
the byte order marker or when writing data that should be read by buggy 
Windows applications. 
  
Now we would just have to use this encodings when it's the case.

Andras

>
> Cheers,
> Waldo

-- 
Quanta Plus developer - http://quanta.sourceforge.net
K Desktop Environment - http://www.kde.org
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://mail.kde.org/pipermail/kde-core-devel/attachments/20050209/8a26478d/attachment.sig>


More information about the kde-core-devel mailing list