Fwd: "International Domain Names" support in KDE

Marc Mutz Marc.Mutz at uni-bielefeld.de
Fri Jan 24 13:05:06 GMT 2003

On Friday 24 January 2003 23:20, Martin Konold wrote:
> The enconding used is going to be the well known escape character
> approach.

Not at all. It basically first removes all [^A-Z0-9] characters from the 
KC-normalized and casemapped Unicode string and then outputs them with 
a prefix prepended (currently referred to as "zz-", that's what you 
meant with "escape character") and a hyphen appended.

www.müller.com -> zz-www.mller.com-

To get back the original string, increments are appended. The increments 
encode both the char to insert and it's position, AFAIR:

curstring = "";
curposition = 0;
char_to_insert = toUnicode( 127 );
foreach( increment )
  curpositon = ( curposition + increment ) % curstring.length();
  char_to_insert = toUnicode( (int)last_char_inserted
                               + increment / curstring.length() );
  curstring.insert( char_to_insert, curposition );

The increments are encoded as a "generalized variable length integer" 
that employs a variable base and uses the [0-9A-Z] alphabet and the 
result is appended to the encoded string.

It is _really_ ugly. Thank god they publish reference source in the RFC:



Nie wird so viel gelogen wie vor der Wahl, während des Kriegs und nach
der Jagd                                          -- Otto von Bismarck
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: signature
URL: <http://mail.kde.org/pipermail/kde-core-devel/attachments/20030124/6890db41/attachment.sig>

More information about the kde-core-devel mailing list