Handling of const strings/char arrays, e.g. with KActionCollection
Thiago Macieira
thiago at kde.org
Tue Apr 22 18:50:42 BST 2008
On Tuesday 22 April 2008 18:58:35 Friedrich W. H. Kossebau wrote:
> Question 2:
> All KDE source files are in UTF-8 AFAIK. So if someone puts non-latin1
> chars in a string, e.g.
> const char identifier[] = "strânge ïdēntìfĩȩr <JAPANESE chars>";
> the C++ compiler will create a char array which matches the UTF-8
> representation in bytes, so sizeof(identifier) > numbers of chars. Right?
Let's define "number of chars" here.
sizeof(char) = 1 by definition and sizeof(identifier) = number of bytes in
that UTF-8 string. Each byte is a "char".
However, in UTF-8, the equation 1 byte = 1 character does not hold. So
strlen(identifier) == sizeof(identifier) - 1 is not the number of Unicode
codepoints. Each codepoint can be anywhere from 1 to 4 bytes in length.
(In fact, it's the multiple equality 1 byte = 1 character = 1 cell of
advancing that doesn't)
> And the content of QString( identifier ) or QLatinString( identifier ) will
> not be the original string as in the source file, but the bytes encoded in
> Latin1 (if no other code uses QTextCodec::setCodecForCStrings(), do we
> catch this?). Right?
Hmm... no. Each one will be a different thing.
// -*- encoding: utf-8 -*-
const char identifier[] = "strânge ïdēntìfĩȩr";
QString str(QLatin1String(identifier)); // == "strânge ïdÄntìfÄ©È©r"
However, QString str(identifier) is the same as
QString::fromAscii(identifier). However the "fromAscii" function is a
misnomer. In the strict sense, ASCII is a subset of Latin 1, so it should
have the same effect, or produce the string "str??nge ??d??nt??f????r".
However, QString::fromAscii is actually used with the
QTextCodec::codecForCStrings codec. By default that's Latin 1, but it could
be overridden by the application to anything at all.
> Still the identifiers from the rc file are read as UTF-8 strings and
> contained as such in QString.
> So this restricts all action identifiers to be latin1 chars. Right?
No. You can use QString::fromUtf8(identifier).
I don't see why, though. The identifiers should be strings easy to write and
to read, also simple and short.
> Than this should be noted with the API Dox of KActionCollection. I would
> prepare a patch if the above is correct.
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
PGP/GPG: 0x6EF45358; fingerprint:
E067 918B B660 DBD1 105C 966C 33F5 F005 6EF4 5358
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/kde-core-devel/attachments/20080422/9f78693e/attachment.sig>
More information about the kde-core-devel
mailing list