[Kde-perl] Perlqt unicode(utf-8) problems

Germain Garand kde-perl@mail.kde.org
Sat, 21 Dec 2002 21:59:19 +0000


Le Samedi 21 D=E9cembre 2002 14:24, Michael Traxler a =E9crit :
> Hi!
>
> Is there a possibility to switch off the new unicode-way of handling
> strings in PerlQt?

Hello Michael,
No there isn't, and the reason for this is exactly as you describe : Unic=
ode=20
is intended to be transparent (at least in Perl).
Introducing a switch is possible but it would be against Perl's design,=20
probably bad design in general, and would have a performance cost (this w=
ould=20
need to be calculated)...=20
However, reading what you say, I think there is a misunderstanding : Qt s=
tring=20
aren't just marshalled back as Perl normal string holding UTF8 : they are=
=20
marshalled back as *true* Perl UTF8 strings.
This means that every manipulation that Perl does 'under the cover' for=20
handling transparently utf8 will also apply to Qt strings.

For instance, let's take your two examples :
1) > perl -e '$a=3D"a\x{e4}\n"; print $a' =20
2) > perl -e 'binmode(STDOUT, ":utf8"); $a=3D"a\x{e4}\n"; print $a'=20

You'll get the very same output by saying :

1) perl -e '
use Qt;
$a =3D Qt::Application(\@ARGV);
$b=3DQt::Label("a\x{e4}\n",undef);=20
print $b->text . "\n"'

2)=20
perl -e '
binmode(STDOUT, ":utf8");
use Qt; $a =3D Qt::Application(\@ARGV);
$b=3DQt::Label("a\x{e4}\n",undef);
print $b->text . "\n"'

It will also work if you construct the Label above as Qt::Label("a=E4\n",=
undef)
Even (as of 5.8.0)  : $b->text =3D~/a=E4/ is true.

May be you felt confused by what I explained about using TextCodec to pri=
nt=20
utf8 strings as localised strings : this is only needed for applications =
that=20
are intended to work with both 5.6.1 and 5.8.0
In 5.6.1, transparent conversion of utf8 strings was not OK=20
(at least, that's what I have here, but maybe there are some compile time=
=20
settings, dunno) :
> perl -e '$a=3D"a\x{e4}\n"; print $a' =20
5.6.1: a=C3=A4
5.8.0: a=E4

> If the whole world was using unicde, then this would not be a problem, =
but
> my application is connected to mysql, which doesn't know unicode.

This is unfortunate however that's not a whole world problem, rather a Pe=
rl=20
vs. external apps.

If a Perl module doesn't handle utf8 correctly, then it is broken and sho=
uld=20
be fixed ASAP.
If it handles utf8 correctly (as is the case here), but the external app =
it=20
drives doesn't, you just need to do some conversion for it, not for your=20
whole application.

> So, this would mean to me, that I have to go through all my application=
 and
> convert all sql-commands sent to DBI with
> $sql =3D pack("C*", unpack("U*", $sqp));

Well, TIMTOWTDY, you can also define :

sub loc() { Qt::TextCodec::codecForLocale()->fromUnicode( shift ) }

and then : =20
$dbh->prepare( loc $sqp);

> Ok, this would work, but is unconveniant and error-prone because I also
> have to test all other "world"-interactions, if they are handled correc=
tly.

Really, there shouldn't be any other problem than non-utf8 aware applicat=
ions,=20
and that's not Perl's fault.
I understand that this change is upsetting, but that's what PerlQt should=
 have=20
done in the first place, given that Qt is consistently utf8 aware and so =
is=20
Perl.
I'm not fundamentally against introducing a switch (Ashley, what do you t=
hink=20
?) but is there more arguments in favor of that than just MySQL ?

Germain