implicit QChar constructors

Dirk Mueller kde-optimize@mail.kde.org
Wed, 22 Jan 2003 00:15:35 +0100


On Die, 21 Jan 2003, Jesse Yurkovich wrote:

> The second piece of code was, of course, much much faster than the first=
=20
> because of all the constructor calls going on (even for small values of N=
).=20

Much faster?

The QChar() constructors are inlined, so there is no actual "work" to do.=
=20

To prove my point, I've analyzed the assembly output of=20

#include <qstring.h>
int main()
{
    QChar* txt =3D new QChar[10];
    QChar space(' ');
    for ( unsigned int z =3D 0; z < 10; ++z)
        txt[z] =3D ' ';

    return 0;
}

I use g++ (3.2.0), no other switches except -O2.=20

The "implicit constructor" variant went down to this code in the loop:=20

=2EL21:
    =B7       movw=B7   $32, -24(%ebp)
    =B7       movw=B7   $32, (%eax,%edx,2)
    =B7       incl=B7   %edx
    =B7       cmpl=B7   $9, %edx
    =B7       jbe=B7    .L21

As you can see, gcc can optimize the repeated assignment quite well, it map=
s=20
it to the optimal x86 assembly mnemonic:=20

 movw=B7   $32, (%eax,%edx,2)

However, for reasons beyond me, gcc pessimized the code by not=20
loop-eliminating this construction:

 movw=B7   $32, -24(%ebp)

This is the implicit QChar constructor. There is absolutely no need to have=
=20
it in the loop. Appears to be a bug in gcc to me.=20


So I was curious and used "=3D space" instead. Here is the output:=20

=2EL21:
    =B7       movl=B7   -24(%ebp), %eax
    =B7       movw=B7   %ax, (%ecx,%edx,2)
    =B7       incl=B7   %edx
    =B7       cmpl=B7   $9, %edx
    =B7       jbe=B7    .L21


As you can see, the actual code is theoretically worse, because it ends up=
=20
in a memory read and a sequential memory write. This can hardly be faster i=
n=20
the average case.=20

Well, gcc could have been intelligent enough to use a register for the load=
,=20
like for example %ax, but apparently its not. Again, I suspect a bug in gcc=
.=20
I'd be interested to analyze the internal parse tree of gcc to understand=
=20
why it thought it is unable to do this trivial optimization, but I have not=
=20
much knowledge in that area.=20

Then, I was curious to see if specifying __attribute__((const)) on all=20
relevant QChar and QString methods and constructors. Unfortunately, it made=
=20
no difference at all for this testcase. This looks like a bug in gcc to me,=
=20
although the documentation is quite unclear on if it can optimize C++=20
methods at all (because of the implicit this pointer)


--=20
Dirk (received 5 mails today)