return
David Leimbach
kde-optimize@mail.kde.org
Mon, 3 Feb 2003 07:30:25 -0600
[LONG]
This is one of the topics of Efficient C++... one of the books on the
list of good books to optimize C++ with.
Basically you can have named or unnamed RVO [return value
optimization]. I am not sure its really a big deal with this example
since you use a built-in, simple type like bool. If these were objects
that used constructors then
RVO becomes a lot more important due to the need to copy-construct and
default construct objects. Neither of those really happen in your
example so to me it seems either way is fine.
Knowing what to worry about is the biggest part of optimization... and
if you don't know what to look for as a candidate for optimization you
should use a profiler or some other measurement tool to find out.
That said... lets pretend bool is an object in this case:
lets pretend we have this example [From Efficient C++]
Complex operator + (const Complex & a, const Complex & b)
{
Complex retval;
retval.imag = a.imag + b.imag;
retval.real = a.real + b.real;
return retval;
}
now if you have 3 complex objects [c1, c2, c3]:
c1 = c2 + c3; //execute the above operator
Compilers might create a temporary named __result object and pass it in
to operator + as a
third argument... by reference basically rewriting your function to:
void Complex_add (const Complex & __result, const Complex & c1, const
Complex & c2)...
now "c1 = c2 + c3;" becomes something like
" Complex __tempResult;
Complex_Add(__tempResult, c1, c2);
c3 = __tempResult;"
The return value optimization done above eliminates the internal value
"retval" and allocates
space for it outside the call to the function. Note there is still a
copy outside of the Complex add
operator.
The book actually gets a little weird here in the Return Value
Optimization chapter where it claims
the compiler generated function is written like this:
void Complex_Add (const Complex & __tempResult, const Complex & c1,
const Complex & c2)
{
__tempResult.Complex::Complex(); // Construct __tempResult
....
}
I disagree that the function needs to construct the __tempResult
variable as its declared outside the class
and passed into the function by reference... why construct it twice
with the default constructor?
Anyway... bottom line is they did some measurements with two different
versions of the Complex add operator:
Complex operator + (const Complex & a, const Complex & b)
{
Complex retval;
retVal.real = a.real + b.real;
retVal.imag = a.imag + b.imag;
return retVal;
}
"One compiler refused to apply RVO to the above version"
however...
Complex operator + (const Complex & a, const Complex & b)
{
double r = a .real + b.real;
double i = a.imag + b.imag;
return Complex(r, i);
}
The same compiler optimized the above function.
On their test platform the wall-clock time difference of running these
operations 1million times
was about .59 seconds faster for the return value optimized one.
They decided that the optimization may not have worked due to naming
the temporary that was getting
returned by the function so they further experimented with two more
versions:
//version 3
Complex operator + (const Complex & a, const Complex & b)
{
Complex retVal(a.real+b.real, a.imag + b.imag);
return retVal;
}
and
//version 4
Complex operator + (const Complex & a, const Complex & b)
{
return Complex(a.real+b.real, a.imag + b.imag);
}
and they found that version 4 got optimization while 3 did not....
Sometimes a named return value won't get optimized by some compilers...
I don't know if that
is the case with gcc or not.
The authors also claim that you must define a copy constructor to "turn
on" RVO.
Its a pretty darned good book ... you should check it out ... but the
key is to always measure to try to
find out what is going on. :)
"Efficient C++", Don Bulka and David Mayhew.
Dave
On Monday, February 3, 2003, at 05:59 AM, Jordi wrote:
>
> There's a lot of methods, at least in my code that end ups returning
> true or
> false to tell if they were able to make a small operation in that
> cases what
> will be faster:
>
> A:
> bool advance()
> {
> bool result=true;
> if (current_frame== max_frame)
> if (cyclic)
> current_frame=0;
> else
> result=false;
> else
> ++current_frame;
> return result;
> }
>
> or
>
> B:
> bool advance()
> {
> if (current_frame== max_frame)
> if (cyclic)
> current_frame=0;
> else
> return false;
> else
> ++current_frame;
> return result;
> }
>
> In case A a bool has to be allocated in that method but gcc is
> supposed to
> optimize that case.
> In case B the bool is get ridden and gcc is supposed to not be able to
> optimize that well....
>
> What's faster A or B?
>
> --
> Jordi
> http://development.bluesock.net
> _______________________________________________
> Kde-optimize mailing list
> Kde-optimize@mail.kde.org
> http://mail.kde.org/mailman/listinfo/kde-optimize
>