return

David Leimbach kde-optimize@mail.kde.org
Mon, 3 Feb 2003 07:30:25 -0600


[LONG]
This is one of the topics of Efficient C++... one of the books on the 
list of good books to optimize C++ with.

Basically you can have named or unnamed RVO [return value 
optimization].  I am not sure its really a big deal with this example 
since you use a built-in, simple type like bool.  If these were objects 
that used constructors then
RVO becomes a lot more important due to the need to copy-construct and 
default construct  objects.  Neither of those  really happen in your 
example so to me it seems either way is fine.

Knowing what to worry about is the biggest part of optimization... and 
if you don't know what to look for as a candidate for optimization you 
should use a profiler or some other measurement tool to find out.

That said... lets pretend bool is an object in this case:

lets pretend we have this example [From Efficient C++]

Complex operator + (const Complex & a, const Complex & b)
{
   Complex retval;
    retval.imag = a.imag + b.imag;
    retval.real = a.real + b.real;
   return retval;
}

now if you have 3 complex objects [c1, c2, c3]:

c1 = c2 + c3;  //execute the above operator

Compilers might create a temporary named __result object and pass it in 
to operator + as a
third argument... by reference basically rewriting your function to:

void Complex_add (const Complex & __result, const Complex & c1, const 
Complex & c2)...

now "c1 = c2 + c3;"  becomes something like
" Complex __tempResult;
   Complex_Add(__tempResult, c1, c2);
   c3 = __tempResult;"

The return value optimization done above eliminates the internal value 
"retval" and allocates
space for it outside the call to the function.  Note there is still a 
copy outside of the Complex add
operator.

The book actually gets a little weird here in the Return Value 
Optimization chapter where it claims
the compiler generated function is written like this:

void Complex_Add (const Complex & __tempResult, const Complex & c1, 
const Complex & c2)
{
   __tempResult.Complex::Complex();  //  Construct __tempResult
....
}

I disagree that the function needs to construct the __tempResult 
variable as its declared  outside the class
and passed into the function by reference... why construct it twice 
with the default constructor?


Anyway... bottom line is they did some measurements with two different 
versions of the Complex add operator:

Complex operator + (const Complex & a, const Complex & b)
{
   Complex retval;
   retVal.real = a.real + b.real;
   retVal.imag = a.imag + b.imag;
   return retVal;
}

"One compiler refused to apply RVO to the above version"

however...

Complex operator + (const Complex & a, const Complex & b)
{
   double r = a .real + b.real;
   double i = a.imag + b.imag;
   return Complex(r, i);
}
The same compiler optimized the above function.

On their test platform the wall-clock time difference of running these 
operations 1million times
was about .59 seconds faster for the return value optimized one.

They decided that the optimization may not have worked due to naming 
the temporary that was getting
returned by the function so they further experimented with two more 
versions:

//version 3
Complex operator +  (const Complex & a, const Complex & b)
{
    Complex retVal(a.real+b.real, a.imag + b.imag);
    return retVal;
}

and

//version 4
Complex operator + (const Complex & a, const Complex & b)
{
   return Complex(a.real+b.real, a.imag + b.imag);
}

and they found that version 4 got optimization while 3 did not....

Sometimes a named return value won't get optimized by some compilers... 
I don't know if that
is the case with gcc or not.

The authors also claim that you must define a copy constructor to "turn 
on" RVO.

Its a pretty darned good book ... you should check it out ... but the 
key is to always measure to try to
find out what is going on. :)

"Efficient C++", Don Bulka and David Mayhew.

Dave
On Monday, February 3, 2003, at 05:59 AM, Jordi wrote:

>
> There's a lot of methods, at least in my code that end ups returning 
> true or
> false to tell if they were able to make a small operation in that 
> cases what
> will be faster:
>
> A:
> bool advance()
> {
>   bool result=true;
>   if (current_frame== max_frame)
>     if (cyclic)
>       current_frame=0;
>     else
>       result=false;
>   else
>     ++current_frame;
>   return result;
> }
>
> or
>
> B:
> bool advance()
> {
>   if (current_frame== max_frame)
>     if (cyclic)
>       current_frame=0;
>     else
>       return false;
>   else
>     ++current_frame;
>   return result;
> }
>
> In case A a bool has to be allocated in that method but gcc is 
> supposed to
> optimize that case.
> In case B the bool is get ridden and gcc is supposed to not be able to
> optimize that well....
>
> What's faster A or B?
>
> --
> Jordi
> http://development.bluesock.net
> _______________________________________________
> Kde-optimize mailing list
> Kde-optimize@mail.kde.org
> http://mail.kde.org/mailman/listinfo/kde-optimize
>