KHTML-Measurement (from my talk at aKademy)

Thu Aug 26 22:43:30 CEST 2004

On Thursday August 26 2004 12:15, Josef Weidendorfer wrote:
> But I never suggested optimizing the given function e.g. by doing strange
> things like inline assembler in KDE code. I simply stated the only
> explanation I can imagine for the difference of simulation results to
> reality.

Sorry, I misunderstood. I thought you had tried to modify the function so that 
it results in less mispredicted branches.

> > By this I also suggest that Calltree/Callgrind should not be used to
> > profile CPU-limited program, only for memory-limited programs. But if you
>
> Hmm... How will you know before actually seeing measurements?

That's where programmer instinct plays a part. A typical program is rarely 
memory-limited, in my own experience. And a memory-limited program is harder 
to optimize than a CPU-limited one.

> The advantage of calltree is that it is easier to setup and get some
> results, in contrast to e.g. OProfile (where you need root access).

Sure, that's a nice advantage, among many others.

> Additionally, you get call-graph and call counts, which is currently not
> possible with OProfile. This alone will make results easier to understand.

This is the main reason why I constantly rely on it.

> And if results suggest that there are unneeded function calls, that's fine.
> Get rid of them, and it will make your program faster, regardless whether
> it is memory-bound or CPU-bound.

Sure, but if the measurement of the time spent in the functions is not 
accurate how can you know which function calls should be investigated to 
determine which are unneeded?

> As I said, I plan to add a heuristic to KCachegrind to extract the
> callgraph from a simulation run, and use this with sampling data from
> OProfile to get inclusive costs.

Oh, that would be a dream come true! Please please please do it!

> > really want to implement some sort of branch prediction algorithm
> > simulation, you could simply use the statistical results of such an
> > algorithm. On current processors, branch prediction is correct 90-95% of
> > the time. So I imagine you could adjust the simulation using these
> > figures, rather than implementing an algorithm directly.
>
> Adding statistics gives you no more results at all. The only thing
> interesting here is to see if there are functions where the misprediction
> ratio is not in the 90%, and for this you need real simulation.

Well, it would at least add some weight to functions that contain a lot of 
branching. You don't need the real ratio. In reality, most successful 
predictions are in the loops and most unsuccessful predictions are in 
branches that are more rarely tested, like if clauses, where the ratio is 
near 50%. Adding a realistic algorithm would remove some weight from the 
functions that contain a lot of branching and redistribute it to those that 
contain less branches. It wouldn't move the bottleneck around. The function 
that is slowed down because of branch misprediction is the one that contains 
a lot of branching, not the one that has a high misprediction ratio.

-- 
Simon Perreault <nomis80 at nomis80.org> -- http://nomis80.org