KHTML-Measurement (from my talk at aKademy)

Fri Aug 27 00:04:58 CEST 2004

On Thursday 26 August 2004 22:43, Simon Perreault wrote:

> > Hmm... How will you know before actually seeing measurements?
>
> That's where programmer instinct plays a part. A typical program is rarely
> memory-limited, in my own experience. And a memory-limited program is
> harder to optimize than a CPU-limited one.

Yes and yes. Although I still have hope that there are ways for profiling and 
visualization tools to point more precisely at the problems of a memory-bound 
program, e.g. with relating events to data, reuse distances and so on. P4 and 
Itanium can do sampling on memory access events and give you data addresses 
(for P4 it's called PEBS). Unfortunately, OProfile does not support this at 
the moment.

> > And if results suggest that there are unneeded function calls, that's
> > fine. Get rid of them, and it will make your program faster, regardless
> > whether it is memory-bound or CPU-bound.
>
> Sure, but if the measurement of the time spent in the functions is not
> accurate how can you know which function calls should be investigated to
> determine which are unneeded?

You have a point ;-)
Still I would say that simulation results often are not that bad and wrong.
It's still worth to look at the sorted function list. Of course, the order 
from simulation is different to the order from any real measurements. Perhaps 
it is best to start looking at functions where some behaviour wasn't expected 
at all (e.g. much to high call count).

> > As I said, I plan to add a heuristic to KCachegrind to extract the
> > callgraph from a simulation run, and use this with sampling data from
> > OProfile to get inclusive costs.
>
> Oh, that would be a dream come true! Please please please do it!

Wow. You think this could be a killer feature for KCachegrind?

Currently I am improving support for more general formulas for 
inherited event types (e.g. "FLOP / Time" to be able to show flop rates).
I hope to come to a point where KCachegrind can be used as test coverage 
visualization, too.

> > Adding statistics gives you no more results at all. The only thing
> > interesting here is to see if there are functions where the misprediction
> > ratio is not in the 90%, and for this you need real simulation.
>
> Well, it would at least add some weight to functions that contain a lot of
> branching. You don't need the real ratio. In reality, most successful
> predictions are in the loops and most unsuccessful predictions are in
> branches that are more rarely tested, like if clauses, where the ratio is
> near 50%. Adding a realistic algorithm would remove some weight from the
> functions that contain a lot of branching and redistribute it to those that
> contain less branches. It wouldn't move the bottleneck around. The function
> that is slowed down because of branch misprediction is the one that
> contains a lot of branching, not the one that has a high misprediction
> ratio.

But if there is no misprediction in a function with lots of branches, there is 
no penalty because of these branches. And I do not think that adding a simple 
branch prediction will have much overhead, as I already have data structures 
for each branch, and storing an additional "most probable target" to every 
branch site, with a small saturating counter, should be easy.

Anyway, actually I am not sure if it's worth to improve simulation at all...

Josef