KHTML-Measurement (from my talk at aKademy)

Simon Perreault nomis80 at nomis80.org
Thu Aug 26 15:32:58 CEST 2004


On Thursday August 26 2004 6:08, Josef Weidendorfer wrote:
> Another source for stall time (wasted time) on modern processors is branch
> misprediction. Work is done in a pipelined manner. And if there is a
> mispredicted jump target, the pipeline has to be flushed. If I remember
> right, an Athlon or P-III has a pipeline length of around 15, a P4 has 21,
> and a P4 Prescott has 30. So every mispredicted branch costs around 15
> cycles wasted time on my notebook.

I have been doing a lot of optimization of numerical software lately, nothing 
related to KDE, but heavy optimization nevertheless. Even for numerical 
computations, optimizing for branch prediction is at a way too low level. The 
effort will be huge for almost no gain. And, as you say, different CPUs have 
different algorithms, so this almost always negates any gain you might have. 
You'd better leave that optimization to compilers that feature profile-guided 
optimization, like Intel's. Never optimize at a level below the compiler's.

By this I also suggest that Calltree/Callgrind should not be used to profile 
CPU-limited program, only for memory-limited programs. But if you really want 
to implement some sort of branch prediction algorithm simulation, you could 
simply use the statistical results of such an algorithm. On current 
processors, branch prediction is correct 90-95% of the time. So I imagine you 
could adjust the simulation using these figures, rather than implementing an 
algorithm directly.

-- 
Simon Perreault <nomis80 at nomis80.org> -- http://nomis80.org


More information about the Kde-optimize mailing list