changes to duchain lock: can anybody reproduce this speedup?

Thu Aug 18 09:56:55 UTC 2016

On Saturday, July 16, 2016 1:20:53 PM CEST Sven Brauch wrote:
> Hi,
> 
> thanks for trying, Kevin.
> 
> On 07/16/2016 01:02 PM, Kevin Funk wrote:
> > Note: sched_yield() is not cross-platform. We can't use that directly.
> 
> Cool, I would have built something with #ifdef Q_OS_LINUX, but the
> QThread API is much nicer of course. Thanks for the hint.
> 
> > - CPUs utilized from ~2.0 to ~3.2
> > - Time down from ~170s to ~140s.
> 
> Ok, nice. I think the reason you see less speedup is because your CPU
> only has 2 real cores (mine has 4).
> 
> > Main question is: Is KDevelop still functioning without noticable lags,
> > e.g. during typing? All fine?
> 
> Didn't do extensive testing. However while we're at this, we should fix
> that anyways: if the foreground thread is waiting for a duchain lock,
> background threads should just call yield() as well after _releasing_ a
> lock. Then you'd basically be guaranteed to always get the lock in the
> foreground within at most a few ms (much unlike now, where while a large
> project is being parsed in the background, you sometimes even hit the
> 300ms timeout). There is even a TODO in duchainlock.cpp that something
> like this should be done. Of course it might hurt the background parser
> performance a bit, but I think that is always a worthy tradeoff for
> better UI responsibility.
> 
> I will look further into this if Milian doesn't find a reason why the
> whole thing is nonsense anyways :)
> 
> > QThread::usleep() just puts the *current* thread to sleep, though.
> 
> Yeah, but it sets the flag which prevents anything from acquiring a
> write lock before. So all other threads needing a write lock can't do
> anything either in those 500us, and lots of things in the duchain
> builders need write locks.

Finally had a look at something related, i.e. replacing DUChainLock internals 
with QReadWriteLock, which became fast for Qt 5.7+:

https://woboq.com/blog/qreadwritelock-gets-faster-in-qt57.html

Parsing heaptrack with duchainify (relwithdebinfo for kdevplatform + 
kdevelop):

 Performance counter stats for 'duchainify -t 8 .' (5 runs):

      21352.404811      task-clock (msec)         #    2.454 CPUs utilized            
( +-  3.65% )
            16,850      context-switches          #    0.789 K/sec                    
( +-  9.65% )
             1,147      cpu-migrations            #    0.054 K/sec                    
( +- 15.94% )
           188,113      page-faults               #    0.009 M/sec                    
( +-  1.39% )
    78,779,737,477      cycles                    #    3.690 GHz                      
( +-  3.63% )
    66,084,778,246      instructions              #    0.84  insn per cycle           
( +-  1.14% )
    14,272,490,148      branches                  #  668.425 M/sec                    
( +-  1.12% )
       240,278,194      branch-misses             #    1.68% of all branches          
( +-  2.34% )

       8.701377847 seconds time elapsed                                          
( +-  4.10% )

Note how this is horrible CPU utilization on my machine (nproc == 8).

Applying this patch: https://pastebin.kdab.com/m10c065de

 Performance counter stats for 'duchainify -t 8 .' (5 runs):

      23720.891477      task-clock (msec)         #    2.979 CPUs utilized            
( +-  0.46% )
            32,629      context-switches          #    0.001 M/sec                    
( +-  7.98% )
               997      cpu-migrations            #    0.042 K/sec                    
( +- 11.20% )
           198,436      page-faults               #    0.008 M/sec                    
( +-  2.10% )
    87,645,125,683      cycles                    #    3.695 GHz                      
( +-  0.45% )
    67,272,691,473      instructions              #    0.77  insn per cycle           
( +-  0.98% )
    14,515,423,390      branches                  #  611.926 M/sec                    
( +-  0.97% )
       256,262,860      branch-misses             #    1.77% of all branches          
( +-  0.46% )

       7.962761391 seconds time elapsed                                          
( +-  2.39% )

So this seems to be a bit better, but far from the gain you guys saw with 
Sven's patch. I'll try to profile this a bit more now and see whether I can 
figure out what is blocking us here. Note that libclang should be able to use 
more than 3 cores on this system, hmmm...

Bye

-- 
Milian Wolff
mail at milianw.de
http://milianw.de