changes to duchain lock: can anybody reproduce this speedup?
Milian Wolff
mail at milianw.de
Thu Aug 18 09:56:55 UTC 2016
On Saturday, July 16, 2016 1:20:53 PM CEST Sven Brauch wrote:
> Hi,
>
> thanks for trying, Kevin.
>
> On 07/16/2016 01:02 PM, Kevin Funk wrote:
> > Note: sched_yield() is not cross-platform. We can't use that directly.
>
> Cool, I would have built something with #ifdef Q_OS_LINUX, but the
> QThread API is much nicer of course. Thanks for the hint.
>
> > - CPUs utilized from ~2.0 to ~3.2
> > - Time down from ~170s to ~140s.
>
> Ok, nice. I think the reason you see less speedup is because your CPU
> only has 2 real cores (mine has 4).
>
> > Main question is: Is KDevelop still functioning without noticable lags,
> > e.g. during typing? All fine?
>
> Didn't do extensive testing. However while we're at this, we should fix
> that anyways: if the foreground thread is waiting for a duchain lock,
> background threads should just call yield() as well after _releasing_ a
> lock. Then you'd basically be guaranteed to always get the lock in the
> foreground within at most a few ms (much unlike now, where while a large
> project is being parsed in the background, you sometimes even hit the
> 300ms timeout). There is even a TODO in duchainlock.cpp that something
> like this should be done. Of course it might hurt the background parser
> performance a bit, but I think that is always a worthy tradeoff for
> better UI responsibility.
>
> I will look further into this if Milian doesn't find a reason why the
> whole thing is nonsense anyways :)
>
> > QThread::usleep() just puts the *current* thread to sleep, though.
>
> Yeah, but it sets the flag which prevents anything from acquiring a
> write lock before. So all other threads needing a write lock can't do
> anything either in those 500us, and lots of things in the duchain
> builders need write locks.
Finally had a look at something related, i.e. replacing DUChainLock internals
with QReadWriteLock, which became fast for Qt 5.7+:
https://woboq.com/blog/qreadwritelock-gets-faster-in-qt57.html
Parsing heaptrack with duchainify (relwithdebinfo for kdevplatform +
kdevelop):
Performance counter stats for 'duchainify -t 8 .' (5 runs):
21352.404811 task-clock (msec) # 2.454 CPUs utilized
( +- 3.65% )
16,850 context-switches # 0.789 K/sec
( +- 9.65% )
1,147 cpu-migrations # 0.054 K/sec
( +- 15.94% )
188,113 page-faults # 0.009 M/sec
( +- 1.39% )
78,779,737,477 cycles # 3.690 GHz
( +- 3.63% )
66,084,778,246 instructions # 0.84 insn per cycle
( +- 1.14% )
14,272,490,148 branches # 668.425 M/sec
( +- 1.12% )
240,278,194 branch-misses # 1.68% of all branches
( +- 2.34% )
8.701377847 seconds time elapsed
( +- 4.10% )
Note how this is horrible CPU utilization on my machine (nproc == 8).
Applying this patch: https://pastebin.kdab.com/m10c065de
Performance counter stats for 'duchainify -t 8 .' (5 runs):
23720.891477 task-clock (msec) # 2.979 CPUs utilized
( +- 0.46% )
32,629 context-switches # 0.001 M/sec
( +- 7.98% )
997 cpu-migrations # 0.042 K/sec
( +- 11.20% )
198,436 page-faults # 0.008 M/sec
( +- 2.10% )
87,645,125,683 cycles # 3.695 GHz
( +- 0.45% )
67,272,691,473 instructions # 0.77 insn per cycle
( +- 0.98% )
14,515,423,390 branches # 611.926 M/sec
( +- 0.97% )
256,262,860 branch-misses # 1.77% of all branches
( +- 0.46% )
7.962761391 seconds time elapsed
( +- 2.39% )
So this seems to be a bit better, but far from the gain you guys saw with
Sven's patch. I'll try to profile this a bit more now and see whether I can
figure out what is blocking us here. Note that libclang should be able to use
more than 3 cores on this system, hmmm...
Bye
--
Milian Wolff
mail at milianw.de
http://milianw.de
More information about the KDevelop-devel
mailing list