changes to duchain lock: can anybody reproduce this speedup?
Milian Wolff
mail at milianw.de
Thu Aug 18 09:57:00 UTC 2016
On Wednesday, August 17, 2016 6:48:56 PM CEST Milian Wolff wrote:
> On Saturday, July 16, 2016 1:20:53 PM CEST Sven Brauch wrote:
> > Hi,
> >
> > thanks for trying, Kevin.
> >
> > On 07/16/2016 01:02 PM, Kevin Funk wrote:
> > > Note: sched_yield() is not cross-platform. We can't use that directly.
> >
> > Cool, I would have built something with #ifdef Q_OS_LINUX, but the
> > QThread API is much nicer of course. Thanks for the hint.
> >
> > > - CPUs utilized from ~2.0 to ~3.2
> > > - Time down from ~170s to ~140s.
> >
> > Ok, nice. I think the reason you see less speedup is because your CPU
> > only has 2 real cores (mine has 4).
> >
> > > Main question is: Is KDevelop still functioning without noticable lags,
> > > e.g. during typing? All fine?
> >
> > Didn't do extensive testing. However while we're at this, we should fix
> > that anyways: if the foreground thread is waiting for a duchain lock,
> > background threads should just call yield() as well after _releasing_ a
> > lock. Then you'd basically be guaranteed to always get the lock in the
> > foreground within at most a few ms (much unlike now, where while a large
> > project is being parsed in the background, you sometimes even hit the
> > 300ms timeout). There is even a TODO in duchainlock.cpp that something
> > like this should be done. Of course it might hurt the background parser
> > performance a bit, but I think that is always a worthy tradeoff for
> > better UI responsibility.
> >
> > I will look further into this if Milian doesn't find a reason why the
> > whole thing is nonsense anyways :)
> >
> > > QThread::usleep() just puts the *current* thread to sleep, though.
> >
> > Yeah, but it sets the flag which prevents anything from acquiring a
> > write lock before. So all other threads needing a write lock can't do
> > anything either in those 500us, and lots of things in the duchain
> > builders need write locks.
>
> Finally had a look at something related, i.e. replacing DUChainLock
> internals with QReadWriteLock, which became fast for Qt 5.7+:
>
> https://woboq.com/blog/qreadwritelock-gets-faster-in-qt57.html
>
> Parsing heaptrack with duchainify (relwithdebinfo for kdevplatform +
> kdevelop):
>
> Performance counter stats for 'duchainify -t 8 .' (5 runs):
>
> 21352.404811 task-clock (msec) # 2.454 CPUs utilized
> ( +- 3.65% )
> 16,850 context-switches # 0.789 K/sec
> ( +- 9.65% )
> 1,147 cpu-migrations # 0.054 K/sec
> ( +- 15.94% )
> 188,113 page-faults # 0.009 M/sec
> ( +- 1.39% )
> 78,779,737,477 cycles # 3.690 GHz
> ( +- 3.63% )
> 66,084,778,246 instructions # 0.84 insn per cycle
> ( +- 1.14% )
> 14,272,490,148 branches # 668.425 M/sec
> ( +- 1.12% )
> 240,278,194 branch-misses # 1.68% of all branches
> ( +- 2.34% )
>
> 8.701377847 seconds time elapsed
> ( +- 4.10% )
>
> Note how this is horrible CPU utilization on my machine (nproc == 8).
>
> Applying this patch: https://pastebin.kdab.com/m10c065de
>
> Performance counter stats for 'duchainify -t 8 .' (5 runs):
>
> 23720.891477 task-clock (msec) # 2.979 CPUs utilized
> ( +- 0.46% )
> 32,629 context-switches # 0.001 M/sec
> ( +- 7.98% )
> 997 cpu-migrations # 0.042 K/sec
> ( +- 11.20% )
> 198,436 page-faults # 0.008 M/sec
> ( +- 2.10% )
> 87,645,125,683 cycles # 3.695 GHz
> ( +- 0.45% )
> 67,272,691,473 instructions # 0.77 insn per cycle
> ( +- 0.98% )
> 14,515,423,390 branches # 611.926 M/sec
> ( +- 0.97% )
> 256,262,860 branch-misses # 1.77% of all branches
> ( +- 0.46% )
>
> 7.962761391 seconds time elapsed
> ( +- 2.39% )
>
> So this seems to be a bit better, but far from the gain you guys saw with
> Sven's patch. I'll try to profile this a bit more now and see whether I can
> figure out what is blocking us here. Note that libclang should be able to
> use more than 3 cores on this system, hmmm...
Ouch.
- The UrlParseLock accumulates ~13s of wait time across all 8 parse threads
(i.e. ~1.6s per thread) for the case above. Looking at UrlParseLock... it
sleeps for 1s straight. Zomg, this is so insanely bad... I'll try to rewrite
this as well now.
- The parse threads wait a lot for work being assigned (~25s). This is also a
bad sign, and probably comes from the way we publish work to the threads. I.e.
instead of creating tons of parse jobs, we only create N (with N being number
of background threads), and then create new ones as existing ones finish. This
is a serialization point that easily leads to starvation. We could try to
optimize this, somehow. Could be tricky to keep the existing functionality
though.
Bye
--
Milian Wolff
mail at milianw.de
http://milianw.de
More information about the KDevelop-devel
mailing list