Patch: wake up duchainlock writers
David Nolden
zwabel at googlemail.com
Tue Dec 15 14:50:01 UTC 2009
Am Dienstag 15 Dezember 2009 14:58:51 schrieb Hamish Rodda:
> On Tue, 15 Dec 2009 08:47:44 pm David Nolden wrote:
> > Am Dienstag 15 Dezember 2009 02:04:53 schrieb Hamish Rodda:
> > > When I said it was slower, I meant it seemed like the background
> > > parsing was slower, but I didn't measure it. Given you've found it's
> > > faster, that's most likely the case. I didn't try to determine the UI
> > > responsiveness. The lock still prefers waiting readers over writers,
> > > so the UI should still be as fast (given the main thread should only
> > > ever use readers).
> > >
> > > If the user time is increased, that just means we were better at
> > > utilising the multiple CPUs, right? Ideally we want utilisation at
> > > 100% x all cpus, which should result in much better wall clock time but
> > > higher user time.
> >
> > That time should count the 'overall' CPU usage, and if it's higher, it
> > means that we've burnt more CPU cycles to get the same result.
>
> Well, having parsing finish earlier is a better result, isn't it? See
> results below, anyway.
>
> > > > Due to the central nature of the duchain lock, I'm actually thinking
> > > > of replacing all the mutexes in there with spin-locks, using
> > > > QAtomicInt instead of all the mutexes and wait conditions, to make
> > > > the whole thing more efficient.
> > >
> > > What are the performance differences with multiple threads in release
> > > mode? I think that is what we should be targeting, as it is our core
> > > audience (developers usually have decent machines).
> >
> > I've implemented my idea now, and it is much faster. Locking the duchain
> > now approximately equals increasing one counter, and eventually waiting.
>
> Here is my test results:
> Test: clean .kdevduchain, hot disk cache, 'time duchainify kdevplatform'
> Test run on a core 2 quad running at 3.57Ghz, 4gb ram
> Non-pattern-conforming results run multiple times to get best time
>
> Spinlock, debugfull build:
> Thread count Real time User Time
> 1 41.14s 38.73s
> 2 46.97s 48.13s
> 4 45.54s 47.92s
> 8 69.37s 70.64s
>
> Waitcondition, debugfull build:
> Thread count Real time User Time
> 1 40.83s 37.92s
> 2 45.75s 49.05s
> 4 46.79s 55.55s
> 8 47.28s 54.64s
>
> Spinlock, release build:
> Thread count Real time User Time
> 1 21.35s 18.64s
> 2 23.85s 22.48s
> 4 31.63s 30.55s
> 8 39.74s 37.58s
>
> Waitcondition, release build:
> Thread count Real time User Time
> 1 22.81s 20.31s
> 2 20.82s 21.39s
> 4 20.73s 22.75s
> 8 23.25s 25.87s
>
> In conclusion,
> 1) Release builds are fast :) I might have to start using them...
> 2) Spinlock does not scale to multiple threads, as I suspected, as it can't
> efficiently handle situations of high lock contention
> 3) Waitcondition does scale up to number of threads == number of cpus, but
> does not yet offer a significant improvement with multithreading. User
> time is only slightly worse with waitcondition.
>
> Last night as I was developing the patch I found a great improvement with
> waitcondition, but that was when I had accidentally allowed write locks to
> be acquired when read locks already were. That's why the patch didn't
> quite perform as I found last night (where multithreaded parsing was ~30%
> faster in debug mode)
>
> Given I still think we can decrease the amount of time spent in write locks
> (by rewriting code to do calculations in read locks, and then get a write
> lock if changes are required), I would think continuing to work with the
> waitcondition lock would be better, possibly with spinlock being used when
> the background parser is only using one thread.
There is some strangenesses though in the results. I've done some similar less
systematic tests, where the spin-locks were always significantly faster with 1
or 2 threads than the wait-conditions. And from profiling I know that the
whole old duchain-locking mechanism with mutexes, a reader-map, etc. was not
efficient enough for the frequency in which it was called, while the spin-lock
is inherently more efficient, at least for the 1-thread case.
I think we can combine the advantages of both approaches, by adding wait-
conditions to the spin-lock approach, and letting waiters wait for the wait-
conditions instead of just sleeping.
Greetings, David
More information about the KDevelop-devel
mailing list