Patch: wake up duchainlock writers

Tue Dec 15 14:50:01 UTC 2009

Am Dienstag 15 Dezember 2009 14:58:51 schrieb Hamish Rodda:
> On Tue, 15 Dec 2009 08:47:44 pm David Nolden wrote:
> > Am Dienstag 15 Dezember 2009 02:04:53 schrieb Hamish Rodda:
> > > When I said it was slower, I meant it seemed like the background
> > > parsing was slower, but I didn't measure it.  Given you've found it's
> > > faster, that's most likely the case.  I didn't try to determine the UI
> > > responsiveness.  The lock still prefers waiting readers over writers,
> > > so the UI should still be as fast (given the main thread should only
> > > ever use readers).
> > >
> > > If the user time is increased, that just means we were better at
> > > utilising the multiple CPUs, right?  Ideally we want utilisation at
> > > 100% x all cpus, which should result in much better wall clock time but
> > > higher user time.
> >
> > That time should count the 'overall' CPU usage, and if it's higher, it
> >  means that we've burnt more CPU cycles to get the same result.
> 
> Well, having parsing finish earlier is a better result, isn't it? See
>  results below, anyway.
> 
> > > > Due to the central nature of the duchain lock, I'm actually thinking
> > > > of replacing all the mutexes in there with spin-locks, using
> > > > QAtomicInt instead of all the mutexes and wait conditions, to make
> > > > the whole thing more efficient.
> > >
> > > What are the performance differences with multiple threads in release
> > > mode? I think that is what we should be targeting, as it is our core
> > > audience (developers usually have decent machines).
> >
> > I've implemented my idea now, and it is much faster. Locking the duchain
> >  now approximately equals increasing one counter, and eventually waiting.
> 
> Here is my test results:
> Test: clean .kdevduchain, hot disk cache, 'time duchainify kdevplatform'
> Test run on a core 2 quad running at 3.57Ghz, 4gb ram
> Non-pattern-conforming results run multiple times to get best time
> 
> Spinlock, debugfull build:
> Thread count	Real time	User Time
> 1				41.14s		38.73s
> 2				46.97s		48.13s
> 4				45.54s		47.92s
> 8				69.37s		70.64s
> 
> Waitcondition, debugfull build:
> Thread count	Real time	User Time
> 1				40.83s		37.92s
> 2				45.75s		49.05s
> 4				46.79s		55.55s
> 8				47.28s		54.64s
> 
> Spinlock, release build:
> Thread count	Real time	User Time
> 1				21.35s		18.64s
> 2				23.85s		22.48s
> 4				31.63s		30.55s
> 8				39.74s		37.58s
> 
> Waitcondition, release build:
> Thread count	Real time	User Time
> 1				22.81s		20.31s
> 2				20.82s		21.39s
> 4				20.73s		22.75s
> 8				23.25s		25.87s
> 
> In conclusion,
> 1) Release builds are fast :)  I might have to start using them...
> 2) Spinlock does not scale to multiple threads, as I suspected, as it can't
> efficiently handle situations of high lock contention
> 3) Waitcondition does scale up to number of threads == number of cpus, but
> does not yet offer a significant improvement with multithreading.  User
>  time is only slightly worse with waitcondition.
> 
> Last night as I was developing the patch I found a great improvement with
> waitcondition, but that was when I had accidentally allowed write locks to
>  be acquired when read locks already were.  That's why the patch didn't
>  quite perform as I found last night (where multithreaded parsing was ~30%
>  faster in debug mode)
> 
> Given I still think we can decrease the amount of time spent in write locks
> (by rewriting code to do calculations in read locks, and then get a write
>  lock if changes are required), I would think continuing to work with the
>  waitcondition lock would be better, possibly with spinlock being used when
>  the background parser is only using one thread.

There is some strangenesses though in the results. I've done some similar less 
systematic tests, where the spin-locks were always significantly faster with 1 
or 2 threads than the wait-conditions. And from profiling I know that the 
whole old duchain-locking mechanism with mutexes, a reader-map, etc. was not 
efficient enough for the frequency in which it was called, while the spin-lock 
is inherently more efficient, at least for the 1-thread case.

I think we can combine the advantages of both approaches, by adding wait-
conditions to the spin-lock approach, and letting waiters wait for the wait-
conditions instead of just sleeping.

Greetings, David