Patch: wake up duchainlock writers

Tue Dec 15 13:58:51 UTC 2009

On Tue, 15 Dec 2009 08:47:44 pm David Nolden wrote:
> Am Dienstag 15 Dezember 2009 02:04:53 schrieb Hamish Rodda:
> > When I said it was slower, I meant it seemed like the background parsing
> >  was slower, but I didn't measure it.  Given you've found it's faster,
> >  that's most likely the case.  I didn't try to determine the UI
> >  responsiveness.  The lock still prefers waiting readers over writers, so
> >  the UI should still be as fast (given the main thread should only ever
> > use readers).
> >
> > If the user time is increased, that just means we were better at
> > utilising the multiple CPUs, right?  Ideally we want utilisation at 100%
> > x all cpus, which should result in much better wall clock time but higher
> > user time.
> 
> That time should count the 'overall' CPU usage, and if it's higher, it
>  means that we've burnt more CPU cycles to get the same result.

Well, having parsing finish earlier is a better result, isn't it? See results 
below, anyway.

> > > Due to the central nature of the duchain lock, I'm actually thinking of
> > > replacing all the mutexes in there with spin-locks, using QAtomicInt
> > >  instead of all the mutexes and wait conditions, to make the whole
> > > thing more efficient.
> >
> > What are the performance differences with multiple threads in release
> > mode? I think that is what we should be targeting, as it is our core
> > audience (developers usually have decent machines).
> 
> I've implemented my idea now, and it is much faster. Locking the duchain
>  now approximately equals increasing one counter, and eventually waiting.

Here is my test results:
Test: clean .kdevduchain, hot disk cache, 'time duchainify kdevplatform'
Test run on a core 2 quad running at 3.57Ghz, 4gb ram
Non-pattern-conforming results run multiple times to get best time

Spinlock, debugfull build:
Thread count	Real time	User Time
1				41.14s		38.73s
2				46.97s		48.13s
4				45.54s		47.92s
8				69.37s		70.64s

Waitcondition, debugfull build:
Thread count	Real time	User Time
1				40.83s		37.92s
2				45.75s		49.05s
4				46.79s		55.55s
8				47.28s		54.64s

Spinlock, release build:
Thread count	Real time	User Time
1				21.35s		18.64s
2				23.85s		22.48s
4				31.63s		30.55s
8				39.74s		37.58s

Waitcondition, release build:
Thread count	Real time	User Time
1				22.81s		20.31s
2				20.82s		21.39s
4				20.73s		22.75s
8				23.25s		25.87s

In conclusion,
1) Release builds are fast :)  I might have to start using them...
2) Spinlock does not scale to multiple threads, as I suspected, as it can't 
efficiently handle situations of high lock contention
3) Waitcondition does scale up to number of threads == number of cpus, but 
does not yet offer a significant improvement with multithreading.  User time 
is only slightly worse with waitcondition.

Last night as I was developing the patch I found a great improvement with 
waitcondition, but that was when I had accidentally allowed write locks to be 
acquired when read locks already were.  That's why the patch didn't quite 
perform as I found last night (where multithreaded parsing was ~30% faster in 
debug mode)

Given I still think we can decrease the amount of time spent in write locks 
(by rewriting code to do calculations in read locks, and then get a write lock 
if changes are required), I would think continuing to work with the 
waitcondition lock would be better, possibly with spinlock being used when the 
background parser is only using one thread.

Cheers,
Hamish.