lock contention - what can we do about it?

Tue Mar 8 21:58:58 UTC 2011

2011/3/7 Milian Wolff <mail at milianw.de>:
> On 03/07/11 13:13, David Nolden wrote:
>>
>> I'm just saying that the<100% is due to I/O. There is no other possible
>> explanation anyway, unless if some thread would take a lock and start
>> sleeping while holding it.
>
> Ah, now I finally get what you are hinting at: in our code at least
> *something* must be running (theoretically) since not all threads should be
> waiting and thus at least 100%.
>
> But I wonder why the iowait (wa) time in top is not higher, instead the CPU
> seems to be idle (sleeping?)... I just remembered, maybe it's even power
> management or stuff like that which is to blame here...
>
> I'll test by disabling power management when I get around to it.
>
>> The fact that we don't use all cores is due to lock contention of course.
>> Using SpinLock instead of QMutex in the whole duchain would have many
>> advantages:
>> a) It can be debugged
>
> QMutex as well, no?

Not really, because its implementation is private. I often felt the
need to, for example, debug the mutex by storing which thread is
currently holding it, or similar, but thats not possible with QMutex.

>> b) It is less buggy (100 LOC instead of a few thousand provenly buggy
>> LOC..)
>
> Hehe isn't that a bit of harsh? On one hand the bug was fixed, otoh SpinLock
> might have it's shortcomings as well which we are not aware of yet.

Have you looked at its internal code? It is HUGE...

>> c) It is probably faster, at least when the protected region is very
>> small,
>> as is usually the case in the duchain
>
> For me personally, "probably" is not enough. But I'd be very interested in
> hard numbers showing how that performs in relation to our current codebase.

I'm honest enough to say "probably", but I'd add a "very" in front of
it. A function-call is a quite huge overhead when it could be replaced
by simply checking and increasing a counter.

>> d) It would allow us to analyze lock contention using valgrind, by
>> disabling
>> all sleeping in the spinlock and in the duchain lock
>>
>> Anyway, personally I don't care so much about the lock contention. What I
>> see though is, that all the statistics, analyzes and hypotheses that have
>> been done regarding this issue simply are not useful to do anything about
>> it. If you want to fix something, then you need to know: Which exact lock
>> is
>> responsible for most of the contention, and from where is it called?
>
> Yes, that is exactly what I think as well! I wrote these emails mostly to
> find out whether someone has experience with such tools and could hint me at
> the proper usage and workflow to get to this actual meaningful information.
>
>> This  could be answered by valgrind with spin-locks.
>
> It (should) in theory also be reported by tools like mutrace, drd, oprofile,
> ... I just have to understand how they are used correctly.

Would be nice if those could answer it, but I also didnt yet find a good way.

>> There would be even more useful statistics, like "Who is holding the lock
>> while we're waiting for it, and what is he doing?", but that would be
>> pretty
>> hard to answer.
>
> drd tells you which lock was hold for what amount of time, and gives you
> backtraces. Isn't that what you want?

No, because we need pairs of information: A) Who tried to acquire it,
but was blocked, for how long? And B) Who was holding it during that
time? To really nicely gather this information, we might also think
about creating a special "debugging" version of SpinLock which would
record all this info.

Greetings, David