lock contention - what can we do about it?

Mon Mar 7 19:06:54 UTC 2011

On 03/07/11 13:13, David Nolden wrote:
> I'm just saying that the<100% is due to I/O. There is no other possible
> explanation anyway, unless if some thread would take a lock and start
> sleeping while holding it.

Ah, now I finally get what you are hinting at: in our code at least 
*something* must be running (theoretically) since not all threads should 
be waiting and thus at least 100%.

But I wonder why the iowait (wa) time in top is not higher, instead the 
CPU seems to be idle (sleeping?)... I just remembered, maybe it's even 
power management or stuff like that which is to blame here...

I'll test by disabling power management when I get around to it.

> The fact that we don't use all cores is due to lock contention of course.
> Using SpinLock instead of QMutex in the whole duchain would have many
> advantages:
> a) It can be debugged

QMutex as well, no?

> b) It is less buggy (100 LOC instead of a few thousand provenly buggy LOC..)

Hehe isn't that a bit of harsh? On one hand the bug was fixed, otoh 
SpinLock might have it's shortcomings as well which we are not aware of yet.

> c) It is probably faster, at least when the protected region is very small,
> as is usually the case in the duchain

For me personally, "probably" is not enough. But I'd be very interested 
in hard numbers showing how that performs in relation to our current 
codebase.

> d) It would allow us to analyze lock contention using valgrind, by disabling
> all sleeping in the spinlock and in the duchain lock
>
> Anyway, personally I don't care so much about the lock contention. What I
> see though is, that all the statistics, analyzes and hypotheses that have
> been done regarding this issue simply are not useful to do anything about
> it. If you want to fix something, then you need to know: Which exact lock is
> responsible for most of the contention, and from where is it called?

Yes, that is exactly what I think as well! I wrote these emails mostly 
to find out whether someone has experience with such tools and could 
hint me at the proper usage and workflow to get to this actual 
meaningful information.

> This  could be answered by valgrind with spin-locks.

It (should) in theory also be reported by tools like mutrace, drd, 
oprofile, ... I just have to understand how they are used correctly.

> There would be even more useful statistics, like "Who is holding the lock
> while we're waiting for it, and what is he doing?", but that would be pretty
> hard to answer.

drd tells you which lock was hold for what amount of time, and gives you 
backtraces. Isn't that what you want?

-- 
Milian Wolff
http://milianw.de