[Nepomuk] High mutex contention in Nepomuk2::Resource

David Faure faure at kde.org
Mon Mar 11 17:39:08 UTC 2013


On Friday 08 March 2013 21:52:58 Simeon Bird wrote:
> I think the answer to my question is "no" because all the places in
> ResourceManager that access the ResourceWatcher were already protected by
> the mutex. So it wasn't that.
> 
> The crash occurs in qt code that looks like:
> 
> if(pending){
>   dbus_unref(pending)
> }
> and asserts complaining that pending == NULL, so something odd is
> definitely happening.

This sounds to me like DBus usage in multiple threads without using a dbus 
connection per thread.
WRONG: QDBusConnection::sessionBus()
RIGHT: DBusConnection::threadConnection()
(nepomuk-core has such a class; akonadi too; KF5 will provide it in the 
future)

Oh..... well, you found it: resourcewatcher.cpp uses 
QDBusConnection::sessionBus()!

Can you reproduce that crash, i.e. can you test if that fixes it?

> I just ran kactivities under helgrind, but all I got so far was lots of
> warnings from deep inside Qt, which I think must be false positives...

Yes, I need to take the time to write a blog post on how to set up Qt for 
helgrind (and helgrind for Qt). Ping me back about this in a few days if I 
don't move forward on it :)

> The crash is connected to the ResourceWatcher used inside
> ResourceManagerPrivate, and I have to say that code makes no sense to me...

Sounds like my feeling whenever I look at the multithread usage in that code 
:-)

> It creates a new object, then calls QObject->moveToThread()  on it. Then
> some (but not all) of the methods calls on that object are sent using
> QMetaObject. And I think the variable is better protected by just requiring
> caller take the lock, which it already does most of the time.

Which caller, and which lock? I see no lock in the watcher -- and if you mean 
the ResourceManager lock, that one is already too much of a contention point, 
we should definitely not re-use it for objects that are independent from 
ResourceManager.
[But of course one could say ResourceWatcher is not really independent --- I'm 
lacking a big picture design about this... I should go back in time and ask 
Vishesh in Berlin...]

My point is: every time an update comes in, it would be good if this didn't 
lock the "big mutex" (resource manager).

> Incidentally, shouldn't the m_dataMutex also be held for
> Nepomuk2::ResourceData::watchEnabled()?

True, feel free to add that. Not that anyone is calling that method :)

> Ok! I think that is why bug 305024 happens. We create a ResourceWatcher,
> and call *most* of its methods via a QAutoConnection, but sometimes call
> addResource and removeResource directly. In any cases where the
> moveToThread makes a difference, this will surely cause a crash, right?

moveToThread means that the events (including slots) are called in a different 
thread. So all the slotFoo() methods are called from a different thread, but to 
be fair, none of them seems to access member variables, they only emit 
signals.
Except start() .... which is also called as a slot, and which sets 
m_connectionInterface.

removeResource(), on the other hand, accesses member vars and makes a dbus 
call ......... ok, from the wrong thread, then. With libdbus not being 
threadsafe, we have to make the dbus calls in the same thread as the one where 
start() is called.

> I guess there are two possible fixes:
> 1) Just protect the ResourceWatcher with the rm mutex.
> 2) Call all methods via a QAutoConnection
> 
> I like the first, because I don't really understand what is happening with
> the moveToThread stuff, but is there some reason I am missing why the mutex
> is not enough and this is really necessary?

I'd prefer solution 2, so that the watcher lives in a single thread entirely, 
which is important for libdbus usage -- and which will give better performance 
than locking the big mutex even more than today.

Can I let you make and test these changes?
On my side my TODO is: unlocking the RM mutex somehow during determineUri()
(i.e. during the blocking call to Soprano executeQuery), otherwise kmail 
blocks for too long when simply creating or destroying Resource instances 
(which happens quite a lot).

-- 
David Faure, faure at kde.org, http://www.davidfaure.fr
Working on KDE, in particular KDE Frameworks 5



More information about the Nepomuk mailing list