[Nepomuk] High mutex contention in Nepomuk2::Resource
David Faure
faure at kde.org
Mon Mar 11 17:39:08 UTC 2013
On Friday 08 March 2013 21:52:58 Simeon Bird wrote:
> I think the answer to my question is "no" because all the places in
> ResourceManager that access the ResourceWatcher were already protected by
> the mutex. So it wasn't that.
>
> The crash occurs in qt code that looks like:
>
> if(pending){
> dbus_unref(pending)
> }
> and asserts complaining that pending == NULL, so something odd is
> definitely happening.
This sounds to me like DBus usage in multiple threads without using a dbus
connection per thread.
WRONG: QDBusConnection::sessionBus()
RIGHT: DBusConnection::threadConnection()
(nepomuk-core has such a class; akonadi too; KF5 will provide it in the
future)
Oh..... well, you found it: resourcewatcher.cpp uses
QDBusConnection::sessionBus()!
Can you reproduce that crash, i.e. can you test if that fixes it?
> I just ran kactivities under helgrind, but all I got so far was lots of
> warnings from deep inside Qt, which I think must be false positives...
Yes, I need to take the time to write a blog post on how to set up Qt for
helgrind (and helgrind for Qt). Ping me back about this in a few days if I
don't move forward on it :)
> The crash is connected to the ResourceWatcher used inside
> ResourceManagerPrivate, and I have to say that code makes no sense to me...
Sounds like my feeling whenever I look at the multithread usage in that code
:-)
> It creates a new object, then calls QObject->moveToThread() on it. Then
> some (but not all) of the methods calls on that object are sent using
> QMetaObject. And I think the variable is better protected by just requiring
> caller take the lock, which it already does most of the time.
Which caller, and which lock? I see no lock in the watcher -- and if you mean
the ResourceManager lock, that one is already too much of a contention point,
we should definitely not re-use it for objects that are independent from
ResourceManager.
[But of course one could say ResourceWatcher is not really independent --- I'm
lacking a big picture design about this... I should go back in time and ask
Vishesh in Berlin...]
My point is: every time an update comes in, it would be good if this didn't
lock the "big mutex" (resource manager).
> Incidentally, shouldn't the m_dataMutex also be held for
> Nepomuk2::ResourceData::watchEnabled()?
True, feel free to add that. Not that anyone is calling that method :)
> Ok! I think that is why bug 305024 happens. We create a ResourceWatcher,
> and call *most* of its methods via a QAutoConnection, but sometimes call
> addResource and removeResource directly. In any cases where the
> moveToThread makes a difference, this will surely cause a crash, right?
moveToThread means that the events (including slots) are called in a different
thread. So all the slotFoo() methods are called from a different thread, but to
be fair, none of them seems to access member variables, they only emit
signals.
Except start() .... which is also called as a slot, and which sets
m_connectionInterface.
removeResource(), on the other hand, accesses member vars and makes a dbus
call ......... ok, from the wrong thread, then. With libdbus not being
threadsafe, we have to make the dbus calls in the same thread as the one where
start() is called.
> I guess there are two possible fixes:
> 1) Just protect the ResourceWatcher with the rm mutex.
> 2) Call all methods via a QAutoConnection
>
> I like the first, because I don't really understand what is happening with
> the moveToThread stuff, but is there some reason I am missing why the mutex
> is not enough and this is really necessary?
I'd prefer solution 2, so that the watcher lives in a single thread entirely,
which is important for libdbus usage -- and which will give better performance
than locking the big mutex even more than today.
Can I let you make and test these changes?
On my side my TODO is: unlocking the RM mutex somehow during determineUri()
(i.e. during the blocking call to Soprano executeQuery), otherwise kmail
blocks for too long when simply creating or destroying Resource instances
(which happens quite a lot).
--
David Faure, faure at kde.org, http://www.davidfaure.fr
Working on KDE, in particular KDE Frameworks 5
More information about the Nepomuk
mailing list