deadlock protection

Waldo Bastian bastian-RoXCvvDuEio at public.gmane.org
Wed Oct 20 11:47:06 BST 2004


On Monday 18 October 2004 22:28, Havoc Pennington wrote:
>  - in item 3, "while the current incoming message is being processed"
>    isn't a concept we have right now ... messages are just popped
>    off the queue and never put back, there's no "end" of the
>    processing, other than perhaps the message getting unref'd.
>    In libdbus that is, the bindings may have a "current message"
>    concept.

But you do know when a message needs a reply I assume? So when that reply gets 
send, that's the end of the processing. If the message doesn't need a reply, 
the call stack ID is not needed either, because in that case you can never 
deadlock.

You still don't have captured the concept of "current message" with that, only 
"all messages that still need processing", so it doesn't help you too much.

So I guess it's indeed mainly a task for the bindings. If there is a way to 
say "this thread is currently processing this message" then bindings that map 
to actual (synchronous) function calls can automatically call that, in the 
other case the application will need to take care of it. But see below.

> A couple thoughts on alternate approaches, not sure these are going to
> be useful, but noting them:
>
>  - we could punt most of this to the bindings; i.e. introduce
>    call stack ID to the protocol and libdbus, but require bindings
>    to figure out how to conveniently track and propagate it.
>    Disadvantage of course is that the app you're talking to may
>    lose track of the call stack if its binding doesn't support it.
>
>  - rather than jumping the would-deadlock incoming method call to the
>    front of the queue, we could return an error "EWOULDDEADLOCK" sort
>    of thing. the advantage is not having to worry about semantics
>    of message reordering, or how to invoke the main loop.
>    Deadlocks would still need debugging (same as if they in fact
>    deadlocked), but they would not lock up the apps which
>    would be nice for users.
>
>    This potentially solves more deadlock cases,
>    however, 
>    in that we could have an app mark "will block for reply"
>    on outgoing calls, and then the bus can know when a
>    client is blocking and which app it is blocking on.
>    So in the "apps send a call to each other simultaneously"
>    case we could detect the deadlock and return an error.
>    Maybe worth doing anyway.

I don't see how it could solve _more_ deadlock cases, in particular, I don't 
see how it would detect "apps send a call to each other simultaneously".
I tried to add something like that to DCOP over the weekend, but then realized 
that
a) It will never be able to catch complicated cases (if the call sequence 
A->B->C is started in parallel with C->D->A it's impossible to detect the 
deadlock, because the call that arrives at C could theoretically contain 
information about A and B, but C doesn't know that D has called A as a result 
of C calling D)
b) In the simple case (A->B started in parallel with B->A) it is possible to 
detect that e.g. A is waiting for a response from B when A gets the call from 
B, but deciding on that information alone creates the risk of false 
positives. After all, maybe B still handles incoming calls either in a 
separate thread or in the same thread while waiting for the answer of A, so 
there may not be deadlock at all. It's possible of course to indicate in the 
message send by A that A is still able to process incoming messages, but that 
starts to become a bit hairy.

Overall I'm starting to become less and less impressed with the automatic 
deadlock detection in DCOP. In particular because it detects and handles the 
simple cases, but those are exactly the cases that you would notice right 
away anyway because they would deadlock each and every time otherwise.

The hard cases (A calls B in parallel with B calling A, KDE BR69346) aren't 
detected, and that are exactly the cases that would be most valuable because 
that are the ones that are timing dependant, so they are most likely to be 
missed while developing and thanks to Murphy only show up after release in 
builds without debug information ;-)

(In my mail from 2004-10-03, "Reentrancy (Was: RFC: DBUS & KDE 4)" I wasn't 
aware that DCOP failed to handle exactly those cases that would have been the 
most valuable to handle)

Then there is also the risks associated with unwanted/unexpected recursion. In 
the trivial cases the automatic deadlock detection replaces a deadlock with 
recursion, but the developer will hardly know because it happens behind 
his/her back.

So at this point in time, I'm starting to think that IMHO the right solution 
is to give application developers more control over the recursion behavior 
(which is a responsibility of the bindings). So far I have thought about that 
in terms of some sort of flag each time an outgoing call is made, that 
indicates whether incoming calls should be processed during this time.
KDE BR69346 made me realize that another option is to flag methods in the call 
interface as "recursion safe", e.g. calls that only query some attribute 
value can most likely be processed while an outgoing call is in progress 
since it doesn't have side effects anyway. That would basically be the 
equivalent of C++'s const on methods. (This would all be up to the bindings, 
I don't think it needs any particular information from libdbus)

As for DCOP compatibility, given the IMHO limited practical use of automatic 
deadlock protection and the more flexible design of DBUS wrt async message 
handling and thread support, I'm inclined to think that it would be enough if 
DBUS provides for a way to let the binding that provides DCOP compatibility 
map DCOP's automatic deadlock protection onto DBUS but that other bindings 
wouldn't need to bother with this (otoh, if a binding likes to it could 
support it, and use it for limited deadlock detection). The goal is then to 
provide backwards compatibility for existing DCOP applications, but new 
(DBUS) applications should then rely on the recursion control features of 
their bindings to prevent deadlock and not on automatic deadlock detection.

Cheers,
Waldo
-- 
bastian-RoXCvvDuEio at public.gmane.org   |   SUSE LINUX 9.2: Order now!   |   bastian-IBi9RG/b67k at public.gmane.org
  http://www.suse.de/us/private/products/suse_linux/preview/index.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://mail.kde.org/pipermail/kde-core-devel/attachments/20041020/3c626198/attachment.sig>


More information about the kde-core-devel mailing list