kdeinit freezes on Wayland in OOM protection

Michael Pyne mpyne at kde.org
Tue Dec 15 02:20:09 UTC 2015


On Mon, December 14, 2015 16:07:38 Martin Graesslin wrote:
> On Friday, November 27, 2015 1:05:26 PM CET Michael Pyne wrote:
> > On Thu, November 26, 2015 13:16:04 Martin Graesslin wrote:
> > > we are facing a problem during the startup of Plasma on Wayland. If OOM
> > > protection is enabled for kdeinit and we already have a running X
> > > server,
> > > kdeinit freezes dead.
> > > 
> > > I'm sorry for having ignored the issue for too long and had just
> > > disabled
> > > OOM protection on my system, so I never hit it. Now I enabled it again
> > > to
> > > get the problem. On my system I have now two frozen kdeinit processes:
> > > 
> > > martin    1960  1956  0 77832 26448   1 13:05 ?        00:00:00
> > > /opt/kf5/bin/ kdeinit5 --oom-pipe 4 --kded +kcminit_startup
> > > martin    1961  1960  0 77832  2816   3 13:05 ?        00:00:00
> > > /opt/kf5/bin/ kdeinit5 --oom-pipe 4 --kded +kcminit_startup
> > > 
> > > One has the following stacktrace:
> > > It's frozen in this line of code:
> > > sigsuspend(&oldsigs);   // wait for the signal to come
> > > 
> > > The other one has the following stacktrace:
> > > which is:
> > > d.n = read(d.fd[0], &d.result, 1);
> > > 
> > > Given that it looks to me like these two processes dead-lock. I do not
> > > understand why, why it only happens on Wayland, why the fact that an X
> > > server must already be running is relevant and what the OOM protection
> > > has
> > > to do with it.
> > 
> > I don't have the answer but I can help explain the deadlock better I
> > think.
> 
> thanks for your input. It helped me understanding quite a bit.
> 
> Some more testing results:
> Weston+Xwayland: doesn't show the problem
> Weston without Xwayland (and DISPLAY=$WAYLAND_DISPLAY): doesn't show the
> problem.
> 
> What I absolutely do not understand how KWin could influence it. From all
> the backtraces I see it always freezes before interacting with the
> windowing system.
> 
> Any more ideas to test and investigate, highly appreciated. I got a rather
> high number of complaints due to that problem and it's a showstopper and I'm
> lost with it.

Did you add an error check around the set_protection call in start_kdeinit.c 
and see if that call is failing? (i.e. does "kill(pid, SIGUSR1)" ever 
execute?).

If the kill() call *is* reached then perhaps SIGUSR1 is unintentionally masked 
in the 'grandchild' process (the child of kdeinit about to be exec()'d). 
Perhaps something in the wayland/kwin/weston/x11 library interaction blocks 
SIGUSR1 from being received in that case?

I think the easiest possible fix is to replace the sigsuspend call with a 
sigtimedwait() call, constructed to wait for SIGUSR1 alone, but with a short 
timeout. In the event the timeout is reached, continue with the exec() as 
normal, possibly after leaving a noisy warning. It's probably a good idea to 
do this anyway since library code shouldn't wait indefinitely just because OOM 
is enabled, but you're the one best positioned to reproduce at this point :)

Regards,
 - Michael Pyne
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: This is a digitally signed message part.
URL: <http://mail.kde.org/pipermail/kde-frameworks-devel/attachments/20151214/994831e0/attachment-0001.sig>


More information about the Kde-frameworks-devel mailing list