kdeinit freezes on Wayland in OOM protection

Martin Gräßlin mgraesslin at kde.org
Tue Dec 15 08:05:52 UTC 2015


Am 2015-12-15 08:33, schrieb Martin Gräßlin:
> Am 2015-12-15 03:20, schrieb Michael Pyne:
>> On Mon, December 14, 2015 16:07:38 Martin Graesslin wrote:
>>> On Friday, November 27, 2015 1:05:26 PM CET Michael Pyne wrote:
>>> > On Thu, November 26, 2015 13:16:04 Martin Graesslin wrote:
>>> > > we are facing a problem during the startup of Plasma on Wayland. If OOM
>>> > > protection is enabled for kdeinit and we already have a running X
>>> > > server,
>>> > > kdeinit freezes dead.
>>> > >
>>> > > I'm sorry for having ignored the issue for too long and had just
>>> > > disabled
>>> > > OOM protection on my system, so I never hit it. Now I enabled it again
>>> > > to
>>> > > get the problem. On my system I have now two frozen kdeinit processes:
>>> > >
>>> > > martin    1960  1956  0 77832 26448   1 13:05 ?        00:00:00
>>> > > /opt/kf5/bin/ kdeinit5 --oom-pipe 4 --kded +kcminit_startup
>>> > > martin    1961  1960  0 77832  2816   3 13:05 ?        00:00:00
>>> > > /opt/kf5/bin/ kdeinit5 --oom-pipe 4 --kded +kcminit_startup
>>> > >
>>> > > One has the following stacktrace:
>>> > > It's frozen in this line of code:
>>> > > sigsuspend(&oldsigs);   // wait for the signal to come
>>> > >
>>> > > The other one has the following stacktrace:
>>> > > which is:
>>> > > d.n = read(d.fd[0], &d.result, 1);
>>> > >
>>> > > Given that it looks to me like these two processes dead-lock. I do not
>>> > > understand why, why it only happens on Wayland, why the fact that an X
>>> > > server must already be running is relevant and what the OOM protection
>>> > > has
>>> > > to do with it.
>>> >
>>> > I don't have the answer but I can help explain the deadlock better I
>>> > think.
>>> 
>>> thanks for your input. It helped me understanding quite a bit.
>>> 
>>> Some more testing results:
>>> Weston+Xwayland: doesn't show the problem
>>> Weston without Xwayland (and DISPLAY=$WAYLAND_DISPLAY): doesn't show 
>>> the
>>> problem.
>>> 
>>> What I absolutely do not understand how KWin could influence it. From 
>>> all
>>> the backtraces I see it always freezes before interacting with the
>>> windowing system.
>>> 
>>> Any more ideas to test and investigate, highly appreciated. I got a 
>>> rather
>>> high number of complaints due to that problem and it's a showstopper 
>>> and I'm
>>> lost with it.
>> 
>> Did you add an error check around the set_protection call in 
>> start_kdeinit.c
>> and see if that call is failing? (i.e. does "kill(pid, SIGUSR1)" ever
>> execute?).
> 
> yep I added it, but I'm not sure whether it changed anything. When I
> gdb'ed into the process it was hanging in the read in the for loop. So
> it might or might not have proceeded to the set_protection call.
> 
>> 
>> If the kill() call *is* reached then perhaps SIGUSR1 is 
>> unintentionally masked
>> in the 'grandchild' process (the child of kdeinit about to be 
>> exec()'d).
>> Perhaps something in the wayland/kwin/weston/x11 library interaction 
>> blocks
>> SIGUSR1 from being received in that case?

good news: I found the reason. It was KWin blocking SIGUSR through 
pthread_sigmask and passing it to the child processes created through 
QProcess. By reimplementing setupChildProcess I was able to fix the 
problem.

Thanks a lot for pointing me in the right direction!

And yes, I'll still look into changing to the wait variant.

Cheers
Martin


More information about the Kde-frameworks-devel mailing list