kdeinit freezes on Wayland in OOM protection
Martin Gräßlin
mgraesslin at kde.org
Tue Dec 15 07:33:31 UTC 2015
Am 2015-12-15 03:20, schrieb Michael Pyne:
> On Mon, December 14, 2015 16:07:38 Martin Graesslin wrote:
>> On Friday, November 27, 2015 1:05:26 PM CET Michael Pyne wrote:
>> > On Thu, November 26, 2015 13:16:04 Martin Graesslin wrote:
>> > > we are facing a problem during the startup of Plasma on Wayland. If OOM
>> > > protection is enabled for kdeinit and we already have a running X
>> > > server,
>> > > kdeinit freezes dead.
>> > >
>> > > I'm sorry for having ignored the issue for too long and had just
>> > > disabled
>> > > OOM protection on my system, so I never hit it. Now I enabled it again
>> > > to
>> > > get the problem. On my system I have now two frozen kdeinit processes:
>> > >
>> > > martin 1960 1956 0 77832 26448 1 13:05 ? 00:00:00
>> > > /opt/kf5/bin/ kdeinit5 --oom-pipe 4 --kded +kcminit_startup
>> > > martin 1961 1960 0 77832 2816 3 13:05 ? 00:00:00
>> > > /opt/kf5/bin/ kdeinit5 --oom-pipe 4 --kded +kcminit_startup
>> > >
>> > > One has the following stacktrace:
>> > > It's frozen in this line of code:
>> > > sigsuspend(&oldsigs); // wait for the signal to come
>> > >
>> > > The other one has the following stacktrace:
>> > > which is:
>> > > d.n = read(d.fd[0], &d.result, 1);
>> > >
>> > > Given that it looks to me like these two processes dead-lock. I do not
>> > > understand why, why it only happens on Wayland, why the fact that an X
>> > > server must already be running is relevant and what the OOM protection
>> > > has
>> > > to do with it.
>> >
>> > I don't have the answer but I can help explain the deadlock better I
>> > think.
>>
>> thanks for your input. It helped me understanding quite a bit.
>>
>> Some more testing results:
>> Weston+Xwayland: doesn't show the problem
>> Weston without Xwayland (and DISPLAY=$WAYLAND_DISPLAY): doesn't show
>> the
>> problem.
>>
>> What I absolutely do not understand how KWin could influence it. From
>> all
>> the backtraces I see it always freezes before interacting with the
>> windowing system.
>>
>> Any more ideas to test and investigate, highly appreciated. I got a
>> rather
>> high number of complaints due to that problem and it's a showstopper
>> and I'm
>> lost with it.
>
> Did you add an error check around the set_protection call in
> start_kdeinit.c
> and see if that call is failing? (i.e. does "kill(pid, SIGUSR1)" ever
> execute?).
yep I added it, but I'm not sure whether it changed anything. When I
gdb'ed into the process it was hanging in the read in the for loop. So
it might or might not have proceeded to the set_protection call.
>
> If the kill() call *is* reached then perhaps SIGUSR1 is unintentionally
> masked
> in the 'grandchild' process (the child of kdeinit about to be
> exec()'d).
> Perhaps something in the wayland/kwin/weston/x11 library interaction
> blocks
> SIGUSR1 from being received in that case?
Possible. I saw that weston has the following in the Xwayland process
fork:
/* Ignore SIGUSR1 in the child, which will make the X
* server send SIGUSR1 to the parent (weston) when
* it's done with initialization. During
* initialization the X server will round trip and
* block on the wayland compositor, so avoid making
* blocking requests (like xcb_connect_to_fd) until
* it's done with that. */
signal(SIGUSR1, SIG_IGN);
>
> I think the easiest possible fix is to replace the sigsuspend call with
> a
> sigtimedwait() call, constructed to wait for SIGUSR1 alone, but with a
> short
> timeout. In the event the timeout is reached, continue with the exec()
> as
> normal, possibly after leaving a noisy warning. It's probably a good
> idea to
> do this anyway since library code shouldn't wait indefinitely just
> because OOM
> is enabled, but you're the one best positioned to reproduce at this
> point :)
That is a good suggestion. I'll give it a try!
Cheers,
Martin
More information about the Kde-frameworks-devel
mailing list