The curious case of stuck systemd poweroff

Thu Jul 14 10:11:26 BST 2016

Hola!

ever since systemd and or sddm started not killing all our session
processes we have had problems of poweroff/reboot getting hung up
waiting for processes to quit.
Recently systemd then started sending them TERM by default, which in
theory should make things behave as before, but more often than not it
doesn't.

The reason for this is meh to debug and altogether somewhat
convoluted. So all that follows was partially inferred from numerous
logging attempts.
They all root in a simple fact: ksmserver is rubbish at its job and
only terminates half the stuff in the session before handing over to
the outside expecting the outside to deal with it.

I found two likely holdup scenarios caused by this:

a) procfoo is still running -> ksmserver hands over to systemd ->
systemd stops sddm -> xserver stops -> procfoo now crashes because it
does x-things (pretty sure [1] is an instance of this) -> kcrash jumps
in -> drkonqi -> gdb -> procfoo wont react to anything but KILL now

b) procfoo is still running -> ksmserver hands over to systemd ->
procfoo survives without X (e.g. kio slave) -> procfoo crashes for
(maybe unreleated) reasons such as qt bug because network is down ->
kcrash gets hung up on recursion crashes handling for kdeinit5 or some
other nonesense

Long story short: if things crash, usually the TERM from systemd won't
do anything.

The way I see it ksmserver needs to properly TERM everything to
protect against a). Kcrash additionally ought to not do anything when
its session is in shutdown to guard against both a) and b) AND allow
core dumps to be collected instead so there actually can be a trace of
something having gone wong.

Thoughts?

I have no clue how we'd implement kcrash changes since that would have
to somehow know if the session is active without doing business on the
heap. For ksmserver we could probably lean on systemd to give a proc
list of the session.

[1] https://bugs.kde.org/show_bug.cgi?id=364340