libksane seems to break QProcess::start calls
Thiago Macieira
thiago at kde.org
Fri Mar 4 17:35:23 GMT 2022
On Friday, 4 March 2022 09:02:37 PST Thiago Macieira wrote:
> > I uploaded the traces here:
> > https://l3u.de/tmp/strace_brscan.txt.xz
> > https://l3u.de/tmp/strace_plustek.txt.xz
> >
> > Thanks again for all help!
>
> I'll take a look. Let's see... the brscan trace doesn't have any use of
> PIDFD.
Let's see the plustek scan. It also has zero uses of PIDFD (probably libksane
is using KProcess, which triggers the QProcess pidfd protection in Qt 5). So
PIDFD is not the issue.
Let's see what it's doing. The first mention of SIGCHLD is here:
[pid 20601] clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|
CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fb873fff910) = 20602
[pid 20601] rt_sigaction(SIGCHLD, {sa_handler=0x7fb894726350, sa_mask=[CHLD],
sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7fb8cd33a5a0},
{sa_handler=SIG_DFL, sa_mask=[], sa_flags=0}, 8) = 0
That's NOT QProcess. I can tell because the sequence is wrong, possibly buggy:
it's a bad idea to install the SIGCHLD handler *after* you've fork()ed your
child, because the child might die before you get there (unless you construct
the child code so that it can't). QProcess also doesn't set the sa_mask, which
this code did.
And there's no execve() in PID 20602, so it definitely isn't QProcess. This is
just something using fork() to run some code that may or may not crash, may or
may not hang so it may need to be kill()ed.
The second mention of SIGCHLD is this child dying:
[pid 20601] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=20602,
si_uid=1000, si_status=0, si_utime=3, si_stime=31} ---
[pid 20601] rt_sigreturn({mask=[ALRM]}) = 0
I don't know what the signal handler did, because it made no system calls
before returning. That indicates a poorly written signal handler (it probably
wrote to a volatile variable). That's also probably buggy.
The third mention of SIGCHLD is QProcess/forkfd:
[pid 20579] rt_sigaction(SIGCHLD, {sa_handler=0x7fb8cda513e0, sa_mask=[],
sa_flags=SA_RESTORER|SA_SIGINFO|SA_NOCLDSTOP, sa_restorer=0x7fb8cd33a5a0},
{sa_handler=0x7fb894726350, sa_mask=[CHLD], sa_flags=SA_RESTORER|SA_RESTART,
sa_restorer=0x7fb8cd33a5a0}, 8) = 0
[pid 20579] rt_sigaction(SIGPIPE, {sa_handler=SIG_IGN, sa_mask=[],
sa_flags=SA_RESTORER, sa_restorer=0x7fb8cd33a5a0}, NULL, 8) = 0
[pid 20579] futex(0x7fb8cdd7f3e8, FUTEX_WAKE_PRIVATE, 2147483647) = 0
[pid 20579] pipe2([24, 25], O_CLOEXEC) = 0
[pid 20579] eventfd2(0, EFD_CLOEXEC) = 26
[pid 20579] clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|
CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fb8ca53ba90) = 20605
[pid 20579] write(26, "*\0\0\0\0\0\0\0", 8) = 8
Note the same sequence as the previous commit, with a pipe and an eventfd,
which gets value 42. Note also how the handler for SIGCHLD is installed before
we fork(), with no mask.
This child's death is properly reported in the fourth mention of SIGCHLD:
[pid 20579] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=20605,
si_uid=1000, si_status=0, si_utime=25, si_stime=0} ---
[pid 20579] waitid(P_ALL, 0, {si_signo=SIGCHLD, si_code=CLD_EXITED,
si_pid=20605, si_uid=1000, si_status=0, si_utime=0, si_stime=0}, WNOHANG|
WEXITED|WNOWAIT, NULL) = 0
[pid 20579] wait4(20605, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG,
{ru_utime={tv_sec=0, tv_usec=248891}, ru_stime={tv_sec=0, tv_usec=3966}, ...})
= 20605
[pid 20579] write(25, "\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0;
\314\3\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 152) = 152
[pid 20579] close(25) = 0
[pid 20579] waitid(P_ALL, 0, 0x7ffc6caac790, WNOHANG|WEXITED|WNOWAIT, NULL) =
-1 ECHILD (Keine Kind-Prozesse)
[pid 20579] rt_sigreturn({mask=[]}) = 2
And here you see a proper SIGCHLD handler, that actually does some work before
returning.
The fifth mention of SIGCHLD is the same as the first:
[pid 20636] clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|
CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fb8720ab910) = 20637
[pid 20636] rt_sigaction(SIGCHLD, {sa_handler=0x7fb894726350, sa_mask=[CHLD],
sa_flags=SA_RESTORER|SA_RESTART, sa_restorer=0x7fb8cd33a5a0},
{sa_handler=0x7fb8cda513e0, sa_mask=[], sa_flags=SA_RESTORER|SA_SIGINFO|
SA_NOCLDSTOP, sa_restorer=0x7fb8cd33a5a0}, 8) = 0
This installs *again* the same SIGCHLD handler at pointer address
0x7fb894726350 after fork(). This is the first certain bug in this code, third
possible bug overall. Since it never uninstalled its handler, it shouldn't
attempt to install it again. There's NO condition under doing that is a
correct thing to do in a multithreaded application. Zero.
The sixth mention is the same as the second.
The seventh mention is forkfd again, but like a proper SIGCHLD citizen, it's
different from the fourth mention by not installing the handler again. It just
starts the child via fork():
[pid 20579] pipe2([24, 25], O_CLOEXEC) = 0
[pid 20579] eventfd2(0, EFD_CLOEXEC) = 26
[pid 20579] clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|
CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fb8ca53ba90) = 20641
[pid 20579] write(26, "*\0\0\0\0\0\0\0", 8) = 8
The eighth mention when child PID 20641 dies:
[pid 20579] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=20641,
si_uid=1000, si_status=0, si_utime=23, si_stime=0} ---
[pid 20579] rt_sigreturn({mask=[]}) = 2
And this is the reason why you got the symptom of QProcess breaking. We'd have
expected that the eight mention be the same as the fifth, to keep the pattern
going, but it isn't. It's the same as the second and sixth mentions. The
SIGCHLD handler that ran is that stupidly short one, which we now know to be
definitely buggy.
Unix signals are a very fragile thing that must be used with extreme care,
especially in multithreaded applications. That's true for the signals that can
only have one purpose and one source application-wide, but it's even more so
for the cooperative ones like SIGCHLD. That's the whole reason why pidfd now
exists: to remove this wart.
This other SIGCHLD-using code isn't cooperating at all. We can easily tell in
that last mention: it simply returned, without calling the handler that was
there before it was installed. Any and all SIGCHLD handlers MUST, without
exception, call the previous handler that was there, in a multithreaded
application. It failed to call the one that forkfd installs, so forkfd &
QProcess never found out its own children had died.
--
Thiago Macieira - thiago (AT) macieira.info - thiago (AT) kde.org
Software Architect - Intel DPG Cloud Engineering
More information about the kde-devel
mailing list