KCrash crash racing

Albert Astals Cid aacid at kde.org
Mon Aug 5 21:58:56 BST 2019


El dimecres, 31 de juliol de 2019, a les 12:26:23 CEST, Harald Sitter va escriure:
> Moin Moin!
> 
> I've been haunting down a nasty backtrace problem in drkonqi where it
> entirely fails to create a backtrace and am now fairly confident this
> is in fact a design flaw with kcrash, but I have no awesome ideas on
> how to solve this properly.
> 
> Long story short: there is a space of time between SEGV occurring and
> drkonqi stopping the threads. This causes (e.g.) GIO threads to
> actively unavoidably crash the process. Most recently this could/can
> be observed with plasmashell which has a GIO thread sitting around
> when (I think) flatpak updates are being checked. The result is that
> the crash cannot be traced because the process dies before drkonqi has
> a chance to deal with it.
> 
> If you have ever seen a warning or error of the kind "XCB connection
> lost" or something similar it is in fact the very same problem, albeit
> usually not fatal.
> 
> When a process crashes SEGV is sent to any one thread. The other
> threads continue to run!
> When the SEGV arrives the standard handler will possibly restart the
> process, then close all open file descriptors, potentially start (and
> wait for) drkonqi and when drkonqi has worked its magic raise itself
> to a core pattern process if applicable [1].
> The threads have still not been suspended!
> When drkonqi starts, it sends STOP to the crashed process. STOP is
> delivered to every thread, thus stopping everything this time around.
> Only now is the process "safe" from crashing while crashing.
> 
> And that's the race right there. In between the file descriptors
> getting closed and the STOPping the threads that aren't being handled
> and continue to run to potentially access the now-closed file
> descriptors. In GIO's case it can try to read inotify events and run
> into an error (e.g. in ik_source_read_some_events) and g_error, which
> as far as I can tell will result in a TRAP because g_error almost
> always(?) ends in g_abort.
> 
> The solution is simply: we shouldn't close FDs before all threads are stopped.
> 
> Practically I can't think of a way to actually pull this off though.
> We'd need to close the FDs *at* STOP. But STOP like KILL cannot be
> handled.
> 
> I think the actual solution here would need to be that kcrash stops
> invoking drkonqi and instead defers to a core handler through which
> drkonqi can get access to the core.
> Trouble is that there can only be one core handler and there are more
> software providers on a system than just us, so I guess this isn't
> really a viable solution :/
> Also the core stuff isn't too portable I think.
> 
> I am fairly out of ideas :/

Tried looking at what breakpad does?

Cheers,
  Albert

P.S: I've no idea if i'm saying something stupid, sorry if i am ^_^

> 
> [1] http://man7.org/linux/man-pages/man5/core.5.html
> 






More information about the Kde-frameworks-devel mailing list