Use abort() instead of exit() in xio error handler?

Thu Jun 4 03:23:48 BST 2009

Am Monday 01 June 2009 22:02:04 schrieb Lubos Lunak:
> On Monday 01 of June 2009, Christoph Feck wrote:
> > Hi,
> >
> > I am currently looking at https://bugs.kde.org/show_bug.cgi?id=194750 and
> > was wondering if it would make sense to change xio error handler to use
> > abort() instead of exit(), effectively bypassing the static cleanup
> > functions that some KDE components register. Those cleanup functions seem
> > to cause this and other wired crashes (just grep bko for "exit()" :)
> >
> > If X11 decides to assert (maybe because of the evil XCB bug), it should
> > not create backtraces that suggest completely unrelated bugs.
>
>  The XIO error handler is not an assert, it is called when the connection
> to the X server is reset (X crashes/terminates, there's a network problem
> when using remote X, etc.). That is not a bug, it is a situation that can
> normally happen, although there isn't really much to do about it, since if
> nothing else Xlib requires that the XIO handler terminates the application.

Sorry, I probably used the wrong word there, I was thinking of "signalling a 
bug condition" when I used the word "assert"...

> Still, it is not something exceptional and there I don't think it should
> lead to abort() (note that with some setups, which would make every
> Ctrl+Alt+Backspace create a bunch of core files). If there are global
> objects, they should be able to survive XIO handler called.

You are right about the core dump issue. My primary question was about "global 
objects should survive XIO errors". As I see it now, Qt does not "signal" the 
application that the connection to the display server is lost, but simply 
expects the application to handle that error itself.

I think glibc recently got a new low level exit call, that does not dump core, 
but avoids calling the exit handlers. Maybe that is an option?

>  In this specific bugreport, there's indeed the interesting detail that the
> report is from DrKonqi, so it doesn't look like X actually went away. The
> other assumption that appears to be wrong is that the problem has to do
> with global objects, as the backtrace with the XIO handler looks harmless -
> it is just sleeping. The crash is in thread #2, in what looks like glibc's
> free().
>
>  My guess that there is a memory corruption, which would explain the crash,
> and it could also explain the XIO handler called (either it got somehow
> triggered by the corrupted heap, or maybe somebody played with thread
> safety and lost, IIRC that can lead to XIO handler called too). I expect
> the other similar bugreports are similar problems.
>
>
>  PS: What is "the evil XCB bug"? A vast majority of asserts in XCB are
> actually application bugs.

I was referring to https://bugs.kde.org/show_bug.cgi?id=178844 alias 
https://bugs.freedesktop.org/show_bug.cgi?id=14211 alias 
https://bugs.launchpad.net/xorg-server/%2Bbug/185311 but it looks like this 
seems to be fixed now, the upstream bug report has not been closed, though.

I have seen some other backtraces that looked quite different, but all had the 
same problem: a crash appeared somewhere after exit() being called because of 
the XIO error, but with global objects still referencing stuff, see my 
comment #2 in bug 178844.

As far as I understand it, XCB had issues with improper locking, causing race 
conditions that eventually raised the XIO errors, but that may have been 
fixed now.

Bug 194750 could be unrelated, it just reminded me of this issue.

Christoph Feck (kdepepo)