[FreeNX-kNX] Endless loop in _XWaitForReadable when user's x server dies

Mario Becroft mb at gem.win.co.nz
Tue Jan 6 12:20:51 UTC 2009


I reported this bug a while ago on this list, but I now have more
information. Running latest nx 3.3.0 libraries retrieved from
nomachine's web site today.

It is very hard to repeat the problem. If you resume a session, but the
Linux host that nxclient is running on is very short of memory, the X
server may get zapped by the oom killer when nxagent starts recreating
the windows etc.

Often this does not cause a problem (in the sense that if you try
logging in again, the session resumes as expected), but sometimes it
results in nxagent getting hung up in such a way that you can never
resume the session.

When this happens, nxagent is stuck inside _XWaitForReadable() while
doing an X protocol call. It is inside the for loop where it calls
Select() with a timeout. Select() periodically returns but it just keeps
looping forever, retrying the Select().

I haven't quite figured out what this code is meant to do. Clearly it is
meant to loop repeatedly under some conditions.

In this case, when Select() returns, result is 0, and
_NXDisplayErrorFunction is not null. The call to NXDisplayErrorFunction
is returning false, which is why it continues rather than returning -1.

--8<---------------cut here---------------start------------->8---
        if (result <= 0) {
            if ((result == -1 && !ECHECK(EINTR)) ||
                    (_NXDisplayErrorFunction != NULL &&
                        (*_NXDisplayErrorFunction)(dpy, _XGetIOError(dpy)))) {
                _XIOError(dpy);
                return -1;
            }
            continue;
        }
--8<---------------cut here---------------end--------------->8---

It can get to this point through various calling paths. I have seen it
happen inside XCreateWindow() and inside XQueryExtension, for example,
originally from within nxagentReconnectAllWindows().

_NXDisplayErrorFunction was introduced in nxagent-2.0.0-14, according to
the CHANGELOG. It is set by calling NXSetDisplayErrorPredicate(), which
is done in nxagent/Display.c. There is a rather dense comment explaining
this:

  /*
   * Let Xlib become aware of our interrupts. In theory
   * we don't need to have the error handler installed
   * during the normal operations and could simply let
   * the dispatcher handle the interrupts. In practice
   * it's better to have Xlib invalidating the display
   * as soon as possible rather than incurring in the
   * risk of entering a loop that doesn't care checking
   * the display errors explicitly.
   */

Unfortunately I am none the wiser after reading that, and looking at the
nxagentDisplayErrorPredicate() function itself.

It seems to me that if the X server goes away, it would be much better
if rather than hanging, nxagent would either notice immediately and
resume accepting new connections, or at least timeout after a reasonable
length of time.

Surely someone at nomachine must have some idea of what is going on. I
really could do with some help here, please!

Stack trace showing an example of how it came to be inside
_XWaitForReadable():

--8<---------------cut here---------------start------------->8---
(gdb) bt
#0  0x00007f45168052e3 in __select_nocancel () from /lib64/libc.so.6
#1  0x00007f4517b05fb8 in NXTransSelect () from /usr/NX/lib/libXcomp.so.3
#2  0x00007f451849d81e in _XSelect (maxfds=<value optimized out>,
    readfds=0x7fff20dfae10, writefds=0x7fff20dfac90,
    exceptfds=<value optimized out>, timeout=0x7f451874ab30) at XlibInt.c:333
#3  0x00007f451849dbd1 in _XWaitForReadable (dpy=0x3d4a8f0) at XlibInt.c:791
#4  0x00007f451849e0fc in _XRead (dpy=0x3d4a8f0, data=0x7fff20dfaf60 "\226",
    size=32) at XlibInt.c:1510
#5  0x00007f451849eeff in _XReply (dpy=0x3d4a8f0, rep=0x7fff20dfaf60, extra=0,
    discard=1) at XlibInt.c:2276
#6  0x00007f4518493092 in XQueryExtension (dpy=0x3d4a8f0,
    name=0x7f451875bfff "SHAPE", major_opcode=0x7fff20dfafc4,
    first_event=0x7fff20dfafc8, first_error=0x7fff20dfafcc) at QuExt.c:51
#7  0x00007f4518488774 in XInitExtension (dpy=0x26, name=0x7fff20dfae10 "")
    at InitExt.c:49
#8  0x00007f451874f65e in XextAddDisplay (extinfo=0x7f451895ea10,
    dpy=0x3d4a8f0, ext_name=0x7f451875bfff "SHAPE", hooks=0x7f451895e500,
    nevents=1, data=0x0) at extutil.c:108
#9  0x00007f4518752be2 in XShapeCombineRegion (dpy=0x3d4a8f0, dest=14681853,
    destKind=0, xOff=0, yOff=0, r=0x5fc5e90, op=0) at XShape.c:74
#10 0x00000000004936fc in nxagentShapeWindow (pWin=0xe1e560) at Window.c:2236
#11 0x0000000000494290 in nxagentReconfigureWindow (
    param0=<value optimized out>, param1=<value optimized out>,
    data_buffer=<value optimized out>) at Window.c:3029
#12 0x0000000000494c3c in nxagentTraverseWindow (pWin=0xe1e560,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2709
#13 0x0000000000494c90 in nxagentTraverseWindow (pWin=0x176b8e0,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2718
#14 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x1a9e570,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#15 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x1ba2090,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#16 0x0000000000494d8c in nxagentTraverseWindow (pWin=0x1bbd7c0,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#17 0x0000000000494c90 in nxagentTraverseWindow (pWin=0x15422e0,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2718
#18 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x390f760,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#19 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x1faaf50,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#20 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x21a4ba0,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#21 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x39dffa0,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#22 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x447d040,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#23 0x0000000000494c7c in nxagentTraverseWindow (pWin=0xfd0f50,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#24 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x1704290,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#25 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x17ecea0,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#26 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x18123c0,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#27 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x1871330,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#28 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x1856c10,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#29 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x18fa8e0,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#30 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x18f6a70,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#31 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x36c1f90,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#32 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x1596a50,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#33 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x158ee90,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#34 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x19a8610,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#35 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x142d1f0,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#36 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x1c653a0,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#37 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x1f4d770,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#38 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x19335d0,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#39 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x1933f30,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#40 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x36a4c30,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#41 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x1f03ec0,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#42 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x38c4890,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#43 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x35c4750,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#44 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x38dde40,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#45 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x1fb69c0,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#46 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x36611e0,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#47 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x1468e70,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#48 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x194a920,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#49 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x1583b30,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#50 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x150e650,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#51 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x2626070,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#52 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x26388e0,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#53 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x2638ad0,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#54 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x2623e00,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#55 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x243a560,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#56 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x1a4d240,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#57 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x377d030,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#58 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x1295460,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#59 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x2207680,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#60 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x1a4d060,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#61 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x38cbb60,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#62 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x243a2e0,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#63 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x2599dd0,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#64 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x2237c20,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#65 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x243a060,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#66 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x228cb20,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#67 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x243b120,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#68 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x1a70ec0,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#69 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x2117c20,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#70 0x0000000000494c7c in nxagentTraverseWindow (pWin=0xedcf00,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#71 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x3906cb0,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#72 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x19d50c0,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#73 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x23c48d0,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#74 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x3847790,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#75 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x39cf6c0,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#76 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x18ed340,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#77 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x3d2ab60,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#78 0x0000000000494c7c in nxagentTraverseWindow (pWin=0x3191b60,
    pF=0x4941f0 <nxagentReconfigureWindow>, p=0x7fff20dfbde4) at Window.c:2713
#79 0x0000000000494fd5 in nxagentReconnectAllWindows (p0=<value optimized out>)
    at Window.c:2718
#80 0x00000000004a0239 in nxagentReconnectSession () at Reconnect.c:505
---Type <return> to continue, or q <return> to quit---
#81 0x00000000004a04f0 in nxagentHandleConnectionChanges () at Reconnect.c:782
#82 0x00000000004a067f in nxagentHandleConnectionStates () at Reconnect.c:189
#83 0x0000000000483055 in nxagentWakeupHandler (data=0x26, count=0,
    mask=0xaf2140) at Handlers.c:565
#84 0x000000000044b47e in WakeupHandler (result=0, pReadmask=0xaf2140)
    at dixutils.c:472
#85 0x0000000000456f95 in WaitForSomething (pClientsReady=0x7fff20dfc140)
    at WaitFor.c:389
#86 0x0000000000427071 in Dispatch () at X/NXdispatch.c:610
#87 0x000000000045043c in main (argc=13, argv=0x7fff20dfc7a8,
    envp=<value optimized out>) at main.c:450
--8<---------------cut here---------------end--------------->8---

-- 
Mario Becroft <mb at gem.win.co.nz>



More information about the FreeNX-kNX mailing list