[FreeNX-kNX] nxagent hang in XCreateWindow when resuming session

Mario Becroft mb at gem.win.co.nz
Thu Oct 16 05:29:30 UTC 2008


Today I tried to resume an nx session but something went wrong and the
machine on which I was running nxclient hung. Never mind what caused
that, but afterwards I could not login to my session at all (from any
host), even though nxagent was still running.

Attaching to the running nxagent with gdb, I got the following stack
trace:

#0  0x00002b8330bd22e3 in __select_nocancel () from /lib64/libc.so.6
#1  0x00002b833011cb47 in NXTransSelect () from /usr/lib/NX/lib/libXcomp.so.3
#2  0x00002b832f9795ce in _XSelect () from /usr/lib/NX/lib/libX11.so.6
#3  0x00002b832f97a43f in _XRead32 () from /usr/lib/NX/lib/libX11.so.6
#4  0x00002b832f97ab31 in _XSend () from /usr/lib/NX/lib/libX11.so.6
#5  0x00002b832f97739c in XCreateWindow () from /usr/lib/NX/lib/libX11.so.6
#6  0x0000000000490d69 in _XdmcpAuthDoIt ()
...

unfortunately, gdb did not correctly resolve the names of functions
within nxagent itself, only within the libraries, but clearly it is
waiting for a response inside XCreateWindow.

strace showed that select() was periodically timing out, some data was
being written out, and then select was being called again (excerpt from
strace output included below for reference).

XCreateWindow is called from a few places in nxagent. I guess it was
called from nxagentOpenScreen() where nxagent was trying to initially
open a window on the nx client.

Presumably when the nx client machine hung, it did not respond to the
XCreateWindow request, but for some reason instead of giving up, nxagent
(or libX11 or libXcomp) just kept trying again and again.

Looking at the code in XlibInt.c, the NX-related changes seem pretty
dense to me. I don't know if there is a bug in the NX-related code which
prevented the call from just failing or timing out in some way.

Maybe someone from nomachine has an idea about what could cause this?

excerpt from strace output: the following repeated again and again.

--8<---------------cut here---------------start------------->8---
gettimeofday({1224129443, 202637}, NULL) = 0
gettimeofday({1224129443, 202800}, NULL) = 0
gettimeofday({1224129443, 202922}, NULL) = 0
ioctl(83, FIONREAD, [0])                = 0
gettimeofday({1224129443, 203106}, NULL) = 0
ioctl(83, FIONREAD, [0])                = 0
gettimeofday({1224129443, 203346}, NULL) = 0
select(84, [83], [], NULL, {5, 0})      = 0 (Timeout)
gettimeofday({1224129448, 201281}, NULL) = 0
gettimeofday({1224129448, 201476}, NULL) = 0
ioctl(83, FIONREAD, [0])                = 0
gettimeofday({1224129448, 201907}, NULL) = 0
gettimeofday({1224129448, 202051}, NULL) = 0
write(83, "\2\"\0\0\0\0\377\377", 8)    = 8
gettimeofday({1224129448, 202380}, NULL) = 0
gettimeofday({1224129448, 202474}, NULL) = 0
gettimeofday({1224129448, 202569}, NULL) = 0
gettimeofday({1224129448, 202686}, NULL) = 0
ioctl(83, FIONREAD, [8])                = 0
ioctl(83, FIONREAD, [8])                = 0
read(83, "\2\"\0\0\0\0\377\377", 65536) = 8
gettimeofday({1224129448, 203039}, NULL) = 0
gettimeofday({1224129448, 203151}, NULL) = 0
gettimeofday({1224129448, 203245}, NULL) = 0
ioctl(83, FIONREAD, [0])                = 0
gettimeofday({1224129448, 203533}, NULL) = 0
ioctl(83, FIONREAD, [0])                = 0
gettimeofday({1224129448, 203813}, NULL) = 0
select(84, [83], [], NULL, {5, 0})      = 0 (Timeout)
gettimeofday({1224129453, 201534}, NULL) = 0
gettimeofday({1224129453, 201694}, NULL) = 0
ioctl(83, FIONREAD, [0])                = 0
gettimeofday({1224129453, 201942}, NULL) = 0
gettimeofday({1224129453, 202093}, NULL) = 0
write(83, "\2\"\0\0\0\0\377\377", 8)    = 8
gettimeofday({1224129453, 202365}, NULL) = 0
gettimeofday({1224129453, 202497}, NULL) = 0
gettimeofday({1224129453, 202619}, NULL) = 0
gettimeofday({1224129453, 202750}, NULL) = 0
ioctl(83, FIONREAD, [8])                = 0
ioctl(83, FIONREAD, [8])                = 0
read(83, "\2\"\0\0\0\0\377\377", 65536) = 8
gettimeofday({1224129453, 203230}, NULL) = 0
gettimeofday({1224129453, 203360}, NULL) = 0
gettimeofday({1224129453, 203484}, NULL) = 0
ioctl(83, FIONREAD, [0])                = 0
gettimeofday({1224129453, 203735}, NULL) = 0
ioctl(83, FIONREAD, [0])                = 0
gettimeofday({1224129453, 203967}, NULL) = 0
--8<---------------cut here---------------end--------------->8---

Periodically (I think every 10 to 20 seconds) there was a call to stat:

stat("/mnt/home/mb/.nx/C-nxhost-1032-83D2284640F84823CC32063BC1FDD97F/errors",
{st_mode=S_IFREG|0600, st_size=139, ...}) = 0

Note, nothing was being written to the errors file at any time, and
there were no other useful messages in any of the logs.

Perhaps someone familiar with the code can understand from this what was
going on.

-- 
Mario Becroft <mb at gem.win.co.nz>



More information about the FreeNX-kNX mailing list