[FreeNX-kNX] nxagent session gets lost, user gets new session even though one already exists

Mario Becroft mb at gem.win.co.nz
Sat Jan 24 04:08:30 UTC 2009


I have been working on this problem today.

It is a bit intermittent, but at the moment I can repeat it reliably
from a windows XP client by starting a session, then killing nxssh. The
problem does not seem to be anything to do with the client, just happens
to be triggered by certain clients. The manifests in two main ways. The
first is where the session immediately goes to closed state, even though
nxagent is still running (in suspended state). The second is where the
session remains in running state, even though it is now suspended. This
leads to a secondary problem next time you try to suspend and resume the
session. The end result of both cases is that when you try to resume the
session, you instead get a new session and your old nxagent is
orphaned. (If you edit the session database files in /usr/NX/var/db to
reinstate the orphaned nxagent, it continues to work fine.)

The problem is quite mind-bending.

The issue perhaps has to do with how nxserver invokes nxnode and
monitors the output from nxnode.

In the fault condition, the output from nxnode is either somehow
garbled, or not interpreted correctly. So far it does not make much
sense. Here is an example of the nxserver.log from when I kill the nxssh
process on the windows XP client:

--8<---------------cut here---------------start------------->8---
(/usr/NX/bin/nxserver)(6076)(Sat Jan 24 16:28:03 NZDT 2009) Info: Closing connection to slave with pid 6221.
(/usr/NX/bin/nxserver)(6076)(Sat Jan 24 16:28:03 NZDT 2009) nxnode_reader: 1001 Bye.
1001 Bye.
(/usr/NX/bin/nxserver)(6076)(Sat Jan 24 16:28:03 NZDT 2009) nxnode_reader: NX> 596 Error: Session  failed. Reason was: Session: Display failure detected at 'Sat Jan 24 16:28:03 2009'.
(/usr/NX/bin/nxserver)(6076)(Sat Jan 24 16:28:03 NZDT 2009) AAAA cmd=(1001 Bye.)
NX> 596 Error: Session  failed. Reason was: Session: Display failure detected at 'Sat Jan 24 16:28:03 2009'.
(/usr/NX/bin/nxserver)(6076)(Sat Jan 24 16:28:03 NZDT 2009) nxnode_reader: NX> 1009 Session status: suspending
(/usr/NX/bin/nxserver)(6076)(Sat Jan 24 16:28:03 NZDT 2009) nxnode_reader: NX> 1005 Session status: suspended
(/usr/NX/bin/nxserver)(6076)(Sat Jan 24 16:28:03 NZDT 2009) AAAA method is su, done nxnode_login
(/usr/NX/bin/nxserver)(6076)(Sat Jan 24 16:28:03 NZDT 2009) AAAA end of server_nxnode_start
(/usr/NX/bin/nxserver)(6076)(Sat Jan 24 16:28:03 NZDT 2009) AAAA server_nxnode_start_wait about to close session
(/usr/NX/bin/nxserver)(6076)(Sat Jan 24 16:28:03 NZDT 2009) Info: Closing connection to slave with pid 6221.
--8<---------------cut here---------------end--------------->8---

The text in brackets and the lines with AAAA are my instrumentation. (It
is "($0)($$)($(date))").

I am still not 100% sure how nxnode works, but a few things don't make
any sense here.

Looking at the "nxnode_reader:" messages, we read "1001 Bye" yet nxnode
never emits that message; the closest thing it emits is "NX> 1001
Bye". Apparently the first four characters are getting lost somehow.

Then we receive two more lines from the nxnode (suspending and
suspended) *after* the "1001 Bye", which is impossible because "1001
Bye" is emitted at the end of nxnode right before it exits.

Then the nxnode process invoked by nxnode_login() exits (or at least our
end of the pipe closes).

I am using su authentication, which may have some bearing on this. For
others with this problem: what authentication mode are you using?

-- 
Mario Becroft <mb at gem.win.co.nz>



More information about the FreeNX-kNX mailing list