[FreeNX-kNX] Sanity check...
Ed Warnicke
eaw at cisco.com
Sun Apr 24 18:46:33 UTC 2005
On Sunday 24 April 2005 09:40 am, Fabian Franz wrote:
> Am Samstag, 23. April 2005 23:04 schrieb Ed Warnicke:
> > Wouldn't it be a better solution to move the:
> >
> > NX> 710 Session status: running
> > into the monitor if clause:
> >
> > if stringinstring "Info: Waiting for connection from" "$line"
> >
> > That way you would never get the nxclient sending
> > "bye" before the nxagent was actually up and awaiting
> > connections? This *should* also take care of the spurious
> > "NX>" problem as well.
>
> I'm sorry to disappoint you, but I tested it out. When the 710 opcode comes
> has _no_ effect at all.
>
> The bye command is just sent after the "Commit, running" 1002 and 1006
> messages.
Hmm... that is quite strange... I have found that if I ommit the 710 opcode
then the client never initiates the 'bye' process, even if the 1002 and 1006
opcodes are sent. However, if I *have* the 710 opcode and the 1002 and
1006 aren't sent then the client initiates the 'bye' process. Are you sure
the opcode 710 is actually being sent to the client (ie stdout hasn't
been reassigned in some way that prevents it from getting to the client)?
>
> So why does it not have any affect and still helps? Me feels quite puzzled
> about that.
>
> Lets get the facts again:
>
> - netcat does sometimes die with a connection refused. -> This seems to be
> a race condition.
Yes
>
> - sleep 3; netcat _helps_ to prevent the race condition
Yes
>
> - NX Client does sent the bye command just after the 1002 Commit, 1006
> running messages. (This is OK, as the session could still fail in between)
Actually, in my original (unaltered) the 1002 and 1006 opcodes where often
sent after the initial 'bye' sequence, leading to a second (with different
opcodes) 'bye' sequence in the non-ssl case and errors in the ssl case.
I see sending the 'bye' command after the 710 opcode in all cases.
>
> - nxagent does send the "Waiting for Connection from" just after its ready
> and this triggers the 1002 Commit and 1006 running messages. -> There
> should be no race condition.
I would agree, except that the 710 opcode is triggering the bye sequence :)
(FYI I'm using the 1.4.0-91 version of nxclient).
>
> =>
>
> I think at the moment we do not have any other chance then to try to send
> the netcat command again and again (3 times max as we then also have waited
> 3 secs) until it works.
I think there is still some misunderstanding (either yours or mine) about
which opcodes trigger the bye command. I suspect that we can resolve
that root issue and solve the problem without the netcat retries.
>
> I'll add that workaround to CVS. Please test it.
>
> I also will add some more sanity checks in nxnode in general as I now
> finally know how to get exit codes from wait (duh!).
>
> -> You propose starting netcat first and then sending the bye-command. What
> would that help?
My thinking was this: nxclient expects nxagent to take over the pipe after
the completion of the bye sequence. By sending the 'bye' from nxserver
we are indicating that we have nxagent all hooked up. My thought
was that we want to make sure we actually *have* nxagent all hooked
up (which included the connection of our netcat) before we tell
the client we do. It is conceivable (although extremely unlikely) that
in the space between the 'bye' we send and successfully hooking up netcat
the nxclient might send something to the nxagent that would then not be
delivered. I see it as a potential race condition (albeat a VERY unlikely
one). So my thinking was, if we can preclude it (by waiting to send
by from the server until we have the netcat hooked up) easily we should.
Ed
More information about the FreeNX-kNX
mailing list