[FreeNX-kNX] Fw: More Trouble with NX 4 Client -> FreeNX

Nick Couchman Nick.Couchman at seakr.com
Tue Oct 1 19:04:26 UTC 2013


> 
> I'm not sure where you got all this "trace" from, but as a start you really need to compare a failed session with a successful session to 
> see what is differing between the two "error" files

The trace comes from the nx library code - at this point, more specifically, nxcomp.  If you look toward the top of the source files in the nxcomp directory, there are lines like this:
//
// Set the verbosity level.
//

#define PANIC
#define WARNING
#undef  TEST
#undef  DEBUG
#undef DUMP

Throughout the source code, you'll see references to "#ifdef TEST" and then some debug code.  Changing these lines to #define TEST, for example, allows that debug code to run, which you gives you more output.  The unfortunate part is that these are all individually defined in the files, which means that you have to change the #undef to #define in each of the files you want to trace.  I concentrated on Loop.cpp, first, but later (after sending my original e-mail), used a "sed" command to go through and change it across the board to be lots more output.  I'm still not very good at debugging it, but I figure more info is better :-).

> 
> line 376 looks like it might be "interesting" tho . . .
> 
>         Loop: WARNING! Executing the select with requested timeout of 0 S 
> and 0 Ms
> 
> 
> Well, a 0 timeout mostly means
> 
>         "Do not time out, keep waiting"
> 
> but
> this may mean
> 
>         "if the 'select' ( whatever that means here) doesn't work first 
> time, then give up without waiting for another attempt"
> 
> looking a bit further
> 
> followed by line 378
> 
>         Loop: Select result is [0] error is [0] at 19:27:08:773.178 after 
> 30 Ms.
> 
> Hmmmm
> 
> 
> 
> and lines 380 and 381
> 
>         Loop: Out of select after 33 Ms at 19:27:08:776.057 with result 0.
>         Loop: Closing multimedia listener in process with pid '2005'.
> 
> 
> and maybe 386
> 
>         Loop: Closing agent FD in process with pid '2005'.
> 
> 
> 388
> 
>         Loop: Freeing up proxy in process with pid '2005'.
> 
> 
> 404  - 407
> 
> 
>         Loop: Going to flush any data to the proxy.
>         Loop: WARNING! Flushing the proxy link on behalf of the agent.
>         Loop: Reporting a flush request at 19:27:08:791.442 with 0 bytes 
> written.
>         Loop: Reporting a flush request at 19:27:08:791.659 with 62 bytes 
> written.
> 
> 
> 410
> 
>         Loop: Freeing up agent in process with pid '2005'
> 
> 
> 413
> 
>         NXTransClose: Closing down all the X connections.
> 
> 
> 415
> 
> 
>         Loop: Closing any remaining X connections.
> 
> 
> 419
> 
> 
>         Loop: Closing any remaining listener.
> 
> 
> 433  - 436
> 
>         NXTransDestroy: Closing down all the X connections.
>         Loop: Closing any remaining X connections.
>         Loop: Closing any remaining listener.
>         NXTransDestroy: Waiting for the NX transport to terminate.
> 
> 
> So . . . .
> 
> Do you see the same sort of things appearing for a successful connection 
> ????
> 
> 

I'll have to check one of the success sessions, but I actually think most of that is "normal."  I'll go back and look closer.  The lines that are standing out to me start around line 645:

    645 NXTransExecute: Going to execute I/O on the NX descriptors.
    646 Loop: Setting proxy and local agent descriptors.
    647 Loop: Values were resultFDs 1 errorFDs 0 setFDs 10.
    648 Loop: Setting proxy descriptor FD#9 as ready to read.
    649 Loop: Values are now resultFDs 1 errorFDs 0 setFDs 10.
    650 Loop: Mark - 1 - at 19:27:13:865.859 with 0 Ms elapsed.
    651 Loop: Going to rotate channels for proxy FD#9.
    652 Loop: Going to check the writable descriptors.
    653 Loop: Mark - 2 - at 19:27:13:865.970 with 0 Ms elapsed.
    654 Loop: Going to check the readable descriptors.
    655 Loop: Failure reading from descriptors for proxy FD#9.
    656 Loop: PANIC! No shutdown of proxy link performed by remote proxy.
    657 Loop: Closing proxy FD in process with pid '1958'.
    658 Loop: Closing agent FD in process with pid '1958'.
    659 Loop: Bytes received so far are 177.
    660 Loop: Going to clean up system resources in process '1958'.

You see this loop with the "Mark - #" entries occur several times, but during the failures, you get this PANIC message, preceded by the "Failure reading from descriptors for proxy FD#9."  With even more debug output (different than the one I sent before) you see:

   1008 Loop: Mark - 2 - at 16:49:35:110.529 with 1 Ms elapsed.
   1009 Loop: Going to check the readable descriptors.
   1010 Proxy: Checking descriptors selected for read.
   1011 Proxy: Looping with current channel 1.
   1012 Proxy: Going to read messages from proxy FD#9.
   1013 Proxy: Decoding data from proxy FD#9.
   1014 Socket: Returning 0 bytes readable on FD#9.
   1015 Socket: Returning 0 bytes readable on FD#9.
   1016 ReadBuffer: Requested 0 bytes for FD#9 with readable 0 remaining 0 pending 0.
   1017 ReadBuffer: Buffer size is 65536 length 0 and start 3.
   1018 ReadBuffer: Buffer size is now 65536 length is 0 and start is 0.
   1019 ReadBuffer: Going to read 65536 bytes from FD#9.
   1020 Transport: Going to read 65536 bytes from FD#9.
   1021 Timestamp: Updating the current timestamp at 16:49:35:110.954.
   1022 Transport: No data read from FD#9.
   1023 Transport: Resetting buffer for FD#9 with size 16384 and capacity 16384.
   1024 Transport: Resetting buffer for FD#9 with size 16384 and capacity 16384.
   1025 ReadBuffer: Error detected reading from FD#9.
   1026 Proxy: Read result on proxy FD#9 is -1.
   1027 Proxy: Failure reading from proxy FD#9.
   1028 Loop: Failure reading from descriptors for proxy FD#9.
   1029 Loop: PANIC! No shutdown of proxy link performed by remote proxy.
   1030 Loop: Closing proxy FD in process with pid '3546'.
   1031 Loop: Closing agent FD in process with pid '3546'.
   1032 Loop: Bytes received so far are 3.
   1033 Loop: Going to clean up system resources in process '3546'.

What's interesting to me here is the Socket and ReadBuffer debug output - Socket says (line 1014) returning 0 bytes readable and ReadBuffer says it's requesting 0 bytes (1016), length 0 (1017 & 1018), but then it's going to try to read 65536 bytes (1019) when it's just said the length is 0?!  Then Transport tries to read 65536 bytes, but fails, saying No data read, then resets the buffer, then ReadBuffer detects an error, which causes Proxy to error out, which causes the Loop to error and start closing down.

My unfamiliarity with this code means that I could be chasing something completely unrelated, but my knowledge of software tells me that trying to read 65536 bytes from a buffer that has 0 bytes in it causes problems, and perhaps NX4 clients are causing the code to hit a corner case where it's reading data that shouldn't be there and this condition is unhandled in the code?

Thanks - Nick


--------
This e-mail may contain SEAKR Engineering (SEAKR) Confidential and Proprietary Information.  If this message is not intended for you, you are strictly prohibited from using this message, its contents or attachments in any way.  If you have received this message in error, please delete the message from your mailbox.  This e-mail may contain export-controlled material and should be handled accordingly.



More information about the FreeNX-kNX mailing list