[FreeNX-kNX] NX performance issue

Tue Mar 20 07:39:39 UTC 2007

> terminal1 $ LD_LIBRARY_PATH=nxcomp/ 
> DISPLAY="nx/nx,link=lan,type=unix-raw,pack=no-pack:2" 
> nxproxy/nxproxy -C :2
> 
> terminal 2 $ LD_LIBRARY_PATH=/usr/NX/lib /usr/NX/bin/nxproxy 
> -S localhost:2
> 
> terminal 3 $ DISPLAY=127.0.0.1:2 vglrun -c 0 -dl +v glxgears
> 
> (Here was something that is strange. Even with -c 1 there was 
> lots of traffic on NX channel and bitrate was going up to 
> around 7 MB/s ...)

I get about 5 fps with a 1280x1024 display using this mode.  Still quite
slow, considering it should be simply moving the images through shared
memory or a local socket.

> And now how to measure bitrate with nxcomp:
> 
> apply the following patch and you'll get in errors file the 
> following logs:

Bitrate is very consistently about 25 million bytes/sec:

Loop: Bitrate is 25792295 B/s and 22160022 B/s in 5/30 seconds timeframes.
Loop: CPU time is 52844 Ms in select and 18355 in loop.

This is consistent with ~5 fps * 1280*1024 pixels/frame * 4 bytes/pixel

CPU usage would be about 42% based on the above, if I calculate right.  That
seems rather high for only 5 fps of output and no compression.  By
comparison, VirtualGL *with* compression could generate 10 fps of output
with that same level of CPU usage.

> Can you try to set the token size + limit higher:
> 
> int SetLinkLan()
> {
> [...]
>   control -> TokenSize  = 1536;
>   control -> TokenLimit = 24;

Tried with TokenSize = 65536 and TokenLimit = 256, but no perceivable
change.

> What method do you use to measure the bandwidth?

I run a simple OpenGL app (the old SphereMark demo from NVidia) using vglrun
-c 0 -sp {app} and look at the frame rate output.  glxgears demonstrates the
problem just as well, but the images that glxgears generates are easier to
compress and thus the frame rate is artificially high (by a few fps,
generally.)  SphereMark is more representative of the compression workload
that a real application might generate.

> fbxtest gives sometimes somehow strange results ...

Yeah, this is part of my confusion.  fbxtest gives consistently 25
Megapixels/sec on my remote connection, which should translate to 20 fps on
a full-screen display, but I'm not sure why VGL can't achieve that, since
it's drawing images using fbx.

> Anyway as direct method is faster anyway, it can be combined 
> with nxagent. I tried it and it works.
> 
> The only thing that needs to be done (and I did it manually 
> for now) is the translation of internal nxagent windows to 
> external windows, but it should be really easy to have an 
> interface to get this - even hacky over XGetAtom ;-).
> 
> That works then in both modes rootless (seamless) and normal.
> 
> That would be the combined solution you thought of in the README ;-).

This idea intrigues me, because it would potentially allow me to implement
quad-buffered stereo as I currently do with Direct Mode.  The real question,
though, is how to make it work with the Windows NX client (the "free as in
beer" client from NoMachine.)  Setting a window property containing the
Win32 window handle would be easy enough, but NXWin doesn't appear to allow
any outside X server connections, so I'm not sure how VGLclient could talk
to it.  Apart from that, though, this seems like the right idea -- probably
easier than diagnosing nxproxy performance issues.

> My further (far, far, away) idea was even to stream the 
> images via UDP / RTP as an image can get lost without 
> problems and this opengl-stream has much more characteristics 
> of video than of anything else.

One of the other developers had this idea as well.  But my only experience
with trying to implement a streaming video protocol over UDP was a bad one,
so any assistance with this would be greatly appreciated.