[FreeNX-kNX] Parallelize the calling of nxcheckload?

Matthew Richardson M.Richardson at ed.ac.uk
Wed Sep 2 15:03:34 UTC 2009


I'm looking at using the 'load' algorithm, and have looked at how
nxserver calls nxcheckload in the server_loadbalance_load() function.
At present it calls it in serial.  Our pool consists of around 50
servers - assuming that we used something simple like netcat to query
the servers, taking 1 second per server this would result in a 50 second
delay - longer if the nxcheckload processes hang for any reason.

I've included a sample bash script that does this kind of thing in
parallel - I thought I'd rather show what kind of action /could/ happen,
rather than try to patch it outright.  If people think this is a useful
change, then I can create a real patch for the function in nxserver.  I
obviously don't know the freenx code that well, e.g if SIGALRM etc is
already in use, hence my hesitance.

This script calls a command (returnvalue.sh), which simply sleeps for,
then echoes back, whatever number was sent to it. (i.e nxcheckload)

The main script spawns a child process for each entry in $SERVER, with a
random sleep amount on each one.  It then records the pids of the child
processes for later.  Each child is writing its output to its own
temporary file to $TMPFOLDER.  The parent waits for the child processes
to exit, then analyses their output. (i.e load results for a server).

However, the script also starts a timer in a subshell, which sleeps for
$ALARMTIME seconds, then signals the parent with SIGALRM.  When the
parent gets SIGALRM, it kills all the child processes.  This presents
children from hanging or taking longer than a chosen time to respond.

You can then decide from the results returned which server has the
lowest load.  If server took too long to respond, its safe to assume
that its probably overloaded.  It might be necessary to adjust this to
repeat the alarm timer if at the point of SIGALRM no results files had
been written at all.

Comments etc appreciated!

Thanks,

Matthew

#############################

#!/bin/bash

COMMAND="/tmp/returnvalue.sh"
SERVERS="a b c d"
TMPFOLDER="/tmp/server"
rm -rf $TMPFOLDER
mkdir -p $TMPFOLDER
#Timer length in seconds
export ALARMTIME=5
echo "Alarm timer set to $ALARMTIME seconds:"

PARENTPID=$$

exit_timeout() {
    echo "Alarm signal: killing children"
    for pid in ${CHILDPIDS[@]}; do
	    kill $pid >/dev/null 2>&1
    done
    parse_results
    exit
}

parse_results() {
    echo "Parsing the results"

    for file in $TMPFOLDER/*
    do
	result=$(cat "$file")
	server=$(basename "$file")
	echo "Server: $server, result: $result"
    done
}

#trap SIGALRM, call exit_timeout function
trap exit_timeout SIGALRM

#Counter for incrementing CHILDPIDS array
CHILDCOUNT=0

#Launch parallel commands
for i in $SERVERS
do
    CHILDCOUNT=$CHILDCOUNT+1

    #Generate a random number between 1 and 10
    RNDPLUS=$RANDOM+1
    RND=$((RNDPLUS%9+1))

    echo "$i sleeping for $RND seconds"

    #Run the cmd in the background, recording the output and its PID
    $(echo "SLEEP:$($COMMAND $RND)" >>$TMPFOLDER/$i) &
    CHILDPIDS[$CHILDCOUNT]=$!
done

#Alarm in a subprocess, then signal parent with SIGALRM
(sleep $ALARMTIME; kill -SIGALRM $PARENTPID) &
TPID=$!

#Wait for scripts to complete
wait ${CHILDPIDS[*]}

echo "Alarm never reached, children exited by themselves."

#Tidy up the Alarm subprocess
kill $TPID

parse_results



########################

-- 

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 260 bytes
Desc: OpenPGP digital signature
URL: <http://mail.kde.org/pipermail/freenx-knx/attachments/20090902/18270ab1/attachment.sig>


More information about the FreeNX-kNX mailing list