looping reading files on NFS systems

holle at almaden.ibm.com holle at almaden.ibm.com
Thu Oct 14 06:09:27 BST 1999



Stale NFS filehandle: you bumped into the "not-so-nice" part of NFS.
NFS works stateless that means, that it uses UDP messages to communicate to the
server. NFS clients now do a lot of stuff to prevent the network from going down
because of too much messages like "have I still got the right filehandle for
...../yadayada/foobar.cc ?" by caching or reading data ahead of time and storing
it on the client side. What happens every now and then is, that a UDP message
gets lost or times out due to whatever reason. UDP is the "unreliable" datagram
protocol which means *nobody* makes shure the that the message reaches its
destination. If you NFS client discovers that it has lost connection to the
server it has to "reconnect". That does not mean it was connected the whole
time, no NFS is stateless but your client might try to reconnect using the same
filehandle and just asking fro more data from the same file. But when the
connection is interrupted this filehandle is invalid, meaning the client has to
ask for a new filehandle.
That is all you see. Reopening the file should yield a new filedescriptor and a
new filehandle therefor the error message should go away. Still it could be a
really bad connection and you still get the same message :-)

In your case I would check the connection to the NFS server. Is it highly loaded
? Do you have enough space in you current working directory (where the process
runs) ? --> NFS clients tend to create "cache" files in the cwd called .nfs* ,
yes with a dot in the beginning so use ls -A to see them. When you are out of
space in cwd the NFS client cannot create these cache files and has to rerequest
the server everytime which adds lag and "instability". Furthermore, is the NFS
higly loaded ? Does the NFS server get enough processing time ? What do the
logfiles on both machines say about NFS timeouts or UDP errors ? Do and
/sbin/ifconfig <networkingdevice> (e.g. eth0) on both the client and the server
and check the fields "errors","dropped" and "overruns" they all should read 0
(zero). A high number of errors could indicate a bad connection or lots of
collisions on the network (too many hosts talking on the same "wire"), dropped
indicates generally that the local site is out of kernel-memory (oops) and at
the moment I do not know what overruns stand for, maybe internal buffer overruns
or so ... Check the version of your kernel "uname -r" and see if there is a more
recent major and minor version number out e.g. you are using 2.0.35 and 2.0.38
is out (see www.kernel.org for details). If so try upgrading to the latest
kernel with the same major/minor version number and a higher patchlevel. Do not
upgrade to another minor or major version number unless you want to spend a lot
of time just upgrading tools and tools and to ols and tools .... Do not upgrade
to a odd minor version number, since these are development kernels.
Often problems disappear if the kernel gets upgraded to the latest (same
major/minor) version, sometimes not.
Try to install "ethereal", **the** best gtk packet sniffer available, beats
ksniffer hands down :-) an d record your connection until it fails and then try
to see if you can detect something like packets that get sent more than once. If
that occurs very often somebody is either having problems with the hardware or
he is just not having enough processing power to fulfill the task....

- Holger

PS: Yes, I love networking stuff :-))



Massimo Morin <mmorin at schedsys.com> on 14/10/99 04:12:03

Please respond to kdevelop at barney.cs.uni-potsdam.de

To:   kdevelop <kdevelop at barney.cs.uni-potsdam.de>
cc:
Subject:  looping reading files on NFS systems




Hi,
        this is the second time it happends and I don't know how to fix
it.
I created a prj. When it is loading the template file for crating the
project stuff it loops on the following while.

in cproject.cpp
   900 void CProject::setKDevelopWriteArea(QString makefile){
   901   QString abs_filename = getProjectDir() + makefile;
   902   QFile file(abs_filename);
   903   QStrList list;
   904   bool found = false;
   905   QTextStream stream(&file);
   906   QString str;
   907
   908   if(file.open(IO_ReadOnly)){ // read the makefileam
   909     while(!stream.eof()){
   910       list.append(stream.readLine());
   911     }
   912   }
   913   file.close();
   914

Actually the abs_filename contains the right filename
(cpnet/cpnet/Makefile.am).
The source file is under NFS in a Linux box (rh5.0 kernel 2.2.10ac3)
it seems that if I do a cat of the file while in the loop I obtain
cat: /home/mmorin/Prg/Porting/cpnet/cpnet/Makefile.am: Stale NFS file
handle

so????
My system is a RH5.1 and I'm using kde 1.1.2, qt 1.44

Cheers
        Massimo


--
                                               _...__..-'
Massimo Morin                                .'
mmorin at schedsys.com                        .'
+1 (617) 484 2999                        .'
                                       .'
            .------._                 ;
      .-"""`-.<')    `-._           .'
     (.--. _   `._       `'---.__.-'
      `   `;'-.-'         '-    ._     Scheduling Systems Inc.
        .--'``  '._      - '   .       Three University Office Park
         `""'-.    `---'    ,          95 Sawyer Road
 ''--..__      `\                      Waltham, 02453 Massachusetts USA
         ``''---'`\      .'            +1 (781) 893-0390 x 126
                   `'. '               http://www.schedsys.com
                     `'.






More information about the KDevelop mailing list