How to handle if nfs hangs

  • Follow


Hi,
  I have an application which reads from the nfs mounted directory. My
program opens the file and read all the files one by one. This works
fine if the nfs is working fine.

Now if nfs does not work due to network problem or something my
program simply
hangs. One of the solution is to use nfs soft mount so that i get nfs
timeout
however i want my application to handle this.

if during read of a particular file, if nfs goes down, i want to skip
this particular file and proceed with the next file. Problem here is
read() block if it tries to stale nfs directory.

If i set an alarm()  , how will i make my application to come out of
blocked
read() if the timeout happens set by alarm()

TIA

Anand
0
Reply anand_ka_2000 9/17/2003 2:45:51 PM

On Wed, 17 Sep 2003 07:45:51 -0700, Anand wrote:

> Hi,
>   I have an application which reads from the nfs mounted directory. My
> program opens the file and read all the files one by one. This works fine
> if the nfs is working fine.
> 
> Now if nfs does not work due to network problem or something my program
> simply
> hangs. One of the solution is to use nfs soft mount so that i get nfs
> timeout
> however i want my application to handle this.
> 
> if during read of a particular file, if nfs goes down, i want to skip this
> particular file and proceed with the next file. Problem here is read()
> block if it tries to stale nfs directory.
> 
> If i set an alarm()  , how will i make my application to come out of
> blocked
> read() if the timeout happens set by alarm()

From the read(2) manpage:

RETURN VALUE
       On success, the number of bytes read is returned (zero indicates end of
       file), and the file position is advanced by this number.  It is not  an
       error  if  this  number  is smaller than the number of bytes requested;
       this may happen for example because fewer bytes are actually  available
       right  now  (maybe  because we were close to end‐of‐file, or because we
       are reading from a pipe, or from a terminal),  or  because  read()  was
       interrupted  by  a  signal.  On error, -1 is returned, and errno is set
       appropriately. In this case it is left  unspecified  whether  the  file
       position (if any) changes.

Thus, at least in theory, a signal should interrupt the read(2) call. 
Alternatively, you could use non-blocking IO (see the read(2) and fcntl(2)
manpages).

 - Aaron Isotton
-- 
http://www.isotton.com/

0
Reply Aaron 9/17/2003 3:53:58 PM


Aaron Isotton <aaron@isotton.com> wrote in message news:<pan.2003.09.17.15.53.55.368501@isotton.com>...
> On Wed, 17 Sep 2003 07:45:51 -0700, Anand wrote:
> 
> > Hi,
> >   I have an application which reads from the nfs mounted directory. My
> > program opens the file and read all the files one by one. This works fine
> > if the nfs is working fine.
> > 
> > Now if nfs does not work due to network problem or something my program
> > simply
> > hangs. One of the solution is to use nfs soft mount so that i get nfs
> > timeout
> > however i want my application to handle this.
> > 
> > if during read of a particular file, if nfs goes down, i want to skip this
> > particular file and proceed with the next file. Problem here is read()
> > block if it tries to stale nfs directory.
> > 
> > If i set an alarm()  , how will i make my application to come out of
> > blocked
> > read() if the timeout happens set by alarm()
> 
> From the read(2) manpage:
> 
> RETURN VALUE
>        On success, the number of bytes read is returned (zero indicates end of
>        file), and the file position is advanced by this number.  It is not  an
>        error  if  this  number  is smaller than the number of bytes requested;
>        this may happen for example because fewer bytes are actually  available
>        right  now  (maybe  because we were close to end‐of‐file, or because we
>        are reading from a pipe, or from a terminal),  or  because  read()  was
>        interrupted  by  a  signal.  On error, -1 is returned, and errno is set
>        appropriately. In this case it is left  unspecified  whether  the  file
>        position (if any) changes.
> 
> Thus, at least in theory, a signal should interrupt the read(2) call. 
> Alternatively, you could use non-blocking IO (see the read(2) and fcntl(2)
> manpages).
> 
>  - Aaron Isotton


Hi,
  If a read is blocking due the PROBLEM in NFS( ie NFS server is not
responding), the entire process hangs. Even alarm() set before the
read() does not work. The SIGALRM signal is not delivered so read()
keeps sleeping....Is there any way i can come out of the block read()?

TIA
Anand
0
Reply anand_ka_2000 9/18/2003 8:56:30 AM

On Thu, 18 Sep 2003 01:56:30 -0700, Anand wrote:

> Hi,
>   If a read is blocking due the PROBLEM in NFS( ie NFS server is not
> responding), the entire process hangs. Even alarm() set before the read()
> does not work. The SIGALRM signal is not delivered so read() keeps
> sleeping....Is there any way i can come out of the block read()?

Hmm.  mount(1) says:

       hard   The  program  accessing a file on a NFS mounted file system will
              hang when the server crashes. The process cannot be  interrupted
              or  killed unless you also specify intr.  When the NFS server is
              back online the program will continue undisturbed from where  it
              was. This is probably what you want.
                                                                                
       soft   This  option  allows the kernel to time out if the nfs server is
              not responding for some time. The time  can  be  specified  with
              timeo=time.   This  option  might  be  useful if your nfs server
              sometimes doesn’t respond or will be rebooted while some process
              tries  to  get  a  file from the server.  Usually it just causes
              lots of trouble.

It looks as if it you couldn't do anything if a hard-mounted nfs server
hangs.  Try experimenting with hard, soft and intr.

 - Aaron Isotton
-- 
http://www.isotton.com/

0
Reply Aaron 9/18/2003 9:32:01 AM

Hi,
  Thanks for your help. It seems the process itself can not do
anything. I was able to send interrup from other process and got some
successul. It is not possible for me to modify the entire code for to
create another  process to interrup it.

I guess i have no other option then to do a soft mount. This will not
need any changes in code.

Howeve i am still pondering, is there any way..........

Thanks,
Anand
0
Reply anand_ka_2000 9/18/2003 3:06:33 PM

4 Replies
1015 Views

(page loaded in 0.063 seconds)

Similiar Articles:













7/22/2012 8:14:01 AM


Reply: