NFS getattr failed for server

  • Follow


	Hello,

	I use several diskless solaris 10 servers. All systems use nfs root
	partition exported by two linux servers. All systems run fine during
	a lot of hours, but sometimes, one of these servers writes on console :

	"NFS getattr failed for server..."

	and system hangs. I don't have any explanation. I'm sure that NFS
	server run because all other servers are not affected. Faulty server
	is randomly selected from all diskless servers.

	On nfs server (linux/sparc T1000), I haven't done special
	configuration, but on client side, I have forced NFSv2 (early boot
	stage) and remount all partitions in NFSv3/UDP :

/ on 192.168.1.254:/export/home/srv/solaris10-dvorak/
remote/read/write/setuid/devices/vers=3/proto=udp/llock/xattr/dev=50c0001
on mar mars 22 08:07:58 2011
/usr/dvorak-apps on 192.168.1.254:/export/home/dvorak/
remote/read/write/setuid/devices/vers=3/proto=udp/xattr/dev=50c0002
....

	In /etc/default/nfs, I have set NFs_CLIENT_VERSMAX to 3.

	Any explanation ?

	Best regards,

	JB
0
Reply JKB 3/22/2011 1:24:21 PM

Le Tue, 22 Mar 2011 13:24:21 +0000 (UTC),
JKB <jkb@koenigsberg.invalid> écrivait :
> 	Hello,
>
> 	I use several diskless solaris 10 servers. All systems use nfs root
> 	partition exported by two linux servers. All systems run fine during
> 	a lot of hours, but sometimes, one of these servers writes on console :
>
> 	"NFS getattr failed for server..."
>
> 	and system hangs. I don't have any explanation. I'm sure that NFS
> 	server run because all other servers are not affected. Faulty server
> 	is randomly selected from all diskless servers.
>
> 	On nfs server (linux/sparc T1000), I haven't done special
> 	configuration, but on client side, I have forced NFSv2 (early boot
> 	stage) and remount all partitions in NFSv3/UDP :
>
> / on 192.168.1.254:/export/home/srv/solaris10-dvorak/
> remote/read/write/setuid/devices/vers=3/proto=udp/llock/xattr/dev=50c0001
> on mar mars 22 08:07:58 2011
> /usr/dvorak-apps on 192.168.1.254:/export/home/dvorak/
> remote/read/write/setuid/devices/vers=3/proto=udp/xattr/dev=50c0002
> ...

	Console output contains :

NFS getattr failed for server 192.168.1.254: error 2 (RPC: Can't decode
result)
NFS lookup failed for server 192.168.1.254: error 2 (RPC: Can't decode
result)
Mar 23 07:48:58 dvorak last message repeated 22 times
Mar 23 07:48:58 dvorak nfs: NFS lookup failed for server 192.168.1.254:
error 2
(RPC: Can't decode result)

Mar 23 07:48:58 dvorak nfs: NFS getattr failed for server 192.168.1.254:
error 2
(RPC: Can't decode result)

Mar 23 07:49:22 dvorak last message repeated 17 times
Mar 23 07:49:23 dvorak nfs: NFS fsstat failed for server 192.168.1.254:
error 2
(RPC: Can't decode result)
Mar 23 07:49:23 dvorak sendmail[430]: filesys_update failed: Permission
denied,
fs=., avail=-1, blocksize=1163300
Mar 23 07:49:23 dvorak nfs: NFS getattr failed for server 192.168.1.254:
error 2
(RPC: Can't decode result)
Mar 23 07:49:23 dvorak nfs: NFS fsstat failed for server 192.168.1.254:
error 2
(RPC: Can't decode result)
Mar 23 07:49:23 dvorak nfs: NFS getattr failed for server 192.168.1.254:
error 2
(RPC: Can't decode result)
Mar 23 07:50:22 dvorak last message repeated 41 times

	Only solution : reboot Solaris 10.

	JKB

-- 
Si votre demande me parvient sur carte perforée, je titiouaillerai très
volontiers une réponse...
=> http://grincheux.de-charybde-en-scylla.fr
0
Reply JKB 3/23/2011 8:44:32 AM


JKB <jkb@koenigsberg.invalid> wrote:

> 	Only solution : reboot Solaris 10.


Which I think is the only solution.

Have you checked if statd and lockd are still running (ps -ef) on the 
clients when this happens?

My two-cent comment is you said you are using a linux box to export the
file systems. This probably is the major flaw. If linux ever got nfs to work
right, it would surprise me. Linux to linux is one thing but in the past we
tried using it to export to FreeBSD, Solaris, OSX and probably other
operating systems and always were running into mystery problems.

I haven't seen those console messages in years, but we used to have a number
of Dell Unix machines (a sys5r4 based unix, early to mid 90's) that
constantly had nfs breakdowns like that, similar anyway. Was just crap nfs
support.

On those it usually meant lockd failed on the clients, if not totally absent
(not running), it was running but was stuck in a loop, accumilating a lot of
cpu time. Killing off the process (if there) and manually starting it up
again fixed it for a while, but usually it meant to schedule the machine for
a reboot "soon".

I haven't seen it come up in years but there used to be a taboo of sorts
mixing up different unix families of nfs support, for servers and clients
anyway. Read-only is one thing, file systems mounted read/write were the
culprits.

-bruce
bje@ripco.com
0
Reply Bruce 3/23/2011 11:47:42 AM

Le Wed, 23 Mar 2011 11:47:42 +0000 (UTC),
Bruce Esquibel <bje@ripco.com> écrivait :
> JKB <jkb@koenigsberg.invalid> wrote:
>
>> 	Only solution : reboot Solaris 10.
>
>
> Which I think is the only solution.
>
> Have you checked if statd and lockd are still running (ps -ef) on the 
> clients when this happens?

	All clients are diskless servers, thus I cannot run ps without
	active nfs.

> My two-cent comment is you said you are using a linux box to export the
> file systems. This probably is the major flaw.

	Maybe, but the same configuration ran fine for more than 4 years
	with same Solaris 10 and Linux flavors.

	Regards,

	JKB

-- 
Si votre demande me parvient sur carte perforée, je titiouaillerai très
volontiers une réponse...
=> http://grincheux.de-charybde-en-scylla.fr
0
Reply JKB 3/23/2011 12:29:29 PM

JKB <jkb@koenigsberg.invalid> wrote:
>        Only solution : reboot Solaris 10.

That is not the solution. That is just a temporary fix. The solution
is to replace the Linux NFS servers with a working NFS servers.

I've yet to see a working Linux NFS server and likely never will.

Sami
0
Reply Sami 3/23/2011 2:51:02 PM

Le Wed, 23 Mar 2011 14:51:02 -0000,
Sami Ketola <Sami.Ketola@iki.finland.invalid> écrivait :
> JKB <jkb@koenigsberg.invalid> wrote:
>>        Only solution : reboot Solaris 10.
>
> That is not the solution. That is just a temporary fix. The solution
> is to replace the Linux NFS servers with a working NFS servers.

	I can't.

> I've yet to see a working Linux NFS server and likely never will.

	Why these NFS servers ran fine during four years ? (NFSv3)

	JKB

-- 
Si votre demande me parvient sur carte perforée, je titiouaillerai très
volontiers une réponse...
=> http://grincheux.de-charybde-en-scylla.fr
0
Reply JKB 3/23/2011 3:00:52 PM

JKB <jkb@koenigsberg.invalid> wrote:
> Le Wed, 23 Mar 2011 14:51:02 -0000,
> Sami Ketola <Sami.Ketola@iki.finland.invalid> �crivait :
>> JKB <jkb@koenigsberg.invalid> wrote:
>>>        Only solution : reboot Solaris 10.
>>
>> That is not the solution. That is just a temporary fix. The solution
>> is to replace the Linux NFS servers with a working NFS servers.
> 
>        I can't.

Why?

>> I've yet to see a working Linux NFS server and likely never will.
> 
>        Why these NFS servers ran fine during four years ? (NFSv3)

I've seen many systems run fine for years until they hit a bug.
Have you really not updated the systems for four years?

Maybe you should log a support call to your Linux vendor and Oracle.
They should be able to help you. Atleast Oracle has folks that can
analyze the NFS traffic for the problem.

Sami
0
Reply Sami 3/23/2011 7:18:29 PM

Le Wed, 23 Mar 2011 19:18:29 -0000,
Sami Ketola <Sami.Ketola@iki.finland.invalid> écrivait :
> JKB <jkb@koenigsberg.invalid> wrote:
>> Le Wed, 23 Mar 2011 14:51:02 -0000,
>> Sami Ketola <Sami.Ketola@iki.finland.invalid> écrivait :
>>> JKB <jkb@koenigsberg.invalid> wrote:
>>>>        Only solution : reboot Solaris 10.
>>>
>>> That is not the solution. That is just a temporary fix. The solution
>>> is to replace the Linux NFS servers with a working NFS servers.
>> 
>>        I can't.
>
> Why?

	Because I have two redondant T1000 that run as advanced routers with
	QoS (and virtual IP interfaces and round-robin default routes...) and
	I have never been able to stabilize this configuration with Solaris.

>>> I've yet to see a working Linux NFS server and likely never will.
>> 
>>        Why these NFS servers ran fine during four years ? (NFSv3)
>
> I've seen many systems run fine for years until they hit a bug.
> Have you really not updated the systems for four years?

	I have never updated Solaris for more than one year as these server
	are not connected to internet network. As NFS servers are not
	directly connected to internet (I have installed a firewall between
	NFS server and Internet), I haven't updated these servers.

> Maybe you should log a support call to your Linux vendor and Oracle.
> They should be able to help you. Atleast Oracle has folks that can
> analyze the NFS traffic for the problem.

	JKB

-- 
Si votre demande me parvient sur carte perforée, je titiouaillerai très
volontiers une réponse...
=> http://grincheux.de-charybde-en-scylla.fr
0
Reply JKB 3/24/2011 8:29:05 AM

7 Replies
1871 Views

(page loaded in 0.098 seconds)

Similiar Articles:













7/24/2012 8:21:05 AM


Reply: