Our V880 locked up for no apparent reason a couple of days ago, where
locked up means it would not respond on the network or the serial
console and had to be reset. There was nothing in the system logs to
indicate a problem, nor were there any power events logged on the UPS
(or by other machines in the same room), prtdiag showed the hardware it
tested to be ok, and SMART analysis of the disks showed no issues.
So today the vts diagnostics are being run. Everything passed except
nettest, which said this for eri0:
05/29/08 10:27:36 gec SunVTS5.1ps13: VTSID 6002 nettest.
ERROR eri0(/pci@9,700000/network@1,1): "No ICMP echo reply from
131.215.XX.YY."
Probable_Cause(s):
(1)bad network board
(2)system load too heavy
(3)no cable connection
(4)target machine too busy
Recommended_Actions:
(1)replace network board
(2)reduce system load or increase timeout time
(3)check cable connection
(4)reduce target machine load
We can rule out causes 2,3,4 . If this is a bad network board, it seems
to only want to fail for nettest. Despite what nettest says, the network
seems to be working just fine on the V880 (for instance, I ran vts
remotely over that very network). Nettest chose XX.YY to be one less
than the address of the V880, and that happened to be a printer. That
printer pings just fine not only from other machines, but from the V880.
% ping -I 1 131.215.XX.YY
Does not drop a single packet in 100 tries. Thinking that nettest might
send the packets faster than 1 per second, I tried it from a linux host
using
ping -i .01 131.215.XX.YY
and the printer was able to return pings that fast. (Ping on solaris
did not allow this test.) netstat -i shows a couple of errors:
Name Mtu Net/Dest Address Ipkts Ierrs Opkts Oerrs Collis
Queue
lo0 8232 loopback localhost 14562131 0 14562131 0 0
0
eri0 1500 gec gec 785446 0 723980 3 0
0
There were also a couple of funky messages in /var/adm/messages after
the reboot yesterday that I don't recall seeing before:
May 28 11:27:31 gec inetd[187]: [ID 965992 daemon.error] 100229/1-2/tcp:
unknown service
May 28 11:27:31 gec inetd[187]: [ID 965992 daemon.error] 100230/1/tcp:
unknown service
May 28 11:28:33 gec inetd[187]: [ID 965992 daemon.error] 100229/1-2/tcp:
unknown service
May 28 11:28:33 gec inetd[187]: [ID 965992 daemon.error] 100230/1/tcp:
unknown service
Is nettest right and the network board is going, or is nettest itself
messed up? I don't have a network loopback connector, so cannot run
netlbtest.
Opinions?
Thanks,
David Mathog
|
|
0
|
|
|
|
Reply
|
David
|
5/29/2008 5:56:47 PM |
|
David Mathog wrote:
> Is nettest right and the network board is going, or is nettest itself
> messed up? I don't have a network loopback connector, so cannot run
> netlbtest.
Made a network loopback connector (two actually, since I had to cut up a
patch cable to do it) and then ran netlbtest for a long time, during
which it logged no errors. So back to the regular nettest and
configured it to target the local network switch, instead of the printer
it picked by itself. Ran nettest 25 times, no errors. My best guess is
that even though the printer can normally handle a high rate of pings,
perhaps when the errors occurred it was accepting a print job, or
performing some other task, and so could not keep up with nettest.
Also, for future reference, in order to get the serial console to work
properly with "sunvts -t" through TeraTerm (or probably any other VT100
emulator) it was necessary to first do:
export TERM=vt100
otherwise TERM defaulted to "sun" and "sunvts -t" resulted in an
unusable mess of an interface.
Regards,
David Mathog
|
|
0
|
|
|
|
Reply
|
David
|
5/30/2008 5:23:12 PM
|
|
"David Mathog" <mathog@caltech.edu> wrote in message
news:g1mqp5$1ee$1@naig.caltech.edu...
> Our V880 locked up for no apparent reason a couple of days ago, where
> locked up means it would not respond on the network or the serial
> console and had to be reset. There was nothing in the system logs to
> indicate a problem, nor were there any power events logged on the UPS
> (or by other machines in the same room), prtdiag showed the hardware it
> tested to be ok, and SMART analysis of the disks showed no issues.
Probably not what you want to hear, but it could be the motherboard is bad.
Especially if when it locks up there is no way to send a break and collect a
core file.
All the CPU/Memory boards plug into this and the ethernet is onboard.
Depending on the P/N or rev level of your motherboard there are some known
issues with the onboard ethernet.
The info is here:
http://www.sunsolve.sun.com/search/document.do?assetkey=I1195
although access requires a SunSolve account.
Hopefully you have a Sun service contract.
I have no idea what such a part is worth, but even on eBay people are asking
a lot.
Trinean
|
|
0
|
|
|
|
Reply
|
Trinean
|
5/31/2008 10:51:51 PM
|
|
Trinean wrote:
> "David Mathog" <mathog@caltech.edu> wrote in message
> news:g1mqp5$1ee$1@naig.caltech.edu...
>> Our V880 locked up for no apparent reason a couple of days ago, where
>> locked up means it would not respond on the network or the serial
>> console and had to be reset. There was nothing in the system logs to
>> indicate a problem, nor were there any power events logged on the UPS
>> (or by other machines in the same room), prtdiag showed the hardware it
>> tested to be ok, and SMART analysis of the disks showed no issues.
>
> Probably not what you want to hear, but it could be the motherboard is bad.
> Especially if when it locks up there is no way to send a break and collect a
> core file.
Yeah, I wasn't happy that it wouldn't respond to a break. If the system
does it again fairly soon I'll have to put in a service call. In the
meantime the ethernet (re)tested clean.
> Depending on the P/N or rev level of your motherboard there are some known
> issues with the onboard ethernet.
> The info is here:
>
> http://www.sunsolve.sun.com/search/document.do?assetkey=I1195
Using an account linked to a service contract I still couldn't follow
that link. Is the link current?
Thanks,
David Mathog
|
|
0
|
|
|
|
Reply
|
David
|
6/2/2008 5:34:55 PM
|
|
|
3 Replies
201 Views
(page loaded in 0.118 seconds)
Similiar Articles: nettest fails, network works though, is nettest broken? - comp.sys ...David Mathog wrote: > Is nettest right and the network board is going, or is nettest itself > messed up? I don't have a network loopback connector, so cannot run ... MikTeX Broken? - comp.text.texaccess a network folder in the SYSTEM account process - comp.os.ms ... nettest fails, network works though, is nettest broken? - comp.sys ... access a ... batch processing ... matlabpool does not respond or does not execute in the right way ...nettest fails, network works though, is nettest broken? - comp.sys ..... of days ago, where locked up means it would not respond on ... David Mathog wrote: > Is nettest ... An unknown erorr occurred while accessing P:\ - comp.cad ...nettest fails, network works though, is nettest broken? - comp.sys ..... 27:31 gec inetd[187]: [ID 965992 daemon.error] 100229/1-2/tcp: unknown ... rate of pings ... nettest fails, network works though, is nettest broken? - comp.sys ...David Mathog wrote: > Is nettest right and the network board is going, or is nettest itself > messed up? I don't have a network loopback connector, so cannot run ... MikTeX Broken? - comp.text.tex | Computer Groupaccess a network folder in the SYSTEM account process - comp.os.ms ... nettest fails, network works though, is nettest broken? - comp.sys ... access a ... batch processing ... 7/14/2012 7:01:38 PM
|