HPTX: Watchdog ReInit Memory, Allocb Failure in ReInitMemory

  • Follow


I've been trying to help someone out with a problem they are having
with an SCO server freezing up. I would like to describe the problem
and what I've done. Looking for any suggestions...

HP Netserver LC3, SCO 5.0.5
HP PCI 10/100 Network Adapter D5013A (intel nic)

Server has been running for about 10 years. No changes have been made
to the server, either hardware-wise  or software-wise in the last few
years. The system is being wound down, and if anything, use of this
server has decreased over the last year. Lately, the server has
started freezing every few hours. Error repeatedly displayed on the
console is:

HPTX: Watchdog ReInit Memory 6 board 0 hptx - Warning: hptx: Allocb
Failure in ReInitMemory

System does not panic, but the console is unresponsive and all telnet
sessions are hung. First thing we did was replace the network card
with an identical model. No help. Researched the issue online and
found a  variety of possible causes/fixes.

Went back and did the following:

Reseated memory, all boards, checked all cables, cleaned up dust, etc.
ran netstat -m and found no failures being reported.
Increased the amount of STREAMS memory by increasing the NTSTRPAGES
kernel parameter.
Checked switches. Found a loop situation between two switches (two
uplinks). That exact scenario had been mentioned by someone as having
caused their problem, so I was hopeful. Also moved the network cable
to a 10/100 switch instead of a gigabit switch and have tried it in
all 3 switches they have.

After doing these things, it seemed to be fixed (I really thought it
was the switch loop). It worked fine for 4-5 days and then started
again. Initially not as frequent, but seeming to get more frequent.

I am going there tomorrow to run hardware diagnostics. Does anyone
have any suggestions of other things I could try?
0
Reply albracco (20) 9/22/2009 5:29:16 PM

AB wrote:
> I've been trying to help someone out with a problem they are having
> with an SCO server freezing up. I would like to describe the problem
> and what I've done. Looking for any suggestions...
> 
> HP Netserver LC3, SCO 5.0.5
> HP PCI 10/100 Network Adapter D5013A (intel nic)
> 
> Server has been running for about 10 years. No changes have been made
> to the server, either hardware-wise  or software-wise in the last few
> years. The system is being wound down, and if anything, use of this
> server has decreased over the last year. Lately, the server has
> started freezing every few hours. Error repeatedly displayed on the
> console is:
> 
> HPTX: Watchdog ReInit Memory 6 board 0 hptx - Warning: hptx: Allocb
> Failure in ReInitMemory
> 
> System does not panic, but the console is unresponsive and all telnet
> sessions are hung. First thing we did was replace the network card
> with an identical model. No help. Researched the issue online and
> found a  variety of possible causes/fixes.
> 
> Went back and did the following:
> 
> Reseated memory, all boards, checked all cables, cleaned up dust, etc.
> ran netstat -m and found no failures being reported.
> Increased the amount of STREAMS memory by increasing the NTSTRPAGES
> kernel parameter.
> Checked switches. Found a loop situation between two switches (two
> uplinks). That exact scenario had been mentioned by someone as having
> caused their problem, so I was hopeful. Also moved the network cable
> to a 10/100 switch instead of a gigabit switch and have tried it in
> all 3 switches they have.
> 
> After doing these things, it seemed to be fixed (I really thought it
> was the switch loop). It worked fine for 4-5 days and then started
> again. Initially not as frequent, but seeming to get more frequent.
> 
> I am going there tomorrow to run hardware diagnostics. Does anyone
> have any suggestions of other things I could try?

I'm leaning towards hardware myself - especially with netstat -m not 
reporting any failures..

I would replace the memory right off the bat, after throughly blowing 
out the RAM sockets. Be sure to use the same type of ECC RAM.

Check the operation of ALL the fan(s) - make sure they are not running 
too slowly or not at all. Replace the fans as needed.

Consider replacing the power supply - many times weirded out older 
machines problems can be traced to internal power supplies being out of 
spec after so many years of service.

Also check the UPS - replace it if it is as old as the machine. Do the 
blunt force approach to UPS testing:

1) Take the machine down into single user mode, or make sure no one is 
on the system and console is at a login.

2) Unplug the UPS from the wall plug or strip.

If it does not keep the system up at least 10 minutes replace the unit.

Outside chance: replace the NIC driver with a newer one from SCO's web 
site. Perhaps new machines on your network are sending out query 
packets, perhaps from SNMP software, that the old drivers do not 
understand or have implemented poorly.

Good luck.

-- 
----------------------------------------------------
Pat Welch, UBB Computer Services, a WCS Affiliate
            SCO Authorized Partner
            Microlite BackupEdge Certified Reseller
            Unix/Linux/Windows/Hardware Sales/Support
            (209) 745-1401 Cell: (209) 251-9120
            E-mail: patubb@inreach.com
----------------------------------------------------
0
Reply Pat 9/23/2009 12:16:14 AM


On 22 Sep, 19:29, AB <albra...@gmail.com> wrote:
> I've been trying to help someone out with a problem they are having
> with an SCO server freezing up. I would like to describe the problem
> and what I've done. Looking for any suggestions...
>
> HP Netserver LC3, SCO 5.0.5
> HP PCI 10/100 Network Adapter D5013A (intel nic)
>
> Server has been running for about 10 years. No changes have been made
> to the server, either hardware-wise =A0or software-wise in the last few
> years. The system is being wound down, and if anything, use of this
> server has decreased over the last year. Lately, the server has
> started freezing every few hours. Error repeatedly displayed on the
> console is:
>
> HPTX: Watchdog ReInit Memory 6 board 0 hptx - Warning: hptx: Allocb
> Failure in ReInitMemory
>
> System does not panic, but the console is unresponsive and all telnet
> sessions are hung. First thing we did was replace the network card
> with an identical model. No help. Researched the issue online and
> found a =A0variety of possible causes/fixes.
>
> Went back and did the following:
>
> Reseated memory, all boards, checked all cables, cleaned up dust, etc.
> ran netstat -m and found no failures being reported.
> Increased the amount of STREAMS memory by increasing the NTSTRPAGES
> kernel parameter.
> Checked switches. Found a loop situation between two switches (two
> uplinks). That exact scenario had been mentioned by someone as having
> caused their problem, so I was hopeful. Also moved the network cable
> to a 10/100 switch instead of a gigabit switch and have tried it in
> all 3 switches they have.
>
> After doing these things, it seemed to be fixed (I really thought it
> was the switch loop). It worked fine for 4-5 days and then started
> again. Initially not as frequent, but seeming to get more frequent.
>
> I am going there tomorrow to run hardware diagnostics. Does anyone
> have any suggestions of other things I could try?

The issue here is most likely with the NIC.

The issue could be caused by:

1) Some change on the LAN that is disagreeing with the NIC

or

2) Failing NIC hardware

For 1) you could try the latest driver (if you are not already
using it). I can see a driver at:

ftp://ftp.sco.com/pub/openserver5/drivers/OSR505/network/eeE/

from 2000 that may be the latest.

For 2) you could try replacing the card.

John
0
Reply John 9/23/2009 7:37:32 AM

On Sep 22, 1:29=A0pm, AB <albra...@gmail.com> wrote:
> I've been trying to help someone out with a problem they are having
> with an SCO server freezing up. I would like to describe the problem
> and what I've done. Looking for any suggestions...
>
> HP Netserver LC3, SCO 5.0.5
> HP PCI 10/100 Network Adapter D5013A (intel nic)
>
> Server has been running for about 10 years. No changes have been made
> to the server, either hardware-wise =A0or software-wise in the last few
> years. The system is being wound down, and if anything, use of this
> server has decreased over the last year. Lately, the server has
> started freezing every few hours. Error repeatedly displayed on the
> console is:
>
> HPTX: Watchdog ReInit Memory 6 board 0 hptx - Warning: hptx: Allocb
> Failure in ReInitMemory
>
> System does not panic, but the console is unresponsive and all telnet
> sessions are hung. First thing we did was replace the network card
> with an identical model. No help. Researched the issue online and
> found a =A0variety of possible causes/fixes.
>
> Went back and did the following:
>
> Reseated memory, all boards, checked all cables, cleaned up dust, etc.
> ran netstat -m and found no failures being reported.
> Increased the amount of STREAMS memory by increasing the NTSTRPAGES
> kernel parameter.
> Checked switches. Found a loop situation between two switches (two
> uplinks). That exact scenario had been mentioned by someone as having
> caused their problem, so I was hopeful. Also moved the network cable
> to a 10/100 switch instead of a gigabit switch and have tried it in
> all 3 switches they have.
>
> After doing these things, it seemed to be fixed (I really thought it
> was the switch loop). It worked fine for 4-5 days and then started
> again. Initially not as frequent, but seeming to get more frequent.
>
> I am going there tomorrow to run hardware diagnostics. Does anyone
> have any suggestions of other things I could try?

This sounds like a thermal problem: hard to diagnose more precisely
remotely, but I assume the fans were clear? Can you run it with the
lid off, and maybe an additional fan blowing on it, and keep it alive
long enough to do a complete system dump and transfer your license to
a new installation? You should upgrade to at least 5.0.7, anyway, for
a whole raft of other reasons, especially including a good SSH
implementation and the gnutools toolchain for running software
compatible with more modern operating systems.

Also, strongly consider virtualizing this thing: 10 year old hardware
is doubtless less capable than a virtual instance on a more modern
server, and VMware works very well now and has SCO OpenServer as a
supported operating system. (I begged them for this last year, and
there are good websites with the details on how to do it, some of
which I've contributed to.) That lets you use a more robust operating
system from a company still in business as your hardware control
layer, and keep the OpenServer in a portable instance that can be
moved to new hardware as needed.
0
Reply Nico 9/23/2009 11:59:33 AM

3 Replies
460 Views

(page loaded in 0.076 seconds)

Similiar Articles:










7/24/2012 10:11:55 PM


Reply: