mc/serviceguard cluster fails once

  • Follow


Hi there,

What about if by mistake the hpguy who configure my company's hp ux itanium2
with hp11v2 (11.23) use the same network lan to be the first heart beat
card. Is too bad?

We are moving some networks links these days, and we had on friday, by1 hour
a very high broadcast storm, and the main lan switch went down.. and, the
cluster fail.

When I noticed the oracle db was down, I do the normal things, I connect
first to the primary node, the cluster was up and running, but was no
package running, I start the package and fail, because the lvol by FC was
mount on the secondary node, I shutdown the package (just in case), umount
the lvol, and restart the package.

I want to notice first i'm newbie on hpux, I was choosed to this job to keep
an eye on the machines, currently i work only in linux, so this is my first
time with a 'real' unix.

Where I can start looking? I check the cluster conf and I noticed the
heartbeat thing on the lan connected to the switch.

from /etc/cmcluster/cmclconfig.ascii

NODE_NAME fns1

NETWORK_INTERFACE lan5

#Mylan

HEARTBEAT_IP 172.18.1.31

#My lan connected to the net/clients

NETWORK_INTERFACE lan1

NETWORK_INTERFACE lan4

HEARTBEAT_IP 30.30.30.1

NETWORK_INTERFACE lan8

HEARTBEAT_IP 20.20.20.1

###

The lan5, should be STATIONARY_IP ?

What happen if any heartbeat fail? send a 'are_you_there' ping using the
next heartbeat ?

Sorry for this big post.

�lvaro.




0
Reply Alvaro 5/18/2004 11:21:09 PM

Hi,

The best solution is to send heartbeat on all interface. 2 is a minimun and
preferably on different subnet (or vlan). Stationary (=interface not sending
heartbeat) is not a good idea.
Heartbeat must be able to reach other node in any failure scenario otherwize
reboot will occur on isolated node (disk lock lost).
For example, don't put all interfaces on the same switch.

Alain.


0
Reply nissan350z 5/19/2004 8:38:16 AM


Hi,

Thanks for reply.

What do you recommend I need to start checking?, I want to find why when the
main switch went down, the cluster made a fail-over. The cluster is using 3
interfaces for heartbeat, I check recently and the ping went pretty well.

Thanks for any help.

�lvaro.


"nissan350z" <ota1998@hotmail.com> wrote in message
news:40ab1d08$0$8406$a0ced6e1@news.skynet.be...
> Hi,
>
> The best solution is to send heartbeat on all interface. 2 is a minimun
and
> preferably on different subnet (or vlan). Stationary (=interface not
sending
> heartbeat) is not a good idea.
> Heartbeat must be able to reach other node in any failure scenario
otherwize
> reboot will occur on isolated node (disk lock lost).
> For example, don't put all interfaces on the same switch.
>
> Alain.
>
>


0
Reply Alvaro 5/19/2004 12:36:45 PM

"Alvaro Miranda" <acmirand@puc.cl> writes:

> Hi there,
> 
> What about if by mistake the hpguy who configure my company's hp ux itanium2
> with hp11v2 (11.23) use the same network lan to be the first heart beat
> card. Is too bad?
> 
> We are moving some networks links these days, and we had on friday, by1 hour
> a very high broadcast storm, and the main lan switch went down.. and, the
> cluster fail.

Welcome to the club! ;-)

Actually: do you have two independent LANs to connect your cluster? We
do have, and when one LAN goes mad (broadcast storm or netbuilder
module rebooting), the cluster won't go down, but packages that depend
on both LANs will be shut down. The way it's intended to be.

The only feature we would require is a "grace period" after a LAN
failure: A package that takes several minutes to shut down and restart
should not be switched if the LAN is down only for 10 seconds. HP told
us this is not possible at the moment, and it doesn't seem to be
planned for a future version.

> 
> When I noticed the oracle db was down, I do the normal things, I connect
> first to the primary node, the cluster was up and running, but was
> no

So your cluster wasn't down.

> package running, I start the package and fail, because the lvol by FC was
> mount on the secondary node, I shutdown the package (just in case), umount
> the lvol, and restart the package.

See the package logs in /etc/cmcluster (or similar) (and syslog)

> 
> I want to notice first i'm newbie on hpux, I was choosed to this job to keep
> an eye on the machines, currently i work only in linux, so this is my first
> time with a 'real' unix.

ServiceGuard definitely isn't for newbies.

> 
> Where I can start looking? I check the cluster conf and I noticed the
> heartbeat thing on the lan connected to the switch.

Package configuration, not cluster configuration.

Regards,
Ulrich

> 
> from /etc/cmcluster/cmclconfig.ascii
> 
> NODE_NAME fns1
> 
> NETWORK_INTERFACE lan5
> 
> #Mylan
> 
> HEARTBEAT_IP 172.18.1.31
> 
> #My lan connected to the net/clients
> 
> NETWORK_INTERFACE lan1
> 
> NETWORK_INTERFACE lan4
> 
> HEARTBEAT_IP 30.30.30.1
> 
> NETWORK_INTERFACE lan8
> 
> HEARTBEAT_IP 20.20.20.1
> 
> ###
> 
> The lan5, should be STATIONARY_IP ?
> 
> What happen if any heartbeat fail? send a 'are_you_there' ping using the
> next heartbeat ?
> 
> Sorry for this big post.
> 
> �lvaro.
0
Reply Ulrich 5/19/2004 12:57:35 PM

> What do you recommend I need to start checking?, I want to find why when
the
> main switch went down, the cluster made a fail-over. The cluster is using
3

The reason of your fail-over event is logged in /var/adm/syslog/syslog.log
and OLDsyslog.log.
Check also /etc/cmcluster/pkg_name/*logfile.

Usually you need to increase the default time-out (only 3 sec). Here at my
work, we have set it to 30 sec.
You can verify your settings with cmgetconf -v

Also, when you put back lan c�ble during your move, let time to the cluster
to recognize it before removing the next one.

A broadcast storm can easily cause a node to reboot except if a heartbeat
lan is still alive.




0
Reply Nissan 5/19/2004 2:43:05 PM

"Alvaro Miranda" <acmirand@puc.cl> writes:

> Hi,
> 
> Thanks for reply.
> 
> What do you recommend I need to start checking?, I want to find why when the
> main switch went down, the cluster made a fail-over. The cluster is using 3
> interfaces for heartbeat, I check recently and the ping went pretty
> well.

more $(find /etc/cmcluster -name '*.log')
grep cmcld: /var/adm/syslog/syslog.log
cmviewcl -v

Regards,
Ulrich
0
Reply Ulrich 5/25/2004 1:47:48 PM

5 Replies
409 Views

(page loaded in 0.393 seconds)

Similiar Articles:







7/25/2012 9:43:17 PM


Reply: