Hi all. I've got a Netra440 with IPMP set up (config below). Every so
often I will get messages that the the interfaces fail and the group
fails. I only get the messages from IPMP, I don't get messages from
the physical layer saying anything is wrong with the connection. Also,
I have other connections on this server going to the same switch and
those appear to be fine. I am doing something I think is a bit
unorthodox. The multipathing is over a 16 bit subnet with no router.
The 2 individual addresses are 10.20.10.61 and 10.20.11.61 and they
share 10.20.10.60. Could that be the cause of my problem? I've never
seen that done before, but I don't really know why it couldn't. I'm
running Solaris 8. I've read that 108528 is a necessary patch for an
issue similar to this, and I'm currently running 108528-22.
Thanks.
ce1:
flags=78040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,NOFAILOVER,FAILED,STANDBY,INACTIVE>
mtu 1500
inet 10.20.10.61 netmask ffff0000 broadcast 10.20.255.255
groupname ipmp
ether 0:14:4f:25:dd:f8
ce5:
flags=18040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,NOFAILOVER,FAILED>
mtu 1500
inet 10.20.11.61 netmask ffff0000 broadcast 10.20.255.255
groupname ipmp
ether 0:14:4f:25:dc:c4
ce5:1: flags=10000843<UP,BROADCAST,RUNNING,MULTICAST,FAILED> mtu 1500
inet 10.20.10.60 netmask ffff0000 broadcast 10.20.255.255
|
|
0
|
|
|
|
Reply
|
bozothedeathmachine16 (49)
|
4/17/2007 9:08:26 PM |
|
On Apr 17, 4:08 pm, bozothedeathmachine
<bozothedeathmach...@gmail.com> wrote:
> Hi all. I've got a Netra440 with IPMP set up (config below). Every so
> often I will get messages that the the interfaces fail and the group
> fails. I only get the messages from IPMP, I don't get messages from
> the physical layer saying anything is wrong with the connection. Also,
> I have other connections on this server going to the same switch and
> those appear to be fine. I am doing something I think is a bit
> unorthodox. The multipathing is over a 16 bit subnet with no router.
> The 2 individual addresses are 10.20.10.61 and 10.20.11.61 and they
> share 10.20.10.60. Could that be the cause of my problem? I've never
> seen that done before, but I don't really know why it couldn't. I'm
> running Solaris 8. I've read that 108528 is a necessary patch for an
> issue similar to this, and I'm currently running 108528-22.
>
> Thanks.
>
> ce1:
> flags=78040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,NOFAILOVER,FAILED,STANDBY,INACTIVE>
> mtu 1500
> inet 10.20.10.61 netmask ffff0000 broadcast 10.20.255.255
> groupname ipmp
> ether 0:14:4f:25:dd:f8
> ce5:
> flags=18040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,NOFAILOVER,FAILED>
> mtu 1500
> inet 10.20.11.61 netmask ffff0000 broadcast 10.20.255.255
> groupname ipmp
> ether 0:14:4f:25:dc:c4
> ce5:1: flags=10000843<UP,BROADCAST,RUNNING,MULTICAST,FAILED> mtu 1500
> inet 10.20.10.60 netmask ffff0000 broadcast 10.20.255.255
Probe based IPMP requires ping partners, which by default is the
default router. The ping partner must reply consistently to ICMP
requests from mpathd. If there are consecutive requests without
replies, it is considered failed. You can specify ping partners by
defining static routes.
|
|
0
|
|
|
|
Reply
|
Adam
|
4/18/2007 5:04:58 AM
|
|
On Apr 18, 12:04 am, Adam Sanders <sanders.a...@gmail.com> wrote:
> On Apr 17, 4:08 pm, bozothedeathmachine
>
>
>
> <bozothedeathmach...@gmail.com> wrote:
> > Hi all. I've got a Netra440 with IPMP set up (config below). Every so
> > often I will get messages that the the interfaces fail and the group
> > fails. I only get the messages from IPMP, I don't get messages from
> > the physical layer saying anything is wrong with the connection. Also,
> > I have other connections on this server going to the same switch and
> > those appear to be fine. I am doing something I think is a bit
> > unorthodox. The multipathing is over a 16 bit subnet with no router.
> > The 2 individual addresses are 10.20.10.61 and 10.20.11.61 and they
> > share 10.20.10.60. Could that be the cause of my problem? I've never
> > seen that done before, but I don't really know why it couldn't. I'm
> > running Solaris 8. I've read that 108528 is a necessary patch for an
> > issue similar to this, and I'm currently running 108528-22.
>
> > Thanks.
>
> > ce1:
> > flags=78040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,NOFAILOVER,FAILED,STANDBY,INACTIVE>
> > mtu 1500
> > inet 10.20.10.61 netmask ffff0000 broadcast 10.20.255.255
> > groupname ipmp
> > ether 0:14:4f:25:dd:f8
> > ce5:
> > flags=18040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,NOFAILOVER,FAILED>
> > mtu 1500
> > inet 10.20.11.61 netmask ffff0000 broadcast 10.20.255.255
> > groupname ipmp
> > ether 0:14:4f:25:dc:c4
> > ce5:1: flags=10000843<UP,BROADCAST,RUNNING,MULTICAST,FAILED> mtu 1500
> > inet 10.20.10.60 netmask ffff0000 broadcast 10.20.255.255
>
> Probe based IPMP requires ping partners, which by default is the
> default router. The ping partner must reply consistently to ICMP
> requests from mpathd. If there are consecutive requests without
> replies, it is considered failed. You can specify ping partners by
> defining static routes.
I did a "route add net 10.20.0.0/16 10.20.10.50" on both machines.
Let's see if it works. Thanks.
|
|
0
|
|
|
|
Reply
|
bozothedeathmachine
|
4/18/2007 12:59:31 PM
|
|
On 2007-04-18, bozothedeathmachine <bozothedeathmachine@gmail.com> wrote:
> On Apr 18, 12:04 am, Adam Sanders <sanders.a...@gmail.com> wrote:
>> On Apr 17, 4:08 pm, bozothedeathmachine
>>
>>
>>
>> <bozothedeathmach...@gmail.com> wrote:
>> > Hi all. I've got a Netra440 with IPMP set up (config below). Every so
>> > often I will get messages that the the interfaces fail and the group
>> > fails. I only get the messages from IPMP, I don't get messages from
>> > the physical layer saying anything is wrong with the connection. Also,
>> > I have other connections on this server going to the same switch and
>> > those appear to be fine. I am doing something I think is a bit
>> > unorthodox. The multipathing is over a 16 bit subnet with no router.
>> > The 2 individual addresses are 10.20.10.61 and 10.20.11.61 and they
>> > share 10.20.10.60. Could that be the cause of my problem? I've never
>> > seen that done before, but I don't really know why it couldn't. I'm
>> > running Solaris 8. I've read that 108528 is a necessary patch for an
>> > issue similar to this, and I'm currently running 108528-22.
>>
>> > Thanks.
>>
>> > ce1:
>> > flags=78040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,NOFAILOVER,FAILED,STANDBY,INACTIVE>
>> > mtu 1500
>> > inet 10.20.10.61 netmask ffff0000 broadcast 10.20.255.255
>> > groupname ipmp
>> > ether 0:14:4f:25:dd:f8
>> > ce5:
>> > flags=18040843<UP,BROADCAST,RUNNING,MULTICAST,DEPRECATED,NOFAILOVER,FAILED>
>> > mtu 1500
>> > inet 10.20.11.61 netmask ffff0000 broadcast 10.20.255.255
>> > groupname ipmp
>> > ether 0:14:4f:25:dc:c4
>> > ce5:1: flags=10000843<UP,BROADCAST,RUNNING,MULTICAST,FAILED> mtu 1500
>> > inet 10.20.10.60 netmask ffff0000 broadcast 10.20.255.255
>>
>> Probe based IPMP requires ping partners, which by default is the
>> default router. The ping partner must reply consistently to ICMP
>> requests from mpathd. If there are consecutive requests without
>> replies, it is considered failed. You can specify ping partners by
>> defining static routes.
>
> I did a "route add net 10.20.0.0/16 10.20.10.50" on both machines.
> Let's see if it works. Thanks.
I *think* that they need to be static host routes, so if what you've
done doesn't work, try that (and let me know either way if you don't
mind).
Cheers,
Ceri
--
That must be wonderful! I don't understand it at all.
-- Moliere
|
|
0
|
|
|
|
Reply
|
Ceri
|
4/18/2007 5:11:15 PM
|
|
Ceri Davies wrote:
> On 2007-04-18, bozothedeathmachine <bozothedeathmachine@gmail.com> wrote:
>> On Apr 18, 12:04 am, Adam Sanders <sanders.a...@gmail.com> wrote:
>>> On Apr 17, 4:08 pm, bozothedeathmachine
>> I did a "route add net 10.20.0.0/16 10.20.10.50" on both machines.
>> Let's see if it works. Thanks.
>
> I *think* that they need to be static host routes, so if what you've
> done doesn't work, try that (and let me know either way if you don't
> mind).
In the past, I have dealt with default routes that would fail over in
case one of the routers went down. I was unable to get a guarantee from
the network group that the fail over would happen within the default
mpathd timeout of 15 seconds.
To get around the situation, like you suggested, I added static host
routes to the NAS head in that subnet, so if the routers were failing
over, the IPMP interfaces always had a failback host route that they
could ping. The likelihood of routers failing over and NAS being
unavailable at the same was considered low enough for this solution to
deemed acceptable.
The OP could also just use link based failure detection rather than
probe based detection
(http://docs.sun.com/app/docs/doc/816-4554/6maoq027k?a=view#emqra), thus
not requiring the use of static hosts routes.
Kind Regards,
Nathan Dietsch
|
|
0
|
|
|
|
Reply
|
Nathan
|
4/22/2007 11:28:53 AM
|
|
I'm getting all sort of errors in one the server's messages log
(below). I'm not quite sure if adding the static route worked.
Could those errors be indicative of a hardware problem?
Thanks again,
Ben..
Apr 23 11:56:58 utspptslee1 in.mpathd[38]: [ID 122137 daemon.error]
Improved failure detection time 23646 ms
Apr 23 11:56:59 utspptslee1 in.mpathd[38]: [ID 122137 daemon.error]
Improved failure detection time 11823 ms
Apr 23 11:56:59 utspptslee1 in.mpathd[38]: [ID 122137 daemon.error]
Improved failure detection time 10000 ms
Apr 23 12:01:59 utspptslee1 in.mpathd[38]: [ID 398532 daemon.error]
Cannot meet requested failure detection time of 10000 ms on (ine
t ce1) new failure detection time is 63042 ms
Apr 23 12:03:00 utspptslee1 in.mpathd[38]: [ID 122137 daemon.error]
Improved failure detection time 31521 ms
Apr 23 12:03:00 utspptslee1 in.mpathd[38]: [ID 122137 daemon.error]
Improved failure detection time 15760 ms
Apr 23 12:03:00 utspptslee1 in.mpathd[38]: [ID 122137 daemon.error]
Improved failure detection time 10000 ms
|
|
0
|
|
|
|
Reply
|
bozothedeathmachine
|
4/23/2007 6:17:18 PM
|
|
In article <1177352237.982077.310430@y5g2000hsa.googlegroups.com>,
bozothedeathmachine <bozothedeathmachine@gmail.com> wrote:
>I'm getting all sort of errors in one the server's messages log
>(below). I'm not quite sure if adding the static route worked.
>
>Could those errors be indicative of a hardware problem?
>
>Thanks again,
>Ben..
>
>Apr 23 11:56:58 utspptslee1 in.mpathd[38]: [ID 122137 daemon.error]
>Improved failure detection time 23646 ms
>Apr 23 11:56:59 utspptslee1 in.mpathd[38]: [ID 122137 daemon.error]
>Improved failure detection time 11823 ms
>Apr 23 11:56:59 utspptslee1 in.mpathd[38]: [ID 122137 daemon.error]
>Improved failure detection time 10000 ms
>Apr 23 12:01:59 utspptslee1 in.mpathd[38]: [ID 398532 daemon.error]
>Cannot meet requested failure detection time of 10000 ms on (ine
>t ce1) new failure detection time is 63042 ms
>Apr 23 12:03:00 utspptslee1 in.mpathd[38]: [ID 122137 daemon.error]
>Improved failure detection time 31521 ms
>Apr 23 12:03:00 utspptslee1 in.mpathd[38]: [ID 122137 daemon.error]
>Improved failure detection time 15760 ms
>Apr 23 12:03:00 utspptslee1 in.mpathd[38]: [ID 122137 daemon.error]
>Improved failure detection time 10000 ms
Silly question, but: does the router you added respond to pings or does
any device on your segment respond to broadcast pings? Is the system that
you're having issues with a relatively network idle system? I've noticed
issues like you've described mostly on systems that have low amounts of
traffic on a given IPMP grouping.
-tom
--
"You can only be -so- accurate with a claw-hammer." --me
|
|
0
|
|
|
|
Reply
|
ferric
|
4/26/2007 7:26:00 PM
|
|
|
6 Replies
323 Views
(page loaded in 0.096 seconds)
|