software RAID: how do I know which HD has failed?

  • Follow


Hi, I got a message that one of my 3 drives on RAID 5 array has
failed. Specifically,

*************(from webmin alert) 

RAID device options 
Device file /dev/md3 
RAID level Redundant (RAID5) 
Status Active and mounted on / 
Persistent superblock? Yes 
Parity algorithm left-symmetric 
Chunk size 32 kB 
Partitions in RAID IDE device A partition 3 (Down) 
IDE device E partition 3 
IDE device G partition 3  

1)How do I know which hard drive inside my server is hard drive A? 
(If I replace the  wrong hard drive, doesn't that screw up the raid
array permanently?)

2)I have actually 3 different arrays: md1 (boot--raid1), md3 (/ raid
5, the problem RAID) and md4 (/home raid5). Curiously, only hda3 on
md3 seems to be down. Why would one of the only one of the partitions
on hda be down? (Wouldn't it affect the whole disk?). Does that imply
that even in the event of catastrophic failure, md4 (/home
directories) would be perfectly safe?)


Robert Nagle, idiotprogrammer, Texas
http://www.imaginaryplanet.net/weblogs/idiotprogrammer/index.php



 # cat /proc/mdstat
Personalities : [raid1] [raid5]
read_ahead 1024 sectors
md1 : active raid1 ide/host2/bus1/target0/lun0/part1[2]
ide/host2/bus0/target0/lun0/part1[1]
ide/host0/bus0/target0/lun0/part1[0]
      96256 blocks [2/2] [UU]

md3 : active raid5 ide/host2/bus1/target0/lun0/part3[2]
ide/host2/bus0/target0/lun0/part3[1]
      58588800 blocks level 5, 32k chunk, algorithm 2 [3/2] [_UU]

md4 : active raid5 ide/host2/bus1/target0/lun0/part4[2]
ide/host2/bus0/target0/lun0/part4[1]
ide/host0/bus0/target0/lun0/part4[0]
      55512448 blocks level 5, 32k chunk, algorithm 2 [3/3] [UUU]




www root # lsraid -a /dev/md3
lsraid: Unable to allocate memory while querying md device
[dev   9,   3] /dev/md3: Cannot allocate memory




Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/md3              58587008   7710480  50876528  14% /
tmpfs                 58587008   7710480  50876528  14%
/var/lib/init.d
/dev/md4              55510748  53822932   1687816  97% /home
none                    451716         0    451716   0% /dev/shm

/dev/md3 on / type reiserfs (rw,notail)
proc on /proc type proc (rw)
none on /dev type devfs (rw)
tmpfs on /var/lib/init.d type tmpfs (rw,mode=0644,size=2048k)
/dev/md4 on /home type reiserfs (rw,notail)
none on /dev/shm type tmpfs (rw)


#raid5 for 3 60gig hd
#/boot
raiddev /dev/md1
raid-level 1
nr-raid-disks 2
nr-spare-disks 1
chunk-size 32 
persistent-superblock 1
device /dev/hda1
raid-disk 0
device /dev/hde1
raid-disk 1
device /dev/hdg1
spare-disk 0

#note: swap is on hda2, hde2, hdg2
#/ partition
raiddev /dev/md3
raid-level 5
nr-raid-disks 3
nr-spare-disks 0
chunk-size 32
persistent-superblock 1
parity-algorithm left-symmetric
device /dev/hda3
raid-disk 0
device /dev/hde3
raid-disk 1
device /dev/hdg3
raid-disk 2

# /home partition raid5
raiddev /dev/md4
raid-level 5
nr-raid-disks 3
nr-spare-disks 0
persistent-superblock 1
parity-algorithm left-symmetric
chunk-size 32
device /dev/hda4
raid-disk 0
device /dev/hde4
raid-disk 1
device /dev/hdg4
raid-disk 2
0
Reply idiotprogrammer (15) 12/11/2003 6:43:40 PM

idiotprogrammer@yahoo.com (Robert Nagle) writes:

[...]

>Partitions in RAID IDE device A partition 3 (Down) 

[...]

That would be /dev/hda3, most probably. 
Your /etc/raidtab file shows /dev/md3 to be including
/dev/hda3, /dev/hde3 and /dev/hdg3. /proc/mdstat
shows [_UU], which is consistent with the output
from webmin. The first partition of the array has failed,
that is, /dev/hda3.


>2)I have actually 3 different arrays: md1 (boot--raid1), md3 (/ raid
>5, the problem RAID) and md4 (/home raid5). Curiously, only hda3 on
>md3 seems to be down. 

That's how it starts.

>Why would one of the only one of the partitions
>on hda be down? (Wouldn't it affect the whole disk?). 

It will, sooner or later. Your /dev/hda drive is dying,
but read/write operations on the other partitions are
still unaffected. I just had the same symptoms on a 
RAID1 setup; first it would simply exclude a single
partition, but some days later, the whole disk became
unreadable. No data loss due to replacement drive in
place, but an interesting experiance.

>Does that imply
>that even in the event of catastrophic failure, md4 (/home
>directories) would be perfectly safe?)

See above. Not for very much longer.

Michael

-- 
Michael Buchenrieder * mibu@scrum.greenie.muc.de * http://www.muc.de/~mibu
          Lumber Cartel Unit #456 (TINLC) & Official Netscum
    Note: If you want me to send you email, don't munge your address.
0
Reply mibu (101) 12/12/2003 6:17:22 AM


Now a stupid question, how do I know which is HDA and which is HDE?  

I have identical drives in hda and hde, and those two are connected to
IDE. (the other is connected to a PCI IDE).

I was told that if one drive already was down, if you remove the other
one, you make the RAID array permanently unusable.

That's why I want to be 100% sure. 

Are there any logs or commands I can run for better information. 


rj 
Robert Nagle, Houston, Texas
www.idiotprogrammer.com
0
Reply idiotprogrammer (15) 12/12/2003 8:35:41 PM

On 12 Dec 2003 12:35:41 -0800, Robert Nagle <idiotprogrammer@yahoo.com> wrote:
> Now a stupid question, how do I know which is HDA and which is HDE?  

hda : first ide controller, master
hdb : first, slave
hdc : second ide controller, master
hdd : second, slave
hde : third, master
hdf : third, slave
etc.



> 
> I have identical drives in hda and hde, and those two are connected to
> IDE. (the other is connected to a PCI IDE).
> 
> I was told that if one drive already was down, if you remove the other
> one, you make the RAID array permanently unusable.
> 
> That's why I want to be 100% sure. 
> 
> Are there any logs or commands I can run for better information. 
> 
> 
> rj 
> Robert Nagle, Houston, Texas
> www.idiotprogrammer.com
0
Reply TCS 12/13/2003 11:19:00 PM

On 12 Dec 2003 12:35:41 -0800, Robert Nagle <idiotprogrammer@yahoo.com> wrote:
> Now a stupid question, how do I know which is HDA and which is HDE?  
> 

also:  /proc/ide/hd*/model

0
Reply TCS 12/13/2003 11:20:26 PM

TCS <The-Central-Scrutinizer@p.o.b.o.x.com> writes:

> On 12 Dec 2003 12:35:41 -0800, Robert Nagle <idiotprogrammer@yahoo.com> wrote:
>> Now a stupid question, how do I know which is HDA and which is HDE?  
>> 
>
> also:  /proc/ide/hd*/model

That won't help if there are many of the same model, which is rather
common with RAID.

-- 
M�ns Rullg�rd
mru@kth.se
0
Reply mru6 (328) 12/13/2003 11:21:42 PM

On Sun, 14 Dec 2003 00:21:42 +0100, M�ns Rullg�rd <mru@kth.se> wrote:
> TCS <The-Central-Scrutinizer@p.o.b.o.x.com> writes:
> 
>> On 12 Dec 2003 12:35:41 -0800, Robert Nagle <idiotprogrammer@yahoo.com> wrote:
>>> Now a stupid question, how do I know which is HDA and which is HDE?  
>>> 
>>
>> also:  /proc/ide/hd*/model
> 
> That won't help if there are many of the same model, which is rather
> common with RAID.
> 
you can still peruse the /proc/ide tree and get a clue which drive letters
are assigned by which controllers.


0
Reply TCS 12/14/2003 12:20:17 AM

6 Replies
53 Views

(page loaded in 0.199 seconds)

Similiar Articles:













7/28/2012 1:10:12 AM


Reply: