What to expect with out-of-sync RAID devices?

  • Follow


Let a RAID-1 consist of two devices, set for autodetect (type 
'fd'). Assume that they get out-of-sync, for instance when the 
system is booted with only one of the devices connected, and then 
the other device is written to. Now the system is booted with 
both devices connected again. Then a degraded array is assembled 
at boot time. So much I found out in experiments.

It remains the question _which_ of the two devices is chosen for 
the degraded array? I observed different behavior, which partly 
seems systematic, but partly random. Maybe someone can explain 
what the general principle is.
0
Reply Snyder 1/30/2010 10:18:23 PM

On Jan 30, 5:18=A0pm, "Snyder" <inva...@invalid.invalid> wrote:
> Let a RAID-1 consist of two devices, set for autodetect (type
> 'fd'). Assume that they get out-of-sync, for instance when the
> system is booted with only one of the devices connected, and then
> the other device is written to. Now the system is booted with
> both devices connected again. Then a degraded array is assembled
> at boot time. So much I found out in experiments.
>
> It remains the question _which_ of the two devices is chosen for
> the degraded array? I observed different behavior, which partly
> seems systematic, but partly random. Maybe someone can explain
> what the general principle is.

If the RAID1 is configured correctly, it should never write to the
"degraded" part of the array. This is one of the  tricky parts of
software RAID: it still allows direct access to that part of the array
from the normal operating system tools. If you corrupt it behind the
back of software RAID, well, re-assembling it is gong to be a problem.
Normally that "disconnected" drive would be marked as out of sync at
boot time, and restoring the array would cause the active disk to be
mirrored to the second disk. That's why restoring the array takes so
long: it has to read all of one disk, and verify and potentially write
to all of the second one. But that kind of problem is inevitable if
you have removable drives in RAID1, such as USB drives.

Why did the drive go offline, and when? And are you using software or
hardware RAID? How did the second drive get written to?
0
Reply Nico 1/31/2010 2:45:48 PM


Nico Kadel-Garcia <nkadel@gmail.com> writes:
 
> If the RAID1 is configured correctly, it should never write to the
> "degraded" part of the array. This is one of the  tricky parts of
> software RAID: it still allows direct access to that part of the array
> from the normal operating system tools. If you corrupt it behind the
> back of software RAID, well, re-assembling it is gong to be a problem.
> Normally that "disconnected" drive would be marked as out of sync at
> boot time, and restoring the array would cause the active disk to be
> mirrored to the second disk. That's why restoring the array takes so
> long: it has to read all of one disk, and verify and potentially write
> to all of the second one. But that kind of problem is inevitable if
> you have removable drives in RAID1, such as USB drives.
 
> Why did the drive go offline, and when? And are you using software or
> hardware RAID? How did the second drive get written to?

I am using software RAID on two USB drives. I know that 
re-syncing can take ages; but I am prepared for this. Yet I must 
prevent that one drive gets written to and then the other one 
gets also written to, so that concurrent versions emerge and none 
of the two drives is the "old" one which can be safely 
overwritten with a mirror of the "current" one.

In other words: At each point in time, both drives must have the 
same content, or one of them must have only obsolete content.

Lets call the drives A and B. Assume that I remove drive B by 
pulling the USB plug. Then I do "touch current-drive" to mark 
the remaining drive. Then I shutdown the system, re-connect drive 
B and boot again. In all my experiments, this lead to a degraded 
array being assembled with partitions from drive A. So far this 
is what I needed. I can then re-add the partitions from drive B 
with something like "mdadm /dev/mdX -a /dev/sdXX".

However, I also did the following experiment: after pulling the 
plug on B, writing the file "current-drive" to A and finally 
shutting down, I booted with only B connected. The system got up 
and did its fsck (as expected, since the filesystems on B were 
not cleanly unmounted before). I then shut the system down, 
re-connected drive A and booted again.

In some cases, drive A was used to build the degraded array, and 
in some cases drive B was used. I did not detect a pattern here. 
This is not very convincing. One must keep in mind that this 
series of events may also occur unprovoked: just think of an 
unreliable USB hub.

You wrote that the "disconnected" drive would be marked as out of 
sync at boot time. I presume this looks like this:

md: kicking non-fresh sda1 from array!

But by what criteria is a drive being categorized as "non-fresh"?
0
Reply Snyder 1/31/2010 3:29:36 PM

2 Replies
370 Views

(page loaded in 0.053 seconds)

Similiar Articles:













7/21/2012 6:35:01 AM


Reply: