What happens if raid gets broken?

Heinz Mauelshagen mauelshagen at redhat.com
Tue Mar 29 15:21:56 UTC 2005


Heinz.Mauelshagen at T-Online.deOn Fri, Mar 25, 2005 at 02:12:33AM -0800, Molle Bestefich wrote:
> Roland wrote:
> > What happend if for example 1 disk of a mirror raid gets broken, has bad sectors?
> > Will there be an appropriate error message
> 
> Probably up to the device mapper mirror target (dm-mirror.(k)o), not
> dmraid.  Hopefully you will get an error message in the syslog, which
> you can use your favorite syslog daemon to direct to somewhere useful.
>  Not sure what HighPoint's proprietary driver does.  I think it tries
> to write the faulted sector back from the working drive immediately,
> thus freezing the operating system meanwhile.  All personal guesswork,
> though.  If you can't get an answer here, you could try the
> device-mapper mailing list.

That's basically right.
The mirror target in device-mapper needs some enhancements still
(ie, status reports on io failure, read balancing) and dmraid needs
a daemon, which monitors device failures at the device-mapper status
interface and handles those (eg, fire up a spare drive and rebuild the
set; try to store status information about the failure in the vendor
specific metadata).

> 
> > and will the metadata be changed?

Not yet. dmraid daemon I'm working on will do.

> Probably ought to be, but I think it won't.
> AFAICT, dmraid currently only tells the device-mapper how to assemble
> RAID arrays, it doesn't stay alive in any way in order to reflect
> drive status to array metadata or such.

Correct for now.
I wanted to give users access to their data in the first development
step and add monitoring in a sencond one, which I'm working on now.

> And I'm pretty sure that
> dm-mirror doesn't do it.  As I remember it, dmraid comes with good
> concise documentation, should be mentioned there.
> 
> > I am running dmraid with a hpt37x mirror (raid 1) on 2.6.10 debian amd64.
> 
> > When I copy some large files onto the raid, my computer "freezes" and I dont
> > get any message in syslog or dmesg. I load dmraid in verbose mode and also
> > have enabled debug symbols but dont see any
> > error message.
> 
> > Whats wrong?
> Not sure.  I've seen the exact same thing happen with HPT37x's with
> proprietary drivers, so perhaps it's a hardware kink that occurs under
> specific circumstances.  Then again, maybe it's not, I've also seen
> numerous bugs in the Linux IDE layer.

And there's recent bug fixes to device-mapper as well.

> 
> > Is this a problem of the device mapper?
> Could be.  That or the HighPoint driver.  How reproducible is the
> problem?  If you have a backup or your data is expendable, your could
> try running parallel dd's to write out a large amount of data to each
> drive in parallel.  If it still freezes, it's not the device-mapper
> ;-).

Yes, move layers out of the test configuration and nail the drives,
interfaces, cables, ...

> 
> > Any Idea?
>   Try upgrading to kernel 2.6.11, and upgrade the device-mapper too..
>   I think the next step then is probably to enable SysRq support in
> your kernel, read a kernel debugging tutorial and see if you can find
> out where it's frozen / deadlocked / infinte-loop'ed / what not.

Definitely needed to make this transparent.
Hammering as close to the HW as possible is a good point to start with.

>   If you really want to know what's happened, in order to make 100%
> sure that it doesn't occur again, you should of course debug against
> your current kernel version.  Find the bug, and check for it's
> existance in newer versions of kernel / whatever.  But if you go this
> path, you probably can't expect any help whatsoever from the kernel
> hackers or any such.
> 
> HTH...
> 
> _______________________________________________
> Ataraid-list mailing list
> Ataraid-list at redhat.com
> https://www.redhat.com/mailman/listinfo/ataraid-list

Regards,
Heinz    -- The LVM Guy --

*** Software bugs are stupid.
    Nevertheless it needs not so stupid people to solve them ***

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

Heinz Mauelshagen                                 Red Hat GmbH
Consulting Development Engineer                   Am Sonnenhang 11
                                                  56242 Marienrachdorf
                                                  Germany
Mauelshagen at RedHat.com                            +49 2626 141200
                                                       FAX 924446
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-




More information about the Ataraid-list mailing list