What happens if raid gets broken?
Heinz Mauelshagen
mauelshagen at redhat.com
Tue Mar 29 15:21:56 UTC 2005
Heinz.Mauelshagen at T-Online.deOn Fri, Mar 25, 2005 at 02:12:33AM -0800, Molle Bestefich wrote:
> Roland wrote:
> > What happend if for example 1 disk of a mirror raid gets broken, has bad sectors?
> > Will there be an appropriate error message
>
> Probably up to the device mapper mirror target (dm-mirror.(k)o), not
> dmraid. Hopefully you will get an error message in the syslog, which
> you can use your favorite syslog daemon to direct to somewhere useful.
> Not sure what HighPoint's proprietary driver does. I think it tries
> to write the faulted sector back from the working drive immediately,
> thus freezing the operating system meanwhile. All personal guesswork,
> though. If you can't get an answer here, you could try the
> device-mapper mailing list.
That's basically right.
The mirror target in device-mapper needs some enhancements still
(ie, status reports on io failure, read balancing) and dmraid needs
a daemon, which monitors device failures at the device-mapper status
interface and handles those (eg, fire up a spare drive and rebuild the
set; try to store status information about the failure in the vendor
specific metadata).
>
> > and will the metadata be changed?
Not yet. dmraid daemon I'm working on will do.
> Probably ought to be, but I think it won't.
> AFAICT, dmraid currently only tells the device-mapper how to assemble
> RAID arrays, it doesn't stay alive in any way in order to reflect
> drive status to array metadata or such.
Correct for now.
I wanted to give users access to their data in the first development
step and add monitoring in a sencond one, which I'm working on now.
> And I'm pretty sure that
> dm-mirror doesn't do it. As I remember it, dmraid comes with good
> concise documentation, should be mentioned there.
>
> > I am running dmraid with a hpt37x mirror (raid 1) on 2.6.10 debian amd64.
>
> > When I copy some large files onto the raid, my computer "freezes" and I dont
> > get any message in syslog or dmesg. I load dmraid in verbose mode and also
> > have enabled debug symbols but dont see any
> > error message.
>
> > Whats wrong?
> Not sure. I've seen the exact same thing happen with HPT37x's with
> proprietary drivers, so perhaps it's a hardware kink that occurs under
> specific circumstances. Then again, maybe it's not, I've also seen
> numerous bugs in the Linux IDE layer.
And there's recent bug fixes to device-mapper as well.
>
> > Is this a problem of the device mapper?
> Could be. That or the HighPoint driver. How reproducible is the
> problem? If you have a backup or your data is expendable, your could
> try running parallel dd's to write out a large amount of data to each
> drive in parallel. If it still freezes, it's not the device-mapper
> ;-).
Yes, move layers out of the test configuration and nail the drives,
interfaces, cables, ...
>
> > Any Idea?
> Try upgrading to kernel 2.6.11, and upgrade the device-mapper too..
> I think the next step then is probably to enable SysRq support in
> your kernel, read a kernel debugging tutorial and see if you can find
> out where it's frozen / deadlocked / infinte-loop'ed / what not.
Definitely needed to make this transparent.
Hammering as close to the HW as possible is a good point to start with.
> If you really want to know what's happened, in order to make 100%
> sure that it doesn't occur again, you should of course debug against
> your current kernel version. Find the bug, and check for it's
> existance in newer versions of kernel / whatever. But if you go this
> path, you probably can't expect any help whatsoever from the kernel
> hackers or any such.
>
> HTH...
>
> _______________________________________________
> Ataraid-list mailing list
> Ataraid-list at redhat.com
> https://www.redhat.com/mailman/listinfo/ataraid-list
Regards,
Heinz -- The LVM Guy --
*** Software bugs are stupid.
Nevertheless it needs not so stupid people to solve them ***
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Heinz Mauelshagen Red Hat GmbH
Consulting Development Engineer Am Sonnenhang 11
56242 Marienrachdorf
Germany
Mauelshagen at RedHat.com +49 2626 141200
FAX 924446
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
More information about the Ataraid-list
mailing list