[dm-devel] RAID5 support ?
Heinz Mauelshagen
mauelshagen at redhat.com
Tue Oct 25 14:41:07 UTC 2005
On Mon, Oct 24, 2005 at 09:54:12AM +1000, Neil Brown wrote:
> On Saturday October 22, alanh at fairlite.demon.co.uk wrote:
> > > More usefully though, I'd be very happy to talk about how md/raid5 can
> > > be made to be sufficient. I'd be happy for it to integrate more
> > > closely with dm, if that was seen to be of value.
> >
> > That'd be useful Neil.
> >
> > I'll explain the problem.
> >
> > I've got a SIL3114 controller with 4 x 200GB drives attached. Now that
> > SIL controller supports RAID5. Given that I set the RAID support up in
> > the BIOS I can now boot from the array.
> >
> > If one of those disks die, I understand that the BIOS will still allow
> > me to boot from the array, even though the primary disk may have died.
> >
> > In the md/raid5 setup, I'm not sure that's the case and if you lose the
> > primary you have to muck about with your bootloader to fix things up.
>
> It seems the core problem here is that you need soft-raid5 in Linux
> which can work with the metadata that is stored by the BIOS on the SIL
> controller.
> This shouldn't be too hard to do, providing it is reasonably
> documented.
> 'md' has all the meta-data operations reasonably well factored out, so
> working with new formats shouldn't be difficult.
>
> I suspect that it would be best to have the code for understanding the
> metadata run in user-space rather than in the kernel - I gather that
> is what dmraid does.
Correct. It uses device-mapper, which lags RAID4 + 5 mappings so far, but I'm
working on this. Having those, we can cover the RAID5 ATARAID case for
many different ATARAID solutions in the given device-mapper/dmraid framwork.
Once I have first presentable code for a device-mapper RAID4 + 5 target
(hopefully next week after my return from te US), I'ld appreciate your
help on it.
>
> For raid5, we really need synchronous metadata updates when a device
> fails, as it is not really safe to write anything after the decision
> to fail a device, and before the metadata has been updated.
Yes, we need to store the information, which device failed, persistently
in order to identify it after a crash. In device-mapper, we have
IO suspend support to make that happen.
FYI: we keep information about which regions (arbitrary sized segments
of the address space) of the set are dirty with the the
device-mapper dirty-log so that we can resynchonize those at set startup.
>
> I am currently working on adding sysfs support to md and raid5 and
> would prefer to use this as the interface between md and a user-space
> metadata handler (though I could probably be convinced to work under
> the dm ioctls as well if that was important).
>
> So the enhancements that seem to be needed to md/raid5 would include:
>
> 1/ Introduce a new metadata type which the kernel doesn't read or
> write at all. When a write is required, it signals userspace
> somehow, and blocks writes until it is told to continue.
That's the default with device-mapper, which doesn't read/write any metadata
but keeps it to userspace.
>
> 2/ Allow all config information to be provided by userspace. The
> current SET_ARRAY_INFO is not quite up to the task. e.g. you
> cannot give a device offset through that interface.
>
>
> I plan to do (2) anyway, probably through sysfs, but maybe configfs -
> I'm not sure yet.
>
> (1) probably needs a bit more thought and some understanding on what
> the userspace metadata tool would require.
> I imagine having an event counter which is updated whenever a
> metadata update is required.
> The userspace tool would
> - read a number from the event-counter file
> - extract all the metadata information needed from sysfs
> - write it to the devices
> - write the original event-count to some other sysfs file.
We do have a dmeventd in libdevmapper already, which can be used to
cover this. Applications can register any mapped device with dmeventd
to be monitored. dmeventd will call into a shared library on any device
event (eg, failure). The library can carry out arbitrary scenarious
such as yours above.
>
> The kernel would not allow further writes until the number written
> to the second file matches the most current event counter, thus if
> multiple events happened while the metadata was being updated, we
> still wouldn't get out of sync.
>
> Of course, we wouldn't want to have to poll the event-counter
> file. We would need some more direct notification of change. As
> I am using sysfs, maybe some sort of hot-plug event... but I'll
> have to learn more about hot plug events first.
>
>
> Does any of this sound useful?
> Any other suggestions?
>
> NeilBrown
>
> --
> dm-devel mailing list
> dm-devel at redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel
--
Regards,
Heinz -- The LVM Guy --
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Heinz Mauelshagen Red Hat GmbH
Consulting Development Engineer Am Sonnenhang 11
Cluster and Storage Development 56242 Marienrachdorf
Germany
Mauelshagen at RedHat.com +49 2626 141200
FAX 924446
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
More information about the dm-devel
mailing list