[linux-lvm] Bypassing LVM Restrictions - RAID6 With Less Than 5 Disks
john at stoffel.org
Mon May 9 00:18:57 UTC 2022
>>>>> "Alex" == Alex Lieflander <atlief at icloud.com> writes:
>> On May 7, 2022, at 4:41 PM, Stuart D Gathman wrote:
>>> On Fri, 6 May 2022, Alex Lieflander wrote:
>>> Thanks. I really don’t want to give up the DM-Integrity management. Less complexity is just a bonus.
>> What are you trying to get out of RAID6? If redundancy and integrity
>> are already managed at another layer, then just use RAID0 for striping.
>> I like to use RAID10 for mirror + striping, but I understand parity disks give redundancy without halving capacity. Parity means RMW cycles of
>> largish blocks, whereas straight mirroring (RAID1, RAID10) can write
>> single sectors without a RMW cycle.
Alex> I don’t trust the hardware I’m running on very much, but it’s
Alex> all I have to work with at the moment; it’s important that the
Alex> array is resilient to *any* (and multiple) single chunk
Alex> corruptions because such corruptions are likely to happen in the
Ouch! I hope you have good backups somewhere, because I suspect
you're doing to suffer a complete failure at some point.
Alex> For the last several months I’ve periodically been seeing
Alex> (DM-Integrity) checksum mismatch warnings at various locations
Alex> on all of my disks. I stopped using a few SATA ports that were
Alex> explicitly throwing SATA errors, but I suspect that the
Alex> remaining connections are unpredictably (albeit infrequently)
Alex> corrupting data in ways that are more difficult to detect.
This is interesting. And worrisome, because I would not expect moving
from one SATA port to another to cure problems, unless it was A)
moving to a different controller, or B) you changed/reseated the SATA
But I also wonder about your power supply and what it's rated for.
You might just be hitting the ragged edge of what it can supply, and
so you're running into problems with voltage dropping just enough to
make things slightly flaky.
Alex> I’ve tried to “check” and “repair” my array on multiple kernel
Alex> versions and live recovery USB sticks, but the “check" always
Alex> seems to freeze and all subsequent IO to the array hangs until
Alex> reboot; at the moment, a chunk is only ever made consistent when
Alex> its data is overwritten, so it needs to survive periodic, random
Alex> corruption for as long as possible.
This is also a warning to my that maybe you have power supply issues.
Can you give a summary of your hardware configuration and model
numbers? If you're running a smallish power supply, maybe look for a
replacement which can get you more power. Go from a 430W one to 600W,
or 500W to 750W and see if that makes a difference.
Looking at your data from before, I see you have 12 disks on the
system, 11 spinning disks and one nvme device. So I *really* suspect
you have an overloaded power supply.
Are you also using a disk controller? And which version of linux?
Alex> I also have a disk that infrequently fails to read from a
Alex> particular area, but the rest of the disk is fine. I wouldn’t
Alex> trust that disk with valuable data, but it seems like a perfect
Alex> candidate to hold additional parity (raid6_ls_6) that I
Alex> hopefully never need.
This is not how RAID6 parity works. The entire disk (or partition) is
used to write data and/or parity. It's RAID4 which dedicates a single
disk to parity duties. So thinking that a known flaky disk will be ok
for just parity use isn't really a good idea.
I'd also look at the output of 'smartctl --all /dev/sd<letter>' for
all your disks and see what the numbers say. But honestly, it sounds
like you have some serious hardware issues which you're trying to
paper over with DM-Integrity and RAID5. And I suspect it will all end
in tears sooner or later.
You do have backups of your data, right? Even onto a single new 10tb
disk that's now connected to the system all the time?
More information about the linux-lvm