[rhelv6-list] Nasty bug with writing to resyncing RAID-5 Array

Daryl Herzmann akrherz at iastate.edu
Mon Sep 3 14:28:38 UTC 2012


On Thu, Aug 16, 2012 at 1:08 PM, David C. Miller
<millerdc at fusion.gat.com> wrote:
>
>
> ----- Original Message -----
>> From: "Daryl Herzmann" <akrherz at iastate.edu>
>> To: "Red Hat Enterprise Linux 6 (Santiago) discussion mailing-list" <rhelv6-list at redhat.com>
>> Sent: Wednesday, August 15, 2012 7:32:15 AM
>> Subject: Re: [rhelv6-list] Nasty bug with writing to resyncing RAID-5 Array
>>
>> On Sun, Jun 24, 2012 at 12:48 PM, Stephen John Smoogen
>> <smooge at gmail.com> wrote:
>> > On 23 June 2012 11:04, Daryl Herzmann <akrherz at iastate.edu> wrote:
>> >> On Fri, Jun 22, 2012 at 4:03 PM, Stephen John Smoogen
>> >> <smooge at gmail.com> wrote:
>> >>> On 22 June 2012 14:10, daryl herzmann <akrherz at iastate.edu>
>> >>> wrote:
>> >>>> Howdy,
>> >>>>
>> >>>> The RHEL6.3 release notes have a curious entry:
>> >>>>
>> >>>> http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/6.3_Technical_Notes/kernel_issues.html
>> >>>>
>> >>>>  kernel component
>> >>>>
>> >>>>  Due to a race condition, in certain cases, writes to RAID4/5/6
>> >>>>  while the
>> >>>>  array is reconstructing could hang the system
>> >>>>
>> >>>> Wow, I am reproducing it frequently here.  Simply have a RAID-5
>> >>>> software
>> >>>> array and do some write IO to it, eventually things start
>> >>>> hanging and the
>> >>>> power button needs to be pressed.
>> >>>>
>> >>>> Oh man.
>> >>>
>> >>> Well the race condition they are mentioning should only happen
>> >>> when
>> >>> the RAID array is reconstructing. This sounds like a different
>> >>> bug/problem. What kind of disks, type of RAID etc.
>> >>
>> >> Thanks for the response.  I am not sure of the difference between
>> >> 'reconstructing' and 'resyncing' and/or 'syncing'.  The
>> >> reproducing
>> >> case was quite easy for me.
>> >>
>> >> 1. Create a software raid5
>> >> 2. Immediately then create a filesystem on this raid5, while init
>> >> sync underway
>> >> 3. IO to the RAID device eventually stops, even for the software
>> >> raid5 sync
>> >
>> > Ok reconstructing is where the initial RAID drives pair up with
>> > each
>> > other. Resyncing I believe is where a RAID which has been created
>> > is
>> > putting the data across its raid. Basic cat /proc/mdstat.. if there
>> > is
>> > a line ====> then you are reconstructing the disk array. In the
>> > example you give above, the disks would be reconstructing
>> >
>> > So the next thing to do is why you are able to trigger it
>> > constantly.
>> > That may be due to
>> > CPU Type:
>> > RAM Amount:
>> > Disk controllers:
>> > DIsk types (SATA, SAS, SCSI, PATA):
>> > RAID type:
>> > RAID layout (same controller, different controller, etc):
>>
>> I don't seem to have much issue reproducing, I just had another
>> machine do it this morning.  Nehalem processor, 12 GB ram, Dell
>> PowerEdge T400, Perc 6i controller, software raid 5, Seagate 2 TB
>> Barracuda drives...
>>
>> Does anybody have the bugzilla ticket associated with this or perhaps
>> a knowledge base article on it?
>>
>> daryl
>>
>
> I would like to know too. I have not seen this issue yet but I do have some large RAID6 arrays.

The private bugzilla tracking this is:

https://bugzilla.redhat.com/show_bug.cgi?id=828065

It appears the hope is to resolve this for the RHEL6.4 release.

daryl




More information about the rhelv6-list mailing list