[rhelv6-list] Nasty bug with writing to resyncing RAID-5 Array

Daryl Herzmann akrherz at iastate.edu
Wed Aug 15 14:32:15 UTC 2012


On Sun, Jun 24, 2012 at 12:48 PM, Stephen John Smoogen <smooge at gmail.com> wrote:
> On 23 June 2012 11:04, Daryl Herzmann <akrherz at iastate.edu> wrote:
>> On Fri, Jun 22, 2012 at 4:03 PM, Stephen John Smoogen <smooge at gmail.com> wrote:
>>> On 22 June 2012 14:10, daryl herzmann <akrherz at iastate.edu> wrote:
>>>> Howdy,
>>>>
>>>> The RHEL6.3 release notes have a curious entry:
>>>>
>>>> http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/6.3_Technical_Notes/kernel_issues.html
>>>>
>>>>  kernel component
>>>>
>>>>  Due to a race condition, in certain cases, writes to RAID4/5/6 while the
>>>>  array is reconstructing could hang the system
>>>>
>>>> Wow, I am reproducing it frequently here.  Simply have a RAID-5 software
>>>> array and do some write IO to it, eventually things start hanging and the
>>>> power button needs to be pressed.
>>>>
>>>> Oh man.
>>>
>>> Well the race condition they are mentioning should only happen when
>>> the RAID array is reconstructing. This sounds like a different
>>> bug/problem. What kind of disks, type of RAID etc.
>>
>> Thanks for the response.  I am not sure of the difference between
>> 'reconstructing' and 'resyncing' and/or 'syncing'.  The reproducing
>> case was quite easy for me.
>>
>> 1. Create a software raid5
>> 2. Immediately then create a filesystem on this raid5, while init sync underway
>> 3. IO to the RAID device eventually stops, even for the software raid5 sync
>
> Ok reconstructing is where the initial RAID drives pair up with each
> other. Resyncing I believe is where a RAID which has been created is
> putting the data across its raid. Basic cat /proc/mdstat.. if there is
> a line ====> then you are reconstructing the disk array. In the
> example you give above, the disks would be reconstructing
>
> So the next thing to do is why you are able to trigger it constantly.
> That may be due to
> CPU Type:
> RAM Amount:
> Disk controllers:
> DIsk types (SATA, SAS, SCSI, PATA):
> RAID type:
> RAID layout (same controller, different controller, etc):

I don't seem to have much issue reproducing, I just had another
machine do it this morning.  Nehalem processor, 12 GB ram, Dell
PowerEdge T400, Perc 6i controller, software raid 5, Seagate 2 TB
Barracuda drives...

Does anybody have the bugzilla ticket associated with this or perhaps
a knowledge base article on it?

daryl




More information about the rhelv6-list mailing list