[rhelv6-list] Nasty bug with writing to resyncing RAID-5 Array

Stephen John Smoogen smooge at gmail.com
Sun Jun 24 17:48:49 UTC 2012


On 23 June 2012 11:04, Daryl Herzmann <akrherz at iastate.edu> wrote:
> On Fri, Jun 22, 2012 at 4:03 PM, Stephen John Smoogen <smooge at gmail.com> wrote:
>> On 22 June 2012 14:10, daryl herzmann <akrherz at iastate.edu> wrote:
>>> Howdy,
>>>
>>> The RHEL6.3 release notes have a curious entry:
>>>
>>> http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/6.3_Technical_Notes/kernel_issues.html
>>>
>>>  kernel component
>>>
>>>  Due to a race condition, in certain cases, writes to RAID4/5/6 while the
>>>  array is reconstructing could hang the system
>>>
>>> Wow, I am reproducing it frequently here.  Simply have a RAID-5 software
>>> array and do some write IO to it, eventually things start hanging and the
>>> power button needs to be pressed.
>>>
>>> Oh man.
>>
>> Well the race condition they are mentioning should only happen when
>> the RAID array is reconstructing. This sounds like a different
>> bug/problem. What kind of disks, type of RAID etc.
>
> Thanks for the response.  I am not sure of the difference between
> 'reconstructing' and 'resyncing' and/or 'syncing'.  The reproducing
> case was quite easy for me.
>
> 1. Create a software raid5
> 2. Immediately then create a filesystem on this raid5, while init sync underway
> 3. IO to the RAID device eventually stops, even for the software raid5 sync

Ok reconstructing is where the initial RAID drives pair up with each
other. Resyncing I believe is where a RAID which has been created is
putting the data across its raid. Basic cat /proc/mdstat.. if there is
a line ====> then you are reconstructing the disk array. In the
example you give above, the disks would be reconstructing

So the next thing to do is why you are able to trigger it constantly.
That may be due to
CPU Type:
RAM Amount:
Disk controllers:
DIsk types (SATA, SAS, SCSI, PATA):
RAID type:
RAID layout (same controller, different controller, etc):


> or another reproducer, which is more concerning:
>
> 1. Start a verify on a previously clean raid5
> 2. Do some write IO to the mounted device
> 3. Processes accessing that mount point lock up
> 4. Push the power button :(

That sounds like a seperate issue (or the same but they didn't label
it. In the above example your RAID array is past
construction/reconstruction and is ready.)

> I wonder how many people will hit this, once the first Sunday of July
> rolls around and software raid5's are auto-verified.
>
> daryl
>
> _______________________________________________
> rhelv6-list mailing list
> rhelv6-list at redhat.com
> https://www.redhat.com/mailman/listinfo/rhelv6-list



-- 
Stephen J Smoogen.
"The core skill of innovators is error recovery, not failure avoidance."
Randy Nelson, President of Pixar University.
"Years ago my mother used to say to me,... Elwood, you must be oh
so smart or oh so pleasant. Well, for years I was smart. I
recommend pleasant. You may quote me."  —James Stewart as Elwood P. Dowd




More information about the rhelv6-list mailing list