RAID5 gets a bad rap

Gordon Messmer yinyang at eburg.com
Fri Jan 2 08:06:00 UTC 2009


Bill Davidsen wrote:
> Gordon Messmer wrote:
...
> No. Even in the worst case it would read N-2 blocks (you are writing a 
> new data block and calculating new parity), and two writes.

Let's just say that I've seen controllers behave in ways that I don't 
understand, and that I agree, the cost should not be as great as I 
previously estimated.

>> It doesn't matter whether you're writing new files or modifying 
>> existing files, because all of this happens at the block level.  It's 
>> especially bad on journalled filesystems, where writing to a file will 
>> update the files blocks, plus the filesystem's journal's blocks, and 
>> finally the filesystem's blocks.
>>
> No again. You read the parity block and the old data block, XOR first 
> the old then the new data with the parity block, and write the new data 
> and parity.

Yes, I understand what you're saying, but that in no way contradicts 
what I wrote there.  Regardless of whether you create a new file or 
modify an existing file, there will be changes made to the filesystem to 
reflect the fact that changes have been made.  If you modify a file, the 
inode's mtime is updated.  If you create a new file, then a new inode is 
written, and the directory entry is modified.  In both cases, the blocks 
which hold the file's data are written, the journal is written before 
the filesystem is updated, the filesystem is updated with the changes in 
the journal, and then the journal is modified again to mark it complete. 
  We can argue about how much overhead RAID5 has, but I don't think you 
can argue either that there is *no* overhead or that the filesystem is 
not a database.  Any given write to the disk will involve updating the 
journal twice and the filesystem once, which more or less creates the 
"small random writes" that RAID5 is so poor at performing.




More information about the fedora-list mailing list