barrier and commit options?

Ric Wheeler rwheeler at redhat.com
Sat Jan 31 12:45:06 UTC 2009


Theodore Tso wrote:
>>>> - If I remember the details correctly, Chris Mason has demonstrated a 
>>>> 50% chance of corruption directory entries in ext3 for example.
>>>>         
>
> Chris Mason has a script which forces the system to be under a lot of
> memory pressure, and in that scenario, it is highly likely that
> without barriers, there will be filesystem corruptions if the system
> is abruptly turned off while his script is running.
>
> Andrew Monrton has been resistant in making barriers=1 be the default
> for ext3 because (as I understand it) he disbelieves that this is an
> adequate real-world example, and there is a real performance hit to
> running without barriers.
>
>   
>>>> If you have a battery backed write cache (say, in a high end array) 
>>>> barriers can be ignored since the storage can effectively make that 
>>>> write cache non-volatile, but otherwise, this is pretty key for 
>>>> anyone wanting to maintain data integrity,
>>>>
>>>>         
>>> That's what I getting at, array controllers with a battery backed
>>> write cache (BBWC). We disable the write cache on the physical
>>> disks and provide no mechanism to re-enable the cache except in
>>> some SATA configurations.
>>>       
>
> Well, we still need the barrier on the block I/O elevantor side to
> make sure that requests don't get reordered in the block layer.  But
> what you're saying is that once the write is posted to the array, it
> is guaranteed that it is on "stable storage" (even if it is BBWC) such
> that if someone hits the Big Red Switch at the exit to the data
> center, and power is forcibly cut from the entire data center in case
> of a fire, the battery will still keep the cache alive, at least until
> the sprinklers go off, anyway, right?  :-)
>   

Yes, true....
> In that case, I suspect the right thing for the cciss array to do is
> to ignore the barrier, but not to return an error.  If you return an
> error, and refuse the write with barrier operation (which is what the
> cciss driver seems to be doing starting in 2.6.29-rcX), ext4 will
> retry the write without the barrier, at which point we are vulnerable
> to the block layer reordering things at the I/O scheduler layer.  In
> effect, you're claiming that every single write to cciss is implicitly
> a "barrier write" in that once it is received by the device, it is
> guaranteed not to be lost even if the power to the entire system is
> forcibly removed.
>
> 						- Ted
>
>
>   
Aren't barriers tied still to the state of the write cache on the target 
drive? In other words, if the write cache is off, we disable barriers 
automatically. I think that this happens for scsi in sd_revalidate_disk().

In this case, it sounds like we have tangled the need to flush a drive's 
write with the need to not re-order IO in the elevator code.

Ric





More information about the Ext3-users mailing list