barrier and commit options?

Theodore Tso tytso at mit.edu
Fri Jan 30 22:02:45 UTC 2009


>>> - If I remember the details correctly, Chris Mason has demonstrated a 
>>> 50% chance of corruption directory entries in ext3 for example.

Chris Mason has a script which forces the system to be under a lot of
memory pressure, and in that scenario, it is highly likely that
without barriers, there will be filesystem corruptions if the system
is abruptly turned off while his script is running.

Andrew Monrton has been resistant in making barriers=1 be the default
for ext3 because (as I understand it) he disbelieves that this is an
adequate real-world example, and there is a real performance hit to
running without barriers.

>>> If you have a battery backed write cache (say, in a high end array) 
>>> barriers can be ignored since the storage can effectively make that 
>>> write cache non-volatile, but otherwise, this is pretty key for 
>>> anyone wanting to maintain data integrity,
>>>
>> That's what I getting at, array controllers with a battery backed
>> write cache (BBWC). We disable the write cache on the physical
>> disks and provide no mechanism to re-enable the cache except in
>> some SATA configurations.

Well, we still need the barrier on the block I/O elevantor side to
make sure that requests don't get reordered in the block layer.  But
what you're saying is that once the write is posted to the array, it
is guaranteed that it is on "stable storage" (even if it is BBWC) such
that if someone hits the Big Red Switch at the exit to the data
center, and power is forcibly cut from the entire data center in case
of a fire, the battery will still keep the cache alive, at least until
the sprinklers go off, anyway, right?  :-)

In that case, I suspect the right thing for the cciss array to do is
to ignore the barrier, but not to return an error.  If you return an
error, and refuse the write with barrier operation (which is what the
cciss driver seems to be doing starting in 2.6.29-rcX), ext4 will
retry the write without the barrier, at which point we are vulnerable
to the block layer reordering things at the I/O scheduler layer.  In
effect, you're claiming that every single write to cciss is implicitly
a "barrier write" in that once it is received by the device, it is
guaranteed not to be lost even if the power to the entire system is
forcibly removed.

						- Ted





More information about the Ext3-users mailing list