When is a block free?

Wed Oct 1 18:59:09 UTC 2008

On Wed, Oct 01, 2008 at 12:18:21PM -0600, Chris Worley wrote:
> 
> I was perusing David Woodhouse's 2.6.27-rc2 kernel at
> git://git.infradead.org/users/drzeus/discard-2.6.git, and noticed he
> has the discard built-in to where I was talking about for ext2... so I
> coded our driver to handle discards, and it works very nicely!!!

I'm not sure what you mean by "our driver"?

> The journaling issue you raise is not a show-stopper on the block
> device side: if the block device has to maintain a couple of blocks
> that are not really in use, it's no big deal (eventually the blocks
> will be re-written and the universe will be in order again)... for the
> users, I can understand if the discard is preserved on the block
> device, while the fs still thinks there's good data in there (we'll
> give you back all zeros on read).

It's no issue on the block device side at all, but from the user's
point of view it can be quite disastrous.  Consider the following
shell script:

   	cp /etc/passwd /etc/passwd.vipw
	vi /etc/passwd.vipw
	<sanity check /etc/passwd.vipw for correctness>
	# atomically update /etc/passwd
	mv /etc/passwd.vipw /etc/passwd

Now assume that we crash right after the "mv" command, but before the
transaction has committed.  The net result will be that the contents
of the /etc/passwd file will be all zeros, which some might
consider....  unfortuate.

This is exactly the same issue for why we can't just zero data blocks
on the unlink command, but instead have to wait until the unlink
operation has actually been committed in the journal.

						- Ted