[Cluster-devel] [Lsf-pc] [LSF/MM ATTEND] [TOPIC] fs/block interface discussions

Steven Whitehouse swhiteho at redhat.com
Wed Dec 10 14:13:02 UTC 2014


Hi,

On 10/12/14 12:48, Jan Kara wrote:
>    Hi,
>
> On Wed 10-12-14 11:49:48, Steven Whitehouse wrote:
>> I'm interested generally in topics related to integration between
>> components, one example being snapshots. We have snapshots at
>> various different layers (can be done at array level or dm/lvm level
>> and also we have filesystem support in the form of fs freezing).
>    Well, usually snapshots at LVM layer are using fs freezing to get a
> consistent image of a filesystem. So these two are integrated AFAICS.
>
>> There are a few thoughts that spring to mind - one being how this
>> should integrate with applications - in order to make it easier to
>> use, and another being whether we could introduce snapshots which do
>> not require freezing the fs (as per btrfs) for other filesystems too
>> - possibly by passing down a special kind of flush from the
>> filesystem layer.
>    So btrfs is special in its COW nature. For filesystems which do updates
> in place you can do COW in the block layer (after all that's what
> dm snapshotting does) but you still have to get fs into consistent state
> (that's fsfreeze), then take snapshot of the device (by setting up proper
> COW structures), and only then you can allow further modifications of the
> filesystem by unfreezing it. I don't see a way around that...
Well I think it should be possible to get the fs into a consistent state 
without needing to do the freeze/snapshot/unfreeze procedure. Instead we 
might have (there are no doubt other solutions too, so this is just an 
example to get discussion started) an extra flag on a bio, which would 
only be valid with some combination of flush flags. Then it is just a 
case of telling the block layer that we want to do a snapshot, and it 
would then spot the marked bio when the fs sends it down, and know that 
everything before and including that bio should be in the snapshot, and 
everything after that is not. So the fs would do basically a special 
form of sync, setting the flag on the bio when it is consistent - the 
question being how should that then be triggered? It means that there is 
no longer any possibility of having a problem if the unfreeze does not 
happen for any reason.

Perhaps the more important question though, is how it would/could be 
integrated with applications? The ultimate goal that I had in mind is 
that we could have a tool which is run to create a snapshot which will 
with a single command deal with all three (application/fs/block) layers, 
and it should not matter whether the snapshot is done via any particular 
fs or block device, it should work in the same way. So how could we send 
a message to a process to say that a snapshot is about to be taken, and 
to get a message back when the app has produced a consistent set of 
data, and to coordinate between multiple applications using the same 
block device, or even across multiple block devices, being used by a 
single app?


>> A more general topic is proposed changes to the fs/block interface,
>> of which the above may possibly be one example. There are a number
>> of proposals for new classes of block device, and new features which
>> will potentially require a different (or extended) interface at the
>> fs/block layer. These have largely been discussed to date as
>> individual features, and I wonder whether it might be useful to try
>> and bring together the various proposals to see if there is
>> commonality between at least some of them at the fs/block interface
>> level. I know that there have been discussions going on relating to
>> the individual proposals, so the idea I had was to try and look at
>> them from a slightly different angle by bringing as many of them as
>> possible together and concentrating on how they would be used from a
>> filesystem perspective,
>    Could you elaborate on which combination of features you'd like to
> discuss?
> 								Honza
Well there are a number that I'm aware of that are currently in 
development, but I suspect that this list is not complete:
  - SMR drives
  - persistent memory (various different types)
  - Hinting from fs to block layer for various different reasons 
(layout, compression, snapshots, anything else?)
  - Better i/o error reporting/recovery
  - copy offload
  - anything I forgot?

Steve.




More information about the Cluster-devel mailing list