[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: Extended Attributes and Access Control Lists

On Oct 30, 2001  01:27 -0500, Theodore Tso wrote:
> This is certainly an improvement over the existing scheme, although it
> does make the recalculation of the header hash slower than it is
> today, because you do end up having to caluclate the has on more stuff
> than is currently being used today.

Yes, but it doesn't increase the amount of data to be hashed by a large
amount - a few extra words for each EA name.

> This is also effectively an on-disk format change, which means it does
> impact the currently deployed base (although granted if we do want to
> make format changes, better to make them now rather than later).

I think Andreas G. said he was going to make an on-disk format change
relatively soon anyways, so it could go in at the same time.

> A simple one which I've been meaning to propose for some time is a
> very simple and straightforward compression technique.  Given that for
> "system attributes" such as ACL EA's, you don't want the user-mode
> code to be able to directly modify such EA's, anyway, there's no real
> need to give it an ASCII name such as "$acl" or "$defacl".  Why not
> just simply adopt a specialized convention where if e_name_len is
> zero, then e_name_index is treated as an 8 bit code value indicating
> the type of system attribute being used.  This implies a central
> registry for assigning values to names (so for example, someone would
> have to keep a central registry of 1 meaning $acl, and 2 meaning
> $defacl, etc., but we do that today with the filesystem feature flags,
> and it's really not all *that* onerous).

Sounds OK to me, but this would "save" us only a few bytes per EA at
most.  Given that we only allocate full disk blocks to EA storage, it
will at most allow us to store a few more bytes in the EA data.  If
it means we don't have to parse the on-disk EA name for each transaction,
it may be more worthwhile to make this change.

> The other change, which is not quite so fully fleshed out yet, is a
> change which Stephen and I have been privately discussing, but which
> we haven't publically floated just yet, mainly because it's not been
> baked yet.  The basic observation here is that the current EA sharing
> system works mainly because the main user of the EA's is the ACL
> subsystem, and most ACL's are the same across files.  However, if
> other things are stored in the EA, such as security ID's for SE Linux,
> or DMAPI tags, the ability to sharing may go down to zero, in which
> case you end up using an extra disk block for every single inode, and
> that gets rather painful.

When Andreas G. and I were discussing shared EA data a long time ago
(I will have to dig it out and re-read my arguments, also on acl-devel
mailing list) I was advocating something like this also.  At the time
I was worried about snapfs snapshot data, which is pretty much always
going to be unique for each inode.

I believe it went something like (when adding a new EA, or changing an
existing EA, exit when one of the conditions is true):
1)try to find an existing EA with the same header hash value (of the
  current EA data + new EA data), if so share entire block
2)if the current EA block is shared, and we are adding a new EA, add a
  new EA block with only this entry, and make an "EA block pointer" to
  the common EA block
3)if the current EA block is shared, and we are changing an EA, try to
  find an EA block with only the other entries, if so do (2)
5)if the current EA block is unshared, and we cannot find any matching
  entries, create a unique EA block.

The important change is the use of EA "pointers", which allow you to
have a 2-stage EA entry for a given inode.  At the time we discussed
this (probably still true now), it was possible to chain EA blocks
so that we have:


Which means if we have some inode-specific EA data, and some common EA
data, we can share the common stuff, but hold unique stuff separately.
The real tradeoff comes whether we will normally need EA larger than a
single disk block per inode.  If this is common (I don't know the
expected sizes of the EA attributes you refer to above, Ted) then we
will save space by sharing the common EA blocks.  If the average size
is <= one disk block, then this is false sharing, since we will always
need one EA block for the inode-unique data, so we may as well store
the "shared" (common) EA data there as well and avoid an extra I/O.

Andreas G. does the current EA implementation allow "indirect" EA
blocks, or is it limited to at most a single EA block per inode?

Cheers, Andreas
Andreas Dilger

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]