[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: Extended Attributes and Access Control Lists

On Sun, 28 Oct 2001, Andreas Dilger wrote:

> > > - It seems that there are not enough files modified in the ext3 tree
> > >   to implement EAs properly (at least compared to the number of files
> > >   changed in the ext2 tree.  Is part of the patch missing?
> > 
> > I don't think so. It's seven files modified in ext2 as well as in
> > ext3. Which patches are you comparing?
> The Linus-kernel file (first one listed above).  It only includes the
> patch for the new file ext3/ext_attr.c.

I see...

This resulted from back-porting from linux-2.4.13-ac3 to linux-2.4.13.

> > > - In ext3_ext_attr_cmp() shouldn't it be enough to compare only the
> > >   hash values of each entry (or even better - only the hash value in
> > >   the header)?  Maybe the entry hash should include the name as well?
> > >   I don't think you will ever want to merge EAs with identical data
> > >   but different names.  It seems like we are comparing too many things.
> > 
> > The hash values in the header only differ if a hash collision occurs,
> > which hopefully is a very rare event. Therefore I'm not doing this check
> > at all, and directly go to the more detailed checks.
> I'm not sure I follow you.  I haven't looked at the code closesly, but I
> would think that the hash value in the EA block header was some product*
> of the hash values of each of the individual entries.

Yes it is.

> This allows you to compare two blocks for sharing simply by comparing
> the two header hash values.

The block hash is the hash key in the cache, so the cache lookup only
returns blocks with equal block hashes, or blocks whose block hashes

> If they are identical, only then do you need to compare each
> entry hash value separately.  The chances of collisions in both the
> header hash and all of the entry hashes is vanishingly small if there
> is more than a single EA in the block.

True. It's still necessary to compare the names and values, though. You
cannot rule out hash collisions.

> (*) the header hash could be some product of the entry hashes + each
>     entry name.  This would ensure that a header hash collision with
>     two blocks each with only one entry would not also be a collision
>     at the entry level.

Each entry hash hashes the entry's name and value. The header hash is a
combination of all entry hashes.

You seem to be proposing a two-level hashing scheme here. That would be an
improvement, as it would decrease the probability of collision. It would
also not allow to re-hash a block as it's done now: When a single entry
changes, only that entry's hash is recomputed, and then the block hash is
computed from al the entry hashes. With a two-level scheme the whole block
hash would have to be re-computed.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]