[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: Extended Attributes and Access Control Lists

On Oct 29, 2001  09:41 +0100, Andreas Gruenbacher wrote:
> > I'm not sure I follow you.  I haven't looked at the code closesly, but I
> > would think that the hash value in the EA block header was some product*
> > of the hash values of each of the individual entries.
> Yes it is.
> > This allows you to compare two blocks for sharing simply by comparing
> > the two header hash values.
> The block hash is the hash key in the cache, so the cache lookup only
> returns blocks with equal block hashes, or blocks whose block hashes
> collide.


> > If they are identical, only then do you need to compare each
> > entry hash value separately.  The chances of collisions in both the
> > header hash and all of the entry hashes is vanishingly small if there
> > is more than a single EA in the block.
> True. It's still necessary to compare the names and values, though. You
> cannot rule out hash collisions.
> > (*) the header hash could be some product of the entry hashes + each
> >     entry name.  This would ensure that a header hash collision with
> >     two blocks each with only one entry would not also be a collision
> >     at the entry level.
> Each entry hash hashes the entry's name and value. The header hash is a
> combination of all entry hashes.

OK, I think that my solution will fix this need to compare all of the entries.
I assume that you get an EA block back which matches the header hash we are
looking for.  Now we want to see if there is a collision between the header
hashes.  We _should_ be able to do this by only comparing the hashes of the
entries themselves, but if the header hash is only a function of the entry
hashes, and there is only one entry, a collision of entry hashes would give
us a collision at the header hash as well, which is undesirable.

To avoid this problem, when generating the header hash, you make it a function
of the entry hash and the entry name.  The entry hash itself hashes the
entry name and contents, right?  So identical entry hashes either mean
identical names and contents, or different names and contents.  If we re-hash
the entry name as part of the header hash (in addition to the entry hash)
the only chance of identical header hashes is if we have identical entry
hash + entry name, or a different entry hash + different entry name.

> You seem to be proposing a two-level hashing scheme here. That would be an
> improvement, as it would decrease the probability of collision. It would
> also not allow to re-hash a block as it's done now: When a single entry
> changes, only that entry's hash is recomputed, and then the block hash is
> computed from al the entry hashes. With a two-level scheme the whole block
> hash would have to be re-computed.

My idea IS a two-level hash, but it takes advantage of the fact that we
have already hashed the entry contents previously.  This still makes
calculating the header hash relatively quick - we don't need to re-hash
all the contents, only the content of the entry that has changed (we do
this already to generate the entry hash), the names of all the entries,
and the entry hashes.

This also ensures that we only need to compare the hashes to ensure
uniqueness even with only a single entry, because if the header hash
matches and the entry hash matches, we know that the two entries have
the same name.

Cheers, Andreas
Andreas Dilger

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]