From vcaron at bearstech.com  Fri Aug 19 15:02:24 2016
From: vcaron at bearstech.com (Vincent Caron)
Date: Fri, 19 Aug 2016 17:02:24 +0200
Subject: e2find: new ext2/3/4 tool for fast directory entry iterations
Message-ID: <c426f837-b21f-d96f-aec4-d232ae169f2b@bearstech.com>

Hello ext users,

  in a recurrent need to be able to traverse large filesystems (10-350M
inodes) backed by spindle-based RAID arrays, I tried several solutions
(like intercepting readdir and sorting by inode, playing with cache
hints, and such), to no avail.

  Since I'm mostly facing ext3 and ext4 filesystems, I wrote a tool
based on libe2fs which replaces the 'find /' part by directly going to
the ext data structures. It has been working great for me for several
months, and I've published it at https://github.com/bearstech/e2find

  There's a data safety issue I'm not quite sure. I'm using libe2fs to
open read-only blockdevices which are mounted and actively written to.
It's obviously unsafe (not from the data-loss p.o.v., but from the
coherent-retrieved-data p.o.v.), but I've never encountered a situation
where e2find would spit incoherent information : all retrieved filenames
exist as seen from the VFS, except those deleted between the enumeration
and resolution phases of e2sync. Since I'm in userland and not locking
any on-disk data structure I'm reading, I wonder what kind of suprises I
should expect in the retrieved data.



From bothie at gmx.de  Sun Aug 21 11:31:12 2016
From: bothie at gmx.de (Bodo Thiesen)
Date: Sun, 21 Aug 2016 13:31:12 +0200
Subject: e2find: new ext2/3/4 tool for fast directory entry iterations
In-Reply-To: <c426f837-b21f-d96f-aec4-d232ae169f2b@bearstech.com>
References: <c426f837-b21f-d96f-aec4-d232ae169f2b@bearstech.com>
Message-ID: <20160821133112.3781364d@phenom>

* Vincent Caron <vcaron at bearstech.com> hat geschrieben:

> Since I'm in userland and not locking
> any on-disk data structure I'm reading, I wonder what kind of suprises I
> should expect in the retrieved data.

Exactly that kind of surprises, you're expecting anyways: Old data
where committed data exist that has not been written to it's
target location yet or data, that has been overwritten in mean time. Since
there is almost no restriction on the "wrong" data (could be mp3, could be
part of a ext2 image file looking exactly like the data you're expecting
to see - no way to know for sure) you can see *anything*.

For an ext2 fs with journal, you could try interpreting the journal and
fixup your cache to bring it up to date like this:

1. Get a copy of the journal
2. Read the blocks you're interested in (i.e. do the normal traversing
   step).
	-> from time to time, get a new copy of the journal, check what
	changed, process the changes. This also means, you need to keep
	some meta data about when and where you got your data from, so you
	can actually fixup stuff. Remember: While traversing, you can read
	any kind of trash.
3. Upon completion of 2. get a final copy of the journal to bring
   your cache up-to-date.

The funny thing about this aproach: By repeating step 3 you could keep
your cache up to date without any need of retraversing the file system at
any time again as long as your check interval is short enough so you don't
miss any journal updates.

I leave the details to you're implementation skills, since I don't know
what your strategies in e2find are.

Regards, Bodo



From vcaron at bearstech.com  Tue Aug 23 09:14:46 2016
From: vcaron at bearstech.com (Vincent Caron)
Date: Tue, 23 Aug 2016 11:14:46 +0200
Subject: e2find: new ext2/3/4 tool for fast directory entry iterations
In-Reply-To: <20160821133112.3781364d@phenom>
References: <c426f837-b21f-d96f-aec4-d232ae169f2b@bearstech.com>
	<20160821133112.3781364d@phenom>
Message-ID: <37f001f2-81a6-2ce7-336a-bb7afa0a8824@bearstech.com>

Hello,

  thanks for your feedback !

On 21/08/16 13:31, Bodo Thiesen wrote:
> For an ext2 fs with journal, you could try interpreting the journal and
> fixup your cache to bring it up to date like this:
> 
> 1. Get a copy of the journal
> 2. Read the blocks you're interested in (i.e. do the normal traversing
>    step).
> 	-> from time to time, get a new copy of the journal, check what
> 	changed, process the changes. This also means, you need to keep
> 	some meta data about when and where you got your data from, so you
> 	can actually fixup stuff. Remember: While traversing, you can read
> 	any kind of trash.
> 3. Upon completion of 2. get a final copy of the journal to bring
>    your cache up-to-date.

  I see. However libe2fs has some support to create journals but does
not seem to have an API to read and interpret the journal inode data,
that would be much more complex to implement for me.


> I leave the details to you're implementation skills, since I don't know
> what your strategies in e2find are.

  e2find does not track any file<->block relationship, which I guess is
needed in my case to map back journal data to files, and I would meed
more I/O and way more memory to implement this. I feel I'm going to give
up on the journal idea and add a clear warning of this limitation of
e2find in its documentation ...