From vcaron at bearstech.com Fri Aug 19 15:02:24 2016 From: vcaron at bearstech.com (Vincent Caron) Date: Fri, 19 Aug 2016 17:02:24 +0200 Subject: e2find: new ext2/3/4 tool for fast directory entry iterations Message-ID: Hello ext users, in a recurrent need to be able to traverse large filesystems (10-350M inodes) backed by spindle-based RAID arrays, I tried several solutions (like intercepting readdir and sorting by inode, playing with cache hints, and such), to no avail. Since I'm mostly facing ext3 and ext4 filesystems, I wrote a tool based on libe2fs which replaces the 'find /' part by directly going to the ext data structures. It has been working great for me for several months, and I've published it at https://github.com/bearstech/e2find There's a data safety issue I'm not quite sure. I'm using libe2fs to open read-only blockdevices which are mounted and actively written to. It's obviously unsafe (not from the data-loss p.o.v., but from the coherent-retrieved-data p.o.v.), but I've never encountered a situation where e2find would spit incoherent information : all retrieved filenames exist as seen from the VFS, except those deleted between the enumeration and resolution phases of e2sync. Since I'm in userland and not locking any on-disk data structure I'm reading, I wonder what kind of suprises I should expect in the retrieved data. From bothie at gmx.de Sun Aug 21 11:31:12 2016 From: bothie at gmx.de (Bodo Thiesen) Date: Sun, 21 Aug 2016 13:31:12 +0200 Subject: e2find: new ext2/3/4 tool for fast directory entry iterations In-Reply-To: References: Message-ID: <20160821133112.3781364d@phenom> * Vincent Caron hat geschrieben: > Since I'm in userland and not locking > any on-disk data structure I'm reading, I wonder what kind of suprises I > should expect in the retrieved data. Exactly that kind of surprises, you're expecting anyways: Old data where committed data exist that has not been written to it's target location yet or data, that has been overwritten in mean time. Since there is almost no restriction on the "wrong" data (could be mp3, could be part of a ext2 image file looking exactly like the data you're expecting to see - no way to know for sure) you can see *anything*. For an ext2 fs with journal, you could try interpreting the journal and fixup your cache to bring it up to date like this: 1. Get a copy of the journal 2. Read the blocks you're interested in (i.e. do the normal traversing step). -> from time to time, get a new copy of the journal, check what changed, process the changes. This also means, you need to keep some meta data about when and where you got your data from, so you can actually fixup stuff. Remember: While traversing, you can read any kind of trash. 3. Upon completion of 2. get a final copy of the journal to bring your cache up-to-date. The funny thing about this aproach: By repeating step 3 you could keep your cache up to date without any need of retraversing the file system at any time again as long as your check interval is short enough so you don't miss any journal updates. I leave the details to you're implementation skills, since I don't know what your strategies in e2find are. Regards, Bodo From vcaron at bearstech.com Tue Aug 23 09:14:46 2016 From: vcaron at bearstech.com (Vincent Caron) Date: Tue, 23 Aug 2016 11:14:46 +0200 Subject: e2find: new ext2/3/4 tool for fast directory entry iterations In-Reply-To: <20160821133112.3781364d@phenom> References: <20160821133112.3781364d@phenom> Message-ID: <37f001f2-81a6-2ce7-336a-bb7afa0a8824@bearstech.com> Hello, thanks for your feedback ! On 21/08/16 13:31, Bodo Thiesen wrote: > For an ext2 fs with journal, you could try interpreting the journal and > fixup your cache to bring it up to date like this: > > 1. Get a copy of the journal > 2. Read the blocks you're interested in (i.e. do the normal traversing > step). > -> from time to time, get a new copy of the journal, check what > changed, process the changes. This also means, you need to keep > some meta data about when and where you got your data from, so you > can actually fixup stuff. Remember: While traversing, you can read > any kind of trash. > 3. Upon completion of 2. get a final copy of the journal to bring > your cache up-to-date. I see. However libe2fs has some support to create journals but does not seem to have an API to read and interpret the journal inode data, that would be much more complex to implement for me. > I leave the details to you're implementation skills, since I don't know > what your strategies in e2find are. e2find does not track any file<->block relationship, which I guess is needed in my case to map back journal data to files, and I would meed more I/O and way more memory to implement this. I feel I'm going to give up on the journal idea and add a clear warning of this limitation of e2find in its documentation ...