From andre.nitschke at versanet.de Mon Mar 3 06:32:44 2008 From: andre.nitschke at versanet.de (Andre Nitschke) Date: Mon, 3 Mar 2008 07:32:44 +0100 Subject: h-tree for ext2 Message-ID: <20080303073244.xbd2fs8b7o0800w8@webmail.versatel.de> Hello, with tune2fs -O dir_index i activate the h-tree function for ext3 to improve performance. now i am not interested in the journaling function, but the journal makes the system a little bit slower. is it possible to use ext2 (also ext3 - journal) with a h-tree index to improve the speed? or must the filesystem be ext3 for these feature? greetings Andre From adilger at sun.com Mon Mar 3 15:52:04 2008 From: adilger at sun.com (Andreas Dilger) Date: Mon, 03 Mar 2008 08:52:04 -0700 Subject: h-tree for ext2 In-Reply-To: <20080303073244.xbd2fs8b7o0800w8@webmail.versatel.de> References: <20080303073244.xbd2fs8b7o0800w8@webmail.versatel.de> Message-ID: <20080303155204.GA3616@webber.adilger.int> On Mar 03, 2008 07:32 +0100, Andre Nitschke wrote: > with tune2fs -O dir_index i activate the h-tree function for ext3 to improve > performance. now i am not interested in the journaling function, but the > journal makes the system a little bit slower. is it possible to use ext2 (also > ext3 - journal) with a h-tree index to improve the speed? > or must the filesystem be ext3 for these feature? The filesystem must be ext3, because the dir_index (htree) feature was not ported to ext2. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. From articpenguin3800 at gmail.com Tue Mar 4 23:12:08 2008 From: articpenguin3800 at gmail.com (John Nelson) Date: Tue, 04 Mar 2008 18:12:08 -0500 Subject: Filefrag Message-ID: <47CDD748.6070002@gmail.com> hi I have a virtualbox image of ubuntu hardy. I did filefrag and i got this hardy.vdi: 73 extents found, perfection would be 69 extents Why does it say perfection would be 69 extents. Shouldnt it be 1 extent? From ling at fnal.gov Tue Mar 4 23:18:44 2008 From: ling at fnal.gov (Ling C. Ho) Date: Tue, 04 Mar 2008 17:18:44 -0600 Subject: Filefrag In-Reply-To: <47CDD748.6070002@gmail.com> References: <47CDD748.6070002@gmail.com> Message-ID: <47CDD8D4.6070409@fnal.gov> If your blocksize is 4k, there are 32k blocks in a group, and therefore about 128MB per group. So, your file size must be slightly less than 69 * 128MB, correct? ... ling John Nelson wrote: > hi > I have a virtualbox image of ubuntu hardy. I did filefrag and i got this > > > hardy.vdi: 73 extents found, perfection would be 69 extents > > > Why does it say perfection would be 69 extents. Shouldnt it be 1 extent? > > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users From sandeen at redhat.com Wed Mar 5 02:41:37 2008 From: sandeen at redhat.com (Eric Sandeen) Date: Tue, 04 Mar 2008 20:41:37 -0600 Subject: Filefrag In-Reply-To: <47CDD748.6070002@gmail.com> References: <47CDD748.6070002@gmail.com> Message-ID: <47CE0861.7030809@redhat.com> John Nelson wrote: > hi > I have a virtualbox image of ubuntu hardy. I did filefrag and i got this > > > hardy.vdi: 73 extents found, perfection would be 69 extents > > > Why does it say perfection would be 69 extents. Shouldnt it be 1 extent? Not if it's sparse. As your fs image almost certainly is. -Eric From articpenguin3800 at gmail.com Thu Mar 6 00:54:57 2008 From: articpenguin3800 at gmail.com (John Nelson) Date: Wed, 05 Mar 2008 19:54:57 -0500 Subject: Journal questions Message-ID: <47CF40E1.9000303@gmail.com> hi i have a couple questions about the journal in ext3. 1. Will there be performance lose with a smaller journal say 32MB instead of 128MB? 2. Is there a way to see free space left in the journal or is it cleared at each mount? 3. Is journal_data_ordered atomic like reiser4 where either a transaction will happen or it wont happen? From adilger at sun.com Thu Mar 6 06:03:11 2008 From: adilger at sun.com (Andreas Dilger) Date: Wed, 05 Mar 2008 23:03:11 -0700 Subject: Journal questions In-Reply-To: <47CF40E1.9000303@gmail.com> References: <47CF40E1.9000303@gmail.com> Message-ID: <20080306060311.GM3616@webber.adilger.int> On Mar 05, 2008 19:54 -0500, John Nelson wrote: > 1. Will there be performance lose with a smaller journal say 32MB instead > of 128MB? Depends on how high an IO/metadata rate you have. If you are just doing light desktop IO it won't make any difference. > 2. Is there a way to see free space left in the journal or is it cleared at > each mount? The journal is a circular buffer, so this is hard to determine exactly. > 3. Is journal_data_ordered atomic like reiser4 where either a transaction > will happen or it wont happen? I'm not sure what you mean - there is data=journal and data=ordered mode. data=journal means all data and metadata changes are atomic. data=ordered (the default) means that data is written to disk before metadata so if there is a crash that you don't get garbage in your files. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. From mrunal.gawade at gmail.com Sun Mar 9 19:55:39 2008 From: mrunal.gawade at gmail.com (Mrunal Gawade) Date: Sun, 9 Mar 2008 11:55:39 -0800 Subject: Disk hash table Message-ID: Hi, I need information on ext3 representation of disk based and memory hash tables. I browsed through the code but could not understand much. Could you point me in the right direction. If not ext3 hash table then any disk based hash table implementation example. Thank you, Mrunal -------------- next part -------------- An HTML attachment was scrubbed... URL: From val.henson at gmail.com Sun Mar 9 22:50:28 2008 From: val.henson at gmail.com (Valerie Henson) Date: Sun, 9 Mar 2008 15:50:28 -0700 Subject: Avoid Fragmentation of ext3 In-Reply-To: <20080228142221.o6xo2cl0bkg80g0k@webmail.versatel.de> References: <20080228142221.o6xo2cl0bkg80g0k@webmail.versatel.de> Message-ID: <70b6f0bf0803091550v54e282d1kbc8212f8fec8917c@mail.gmail.com> On Thu, Feb 28, 2008 at 6:22 AM, Andre Nitschke wrote: > Hello, > i just want to know, how ext3 avoids fragmentation. Well, i think it works like > this (but i dont know...): > When the OS says to the filesystem, save the file, the file system looks, where > are free sectors laying together to use. when there is enough place the > filesystem try's to write the file without fragments. is there not enough > place, the fs wrote the file in the way, to create less fragemnts. some file > systems keep space after the file, for when the file grows. i dont know, works > ext3 in this way? > maybe somebody can explain it shortly. Yes, that's the basic theory. Various file systems execute it more or less successfully. I'd say ext3 is about average and XFS is quite good at it. There was a paper comparing file system fragmentation at OLS a few years ago. "The Effects of File System Fragmentation" by Ard Biesheuvel, et. al.: http://www.kernel.org/doc/ols/2006/ols2006v1-pages-193-208.pdf -VAL From carlo at alinoe.com Tue Mar 11 19:23:15 2008 From: carlo at alinoe.com (Carlo Wood) Date: Tue, 11 Mar 2008 20:23:15 +0100 Subject: New undelete tool for ext3 Message-ID: <20080311192315.GA27329@alinoe.com> Hi all, I developed a tool to undelete files and directories. I did this after I accidently deleted 3 GB of my home directory. I have been able to successfully recover all 50,000 files. Note that this works WITHOUT prior installed patches or changes (like giis). I have sent a mail to Juri Haberland, asking him to change the FAQ entry that claims that it is impossible to undelete files on ext3, but he has not replied to my mail at all. I still hope that the FAQ can be changed to point to the HOWTO that I have just written: http://www.xs4all.nl/~carlo17/howto/undelete_ext3.html -- Carlo Wood From mike at doubleplum.net Tue Mar 11 19:30:12 2008 From: mike at doubleplum.net (Michael Biggs) Date: Tue, 11 Mar 2008 15:30:12 -0400 (EDT) Subject: New undelete tool for ext3 In-Reply-To: <20080311192315.GA27329@alinoe.com> References: <20080311192315.GA27329@alinoe.com> Message-ID: On Tue, 11 Mar 2008, Carlo Wood wrote: > I developed a tool to undelete files and directories. > I did this after I accidently deleted 3 GB of my home > directory. Sounds good to me. > http://www.xs4all.nl/~carlo17/howto/undelete_ext3.html Why isn't the source available? What license is it under / do you plan to release it under, and when? Just wondering. __ Michael Biggs From carlo at alinoe.com Tue Mar 11 20:16:06 2008 From: carlo at alinoe.com (Carlo Wood) Date: Tue, 11 Mar 2008 21:16:06 +0100 Subject: New undelete tool for ext3 In-Reply-To: References: <20080311192315.GA27329@alinoe.com> Message-ID: <20080311201606.GA30848@alinoe.com> On Tue, Mar 11, 2008 at 03:30:12PM -0400, Michael Biggs wrote: > Why isn't the source available? What license is it under / do you plan to > release it under, and when? > Just wondering. I'll release it under the GPL version 3. I still need to make a package of it, ... I'm not REALLY in a hurry to that though, because even though I'm not asking money for this tool, I'd very much like it to hear from people who use it, what their experiences are - and hopefully hear from them about success. I've written quite some howto's and usually I never get ANY mail about them. That kinda sucks. People should realize that I'm doing this as a volunteer and that it costs me considerable amount of time. A 'thank you' would be nice every now and then ;) -- Carlo Wood From Mike.Miller at hp.com Tue Mar 11 20:17:54 2008 From: Mike.Miller at hp.com (Miller, Mike (OS Dev)) Date: Tue, 11 Mar 2008 20:17:54 +0000 Subject: New undelete tool for ext3 In-Reply-To: <20080311201606.GA30848@alinoe.com> References: <20080311192315.GA27329@alinoe.com> <20080311201606.GA30848@alinoe.com> Message-ID: > -----Original Message----- > From: ext3-users-bounces at redhat.com > [mailto:ext3-users-bounces at redhat.com] On Behalf Of Carlo Wood > Sent: Tuesday, March 11, 2008 3:16 PM > To: Michael Biggs > Cc: ext3-users at redhat.com > Subject: Re: New undelete tool for ext3 > > On Tue, Mar 11, 2008 at 03:30:12PM -0400, Michael Biggs wrote: > > Why isn't the source available? What license is it under / do you > > plan to release it under, and when? > > Just wondering. > > I'll release it under the GPL version 3. > > I still need to make a package of it, ... I'm not REALLY in a > hurry to that though, because even though I'm not asking > money for this tool, I'd very much like it to hear from > people who use it, what their experiences are - and hopefully > hear from them about success. > > I've written quite some howto's and usually I never get ANY > mail about them. That kinda sucks. People should realize that > I'm doing this as a volunteer and that it costs me > considerable amount of time. A 'thank you' would be nice > every now and then ;) > Thank you, Carlo. > -- > Carlo Wood > > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users > From tytso at mit.edu Tue Mar 11 20:24:23 2008 From: tytso at mit.edu (Theodore Tso) Date: Tue, 11 Mar 2008 16:24:23 -0400 Subject: New undelete tool for ext3 In-Reply-To: <20080311192315.GA27329@alinoe.com> References: <20080311192315.GA27329@alinoe.com> Message-ID: <20080311202423.GL15804@mit.edu> On Tue, Mar 11, 2008 at 08:23:15PM +0100, Carlo Wood wrote: > > I developed a tool to undelete files and directories. > I did this after I accidently deleted 3 GB of my home > directory. > > I have been able to successfully recover all 50,000 files. > > Note that this works WITHOUT prior installed patches or > changes (like giis). > > I have sent a mail to Juri Haberland, asking him to > change the FAQ entry that claims that it is impossible > to undelete files on ext3, but he has not replied to > my mail at all. I still hope that the FAQ can be changed > to point to the HOWTO that I have just written: > > http://www.xs4all.nl/~carlo17/howto/undelete_ext3.html That's a clever technique. It only works so long as the journal blocks haven't been reused, so you would need to use your tool very *quickly* after the files had been deleted. If the inode table block hadn't been modified before the deletion, it might also not appear in the journal, so it's also not guaranteed to work. But certainly it's a better shot than no chance at all..... - Ted From keld at dkuug.dk Tue Mar 11 20:42:42 2008 From: keld at dkuug.dk (Keld =?iso-8859-1?Q?J=F8rn?= Simonsen) Date: Tue, 11 Mar 2008 21:42:42 +0100 Subject: New undelete tool for ext3 In-Reply-To: <20080311201606.GA30848@alinoe.com> References: <20080311192315.GA27329@alinoe.com> <20080311201606.GA30848@alinoe.com> Message-ID: <20080311204242.GB4312@rap.rap.dk> On Tue, Mar 11, 2008 at 09:16:06PM +0100, Carlo Wood wrote: > On Tue, Mar 11, 2008 at 03:30:12PM -0400, Michael Biggs wrote: > > Why isn't the source available? What license is it under / do you plan to > > release it under, and when? > > Just wondering. > > I'll release it under the GPL version 3. > > I still need to make a package of it, ... I'm not REALLY in a hurry > to that though, because even though I'm not asking money for this tool, > I'd very much like it to hear from people who use it, what their > experiences are - and hopefully hear from them about success. > > I've written quite some howto's and usually I never get ANY mail > about them. That kinda sucks. People should realize that I'm > doing this as a volunteer and that it costs me considerable > amount of time. A 'thank you' would be nice every now and then ;) I just want to point out that I have also made a tool for undeleting files on ext2/3, but it works in another way. Available at: http://std.dkuug.dk/keld/readme-salvage.html best regards keld From tpo2 at sourcepole.ch Tue Mar 11 22:56:07 2008 From: tpo2 at sourcepole.ch (Tomas Pospisek's Mailing Lists) Date: Tue, 11 Mar 2008 23:56:07 +0100 (CET) Subject: New undelete tool for ext3 In-Reply-To: <20080311201606.GA30848@alinoe.com> References: <20080311192315.GA27329@alinoe.com> <20080311201606.GA30848@alinoe.com> Message-ID: On Tue, 11 Mar 2008, Carlo Wood wrote: > On Tue, Mar 11, 2008 at 03:30:12PM -0400, Michael Biggs wrote: >> Why isn't the source available? What license is it under / do you plan to >> release it under, and when? >> Just wondering. > > I'll release it under the GPL version 3. > > I still need to make a package of it, ... I'm not REALLY in a hurry > to that though, because even though I'm not asking money for this tool, > I'd very much like it to hear from people who use it, what their > experiences are - and hopefully hear from them about success. > > I've written quite some howto's and usually I never get ANY mail > about them. That kinda sucks. People should realize that I'm > doing this as a volunteer and that it costs me considerable > amount of time. A 'thank you' would be nice every now and then ;) I understand your frustration. The fact that people use your stuff but won't come around to say thank you will probably not change. It's however possible to shift your perspective and that has the potential to reduce the frustration. Consider that with your work you will give other people the impetus or the energy or the excitement, that will make them or let them do their little contribution to the common wealth of open source. Someone will google desperately, discover your tool, rescue her FS, be thrilled, and contribute back their enthusiasm by writing a HOWTO about her favoured tool - say Gimp f.ex. Some other person much later on will go off and write a filesystem instead - say ext3. You and me - maybe everybody in the OSS world are really standing on the shoulders of *giants* and we are only able to do what we want/need because there were so many others that added their little or big piece to the base we're using - and hey, have we gone out and thanked all those people? $ dpkg --get-selections|wc -l 2840 # roughly the number of installed packages on my sys Anyway, I was thrilled by your detailed description of the on-disk ext3 structures. Brilliant! Two months ago this would have been *exaclty* what I had needed, I'm sure somebody will be *very happy* to find all this information, nicely tended and comprehensible in one place. Thanks, very cool! *t -- ----------------------------------------------------------- Tomas Pospisek http://sourcepole.com - Linux & Open Source Solutions ----------------------------------------------------------- From tytso at mit.edu Tue Mar 11 23:37:57 2008 From: tytso at mit.edu (Theodore Tso) Date: Tue, 11 Mar 2008 19:37:57 -0400 Subject: New undelete tool for ext3 In-Reply-To: References: <20080311192315.GA27329@alinoe.com> <20080311201606.GA30848@alinoe.com> Message-ID: <20080311233757.GQ15804@mit.edu> On Tue, Mar 11, 2008 at 11:56:07PM +0100, Tomas Pospisek's Mailing Lists wrote: > I understand your frustration. The fact that people use your stuff but > won't come around to say thank you will probably not change. It's however > possible to shift your perspective and that has the potential to reduce the > frustration. Carlo, Absolutely. I'm not sure how many people use e2fsprogs, but granted it's "somewhat large", yes? (As in every single Linux computer out there. :-) I get a thank you note about maybe once a year. I think *once* a grateful user sent me a paypal payment of $20, but that was extremely rare in the over 12 or so years that e2fsprogs has been in existence, and I was completely (but pleasantly) surprised when it happened. If you're in this business to get thank you notes, or virtual beers, you're in the wrong business. :-) Now, what you *could* get out of it if you are willing to write a paper and submit it to OLS, or Linux.conf.au, or some other conference, might be an invitation to tell others about your cool tool. And if you are one of those who get satisfaction at download statistics, that can be good too. BTW, if you are willing to relicense your code to GPLv2, I would be interested in reworking bits of your tool into e2fsprogs's debugfs. Or if you'd like to keep it as a standalone tool, that's cool as well. Regards, - Ted From bruno at wolff.to Wed Mar 12 03:03:26 2008 From: bruno at wolff.to (Bruno Wolff III) Date: Tue, 11 Mar 2008 22:03:26 -0500 Subject: New undelete tool for ext3 In-Reply-To: References: <20080311192315.GA27329@alinoe.com> <20080311201606.GA30848@alinoe.com> Message-ID: <20080312030326.GD15182@wolff.to> On Tue, Mar 11, 2008 at 23:56:07 +0100, Tomas Pospisek's Mailing Lists wrote: > > I understand your frustration. The fact that people use your stuff but > won't come around to say thank you will probably not change. It's > however possible to shift your perspective and that has the potential to > reduce the frustration. If your software becomes popular enough you may not really want everyone who uses it personally thanking you. From lm at bitmover.com Wed Mar 12 03:24:52 2008 From: lm at bitmover.com (Larry McVoy) Date: Tue, 11 Mar 2008 20:24:52 -0700 Subject: New undelete tool for ext3 In-Reply-To: <20080312030326.GD15182@wolff.to> References: <20080311192315.GA27329@alinoe.com> <20080311201606.GA30848@alinoe.com> <20080312030326.GD15182@wolff.to> Message-ID: <20080312032452.GA823@bitmover.com> On Tue, Mar 11, 2008 at 10:03:26PM -0500, Bruno Wolff III wrote: > On Tue, Mar 11, 2008 at 23:56:07 +0100, > Tomas Pospisek's Mailing Lists wrote: > > > > I understand your frustration. The fact that people use your stuff but > > won't come around to say thank you will probably not change. It's > > however possible to shift your perspective and that has the potential to > > reduce the frustration. > > If your software becomes popular enough you may not really want everyone > who uses it personally thanking you. Actually, I think you should let the author decide that. -- --- Larry McVoy lm at bitmover.com http://www.bitkeeper.com From lm at bitmover.com Wed Mar 12 03:27:03 2008 From: lm at bitmover.com (Larry McVoy) Date: Tue, 11 Mar 2008 20:27:03 -0700 Subject: New undelete tool for ext3 In-Reply-To: <20080311233757.GQ15804@mit.edu> References: <20080311192315.GA27329@alinoe.com> <20080311201606.GA30848@alinoe.com> <20080311233757.GQ15804@mit.edu> Message-ID: <20080312032703.GA1621@bitmover.com> On Tue, Mar 11, 2008 at 07:37:57PM -0400, Theodore Tso wrote: > On Tue, Mar 11, 2008 at 11:56:07PM +0100, Tomas Pospisek's Mailing Lists wrote: > > I understand your frustration. The fact that people use your stuff but > > won't come around to say thank you will probably not change. It's however > > possible to shift your perspective and that has the potential to reduce the > > frustration. > > Carlo, > > If you're in this business to get thank you notes, or virtual beers, > you're in the wrong business. :-) Amen to that. > BTW, if you are willing to relicense your code to GPLv2 I'd be another one who would thank you for doing that. GPLv2 is free. v3 has an agenda which is not that in line with freeness. IMO. -- --- Larry McVoy lm at bitmover.com http://www.bitkeeper.com From carlo at alinoe.com Wed Mar 12 05:46:21 2008 From: carlo at alinoe.com (Carlo Wood) Date: Wed, 12 Mar 2008 06:46:21 +0100 Subject: New undelete tool for ext3 In-Reply-To: References: <20080311192315.GA27329@alinoe.com> <20080311201606.GA30848@alinoe.com> Message-ID: <20080312054621.GA30974@alinoe.com> On Tue, Mar 11, 2008 at 08:17:54PM +0000, Miller, Mike (OS Dev) wrote: > Thank you, Carlo. Thanks Mike ;). This is a reply also to the others who replied to this thread (I read all of them that were sent after this one, too); there is definitely no need to thank me unless you actually used my software and/or HOWTO and found it helpful, heheh. Also, reading all those replies I feel that I have to straighten something out: I am NOT mad or even demotivated because I get so little "thank you" mails. And I'm certainly not in this business to get pats on my shoulder. I believe that my philosophy matches the comment of Tomas. Let me elaborate: 1) I'm no different than most humans in that I would like, that when I die I can look back at my life and say: I made a difference. Most people (the masses) think that getting a child is the answer to that, they hope that their son or daughter won't make the same mistakes; even do something great-- so that their own life won't have been meaningless. Note that I do not have offspring and never will have. I'll have to get my self-esteem from my work. 2) I believe in the theory that the universe, and the existance of life might be a chance of 1 in , but we're here to think about that anyway because we are thinking about it. It DOES make it rather important to me to get the most out of this evolution though, and it seems unacceptable to me that humanity will become extinct before we explored every corner of the universe. The current situation is a VERY critical stage: the energy we have, the minerals and raw material we need to run our current civilisation of technology is a once-in-an-evolution chance. If we can't break free from this planet THIS time, before the next World War, or before we run out of minerals/materials and energy (no doubt leading to a world war anyway), it will be too late. In fact, I think we're not going to make it, UNLESS we can bootstrap artificial intelligence, soon. 3) I believe that next step in evolution (if at all, thus) is what is called the 'Singularity' (you already know what that is: A.I's making A.I's, giving rise to an exponential growth of technological advancement). Whether or not humanity survives that doesn't even really interest me. As long as something we created will expore the universe, then it wasn't all for nothing (and who knows, in the end a civiliation of A.I. that grows exponentially smarter towards the end of time might become what now we call God; ascent to a high level of existance and recreate the Big Bang in such a way that we exist(ed) at all (in which case I wouldn't have to worry that this will happen, but ok). 4) Contrary to most believers in the Singularity, I don't believe it will happen during my time. However, I DO embrace the idea of exponential growth: The *ONLY* work that is really significant is work that *amplifies* the development. If I can put in a factor of 1.000001, then that might JUST be enough to get us there, because 1.000001 to the power N will be JUST large enough to prevent the extinction of mankind before we can leave this planet. Thus: _productivity_ increasing software has my interest (as opposed to, say 3D game engines). Software that leads to FASTER development of the next generation of development software. Well, ... to make a long story short, as you see, I'm not driven by "thank you" mails, but by, well, "something else" ;) Regards, -- Carlo Wood From jprats at cesca.es Wed Mar 12 07:56:44 2008 From: jprats at cesca.es (Jordi Prats) Date: Wed, 12 Mar 2008 08:56:44 +0100 Subject: error reading block Message-ID: <47D78CBC.9040901@cesca.es> Hi all, I'm getting this error using fsck on my fs: Error reading block 35979726 (Attempt to read block from filesystem resulted in short read) while getting next inode from scan. Ignore error? Anyone can explain me what exactly does it mean? cheers! Jordi From sandeen at redhat.com Wed Mar 12 12:35:24 2008 From: sandeen at redhat.com (Eric Sandeen) Date: Wed, 12 Mar 2008 07:35:24 -0500 Subject: error reading block In-Reply-To: <47D78CBC.9040901@cesca.es> References: <47D78CBC.9040901@cesca.es> Message-ID: <47D7CE0C.1030006@redhat.com> Jordi Prats wrote: > Hi all, > I'm getting this error using fsck on my fs: > > Error reading block 35979726 (Attempt to read block from filesystem > resulted in short read) while getting next inode from scan. Ignore error? > > Anyone can explain me what exactly does it mean? It means it could not read block 35979726 ... Is your disk healthy? Were there any IO errors from the kernel? Is your filesystem reall (35979726 * blocksize) bytes long? -Eric From articpenguin3800 at gmail.com Thu Mar 13 02:35:57 2008 From: articpenguin3800 at gmail.com (John Nelson) Date: Wed, 12 Mar 2008 22:35:57 -0400 Subject: indirect blocks Message-ID: <47D8930D.6010606@gmail.com> what are indirects blocks? LIke double indirect triple indirect? From davids at webmaster.com Thu Mar 13 02:44:28 2008 From: davids at webmaster.com (David Schwartz) Date: Wed, 12 Mar 2008 19:44:28 -0700 Subject: indirect blocks In-Reply-To: <47D8930D.6010606@gmail.com> Message-ID: > what are indirects blocks? LIke double indirect triple indirect? Indirect blocks are blocks that point to (contain the address of) other blocks (which hold data). Double indirect blocks point to indirect blocks. Triple indirect blocks point to double indirect blocks. DS From sandeen at redhat.com Thu Mar 13 02:49:08 2008 From: sandeen at redhat.com (Eric Sandeen) Date: Wed, 12 Mar 2008 21:49:08 -0500 Subject: indirect blocks In-Reply-To: References: Message-ID: <47D89624.1000002@redhat.com> David Schwartz wrote: >> what are indirects blocks? LIke double indirect triple indirect? > > Indirect blocks are blocks that point to (contain the address of) other > blocks (which hold data). Double indirect blocks point to indirect blocks. > Triple indirect blocks point to double indirect blocks. And there's a nice picture at http://web.mit.edu/tytso/www/linux/ext2intro.html -Eric From ianbrn at gmail.com Thu Mar 13 13:54:08 2008 From: ianbrn at gmail.com (Ian Brown) Date: Thu, 13 Mar 2008 15:54:08 +0200 Subject: The maximum number of files under a folder Message-ID: Hello, In an ext3-based file system, what is the maximum number of files I can create under a folder ? Is it configurable somehow ? Regards, Ian From articpenguin3800 at gmail.com Thu Mar 13 16:48:50 2008 From: articpenguin3800 at gmail.com (John Nelson) Date: Thu, 13 Mar 2008 12:48:50 -0400 Subject: The maximum number of files under a folder Message-ID: <47D95AF2.6030301@gmail.com> i think not more than 5k files without dir_index on. The max limit of subfolders is 32k From tytso at mit.edu Thu Mar 13 17:23:18 2008 From: tytso at mit.edu (Theodore Tso) Date: Thu, 13 Mar 2008 13:23:18 -0400 Subject: The maximum number of files under a folder In-Reply-To: <47D95AF2.6030301@gmail.com> References: <47D95AF2.6030301@gmail.com> Message-ID: <20080313172318.GB31653@mit.edu> On Thu, Mar 13, 2008 at 12:48:50PM -0400, John Nelson wrote: > i think not more than 5k files without dir_index on. The max limit of > subfolders is 32k There is no limit to the number of files in a folder, except for the fact that the directory itself can't be bigger than 2GB, and the number of inodes that the entire filesystem has available to it. Of course, if you don't have directory indexing turned on, you may not like the performance of doing directory lookups, but that's a different story. - Ted From articpenguin3800 at gmail.com Thu Mar 13 17:57:18 2008 From: articpenguin3800 at gmail.com (John Nelson) Date: Thu, 13 Mar 2008 13:57:18 -0400 Subject: The maximum number of files under a folder In-Reply-To: <20080313172318.GB31653@mit.edu> References: <47D95AF2.6030301@gmail.com> <20080313172318.GB31653@mit.edu> Message-ID: <47D96AFE.4020701@gmail.com> is an h-tree the same thing as a b+ tree? From adilger at sun.com Thu Mar 13 18:26:31 2008 From: adilger at sun.com (Andreas Dilger) Date: Thu, 13 Mar 2008 11:26:31 -0700 Subject: The maximum number of files under a folder In-Reply-To: <20080313172318.GB31653@mit.edu> References: <47D95AF2.6030301@gmail.com> <20080313172318.GB31653@mit.edu> Message-ID: <20080313182631.GE3217@webber.adilger.int> On Mar 13, 2008 13:23 -0400, Theodore Ts'o wrote: > There is no limit to the number of files in a folder, except for the > fact that the directory itself can't be bigger than 2GB, and the > number of inodes that the entire filesystem has available to it. Of > course, if you don't have directory indexing turned on, you may not > like the performance of doing directory lookups, but that's a > different story. There is also a limit in the current ext3 htree code to be only 2 levels deep. Along with the 2GB limit you hit problems around 15M files, depending on the length of the filenames. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. From carlo at alinoe.com Sat Mar 15 02:20:35 2008 From: carlo at alinoe.com (Carlo Wood) Date: Sat, 15 Mar 2008 03:20:35 +0100 Subject: Kernel header vs libext2fs headers Message-ID: <20080315022035.GA28894@alinoe.com> The kernel headers define EXT3_ORPHAN_FS while libext2fs header defines EXT4_ORPHAN_FS This means that one of the two is wrong. Does ext3 use/have EXT3_ORPHAN_FS, or that something that is new in ext4? -- Carlo Wood From adilger at sun.com Sat Mar 15 03:26:37 2008 From: adilger at sun.com (Andreas Dilger) Date: Sat, 15 Mar 2008 11:26:37 +0800 Subject: Kernel header vs libext2fs headers In-Reply-To: <20080315022035.GA28894@alinoe.com> References: <20080315022035.GA28894@alinoe.com> Message-ID: <20080315032637.GO3542@webber.adilger.int> On Mar 15, 2008 03:20 +0100, Carlo Wood wrote: > The kernel headers define EXT3_ORPHAN_FS > while libext2fs header defines EXT4_ORPHAN_FS > > This means that one of the two is wrong. That isn't necessarily a correct assumption. All of the definitions in the fs/ext3 code are EXT3_*, and similarly, all of the definitions in fs/ext2 are EXT2_*, and in fs/ext4 they are EXT4_*. This avoids name conflicts. Conversely (though I don't necessarily agree with this) the definitions in libext2fs declare these flags depending on what "version" of extN the feature was first added (EXT2_*, EXT3_*, EXT4_*). That makes it easier to see what kernel is using which feature, but isn't always 100% accurate or correct. > Does ext3 use/have EXT3_ORPHAN_FS, or that > something that is new in ext4? Note that EXT3_ORPHAN_FS isn't an on disk format or feature at all, but just an in-memory state flag to convey the fact that the filesystem is just being mounted and orphans are being cleaned up down to lower levels of the code that are reading the inodes from disk. Otherwise, the low level ext3_read_inode() will consider inodes with i_nlink == 0 to be unlinked and return a bad inode to the caller, to avoid issues with NFS trying to access inodes that were deleted. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. From carlo at alinoe.com Sat Mar 15 04:17:02 2008 From: carlo at alinoe.com (Carlo Wood) Date: Sat, 15 Mar 2008 05:17:02 +0100 Subject: Kernel header vs libext2fs headers In-Reply-To: <20080315032637.GO3542@webber.adilger.int> References: <20080315022035.GA28894@alinoe.com> <20080315032637.GO3542@webber.adilger.int> Message-ID: <20080315041702.GA2172@alinoe.com> On Sat, Mar 15, 2008 at 11:26:37AM +0800, Andreas Dilger wrote: > the fs/ext3 code are EXT3_*, and similarly, all of the definitions in > fs/ext2 are EXT2_*, and in fs/ext4 they are EXT4_*. This avoids name > conflicts. > > Conversely (though I don't necessarily agree with this) the definitions > in libext2fs declare these flags depending on what "version" of extN > the feature was first added (EXT2_*, EXT3_*, EXT4_*). That makes it > easier to see what kernel is using which feature, but isn't always 100% > accurate or correct. But if EXT4_ORPHAN_FS is defined, then you imply that ext4 is the first version of ext that has implemented it; however, the ext3 kernel header defines it, so you should use EXT3_ORPHAN_FS in e2fsprogs. Or am I missing something? If ORPHAN_FS was truely new since ext4, shouldn't it be missing in /usr/include/linux/ext3_fs.h ? -- Carlo Wood From adilger at sun.com Sat Mar 15 04:27:38 2008 From: adilger at sun.com (Andreas Dilger) Date: Sat, 15 Mar 2008 12:27:38 +0800 Subject: Kernel header vs libext2fs headers In-Reply-To: <20080315041702.GA2172@alinoe.com> References: <20080315022035.GA28894@alinoe.com> <20080315032637.GO3542@webber.adilger.int> <20080315041702.GA2172@alinoe.com> Message-ID: <20080315042738.GQ3542@webber.adilger.int> On Mar 15, 2008 05:17 +0100, Carlo Wood wrote: > On Sat, Mar 15, 2008 at 11:26:37AM +0800, Andreas Dilger wrote: > > the fs/ext3 code are EXT3_*, and similarly, all of the definitions in > > fs/ext2 are EXT2_*, and in fs/ext4 they are EXT4_*. This avoids name > > conflicts. > > > > Conversely (though I don't necessarily agree with this) the definitions > > in libext2fs declare these flags depending on what "version" of extN > > the feature was first added (EXT2_*, EXT3_*, EXT4_*). That makes it > > easier to see what kernel is using which feature, but isn't always 100% > > accurate or correct. > > But if EXT4_ORPHAN_FS is defined, then you imply that ext4 is the > first version of ext that has implemented it; however, the ext3 kernel > header defines it, so you should use EXT3_ORPHAN_FS in e2fsprogs. > Or am I missing something? If ORPHAN_FS was truely new since ext4, > shouldn't it be missing in /usr/include/linux/ext3_fs.h ? Actually, I'm not sure what is going on there. In lib/ext2fs/ext2_fs.h it is in fact defined as EXT4_ORPHAN_FS, but this has been in use on ext3 for a long time, so you are right - there is a bug in the e2fsprogs version of ext2_fs.h. Can you please submit a patch to Ted with this change. It is probably also worth noting that this flag is only used in memory and not on disk. Since it shares the same in-memory variable with EXT2_ERROR_FS it needs to be declared in e2fsprogs to avoid conflict, but otherwise has no meaning. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. From carlo at alinoe.com Sat Mar 15 04:32:04 2008 From: carlo at alinoe.com (Carlo Wood) Date: Sat, 15 Mar 2008 05:32:04 +0100 Subject: Can journal_revoke_header_s::r_count be changed from int to __s32 please? Message-ID: <20080315043204.GB2172@alinoe.com> I find that linux/jbd.h defines: typedef struct journal_revoke_header_s { journal_header_t r_header; __be32 r_count; /* Count of bytes used in the block */ } journal_revoke_header_t; thus, sizeof(r_count) == 4 However, in e2progs, in kernel-jbd.h I find: typedef struct journal_revoke_header_s { journal_header_t r_header; int r_count; /* Count of bytes used in the block */ } journal_revoke_header_t; and this sizeof(r_count) depends on the architecture. Using e2fslibs this is probably not a problem because all current OS have sizeof(int) >= 4, and r_count is assigned rather than mapped to the disk image (even on big endian machines?). Nevertheless, since I believe that kernel-jbd.h should be made public (installed along with the other header files) in order to make at least journal_superblock_t available to user programs, I'd like to request to change this int into __s32. That simply makes more sense as journal_revoke_header_t represents a data structure on disk and sizeof(journal_revoke_header_s) might be used somewhere. -- Carlo Wood From tambewilliam at gmail.com Sun Mar 16 20:13:47 2008 From: tambewilliam at gmail.com (William Tambe) Date: Sun, 16 Mar 2008 15:13:47 -0500 Subject: Filesystem fragmentation and scatter-gather DMA Message-ID: When designing a filesystem, is fragmentation really an issue if access to the disk can be done using scatter-gather DMA techics ? From adilger at sun.com Sun Mar 16 22:29:03 2008 From: adilger at sun.com (Andreas Dilger) Date: Mon, 17 Mar 2008 06:29:03 +0800 Subject: Filesystem fragmentation and scatter-gather DMA In-Reply-To: References: Message-ID: <20080316222903.GC3542@webber.adilger.int> On Mar 16, 2008 15:13 -0500, William Tambe wrote: > When designing a filesystem, is fragmentation really an issue if > access to the disk can be done using scatter-gather DMA techics ? Yes!!! Scatter-gather only handles "fragmentation" in memory, where seek time is zero. If there is fragmentation on disk you pay 8ms for each fragment in the read. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. From tambewilliam at gmail.com Sun Mar 16 23:56:27 2008 From: tambewilliam at gmail.com (William Tambe) Date: Sun, 16 Mar 2008 18:56:27 -0500 Subject: Filesystem fragmentation and scatter-gather DMA In-Reply-To: <20080316222903.GC3542@webber.adilger.int> References: <20080316222903.GC3542@webber.adilger.int> Message-ID: Is the delay due to mechanical parts or the electronics gathering the fragments? Would that same delay still apply to a solid state drive? Since a solid state drive is really just a slower version of system memory (Please correct me if I am wrong). On Sun, Mar 16, 2008 at 5:29 PM, Andreas Dilger wrote: > On Mar 16, 2008 15:13 -0500, William Tambe wrote: > > When designing a filesystem, is fragmentation really an issue if > > access to the disk can be done using scatter-gather DMA techics ? > > Yes!!! Scatter-gather only handles "fragmentation" in memory, where > seek time is zero. If there is fragmentation on disk you pay 8ms > for each fragment in the read. > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. > > From jlforrest at berkeley.edu Mon Mar 17 01:40:19 2008 From: jlforrest at berkeley.edu (Jon Forrest) Date: Sun, 16 Mar 2008 18:40:19 -0700 Subject: Filesystem fragmentation and scatter-gather DMA In-Reply-To: References: Message-ID: <47DDCC03.1060002@berkeley.edu> The following is a short note I wrote a while back, mainly in response to a discussion of filesystem fragmentation in Windows operating systems. Most of what I saw also applies to *nix systems. Jon Forrest ---------------- Why PC Disk Fragmentation Doesn't Matter (much) Jon Forrest (jlforrest at berkeley.edu) [The following is an hypothesis. I don't have any real data to back this up. I'd like to know if I'm overlooking any technical details.] Disk fragmentation can mean several things. On one hand it can mean that the disk blocks that a file occupies aren't right next to each other physically. The more pieces that make up a file, the more fragmented the file is. Or, it can mean that the unused blocks on a disk aren't all right next to each other. Win9X, Windows 2000, and Windows XP come with defragmentation programs. Such programs are also available for other Microsoft and non-Microsoft operating systems from commercial vendors. The question of whether a fragmented disk really results in anything bad has always been a topic of heated discussion. On one side of the issue the vendors of disk defragmentation programs can always be found. The other side is usually occupied by skeptical system managers, such as yours truly. For example, the following claim is made by the vendor of one commercial vendor: "Disk fragmentation can cripple performance even worse than running with insufficient memory. Eliminate it and you've eliminated the primary performance bottleneck plaguing even the best-equipped systems." But can it, and does it? The user's guide for this product spends some 60 pages describing how to run the product but never justifies this claim. I'm not saying that fragmentation is good. That's one reason why you can't buy a product whose purpose is to fragment a disk. But, it's hard to imagine how fragmentation can cause any noticeable performance problems. Here's why: 1) The greatest benefit from having a contiguous file would be when the whole file is read (let's stick with reads) in one I/O operation. The would result in the minimal amount of disk arm movement, which is the slowest part of a disk I/O operation. But, this isn't the way most I/Os take place. Instead, most I/Os are fairly small. Plus, and this is the kicker, on a modern multitasking operating system, those small I/Os are coming from different processes reading from different files. Assuming that the data to be read isn't in a memory cache, this means that the disk arm is going to be flying all over the place, trying to satisfy all the seek operations being issued by the operating system. Sure, the operating system, and maybe even the disk controller, might be trying to re-order I/Os but there's only so much of this that can be done. A contiguous file doesn't really help much because there's a very good change that the disk arm is going to have to move elsewhere on the disk between the time that pieces of a file are read. 2) The metadata for managing a filesystem is probably cached in RAM. This means when a file is created, or extended, the necessary metadata updates are done at memory speed, not at disk speed. So, the overhead of allocating multiple pieces for a new file is probably in the noise. Of course, the in-memory metadata eventually has to be flushed to disk but this is usually done after the original I/O completes, so there won't be any visible slowdown in the program that issued the I/O. 3) Modern disks do all kind of internal block remapping so there's no guarantee that what appears to be contiguous to the operating system is actually really and truly contiguous on the disk. I have no idea how often this possibility occurs, or how bad the skew is between "fake" blocks and "real" blocks. But, it could happen. So, go ahead and run your favorite disk defragmenter. I know I do. Now that W2K and later have an official API for moving files in an atomic operation, such programs probably can't cause any harm. But don't be surprised if you don't see any noticeable performance improvements. The mystery that really puzzles and sometimes frightens me is why an NTFS file system becomes fragmented so easily in the first place. Let's say I'm installing Windows 2000 on a newly formatted 20GB disk. Let's say that the total amount of space used by the new installation is 600MB. Why should I see any fragmented files, other than registry files, after such an installation? I have no idea. My thinking is that all files that aren't created and then later extended should be able to be created contiguously to begin with. From ling at aliko.com Mon Mar 17 05:48:03 2008 From: ling at aliko.com (Ling C. Ho) Date: Mon, 17 Mar 2008 00:48:03 -0500 Subject: Filesystem fragmentation and scatter-gather DMA In-Reply-To: <47DDCC03.1060002@berkeley.edu> References: <47DDCC03.1060002@berkeley.edu> Message-ID: <47DE0613.4040907@aliko.com> I have this experience a couple of years ago. Under some version of Redhat Linux Enterprise 3 using kernel 2.4x, I tested scping two files slightly over 1Gig to a freshly formated ext3 filesystems simultaneously. It turned out the version of ext3 did not have reservation implemented, and we ended up with 2 files with more than 10,000 non-contiguous fragments. Even though the two files sat physically very close together on disk, the fragmentation was so bad that instead of getting over 50MB/s read we were expecting from reading a file at a time, we were getting about 10MB/s. It's not day to day usage pattern on many desktop or servers, but unfortunately for us, that's what hundreds of our servers were set up to do. That is to run 2 jobs at a time, where they would first copy the data files from some where else, read them and then analyze the data, and write some result onto another file systems. So fragmentation could be very bad, but fortunately the later versions of ext3 have done much better in preventing just that. ... ling Jon Forrest wrote: > The following is a short note I wrote a while back, > mainly in response to a discussion of filesystem > fragmentation in Windows operating systems. Most > of what I saw also applies to *nix systems. > > Jon Forrest > > ---------------- > Why PC Disk Fragmentation Doesn't Matter (much) > > Jon Forrest (jlforrest at berkeley.edu) > > [The following is an hypothesis. I don't have > any real data to back this up. I'd like to know > if I'm overlooking any technical details.] > > Disk fragmentation can mean several things. > On one hand it can mean that the disk blocks > that a file occupies aren't right next to each > other physically. The more pieces that make up a file, the > more fragmented the file is. Or, it can mean > that the unused blocks on a disk aren't all right > next to each other. Win9X, Windows 2000, and Windows XP > come with defragmentation programs. Such programs > are also available for other Microsoft and non-Microsoft > operating systems from commercial vendors. > > The question of whether a fragmented disk really > results in anything bad has always been a topic > of heated discussion. On one side of the issue > the vendors of disk defragmentation programs can > always be found. The other side is usually occupied > by skeptical system managers, such as yours truly. > > For example, the following claim is made by the > vendor of one commercial vendor: > > "Disk fragmentation can cripple performance even worse > than running with insufficient memory. Eliminate it > and you've eliminated the primary performance bottleneck > plaguing even the best-equipped systems." But can it, and > does it? The user's guide for this product spends some 60 pages > describing how to run the product but never justifies this > claim. > > I'm not saying that fragmentation is good. That's one reason > why you can't buy a product whose purpose is to fragment a disk. > But, it's hard to imagine how fragmentation can cause any noticeable > performance problems. Here's why: > > 1) The greatest benefit from having a contiguous file would > be when the whole file is read (let's stick with reads) in > one I/O operation. The would result in the minimal amount of > disk arm movement, which is the slowest part of a disk I/O > operation. But, this isn't the way most I/Os take place. Instead, > most I/Os are fairly small. Plus, and this is the kicker, on > a modern multitasking operating system, those small I/Os are coming > from different processes reading from different files. Assuming that the > data to be read isn't in a memory cache, this means that the disk arm is > going to be flying all over the place, trying to satisfy all > the seek operations being issued by the operating system. > Sure, the operating system, and maybe even the disk controller, > might be trying to re-order I/Os but there's only so much of > this that can be done. A contiguous file doesn't really help > much because there's a very good change that the disk arm is > going to have to move elsewhere on the disk between the time > that pieces of a file are read. > > 2) The metadata for managing a filesystem is probably > cached in RAM. This means when a file is created, or > extended, the necessary metadata updates are done at memory > speed, not at disk speed. So, the overhead of allocating > multiple pieces for a new file is probably in the noise. > Of course, the in-memory metadata eventually has to be flushed > to disk but this is usually done after the original I/O completes, > so there won't be any visible slowdown in the program that issued > the I/O. > > 3) Modern disks do all kind of internal block remapping so there's > no guarantee that what appears to be contiguous to the operating > system is actually really and truly contiguous on the disk. I have > no idea how often this possibility occurs, or how bad the skew is > between "fake" blocks and "real" blocks. But, it could happen. > > So, go ahead and run your favorite disk defragmenter. I know I do. > Now that W2K and later have an official API for moving files in an > atomic operation, such programs probably can't cause any harm. But > don't be surprised if you don't see any noticeable performance > improvements. > > The mystery that really puzzles and sometimes frightens me is > why an NTFS file system becomes fragmented so easily in the first > place. Let's say I'm installing Windows 2000 on a newly formatted > 20GB disk. Let's say that the total amount of space used by the > new installation is 600MB. Why should I see any fragmented files, > other than registry files, after such an installation? I have no > idea. My thinking is that all files that aren't created and then > later extended should be able to be created contiguously to begin with. > > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users From davids at webmaster.com Mon Mar 17 06:05:48 2008 From: davids at webmaster.com (David Schwartz) Date: Sun, 16 Mar 2008 23:05:48 -0700 Subject: Filesystem fragmentation and scatter-gather DMA In-Reply-To: <47DDCC03.1060002@berkeley.edu> Message-ID: Jon Forrest wrote: > 1) The greatest benefit from having a contiguous file would > be when the whole file is read (let's stick with reads) in > one I/O operation. The would result in the minimal amount of > disk arm movement, which is the slowest part of a disk I/O > operation. But, this isn't the way most I/Os take place. Instead, > most I/Os are fairly small. Plus, and this is the kicker, on > a modern multitasking operating system, those small I/Os are coming > from different processes reading from different files. Assuming that the > data to be read isn't in a memory cache, this means that the disk arm is > going to be flying all over the place, trying to satisfy all > the seek operations being issued by the operating system. > Sure, the operating system, and maybe even the disk controller, > might be trying to re-order I/Os but there's only so much of > this that can be done. A contiguous file doesn't really help > much because there's a very good change that the disk arm is > going to have to move elsewhere on the disk between the time > that pieces of a file are read. That's not really the issue. The issue is whether a read of a chunk of a file can take place without any extra seeks or whether it does require extra seeks. Further, for the vast majority of cases, there is only one I/O stream going on at a time. The disk will read ahead. If that can satisfy even a small fraction of the subsequent I/Os the OS issues, that's a big win. > 3) Modern disks do all kind of internal block remapping so there's > no guarantee that what appears to be contiguous to the operating > system is actually really and truly contiguous on the disk. I have > no idea how often this possibility occurs, or how bad the skew is > between "fake" blocks and "real" blocks. But, it could happen. Not bad enough to make a significant difference on any but a nearly-failing drive. > The mystery that really puzzles and sometimes frightens me is > why an NTFS file system becomes fragmented so easily in the first > place. Let's say I'm installing Windows 2000 on a newly formatted > 20GB disk. Let's say that the total amount of space used by the > new installation is 600MB. Why should I see any fragmented files, > other than registry files, after such an installation? I have no > idea. My thinking is that all files that aren't created and then > later extended should be able to be created contiguously to begin with. Only if you're willing to leave big holes behind, which will rapidly lead to a full disk and massive fragmentation. As files are being created, files are also being deleted. There is no way for the OS to know ahead of time which files are going to be around for a long time, so it has to mix the short-term files with the long-term files. But, of course, once you defragment a large chunk of non-changing files, they should stay that way. DS From liuyue at ncic.ac.cn Mon Mar 17 07:29:32 2008 From: liuyue at ncic.ac.cn (liuyue) Date: Mon, 17 Mar 2008 15:29:32 +0800 Subject: The maximum number of files under a folder Message-ID: <20080317071048.402B21368F7@ncic.ac.cn> John Nelson I see that EXT3_LINK_MAX was set to 32000, if we will meet with problems if we change this limit to 65000? Thanks! >i think not more than 5k files without dir_index on. The max limit of >subfolders is 32k > >_______________________________________________ >Ext3-users mailing list >Ext3-users at redhat.com >https://www.redhat.com/mailman/listinfo/ext3-users > > = = = = = = = = = = = = = = = = = = = = ????????? ?? ????????liuyue ????????liuyue at ncic.ac.cn ??????????2008-03-17 From liuyue at ncic.ac.cn Mon Mar 17 07:40:36 2008 From: liuyue at ncic.ac.cn (liuyue) Date: Mon, 17 Mar 2008 15:40:36 +0800 Subject: The maximum number of files under a folder Message-ID: <20080317072152.05DB51369B8@ncic.ac.cn> Theodore Tso, In 64bit system, directory size can not be bigger than 2GB? ======= 2008-03-14 01:23:18 ????????======= >On Thu, Mar 13, 2008 at 12:48:50PM -0400, John Nelson wrote: >> i think not more than 5k files without dir_index on. The max limit of >> subfolders is 32k > >There is no limit to the number of files in a folder, except for the >fact that the directory itself can't be bigger than 2GB, and the >number of inodes that the entire filesystem has available to it. Of >course, if you don't have directory indexing turned on, you may not >like the performance of doing directory lookups, but that's a >different story. > > - Ted > >_______________________________________________ >Ext3-users mailing list >Ext3-users at redhat.com >https://www.redhat.com/mailman/listinfo/ext3-users > > = = = = = = = = = = = = = = = = = = = = ????????? ?? ????????liuyue ????????liuyue at ncic.ac.cn ??????????2008-03-17 From tytso at mit.edu Mon Mar 17 13:32:07 2008 From: tytso at mit.edu (Theodore Tso) Date: Mon, 17 Mar 2008 09:32:07 -0400 Subject: The maximum number of files under a folder In-Reply-To: <20080317072152.05DB51369B8@ncic.ac.cn> References: <20080317072152.05DB51369B8@ncic.ac.cn> Message-ID: <20080317133207.GB8368@mit.edu> On Mon, Mar 17, 2008 at 03:40:36PM +0800, liuyue wrote: > Theodore Tso, > > In 64bit system, directory size can not be bigger than 2GB? No, because the high 32-bits for i_size are overloaded to store the directory creation acl. In practice, you really don't want to have a directory that huge anyway. Iterating through it all with readdir() gets horribly slow, and applications that try do anything with really huge directories would be well advised to use a database, because they will get *much* better performance that way.... - Ted From ric at emc.com Mon Mar 17 13:45:54 2008 From: ric at emc.com (Ric Wheeler) Date: Mon, 17 Mar 2008 09:45:54 -0400 Subject: Filesystem fragmentation and scatter-gather DMA In-Reply-To: References: <20080316222903.GC3542@webber.adilger.int> Message-ID: <47DE7612.2040306@emc.com> William Tambe wrote: > Is the delay due to mechanical parts or the electronics gathering the fragments? > > Would that same delay still apply to a solid state drive? Since a > solid state drive is really just a slower version of system memory > (Please correct me if I am wrong). > With spinning media,the big cost is moving the physical heads of the drive. With an SSD FLASH-based device, you might also prefer having contiguous writes since flash needs to be erased before the write can happen (and that occurs in chunks). Non-contiguous writes of single sectors would have a high chance of causing extra erasures & read-modify-writes... ric From jlforrest at berkeley.edu Mon Mar 17 16:52:04 2008 From: jlforrest at berkeley.edu (Jon Forrest) Date: Mon, 17 Mar 2008 09:52:04 -0700 Subject: Filesystem fragmentation and scatter-gather DMA In-Reply-To: References: Message-ID: <47DEA1B4.70204@berkeley.edu> David Schwartz wrote: > That's not really the issue. The issue is whether a read of a chunk of a > file can take place without any extra seeks or whether it does require extra > seeks. Further, for the vast majority of cases, there is only one I/O stream > going on at a time. The disk will read ahead. If that can satisfy even a > small fraction of the subsequent I/Os the OS issues, that's a big win. Maybe on a single user PC, some of the time there is only one I/O stream going on a time. But, once you start doing anything in parallel, or have multiple users, the number of sources (and destinations) of I/O goes way up. This, the arm is going to have to be moving around randomly even if the files involved aren't fragmented. Some (most?) OSs sort I/Os so that the movement is minimized but it still occurs. >> 3) Modern disks do all kind of internal block remapping so there's >> no guarantee that what appears to be contiguous to the operating >> system is actually really and truly contiguous on the disk. I have >> no idea how often this possibility occurs, or how bad the skew is >> between "fake" blocks and "real" blocks. But, it could happen. > > Not bad enough to make a significant difference on any but a nearly-failing > drive. It would be interesting to see what I'm calling the skew between the true sector layout and what an O/S sees on modern SATA drives. I'm not aware of any way to see this. Does anybody know? I stand by my assertion that while disk fragmentation is in no way a good thing, it isn't something to fear, at least not in the way shown in the advertisements for defragmentation products. -- Jon Forrest Research Computing Support College of Chemistry 173 Tan Hall University of California Berkeley Berkeley, CA 94720-1460 510-643-1032 jlforrest at berkeley.edu From ric at emc.com Mon Mar 17 17:11:24 2008 From: ric at emc.com (Ric Wheeler) Date: Mon, 17 Mar 2008 13:11:24 -0400 Subject: Filesystem fragmentation and scatter-gather DMA In-Reply-To: <47DEA1B4.70204@berkeley.edu> References: <47DEA1B4.70204@berkeley.edu> Message-ID: <47DEA63C.2010305@emc.com> Jon Forrest wrote: > David Schwartz wrote: > >> That's not really the issue. The issue is whether a read of a chunk of a >> file can take place without any extra seeks or whether it does require >> extra >> seeks. Further, for the vast majority of cases, there is only one I/O >> stream >> going on at a time. The disk will read ahead. If that can satisfy even a >> small fraction of the subsequent I/Os the OS issues, that's a big win. > > Maybe on a single user PC, some of the time there is only one I/O > stream going on a time. But, once you start doing anything in parallel, > or have multiple users, the number of sources (and destinations) of I/O > goes way up. This, the arm is going to have to be moving around randomly > even if the files involved aren't fragmented. Some (most?) OSs sort > I/Os so that the movement is minimized but it still occurs. You should keep in mind that big servers also have higher end storage systems (or at least multiple devices). Heads don't tend to move about randomly - they will normally try to read (or write) in a specific order. Normally, that order is in increasing sector order. Every level of the the system tries to guess how to combine and read ahead, all the way from the file system down to the internal firmware in the storage. The best way to get read-ahead to work is to use really obvious patterns - sequential, increasing and large IO's work best ;-) > >>> 3) Modern disks do all kind of internal block remapping so there's >>> no guarantee that what appears to be contiguous to the operating >>> system is actually really and truly contiguous on the disk. I have >>> no idea how often this possibility occurs, or how bad the skew is >>> between "fake" blocks and "real" blocks. But, it could happen. >> >> Not bad enough to make a significant difference on any but a >> nearly-failing >> drive. > > It would be interesting to see what I'm calling the skew between > the true sector layout and what an O/S sees on modern SATA drives. > I'm not aware of any way to see this. Does anybody know? I would not spend any time worrying about the sector remapping. SMART can tell you how many sectors have been remapped, but even with a really large disk the maximum number of remapped sectors is tiny (say 2000 or so for a 500GB disk). Your chances of hitting them are tiny, especially since most drives end up with very, very few remapped sectors before they get tossed. Those with more than 100 sectors, for example, tend to complain a lot. The short answer is to look at the sector level order of your file and assume (pretend) that it reflects the media layout as well. Note that the whole deal changes when you have multi-drive RAID devices (software or hardware). > I stand by my assertion that while disk fragmentation is in no way > a good thing, it isn't something to fear, at least not in the way > shown in the advertisements for defragmentation products. > I think that fragmentation is a bad performance hit, but that we actually do relatively well in keeping our files contiguous in normal cases. I have a simple bit of c code that uses fibmap to dump the sectors/blocks for a specific file. If you like, I can send it over to you. Regards, Ric From jlforrest at berkeley.edu Mon Mar 17 17:24:56 2008 From: jlforrest at berkeley.edu (Jon Forrest) Date: Mon, 17 Mar 2008 10:24:56 -0700 Subject: Filesystem fragmentation and scatter-gather DMA In-Reply-To: <47DEA63C.2010305@emc.com> References: <47DEA1B4.70204@berkeley.edu> <47DEA63C.2010305@emc.com> Message-ID: <47DEA968.5080700@berkeley.edu> Ric Wheeler wrote: > Every level of the the system tries to guess how to combine and read > ahead, all the way from the file system down to the internal firmware in > the storage. I remember Kirk McKusick once complaining about how hard it was to write a file system when so many other levels in a system try to second guess what he was trying to do. I've also heard disk engineers complain about the same thing, except they complain about the OS people not leaving optimization techniques to them. Go figure. > I think that fragmentation is a bad performance hit, but that we > actually do relatively well in keeping our files contiguous in normal > cases. We might disagree on how bad the performance hit is, but I'm really trying to prevent non-technical people from panicking when they see a fragmented filesystem (or file). > I have a simple bit of c code that uses fibmap to dump the > sectors/blocks for a specific file. If you like, I can send it over to you. Sure. Thanks. -- Jon Forrest Research Computing Support College of Chemistry 173 Tan Hall University of California Berkeley Berkeley, CA 94720-1460 510-643-1032 jlforrest at berkeley.edu From ric at emc.com Mon Mar 17 17:29:58 2008 From: ric at emc.com (Ric Wheeler) Date: Mon, 17 Mar 2008 13:29:58 -0400 Subject: Filesystem fragmentation and scatter-gather DMA In-Reply-To: <47DEA968.5080700@berkeley.edu> References: <47DEA1B4.70204@berkeley.edu> <47DEA63C.2010305@emc.com> <47DEA968.5080700@berkeley.edu> Message-ID: <47DEAA96.6070005@emc.com> Jon Forrest wrote: > Ric Wheeler wrote: > >> Every level of the the system tries to guess how to combine and read >> ahead, all the way from the file system down to the internal firmware >> in the storage. > > I remember Kirk McKusick once complaining about how hard it was to write > a file system when so many other levels in a system try to second guess > what he was trying to do. I've also heard disk engineers complain about > the same thing, except they complain about the OS people not leaving > optimization techniques to them. Go figure. The trick is just to do the obvious thing (big, sequential IO's) from the application to give the various layers the easiest job of second guessing ;-) There are certainly advantages to doing the read ahead (and coalescing) at the different layers. For example, a file system can do predictive read ahead across the non-contiguous chunks of a single file while the IO layer can coalesce multiple write or read commands on the same host and a multi-ported drive can do the same for multiple hosts. > >> I think that fragmentation is a bad performance hit, but that we >> actually do relatively well in keeping our files contiguous in normal >> cases. > > We might disagree on how bad the performance hit is, but I'm really > trying to prevent non-technical people from panicking when they see > a fragmented filesystem (or file). I agree - most casual users will never see anything close to a performance issue until they have completely filled the file system. In that case, defragmentation will not be the real help. > >> I have a simple bit of c code that uses fibmap to dump the >> sectors/blocks for a specific file. If you like, I can send it over to >> you. > > Sure. Thanks. I will send it to you out of band. Mark Lord had some tweaks to this that I have not rolled in, let me know if it is useful. ric From davids at webmaster.com Mon Mar 17 22:20:29 2008 From: davids at webmaster.com (David Schwartz) Date: Mon, 17 Mar 2008 15:20:29 -0700 Subject: Filesystem fragmentation and scatter-gather DMA In-Reply-To: <47DEAA96.6070005@emc.com> Message-ID: Ric Wheeler wrote: > There are certainly advantages to doing the read ahead (and coalescing) > at the different layers. For example, a file system can do predictive > read ahead across the non-contiguous chunks of a single file while the > IO layer can coalesce multiple write or read commands on the same host > and a multi-ported drive can do the same for multiple hosts. If the file system does predictive read-ahead, and the data is not used, the penalty will be *much* larger if the predictive read-ahead required an extra seek than if it didn't. This is one of the biggest ways that fragmentation hurts performance. The other is if the disk does read-ahead and the next chunk of data in the file was needed, but wasn't read by the disk because of fragmentation. > > We might disagree on how bad the performance hit is, but I'm really > > trying to prevent non-technical people from panicking when they see > > a fragmented filesystem (or file). > I agree - most casual users will never see anything close to a > performance issue until they have completely filled the file system. In > that case, defragmentation will not be the real help. I agree with this as well. The only significant differences I've seen with disk defragmenters were in two cases: 1) The filesystem was close to full, and the defragmenter bought a bit of extra time before something had to be done. 2) The defragmenter was smart enough to move frequenty-accessed files to the fastest parts of the disk, and the disk had a large (20%) difference between its fastest and slowest tracks. Otherwise, it's a miniscule difference. I'd love to see smarter disks with much larger caches so that the OS could say to the disk "here's the data I need now, and here's what I might need later". DS From adilger at sun.com Tue Mar 18 01:14:12 2008 From: adilger at sun.com (Andreas Dilger) Date: Tue, 18 Mar 2008 09:14:12 +0800 Subject: Filesystem fragmentation and scatter-gather DMA In-Reply-To: <47DEA63C.2010305@emc.com> References: <47DEA1B4.70204@berkeley.edu> <47DEA63C.2010305@emc.com> Message-ID: <20080318011342.GH3542@webber.adilger.int> On Mar 17, 2008 13:11 -0400, Ric Wheeler wrote: > I have a simple bit of c code that uses fibmap to dump the sectors/blocks > for a specific file. If you like, I can send it over to you. Hmm, I could have sworn "filefrag" did this, but it doesn't have any mode that actually prints out a list of blocks, only the discontinuities in the file... We are adding a new "extents" output mode to filefrag which prints block mappings in a more useful manner, but it isn't in upstream e2fsprogs yet. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. From adilger at sun.com Tue Mar 18 22:56:58 2008 From: adilger at sun.com (Andreas Dilger) Date: Wed, 19 Mar 2008 06:56:58 +0800 Subject: The maximum number of files under a folder In-Reply-To: <20080317133207.GB8368@mit.edu> References: <20080317072152.05DB51369B8@ncic.ac.cn> <20080317133207.GB8368@mit.edu> Message-ID: <20080318225658.GA2971@webber.adilger.int> On Mar 17, 2008 09:32 -0400, Theodore Ts'o wrote: > On Mon, Mar 17, 2008 at 03:40:36PM +0800, liuyue wrote: > > Theodore Tso, > > > > In 64bit system, directory size can not be bigger than 2GB? > > No, because the high 32-bits for i_size are overloaded to store the > directory creation acl. I think we should change the code (kernel and e2fsprogs) to allow i_size_high for directories also. > In practice, you really don't want to have a directory that huge > anyway. Iterating through it all with readdir() gets horribly slow, > and applications that try do anything with really huge directories > would be well advised to use a database, because they will get *much* > better performance that way.... Actually, for many HPC applications they never do readdir at all. The job creates 1 file/process and always uses a predefined filename like {job}-{timestamp}-{process} that it will directly look up. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. From darkonc at gmail.com Wed Mar 19 06:35:03 2008 From: darkonc at gmail.com (Stephen Samuel) Date: Tue, 18 Mar 2008 23:35:03 -0700 Subject: The maximum number of files under a folder In-Reply-To: <20080318225658.GA2971@webber.adilger.int> References: <20080317072152.05DB51369B8@ncic.ac.cn> <20080317133207.GB8368@mit.edu> <20080318225658.GA2971@webber.adilger.int> Message-ID: <6cd50f9f0803182335o45fa23echd8128cd6ddd2216e@mail.gmail.com> The OS will have to search the directory to see if the file already exists before creating it. Well, if you hash it such that it splits up something like: jobid(upper part)/jobid(lower- part)[/-]timestamp-process, you'll find that your access times will be must faster (especially if you don't use H-Trees). This also applies if you're just creating a file, because you'll have to search the entire directory to see if that filename exists With regular directories, searching through them to see if a file already exist increases linearly with the number of entries. If you hash on 3 levels with 8-bits per level, you'll have to open 2 or 3 extra inodes, but you'll cut your directory search times down by a factor of 20000-1. You'll also skip having to deal with any sort of directory-size limit. (=2^24/256/3) I did something similar on a Solaris box which had 200000 emails in the /var/spool/mqueue directory. That many messages was slowing the system to a crawl. I hashed it into 100 directories with 2000 entries each, it sped things up *enormously.* On Tue, Mar 18, 2008 at 3:56 PM, Andreas Dilger wrote: > On Mar 17, 2008 09:32 -0400, Theodore Ts'o wrote: > > On Mon, Mar 17, 2008 at 03:40:36PM +0800, liuyue wrote: > > > Theodore Tso, > > > > > > In 64bit system, directory size can not be bigger than 2GB? > > > > No, because the high 32-bits for i_size are overloaded to store the > > directory creation acl. > > I think we should change the code (kernel and e2fsprogs) to allow > i_size_high for directories also. > > > In practice, you really don't want to have a directory that huge > > anyway. Iterating through it all with readdir() gets horribly slow, > > and applications that try do anything with really huge directories > > would be well advised to use a database, because they will get *much* > > better performance that way.... > > Actually, for many HPC applications they never do readdir at all. > The job creates 1 file/process and always uses a predefined filename > like {job}-{timestamp}-{process} that it will directly look up. > > Cheers, Andreas > -- Stephen Samuel http://www.bcgreen.com 778-861-7641 -------------- next part -------------- An HTML attachment was scrubbed... URL: From articpenguin3800 at gmail.com Wed Mar 19 12:16:15 2008 From: articpenguin3800 at gmail.com (John Nelson) Date: Wed, 19 Mar 2008 08:16:15 -0400 Subject: The maximum number of files under a folder In-Reply-To: <6cd50f9f0803182335o45fa23echd8128cd6ddd2216e@mail.gmail.com> References: <20080317072152.05DB51369B8@ncic.ac.cn> <20080317133207.GB8368@mit.edu> <20080318225658.GA2971@webber.adilger.int> <6cd50f9f0803182335o45fa23echd8128cd6ddd2216e@mail.gmail.com> Message-ID: <47E1040F.5060408@gmail.com> What does what does the h stand for in h-tree? Like the b in btree is binary Tree Stephen Samuel wrote: > The OS will have to search the directory to see if the file already > exists before creating it. > > Well, if you hash it such that it splits up something like: > jobid(upper part)/jobid(lower- part)[/-]timestamp-process, > you'll find that your access times will be must faster (especially if > you don't use H-Trees). This also applies if you're just creating a > file, because you'll have to search the entire directory to see if > that filename exists > > With regular directories, searching through them to see if a file > already exist increases linearly with the number of entries. If you > hash on 3 levels with 8-bits per level, you'll have to open 2 or 3 > extra inodes, but you'll cut your directory search times down by a > factor of 20000-1. You'll also skip having to deal with any sort of > directory-size limit. (=2^24/256/3) > > I did something similar on a Solaris box which had 200000 emails in > the /var/spool/mqueue directory. That many messages was slowing the > system to a crawl. I hashed it into 100 directories with 2000 > entries each, it sped things up *enormously.* > > On Tue, Mar 18, 2008 at 3:56 PM, Andreas Dilger > wrote: > > On Mar 17, 2008 09:32 -0400, Theodore Ts'o wrote: > > On Mon, Mar 17, 2008 at 03:40:36PM +0800, liuyue wrote: > > > Theodore Tso, > > > > > > In 64bit system, directory size can not be bigger than 2GB? > > > > No, because the high 32-bits for i_size are overloaded to store the > > directory creation acl. > > I think we should change the code (kernel and e2fsprogs) to allow > i_size_high for directories also. > > > In practice, you really don't want to have a directory that huge > > anyway. Iterating through it all with readdir() gets horribly slow, > > and applications that try do anything with really huge directories > > would be well advised to use a database, because they will get > *much* > > better performance that way.... > > Actually, for many HPC applications they never do readdir at all. > The job creates 1 file/process and always uses a predefined filename > like {job}-{timestamp}-{process} that it will directly look up. > > Cheers, Andreas > > > > > -- > Stephen Samuel http://www.bcgreen.com > 778-861-7641 From tytso at MIT.EDU Wed Mar 19 16:01:51 2008 From: tytso at MIT.EDU (Theodore Tso) Date: Wed, 19 Mar 2008 12:01:51 -0400 Subject: The maximum number of files under a folder In-Reply-To: <47E1040F.5060408@gmail.com> References: <20080317072152.05DB51369B8@ncic.ac.cn> <20080317133207.GB8368@mit.edu> <20080318225658.GA2971@webber.adilger.int> <6cd50f9f0803182335o45fa23echd8128cd6ddd2216e@mail.gmail.com> <47E1040F.5060408@gmail.com> Message-ID: <20080319160151.GK3158@mit.edu> On Wed, Mar 19, 2008 at 08:16:15AM -0400, John Nelson wrote: > What does what does the h stand for in h-tree? Like the b in btree is > binary Tree Hash-tree. (And the 'b' in btree usually standards for balanced tree). What we do is we hash the directory name, and use the hashed name to put into the tree. For simplicity's sake, we don't do balancing in ext3's htree implementation. - Ted From ashitpro at yahoo.co.in Thu Mar 20 10:51:04 2008 From: ashitpro at yahoo.co.in (ashish mahamuni) Date: Thu, 20 Mar 2008 16:21:04 +0530 (IST) Subject: How to get device name with device id? Message-ID: <558394.76006.qm@web94612.mail.in2.yahoo.com> Hi all, I want to open a device(/dev/sda1, /dev/hda2 etc) in which my file exists. I've used 'stat' system call to get the device id. But now I want the device name from this id(st_dev). How to get that one? Or Do you have any other method to know the device name where my file resides? Thanks Bollywood, fun, friendship, sports and more. You name it, we have it on http://in.promos.yahoo.com/groups From liuyue at ncic.ac.cn Thu Mar 20 10:59:59 2008 From: liuyue at ncic.ac.cn (liuyue) Date: Thu, 20 Mar 2008 18:59:59 +0800 Subject: The maximum number of files under a folder Message-ID: <20080320104039.F2DFB13692A@ncic.ac.cn> Thank you all. Now I find a patch which can extend ext3 subdirectory limit. http://osdir.com/ml/file-systems.ext2.devel/2004-12/msg00026.html ======= 2008-03-19 06:56:58 ????????======= >On Mar 17, 2008 09:32 -0400, Theodore Ts'o wrote: >> On Mon, Mar 17, 2008 at 03:40:36PM +0800, liuyue wrote: >> > Theodore Tso, >> > >> > In 64bit system, directory size can not be bigger than 2GB? >> >> No, because the high 32-bits for i_size are overloaded to store the >> directory creation acl. > >I think we should change the code (kernel and e2fsprogs) to allow >i_size_high for directories also. > >> In practice, you really don't want to have a directory that huge >> anyway. Iterating through it all with readdir() gets horribly slow, >> and applications that try do anything with really huge directories >> would be well advised to use a database, because they will get *much* >> better performance that way.... > >Actually, for many HPC applications they never do readdir at all. >The job creates 1 file/process and always uses a predefined filename >like {job}-{timestamp}-{process} that it will directly look up. > >Cheers, Andreas >-- >Andreas Dilger >Sr. Staff Engineer, Lustre Group >Sun Microsystems of Canada, Inc. > > > = = = = = = = = = = = = = = = = = = = = ????????? ?? ????????liuyue ????????liuyue at ncic.ac.cn ??????????2008-03-20 From liuyue at ncic.ac.cn Thu Mar 20 11:04:51 2008 From: liuyue at ncic.ac.cn (liuyue) Date: Thu, 20 Mar 2008 19:04:51 +0800 Subject: How to get device name with device id? Message-ID: <20080320104532.1EB38136845@ncic.ac.cn> ashish mahamuni, I guess maybe the following function does what you want. But it is a kernel function, sorry :( int __file_to_disk (char * file_name, char *disk_name) { int err = 0; struct nameidata nd; struct super_block * sb; struct vfsmount *mnt; err = path_lookup(file_name, LOOKUP_FOLLOW, &nd); if(err){ DCFS3_ERROR("error to parse the file name, %s\n", file_name); goto exit; } mnt = nd.mnt; sb = mnt->mnt_sb; strcpy (disk_name, sb->s_bdev->bd_disk->disk_name); path_release(&nd); exit: return err; } ======= 2008-03-20 19:21:04 ????????======= >Hi all, > >I want to open a device(/dev/sda1, /dev/hda2 etc) in which my file exists. >I've used 'stat' system call to get the device id. > >But now I want the device name from this id(st_dev). >How to get that one? >Or >Do you have any other method to know the device name where my file resides? > >Thanks > > > Bollywood, fun, friendship, sports and more. You name it, we have it on http://in.promos.yahoo.com/groups > > >_______________________________________________ >Ext3-users mailing list >Ext3-users at redhat.com >https://www.redhat.com/mailman/listinfo/ext3-users > > = = = = = = = = = = = = = = = = = = = = ????????? ?? ????????liuyue ????????liuyue at ncic.ac.cn ??????????2008-03-20 From ashitpro at yahoo.co.in Thu Mar 20 11:13:09 2008 From: ashitpro at yahoo.co.in (ashish mahamuni) Date: Thu, 20 Mar 2008 16:43:09 +0530 (IST) Subject: How to get device name with device id? In-Reply-To: <20080320104532.1EB38136845@ncic.ac.cn> Message-ID: <171853.43026.qm@web94601.mail.in2.yahoo.com> Can you suggest any other method(in user space) for this? --- On Thu, 20/3/08, liuyue wrote: > From: liuyue > Subject: Re: How to get device name with device id? > To: "ashitpro at yahoo.co.in" , "ext3-users at redhat.com" > Date: Thursday, 20 March, 2008, 4:34 PM > ashish mahamuni, > > I guess maybe the following function does what you want. > But it is a kernel function, sorry :( > > int __file_to_disk (char * file_name, char *disk_name) { > int err = 0; > struct nameidata nd; > struct super_block * sb; > struct vfsmount *mnt; > err = path_lookup(file_name, LOOKUP_FOLLOW, > &nd); > if(err){ > DCFS3_ERROR("error to parse the file > name, %s\n", file_name); > goto exit; > } > mnt = nd.mnt; > sb = mnt->mnt_sb; > strcpy (disk_name, > sb->s_bdev->bd_disk->disk_name); > path_release(&nd); > exit: > return err; > } > > ======= 2008-03-20 19:21:04 > ????????======= > > >Hi all, > > > >I want to open a device(/dev/sda1, /dev/hda2 etc) in > which my file exists. > >I've used 'stat' system call to get the > device id. > > > >But now I want the device name from this id(st_dev). > >How to get that one? > >Or > >Do you have any other method to know the device name > where my file resides? > > > >Thanks > > > > > > Bollywood, fun, friendship, sports and more. You > name it, we have it on http://in.promos.yahoo.com/groups > > > > > >_______________________________________________ > >Ext3-users mailing list > >Ext3-users at redhat.com > >https://www.redhat.com/mailman/listinfo/ext3-users > > > > > > = = = = = = = = = = = = = = = = = = = = > > > ????????? > ?? > > > ????????liuyue > ????????liuyue at ncic.ac.cn > ??????????2008-03-20 Chat on a cool, new interface. No download required. Go to http://in.messenger.yahoo.com/webmessengerpromo.php From tytso at MIT.EDU Thu Mar 20 11:28:49 2008 From: tytso at MIT.EDU (Theodore Tso) Date: Thu, 20 Mar 2008 07:28:49 -0400 Subject: The maximum number of files under a folder In-Reply-To: <20080320104039.F2DFB13692A@ncic.ac.cn> References: <20080320104039.F2DFB13692A@ncic.ac.cn> Message-ID: <20080320112849.GU3158@mit.edu> On Thu, Mar 20, 2008 at 06:59:59PM +0800, liuyue wrote: > Thank you all. > > Now I find a patch which can extend ext3 subdirectory limit. > http://osdir.com/ml/file-systems.ext2.devel/2004-12/msg00026.html That's *subdirectories*, not files. The maximum number of files per directory are basically limited as discussed in this thread. The number of subdirectories was limited by the 16-bit i_nlink field. Andreas' idea for extending this limit, as described above, is in ext4. Regards, - Ted From ashitpro at yahoo.co.in Fri Mar 21 12:16:57 2008 From: ashitpro at yahoo.co.in (ashish mahamuni) Date: Fri, 21 Mar 2008 17:46:57 +0530 (IST) Subject: Unable to change the 'name' field from 'ext3_dir_entry_2' structure. Message-ID: <688934.98172.qm@web94601.mail.in2.yahoo.com> Hello everybody, I am trying to rename the file/directory by renaming the 'name' field from ext3_dir_entry_2 structure. I can easily do it for directories. I am reading the structure then I change this field, and writing it back as it is. New file name length will be similar as the old(just for simplicity). But whenever I do this for file. It doesn't do any thing. 'write' sys call gets execute properly. Next time if I read dir entry for this file it shows me older one. Am I doing anything wrong? Chat on a cool, new interface. No download required. Go to http://in.messenger.yahoo.com/webmessengerpromo.php From tytso at MIT.EDU Fri Mar 21 12:38:46 2008 From: tytso at MIT.EDU (Theodore Tso) Date: Fri, 21 Mar 2008 08:38:46 -0400 Subject: Unable to change the 'name' field from 'ext3_dir_entry_2' structure. In-Reply-To: <688934.98172.qm@web94601.mail.in2.yahoo.com> References: <688934.98172.qm@web94601.mail.in2.yahoo.com> Message-ID: <20080321123846.GF7991@mit.edu> On Fri, Mar 21, 2008 at 05:46:57PM +0530, ashish mahamuni wrote: > Hello everybody, > > I am trying to rename the file/directory by renaming the 'name' field from ext3_dir_entry_2 structure. > > I can easily do it for directories. > > I am reading the structure then I change this field, and writing it back as it is. > > New file name length will be similar as the old(just for simplicity). > > But whenever I do this for file. It doesn't do any thing. > > 'write' sys call gets execute properly. Next time if I read dir entry for this file it shows me older one. > > Am I doing anything wrong? #1. *Why* are you trying to do this? #2. Are you doing this on an unmounted filesystem? Or is the filesystem mounted when you tried to modify the filesystem directly using the write system call? - Ted From htmldeveloper at gmail.com Sat Mar 22 04:39:32 2008 From: htmldeveloper at gmail.com (Peter Teoh) Date: Sat, 22 Mar 2008 12:39:32 +0800 Subject: "Write once only but read many" filesystem In-Reply-To: <20080314232403.GI3542@webber.adilger.int> References: <804dabb00803140917o2abebd2dh12c77b21a48094c4@mail.gmail.com> <20080314232403.GI3542@webber.adilger.int> Message-ID: <47E48D84.7070701@gmail.com> For reasons of auditability/accountability, I would like a filesystem such that I can write to it only ONCE, subsequently not modifiable/deletable, but always readable. Kind of a database journal logs - it is continuously being written, sequentiall appending, but not circular buffer based, so that upon running out of space, logging will be paused in memory, and after new storage devices added to it, it will continue to flush out whatever is outstanding in memory. Can ext3 / ext4 or current jbd2 be easily configured to serve this purpose? Thanks. From ashitpro at yahoo.co.in Sat Mar 22 07:47:04 2008 From: ashitpro at yahoo.co.in (ashish mahamuni) Date: Sat, 22 Mar 2008 13:17:04 +0530 (IST) Subject: Unable to change the 'name' field from 'ext3_dir_entry_2' structure. In-Reply-To: <20080321123846.GF7991@mit.edu> Message-ID: <464689.50372.qm@web94615.mail.in2.yahoo.com> 1: I am trying to write a tool to hide a file/directory. So I am changing the 'name' field to NULL. Directories get hide properly. But nothing for file(Unable to change the 'name' field) 2: Of course filesystem is mounted. --- On Fri, 21/3/08, Theodore Tso wrote: > From: Theodore Tso > Subject: Re: Unable to change the 'name' field from 'ext3_dir_entry_2' structure. > To: "ashish mahamuni" > Cc: ext3-users at redhat.com > Date: Friday, 21 March, 2008, 6:08 PM > On Fri, Mar 21, 2008 at 05:46:57PM +0530, ashish mahamuni > wrote: > > Hello everybody, > > > > I am trying to rename the file/directory by renaming > the 'name' field from ext3_dir_entry_2 structure. > > > > I can easily do it for directories. > > > > I am reading the structure then I change this field, > and writing it back as it is. > > > > New file name length will be similar as the old(just > for simplicity). > > > > But whenever I do this for file. It doesn't do any > thing. > > > > 'write' sys call gets execute properly. Next > time if I read dir entry for this file it shows me older > one. > > > > Am I doing anything wrong? > > #1. *Why* are you trying to do this? > > #2. Are you doing this on an unmounted filesystem? Or is > the > filesystem mounted when you tried to modify the > filesystem directly > using the write system call? > > - Ted Unlimited freedom, unlimited storage. Get it now, on http://help.yahoo.com/l/in/yahoo/mail/yahoomail/tools/tools-08.html/ From tytso at MIT.EDU Sat Mar 22 12:29:33 2008 From: tytso at MIT.EDU (Theodore Tso) Date: Sat, 22 Mar 2008 08:29:33 -0400 Subject: Unable to change the 'name' field from 'ext3_dir_entry_2' structure. In-Reply-To: <464689.50372.qm@web94615.mail.in2.yahoo.com> References: <20080321123846.GF7991@mit.edu> <464689.50372.qm@web94615.mail.in2.yahoo.com> Message-ID: <20080322122933.GQ7991@mit.edu> On Sat, Mar 22, 2008 at 01:17:04PM +0530, ashish mahamuni wrote: > 1: I am trying to write a tool to hide a file/directory. > So I am changing the 'name' field to NULL. > Directories get hide properly. But nothing for file(Unable to change the 'name' field) So you're deliberately corrupting the filesystem. This wouldn't be for some university class assignment, would it? > 2: Of course filesystem is mounted. Well, there's your problem. The name is cached in the kernel's dentry cache. It won't necessarily work for directories, either, BTW. I think you've just been getting lucky. - Ted From htmldeveloper at gmail.com Sat Mar 22 15:55:53 2008 From: htmldeveloper at gmail.com (Peter Teoh) Date: Sat, 22 Mar 2008 23:55:53 +0800 Subject: "Write once only but read many" filesystem In-Reply-To: <20080322150626.GB19347@logfs.org> References: <804dabb00803140917o2abebd2dh12c77b21a48094c4@mail.gmail.com> <20080314232403.GI3542@webber.adilger.int> <47E48D84.7070701@gmail.com> <20080322102331.GA19347@logfs.org> <804dabb00803220752h670757d8o9c1b7fa3696467bc@mail.gmail.com> <20080322150626.GB19347@logfs.org> Message-ID: <804dabb00803220855q1aa41fc7mc30c7ce7951fe98@mail.gmail.com> Thank you for your reply :-). On Sat, Mar 22, 2008 at 11:06 PM, J?rn Engel wrote: > On Sat, 22 March 2008 22:52:12 +0800, Peter Teoh wrote: > > > > what are the difference in terms of final features provided by these > > two different filesystem? what is this "garbage collection"? u > > still have features like creating different directories, and creating > > different files, and writing the files? How about setting the file > > attributes...it should be set before writing right (so that after > > writing and handle is closed it becomes permanently not > > modifiable)..but creating a subdirectory below the current dir should > > be possible right (even after closing the previous directory)? > > Your requirements aren't quite clear to me. Do you want the complete > filesystem to be read-only after being written once? YES.... > Or do you want individual files/directories to be immutable - chattr? chattr is not good enough, as root can still modify it. So if current feature is not there, then some small development may be needed. > And in either case, what problem do you want to solve with a read-only filesystem? Simple: i want to record down everything that a user does, or a database does, or any applications running - just record down its state permanently securely into the filesystem, knowing that for sure, there is not way to modify the data, short of recreating the filesystem again. Sound logical? Or is there any loophole in this concept? In summary, are there any strong demand for such a concept/filesystem? I may take the plunge to implementing it, if justfiable and everybody is interested..:-)... -- Regards, Peter Teoh From ashitpro at yahoo.co.in Sun Mar 23 18:13:02 2008 From: ashitpro at yahoo.co.in (ashish mahamuni) Date: Sun, 23 Mar 2008 23:43:02 +0530 (IST) Subject: Unable to change the 'name' field from 'ext3_dir_entry_2' structure. In-Reply-To: <20080322122933.GQ7991@mit.edu> Message-ID: <707992.29162.qm@web94603.mail.in2.yahoo.com> ok.. I'll find some other way to hide the file/directory.. Can you suggest me the better and secure way to modify the dentry? I mean, which one should I modify? On disk structure or kernel cache(I guess this is what we called as memory data structure). Certainly this question is not only for dentry. The case should be common while modifying other data structures also. --- On Sat, 22/3/08, Theodore Tso wrote: > From: Theodore Tso > Subject: Re: Unable to change the 'name' field from 'ext3_dir_entry_2' structure. > To: "ashish mahamuni" > Cc: ext3-users at redhat.com > Date: Saturday, 22 March, 2008, 5:59 PM > On Sat, Mar 22, 2008 at 01:17:04PM +0530, ashish mahamuni > wrote: > > 1: I am trying to write a tool to hide a > file/directory. > > So I am changing the 'name' field to NULL. > > Directories get hide properly. But nothing for > file(Unable to change the 'name' field) > > So you're deliberately corrupting the filesystem. This > wouldn't be > for some university class assignment, would it? > > > 2: Of course filesystem is mounted. > > Well, there's your problem. The name is cached in the > kernel's dentry > cache. It won't necessarily work for directories, > either, BTW. I > think you've just been getting lucky. > > - Ted Save all your chat conversations. Find them online at http://in.messenger.yahoo.com/webmessengerpromo.php From tytso at MIT.EDU Mon Mar 24 00:19:16 2008 From: tytso at MIT.EDU (Theodore Tso) Date: Sun, 23 Mar 2008 20:19:16 -0400 Subject: Unable to change the 'name' field from 'ext3_dir_entry_2' structure. In-Reply-To: <707992.29162.qm@web94603.mail.in2.yahoo.com> References: <20080322122933.GQ7991@mit.edu> <707992.29162.qm@web94603.mail.in2.yahoo.com> Message-ID: <20080324001916.GC24943@mit.edu> On Sun, Mar 23, 2008 at 11:43:02PM +0530, ashish mahamuni wrote: > > ok.. > I'll find some other way to hide the file/directory.. > Can you suggest me the better and secure way to modify the dentry? > I mean, which one should I modify? On disk structure or kernel cache(I guess this is what we called as memory data structure). > Certainly this question is not only for dentry. The case should be common while modifying other data structures also. So what's the high level problem? *Why* are you trying to hide file names or directories? I repeat, is this for a university problem set or project? Or is there a practical real-life use for it. If so, *what* is the practical real-life use? What are you trying accomplish at the high level, and why is it useful to try to hide filenames or directories? Is this for a root kit, where you are trying to write malware? - Ted From scott.lovenberg at gmail.com Mon Mar 24 04:49:17 2008 From: scott.lovenberg at gmail.com (Scott Lovenberg) Date: Mon, 24 Mar 2008 00:49:17 -0400 Subject: "Write once only but read many" filesystem In-Reply-To: <20080322165906.GC19347@logfs.org> References: <804dabb00803140917o2abebd2dh12c77b21a48094c4@mail.gmail.com> <20080314232403.GI3542@webber.adilger.int> <47E48D84.7070701@gmail.com> <20080322102331.GA19347@logfs.org> <804dabb00803220752h670757d8o9c1b7fa3696467bc@mail.gmail.com> <20080322150626.GB19347@logfs.org> <804dabb00803220855q1aa41fc7mc30c7ce7951fe98@mail.gmail.com> <20080322165906.GC19347@logfs.org> Message-ID: <47E732CD.3070202@gmail.com> J?rn Engel wrote: > On Sat, 22 March 2008 23:55:53 +0800, Peter Teoh wrote: >>> Or do you want individual files/directories to be immutable - chattr? >> chattr is not good enough, as root can still modify it. So if >> current feature is not there, then some small development may be >> needed. >> >>> And in either case, what problem do you want to solve with a read-only filesystem? >> Simple: i want to record down everything that a user does, or a >> database does, or any applications running - just record down its >> state permanently securely into the filesystem, knowing that for sure, >> there is not way to modify the data, short of recreating the >> filesystem again. Sound logical? Or is there any loophole in this >> concept? > > The loophole is called root. In a normal setup, root can do anything, > including writing directly to the device your filesystem resides in, > writing to kernel memory, etc. > > It may be rather inconvenient to change a filesystem by writing to the > block device, but far from impossible. If you want to make such changes > impossible, you are facing an uphill battle that I personally don't care > about. And if inconvenience is good enough, wouldn't chattr be > sufficiently inconvenient? > > J?rn > How about mounting an isofs via loopback? This has the added benefit of being ready to be exported to disc. You can make it with mkisofs on a directory structure and mount it to the tree with a normal mount(1). If it asks for fs type on mount, I think its 'iso9660'. From htmldeveloper at gmail.com Mon Mar 24 06:35:46 2008 From: htmldeveloper at gmail.com (Peter Teoh) Date: Mon, 24 Mar 2008 14:35:46 +0800 Subject: "Write once only but read many" filesystem In-Reply-To: <47E732CD.3070202@gmail.com> References: <804dabb00803140917o2abebd2dh12c77b21a48094c4@mail.gmail.com> <20080314232403.GI3542@webber.adilger.int> <47E48D84.7070701@gmail.com> <20080322102331.GA19347@logfs.org> <804dabb00803220752h670757d8o9c1b7fa3696467bc@mail.gmail.com> <20080322150626.GB19347@logfs.org> <804dabb00803220855q1aa41fc7mc30c7ce7951fe98@mail.gmail.com> <20080322165906.GC19347@logfs.org> <47E732CD.3070202@gmail.com> Message-ID: <47E74BC2.7040408@gmail.com> An HTML attachment was scrubbed... URL: From ashitpro at yahoo.co.in Mon Mar 24 06:42:57 2008 From: ashitpro at yahoo.co.in (ashish mahamuni) Date: Mon, 24 Mar 2008 12:12:57 +0530 (IST) Subject: Unable to change the 'name' field from 'ext3_dir_entry_2' structure. In-Reply-To: <20080324001916.GC24943@mit.edu> Message-ID: <888119.83086.qm@web94606.mail.in2.yahoo.com> Oh sir, This is not any university problem set or project. It really dont have any practical real-life use. This is not a root kit or any malware. I just want to learn the file system(ext2/ext3). I know there are number of books on filesystem,but my way of learning is bit different. I don't like thearotical ways. I like practical implementions. So I thought why not to start with some little tool like hiding file. If you don't like my idea,then suggest me somthing different which has some practical use. Thanks Ashish --- On Mon, 24/3/08, Theodore Tso wrote: > From: Theodore Tso > Subject: Re: Unable to change the 'name' field from 'ext3_dir_entry_2' structure. > To: "ashish mahamuni" > Cc: ext3-users at redhat.com > Date: Monday, 24 March, 2008, 5:49 AM > On Sun, Mar 23, 2008 at 11:43:02PM +0530, ashish mahamuni > wrote: > > > > ok.. > > I'll find some other way to hide the > file/directory.. > > Can you suggest me the better and secure way to modify > the dentry? > > I mean, which one should I modify? On disk structure > or kernel cache(I guess this is what we called as memory > data structure). > > Certainly this question is not only for dentry. The > case should be common while modifying other data > structures also. > > So what's the high level problem? *Why* are you trying > to hide file > names or directories? > > I repeat, is this for a university problem set or project? > > Or is there a practical real-life use for it. If so, > *what* is the > practical real-life use? What are you trying accomplish at > the high > level, and why is it useful to try to hide filenames or > directories? > > Is this for a root kit, where you are trying to write > malware? > > - Ted Did you know? You can CHAT without downloading messenger. Go to http://in.messenger.yahoo.com/webmessengerpromo.php/ From articpenguin3800 at gmail.com Mon Mar 24 19:48:04 2008 From: articpenguin3800 at gmail.com (John Nelson) Date: Mon, 24 Mar 2008 15:48:04 -0400 Subject: resize2fs Message-ID: <47E80574.3090400@gmail.com> hi Why does resize2fs have to scan the whole partition when expanding? it dosent do this when it shrinks From tytso at MIT.EDU Mon Mar 24 22:23:08 2008 From: tytso at MIT.EDU (Theodore Tso) Date: Mon, 24 Mar 2008 18:23:08 -0400 Subject: resize2fs In-Reply-To: <47E80574.3090400@gmail.com> References: <47E80574.3090400@gmail.com> Message-ID: <20080324222308.GD30110@mit.edu> On Mon, Mar 24, 2008 at 03:48:04PM -0400, John Nelson wrote: > hi > Why does resize2fs have to scan the whole partition when expanding? it > dosent do this when it shrinks Resize2fs sometimes, when either expanding or shrinking a partition, will need to scan the inode table so it can move blocks. It may need to do this if it is shrinking a partition, and there are files which are using blocks at the end of partition which will no longer be available at the end of the srhink operation, so it needs to scan the inode tables to determine which inodes need to be updated as part of moving the data blocks. When resize2fs is expanding the filesystem, if the filesystem grows enough that more blocks need to be reserved for the block group descriptors, then similarly it will need to scan the inode table to determine which inodes will need to be updated when moving blocks out of the way so the block group descriptors can be expanded. Regards, - Ted From sebastia at l00-bugdead-prods.de Mon Mar 31 06:36:45 2008 From: sebastia at l00-bugdead-prods.de (Sebastian Reitenbach) Date: Mon, 31 Mar 2008 08:36:45 +0200 Subject: with dir_index ls is slower than without? Message-ID: <20080331063645.F1A3AD13DA@smtp.l00-bugdead-prods.de> Hi, I try to tune a ext3 filesystem, as I've heard, that when enabling dir_index option, then ls -l or find will be a lot faster than before. So I did. I created 2 partition on the harddisc, each 20GB: installhost2:~ # fdisk -l /dev/sda Disk /dev/sda: 80.0 GB, 80026361856 bytes 255 heads, 63 sectors/track, 9729 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sda1 1 192 1542208+ 82 Linux swap / Solaris /dev/sda2 * 193 2803 20972857+ 83 Linux /dev/sda3 2804 5236 19543072+ 83 Linux /dev/sda4 5237 7669 19543072+ 83 Linux /dev/sda3 was formatted with the dir_index option enabled, /dev/sda4 with dir_index disabled: installhost2:/ # tune2fs -l /dev/sda3 tune2fs 1.38 (30-Jun-2005) Filesystem volume name: Last mounted on: Filesystem UUID: d90ccbb9-f45a-4304-87d8-805fce775c23 Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal dir_index filetype needs_recovery sparse_super Default mount options: (none) Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 2443200 Block count: 4885768 Reserved block count: 244288 Free blocks: 4273422 Free inodes: 1943188 First block: 0 Block size: 4096 Fragment size: 4096 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 16288 Inode blocks per group: 509 Filesystem created: Thu Mar 27 17:14:40 2008 Last mount time: Fri Mar 28 11:39:47 2008 Last write time: Fri Mar 28 11:39:47 2008 Mount count: 7 Maximum mount count: 28 Last checked: Thu Mar 27 17:14:40 2008 Check interval: 15552000 (6 months) Next check after: Tue Sep 23 18:14:40 2008 Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 128 Journal inode: 8 Default directory hash: tea Directory Hash Seed: 306a3c58-3cbb-4f4a-856a-e48ae3006a07 Journal backup: inode blocks installhost2:/ # tune2fs -l /dev/sda4 tune2fs 1.38 (30-Jun-2005) Filesystem volume name: Last mounted on: Filesystem UUID: 2bb124a4-f7c7-4cac-b0c1-16aa8afc67eb Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal filetype needs_recovery sparse_super Default mount options: (none) Filesystem state: clean Errors behavior: Continue Filesystem OS type: Linux Inode count: 2443200 Block count: 4885768 Reserved block count: 244288 Free blocks: 4274331 Free inodes: 1943188 First block: 0 Block size: 4096 Fragment size: 4096 Blocks per group: 32768 Fragments per group: 32768 Inodes per group: 16288 Inode blocks per group: 509 Filesystem created: Thu Mar 27 17:15:03 2008 Last mount time: Fri Mar 28 11:39:47 2008 Last write time: Fri Mar 28 11:39:47 2008 Mount count: 7 Maximum mount count: 23 Last checked: Thu Mar 27 17:15:03 2008 Check interval: 15552000 (6 months) Next check after: Tue Sep 23 18:15:03 2008 Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 128 Journal inode: 8 Default directory hash: tea Directory Hash Seed: 1cfc2290-e289-4c49-a57f-2b2e3b9e91c4 Journal backup: inode blocks The partitions are mounted: /dev/sda3 on /mnt/index type ext3 (rw) /dev/sda4 on /mnt/noindex type ext3 (rw) If I create 500000 files, each 1kB from /dev/urandom, the ls -la command needs a lot of time on the partition with dir_index enabled (the wc -l is to eleminate the slow terminal :), the files were created on one partition and rsynced to the other: installhost2:~ # time ls -la /mnt/index/ | wc -l 500005 real 2m41.015s user 0m4.568s sys 0m6.520s installhost2:~ # time ls -la /mnt/noindex/ | wc -l 500005 real 0m10.792s user 0m3.172s sys 0m6.000s I expected the dir_index should speedup this a little bit? I assume I'm still missing sth? I'm on SLES10sp1, kernel 2.6.16.46 x86_64. kind regards Sebastian From niko at petole.dyndns.org Mon Mar 31 08:36:46 2008 From: niko at petole.dyndns.org (Nicolas KOWALSKI) Date: Mon, 31 Mar 2008 10:36:46 +0200 Subject: with dir_index ls is slower than without? In-Reply-To: <20080331063645.F1A3AD13DA@smtp.l00-bugdead-prods.de> References: <20080331063645.F1A3AD13DA@smtp.l00-bugdead-prods.de> Message-ID: <874panl275.fsf@petole.dyndns.org> "Sebastian Reitenbach" writes: > installhost2:~ # time ls -la /mnt/index/ | wc -l > 500005 > > real 2m41.015s > user 0m4.568s > sys 0m6.520s > > > installhost2:~ # time ls -la /mnt/noindex/ | wc -l > 500005 > > real 0m10.792s > user 0m3.172s > sys 0m6.000s > > I expected the dir_index should speedup this a little bit? > I assume I'm still missing sth? I think the point of dir_index is "only" to quickly find in a large directory a file when you _already_ have its name. The performance of listing is not its purpose, and as you noted it, even makes performance worse. -- Nicolas From sebastia at l00-bugdead-prods.de Mon Mar 31 11:18:06 2008 From: sebastia at l00-bugdead-prods.de (Sebastian Reitenbach) Date: Mon, 31 Mar 2008 13:18:06 +0200 Subject: with dir_index ls is slower than without? Message-ID: <20080331111807.6A293D148D@smtp.l00-bugdead-prods.de> Hi Nicolas, Nicolas KOWALSKI wrote: > "Sebastian Reitenbach" writes: > > > installhost2:~ # time ls -la /mnt/index/ | wc -l > > 500005 > > > > real 2m41.015s > > user 0m4.568s > > sys 0m6.520s > > > > > > installhost2:~ # time ls -la /mnt/noindex/ | wc -l > > 500005 > > > > real 0m10.792s > > user 0m3.172s > > sys 0m6.000s > > > > I expected the dir_index should speedup this a little bit? > > I assume I'm still missing sth? > > I think the point of dir_index is "only" to quickly find in a large > directory a file when you _already_ have its name. > > The performance of listing is not its purpose, and as you noted it, > even makes performance worse. ah, that would explain what I've seen here. after reading your answer, I found this older mail in the archives: http://osdir.com/ml/file-systems.ext3.user/2004-09/msg00029.html So everything seems to depend on how the application is using the filesystem. Picking a single given file might be faster than with a plain ext3, but scanning and opening all files in a directory might become slower. I wanted to use the dir_index for some partitions, like for cyrus imap server, and for some other applications. I think I have to benchmark the applications, to see whether they get a speed gain of the dir_index or not. kind regards Sebastian >