From Sinha_Himanshu at emc.com Thu Jul 6 18:02:16 2006 From: Sinha_Himanshu at emc.com (Sinha_Himanshu at emc.com) Date: Thu, 6 Jul 2006 14:02:16 -0400 Subject: Limited write bandwidth from ext3 Message-ID: <7E76AE153FDC3240BA7E82E23972F9FE01B6037D@CORPUSMX30B.corp.emc.com> We tried the extents+mballoc+delalloc patches suggested by Andreas and found that it made a significant improvement in our benchmark - write bandwidth increased from 144 MBps to 214 MBps. We are at about 85% of the bandwidth that one can get writing to an ext2 file which in turn is about 82% of the bandwidth one can get writing to the block device. We are analyzing our traces to determine the cause of these differences. So far, we see that during writes to the ext3 file lun writes periodically wait for 5 reads while in the case of writes to ext2 file lun writes periodically wait for only one read. Workload: Single threaded 512 KB writes to a new file. RedHat 4 U1 2.6.16.8 kernel (2.6.9 based kernel) Block Device 308 MBps 306 MBps Ext2 file 267 255 Ext3 138 144 Ext3 with patches N/A 216 Ext3 with patches, journal on separate LUN 215 Himanshu -----Original Message----- From: Andreas Dilger [mailto:adilger at clusterfs.com] Sent: Wednesday, June 21, 2006 4:54 PM To: Sinha, Himanshu Cc: ext3-users at redhat.com Subject: Re: Limited write bandwidth from ext3 On Jun 19, 2006 14:18 -0400, Sinha_Himanshu at emc.com wrote: > We measured the write bandwidth for writes to the block device > corresponding to the lun (e.g. /dev/sdb), a file in an ext2 filesystem > and to a file in an ext3 file system. > Write b/w for 512 KB writes > Block device 312 MBps > Ext2 file 247 MBps > Ext3 file 130 MBps > > We are looking for ways to improve the ext3 file write bandwidth. Have a look at the extents+mballoc+delalloc patches from Alex Tomas: ftp://ftp.lustre.org/pub/people/alex/2.6.16.8/ Mount the filesystem with "-o extents,mballoc,delalloc" to enable this. They noticably improve IO performance while also reducing the CPU load for ext3. The extent patches are approved by all of the ext3 developers and will be supported upstream fairly soon (in the kernel and e2fsprogs), and mballoc+delalloc will follow on afterward. NOTE: the extents on-disk format is incompatible with older kernels, so at this stage consider it "for benchmarking only". Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. From herta.vandeneynde at cc.kuleuven.be Mon Jul 10 10:29:50 2006 From: herta.vandeneynde at cc.kuleuven.be (Herta Van den Eynde) Date: Mon, 10 Jul 2006 12:29:50 +0200 Subject: chattr +T not implemented? Message-ID: <44B22C1E.6050908@cc.kuleuven.be> We run a third party application that creates an inordinate amount of subdirectories in a single directory. To speed up I/O, I wanted to set the T attribute on the directory that will hold the subdirectories. The "chattr +T /usr/local/lepus-bb/a-0607" command returns status 0, but when I verify the setting, the attribute isn't there: # lsattr -d /usr/local/lepus-bb/a-0607 ------------- /usr/local/lepus-bb/a-0607 Is this attribute implemented? The manual pages entry for chattr suggests it is, but when I check the chattr usage, "T" isn't listed: #chattr -v usage: chattr [-RV] [-+=AacDdijsSu] [-v version] files... FWIIW # cat /proc/version Linux version 2.4.21-40.ELsmp (bhcompile at hs20-bc1-7.build.redhat.com) (gcc version 3.2.3 20030502 (Red Hat Linux 3.2.3-54)) #1 SMP Thu Feb 2 22:22:39 EST 2006 Kind regards, Herta Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm From tytso at mit.edu Mon Jul 10 18:08:00 2006 From: tytso at mit.edu (Theodore Tso) Date: Mon, 10 Jul 2006 14:08:00 -0400 Subject: chattr +T not implemented? In-Reply-To: <44B22C1E.6050908@cc.kuleuven.be> References: <44B22C1E.6050908@cc.kuleuven.be> Message-ID: <20060710180800.GB16137@thunk.org> On Mon, Jul 10, 2006 at 12:29:50PM +0200, Herta Van den Eynde wrote: > Is this attribute implemented? The manual pages entry for chattr > suggests it is, but when I check the chattr usage, "T" isn't listed: > > #chattr -v > usage: chattr [-RV] [-+=AacDdijsSu] [-v version] files... > > FWIIW > # cat /proc/version > Linux version 2.4.21-40.ELsmp (bhcompile at hs20-bc1-7.build.redhat.com) > (gcc version 3.2.3 20030502 (Red Hat Linux 3.2.3-54)) #1 SMP Thu Feb 2 > 22:22:39 EST 2006 To quote from the man page: A directory with attribute will be deemed to be the top of directory hierarchies for the purposes of the Orlov block allocator (which is used in on systems with Linux 2.5.46 or later). You're using Linux version 2.4.21.... - Ted From adilger at clusterfs.com Mon Jul 10 18:37:02 2006 From: adilger at clusterfs.com (Andreas Dilger) Date: Mon, 10 Jul 2006 12:37:02 -0600 Subject: chattr +T not implemented? In-Reply-To: <44B22C1E.6050908@cc.kuleuven.be> References: <44B22C1E.6050908@cc.kuleuven.be> Message-ID: <20060710183702.GF15380@schatzie.adilger.int> On Jul 10, 2006 12:29 +0200, Herta Van den Eynde wrote: > We run a third party application that creates an inordinate amount of > subdirectories in a single directory. To speed up I/O, I wanted to set > the T attribute on the directory that will hold the subdirectories. The > "chattr +T /usr/local/lepus-bb/a-0607" command returns status 0, but > when I verify the setting, the attribute isn't there: > > # lsattr -d /usr/local/lepus-bb/a-0607 > ------------- /usr/local/lepus-bb/a-0607 > > Is this attribute implemented? The manual pages entry for chattr > suggests it is, but when I check the chattr usage, "T" isn't listed: > > #chattr -v > usage: chattr [-RV] [-+=AacDdijsSu] [-v version] files... man chattr(1) reports: A directory with the ?T? attribute will be deemed to be the top of directory hierarchies for the purposes of the Orlov block allocator (which is used in on systems with Linux 2.5.46 or later).` You can also check with "debugfs -c -R 'stat lepus-bb/a-0607' /dev/XXXX" (assuming /usr/local/ is the mountpoint). It may be that the kernel is not allowing the T attribute in the EXT3_FL_USER_VISIBLE mask, though it does show correctly in my kernel. #define EXT3_TOPDIR_FL 0x00020000 /* Top of directory hierarchies*/ #define EXT3_FL_USER_VISIBLE 0x0003DFFF /* User visible flags */ Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. From herta.vandeneynde at cc.kuleuven.be Mon Jul 10 22:11:54 2006 From: herta.vandeneynde at cc.kuleuven.be (Herta Van den Eynde) Date: Tue, 11 Jul 2006 00:11:54 +0200 Subject: chattr +T not implemented? In-Reply-To: <20060710180800.GB16137@thunk.org> References: <44B22C1E.6050908@cc.kuleuven.be> <20060710180800.GB16137@thunk.org> Message-ID: <44B2D0AA.7000809@cc.kuleuven.be> Theodore Tso wrote: > On Mon, Jul 10, 2006 at 12:29:50PM +0200, Herta Van den Eynde wrote: > >>Is this attribute implemented? The manual pages entry for chattr >>suggests it is, but when I check the chattr usage, "T" isn't listed: >> >> #chattr -v >> usage: chattr [-RV] [-+=AacDdijsSu] [-v version] files... >> >>FWIIW >> # cat /proc/version >> Linux version 2.4.21-40.ELsmp (bhcompile at hs20-bc1-7.build.redhat.com) >> (gcc version 3.2.3 20030502 (Red Hat Linux 3.2.3-54)) #1 SMP Thu Feb 2 >> 22:22:39 EST 2006 > > > To quote from the man page: > > A directory with attribute will be deemed to be the top of > directory hierarchies for the purposes of the Orlov block allocator > (which is used in on systems with Linux 2.5.46 or later). > > You're using Linux version 2.4.21.... > Ouch. Missed that. Thanks for pointing it out, Theodore. Kind regards, Herta Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm From zeremski.boris at nsinfo.co.yu Thu Jul 13 07:15:47 2006 From: zeremski.boris at nsinfo.co.yu (Zeremski Boris) Date: Thu, 13 Jul 2006 09:15:47 +0200 Subject: detail explain of file creation process Message-ID: <200607130800.k6D80n6u024206@mx1.redhat.com> Hi, Could someone point me to documentation or explain in detail, process of creating file.(space reservation, inode....) What is happen at low lavel? Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: From evilninja at gmx.net Thu Jul 13 17:24:34 2006 From: evilninja at gmx.net (christian) Date: Thu, 13 Jul 2006 18:24:34 +0100 (BST) Subject: detail explain of file creation process In-Reply-To: <200607130800.k6D80n6u024206@mx1.redhat.com> References: <200607130800.k6D80n6u024206@mx1.redhat.com> Message-ID: On Thu, 13 Jul 2006, Zeremski Boris wrote: > Could someone point me to documentation or explain > in detail, process of creating file.(space reservation, inode....) > What is happen at low lavel? does this: http://e2fsprogs.sourceforge.net/ext2intro.html suffice? -- BOFH excuse #332: suboptimal routing experience From zeremski.boris at nsinfo.co.yu Fri Jul 14 05:09:57 2006 From: zeremski.boris at nsinfo.co.yu (Zeremski Boris) Date: Fri, 14 Jul 2006 07:09:57 +0200 Subject: detail explain of file creation process In-Reply-To: Message-ID: <200607140510.k6E5AGnC008664@mx1.redhat.com> Hi, this link is great, explain basic concept of ext2/3 file system (inode, directory, soft/hard links...). What I am interested in, is more detail process of creating file. What is going on when, for example, make 'touch test.file' till that file really start existing on file system. Where can I find his kind of information? > > > Could someone point me to documentation or explain > > in detail, process of creating file.(space reservation, inode....) > > What is happen at low lavel? > > does this: http://e2fsprogs.sourceforge.net/ext2intro.html suffice? > From adilger at clusterfs.com Fri Jul 14 05:20:57 2006 From: adilger at clusterfs.com (Andreas Dilger) Date: Thu, 13 Jul 2006 23:20:57 -0600 Subject: detail explain of file creation process In-Reply-To: <200607140510.k6E5AGnC008664@mx1.redhat.com> References: <200607140510.k6E5AGnC008664@mx1.redhat.com> Message-ID: <20060714052057.GL15380@schatzie.adilger.int> On Jul 14, 2006 07:09 +0200, Zeremski Boris wrote: > Hi, this link is great, explain basic concept of ext2/3 file system (inode, > directory, soft/hard links...). > > What I am interested in, is more detail process of creating file. What is > going on when, for example, make 'touch test.file' till that file really > start existing on file system. > > Where can I find his kind of information? If you run UML with GDB, you can set a breakpoint at "sys_open" and follow it around from there. Also of interest are ext3_lookup, ext3_create. If you don't find any documentation, you might consider writing a wiki page for this as you figure it out. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. From ling at aliko.com Thu Jul 13 20:13:07 2006 From: ling at aliko.com (Ling C. Ho) Date: Thu, 13 Jul 2006 15:13:07 -0500 Subject: Ext3 overhead vs Raw Message-ID: <44B6A953.7010209@aliko.com> Hi, I am trying to find way to speed up read access on ext3 filesystem. I did some tests using dd, with different block sizes, directio and none, etc. The test file is about 1Gig in size, and spread across 25 fragments (found using filefrag). Block size is 4k. I have also tried setting readahead buffer using blockdev , from 256 to 32767. time /root/dd conv=nocreat ibs=4096 obs=4096 if=/sam/cache/test/test3 of=/dev/null The best real elapsed time I get is about 23.5s. If I dd the same amount of data from the disk device itself, I get about 18.5s, which matches what hdparm -tT gives me. Comparing strace outputs, I can see the read system calls reading from ext3 takes 30-35% longer to complete compare to raw device. Is this something expected or can I expect better performance? I am running kernel.org kernel 2.6.12 . Thanks, ... ling From Martin at lichtvoll.de Fri Jul 14 15:47:22 2006 From: Martin at lichtvoll.de (Martin Steigerwald) Date: Fri, 14 Jul 2006 17:47:22 +0200 Subject: Write barrier support in ext3 Message-ID: <200607141747.22884.Martin@lichtvoll.de> Hello ext3 users and developers, I am gathering information for an article about journal filesystems with emphasis on write barrier functionality, how it works, why journalling filesystems need write barrier and the current implementation of write barrier support for different filesystems. Background of this is my own experience of three XFS crashes in one week: http://bugzilla.kernel.org/show_bug.cgi?id=6380 With 2.6.17.1 XFS seems to works stable with write caches after applying a (write cache unrelated) fix: http://bugzilla.kernel.org/show_bug.cgi?id=6757 But I like to provide information on ext3, jfs, reiserfs 3 and reiser 4 as well. I like to ask you: 1) Since which kernel release are write barriers officially supported and stable in ext3? Is it 2.6.16? I found the barrier option in filesystem/ext3.txt in my 2.6.17 kernel. 2) Since which kernel release are write barriers enabled by default in ext3 if any ? 3) Are there any performance measurements for ext3? It is expected that write barrier will be slower than no write barrier but faster then disabled write caches. 4) Have there been any issues regarding write barrier support in ext3 that are worth to mention in the article? If you have any links of relevant information pieces, please share them with me. Nonetheless I will continue grepping kernel changelogs and the internet for until I have the information I want for that article. Please CC to me personally as I am not subscribed to the list... Regards, -- Martin 'Helios' Steigerwald - http://www.Lichtvoll.de GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7 From sct at redhat.com Fri Jul 14 21:47:51 2006 From: sct at redhat.com (Stephen C. Tweedie) Date: Fri, 14 Jul 2006 22:47:51 +0100 Subject: Ext3 overhead vs Raw In-Reply-To: <44B6A953.7010209@aliko.com> References: <44B6A953.7010209@aliko.com> Message-ID: <1152913671.13275.38.camel@sisko.sctweedie.blueyonder.co.uk> Hi, On Thu, 2006-07-13 at 15:13 -0500, Ling C. Ho wrote: > If I dd the same amount of data from the disk device itself, I get about > 18.5s, which matches what hdparm -tT gives me. Be aware, disks typically have different performance depending on where the data is, with data on the outermost cylinders getting higher throughput than data on innermost cylinders (there's constant rotational velocity for the surface, but the outer tracks are longer so each rotation carries more data past the heads.) So all sorts of things like the exact data placement can come into effect. Are you sure you're using the same bits of the disk for the raw and filesystem cases? --Stephen From ling at aliko.com Fri Jul 14 22:51:10 2006 From: ling at aliko.com (Ling C. Ho) Date: Fri, 14 Jul 2006 17:51:10 -0500 Subject: Ext3 overhead vs Raw In-Reply-To: <1152913671.13275.38.camel@sisko.sctweedie.blueyonder.co.uk> References: <44B6A953.7010209@aliko.com> <1152913671.13275.38.camel@sisko.sctweedie.blueyonder.co.uk> Message-ID: <44B81FDE.9020906@aliko.com> Hi Stephen, That's a great point. I recreated the filesystem again, using default options and then create a directory. These are some info: 1864 block groups 32768 blocks per group, 32768 fragments per group 16384 inodes per group # mount /dev/hdb /sam/cache # ls -ldi /sam/cache/test 11485185 drwxr-xr-x 2 root root 4096 Jul 14 17:39 /sam/cache/test The directory inode is in the ~701st block group if not mistaken, which is no where near the beginning of the filesystem. This looks to me like it had changed from kernel 2.4 time. But is it still true that any files being created under the directory will still have the data written into free space in the same block group as the directory, and onwards? So, how does it work now? Is a directory randomly placed now even on an empty file system? Is there anyway to force it to be created near the beginning of the file system, thus towards to outermost cylinders? The application I am working with only use one directory on a file system, so I don't really care where it is placed. But for performance testings, like the one versus raw access, it would be nice to test against file written at the beginning of the filesystem. Thanks, ... ling Stephen C. Tweedie wrote: >Hi, > >On Thu, 2006-07-13 at 15:13 -0500, Ling C. Ho wrote: > > > >>If I dd the same amount of data from the disk device itself, I get about >>18.5s, which matches what hdparm -tT gives me. >> >> > >Be aware, disks typically have different performance depending on where >the data is, with data on the outermost cylinders getting higher >throughput than data on innermost cylinders (there's constant rotational >velocity for the surface, but the outer tracks are longer so each >rotation carries more data past the heads.) > >So all sorts of things like the exact data placement can come into >effect. Are you sure you're using the same bits of the disk for the raw >and filesystem cases? > >--Stephen > > > > From zeremski.boris at nsinfo.co.yu Mon Jul 17 05:13:43 2006 From: zeremski.boris at nsinfo.co.yu (Zeremski Boris) Date: Mon, 17 Jul 2006 07:13:43 +0200 Subject: detail explain of file creation process In-Reply-To: <20060714052057.GL15380@schatzie.adilger.int> Message-ID: <200607170514.k6H5E4FQ031473@mx1.redhat.com> Thanks, I will try to spend some time to solve this problem,.... If you have any suggestion, please be free to tell me. Any help is welcome. Bye > -----Original Message----- > From: Andreas Dilger [mailto:adilger at clusterfs.com] > Sent: Friday, July 14, 2006 7:21 AM > To: Zeremski Boris > Cc: 'christian'; Ext3-users at redhat.com > Subject: Re: detail explain of file creation process > > On Jul 14, 2006 07:09 +0200, Zeremski Boris wrote: > > Hi, this link is great, explain basic concept of ext2/3 file system > (inode, > > directory, soft/hard links...). > > > > What I am interested in, is more detail process of creating file. What > is > > going on when, for example, make 'touch test.file' till that file really > > start existing on file system. > > > > Where can I find his kind of information? > > If you run UML with GDB, you can set a breakpoint at "sys_open" and follow > it around from there. Also of interest are ext3_lookup, ext3_create. > > If you don't find any documentation, you might consider writing a wiki > page for this as you figure it out. > > Cheers, Andreas > -- > Andreas Dilger > Principal Software Engineer > Cluster File Systems, Inc. From mfaine at knology.net Wed Jul 19 12:10:56 2006 From: mfaine at knology.net (Mark F) Date: Wed, 19 Jul 2006 07:10:56 -0500 Subject: create very large file system Message-ID: Suse Linux Enterprise Server 9 SP3 I've tried to create a large 5TB file system using both reiserfs and ext3 and both have failed. I end up with only a 1.5TB file system. Does anyone know why this doesn't work, what to do to fix it? Others have suggested that only XFS or JFS will work. Is this so? Thanks, -Mark From ulf at autotradecenter.com Thu Jul 20 00:00:19 2006 From: ulf at autotradecenter.com (Ulf Zimmermann) Date: Wed, 19 Jul 2006 17:00:19 -0700 Subject: Problems under Redhat EL3 and ext3 Message-ID: <5DE4B7D3E79067418154C49A739C1251D81503@msmpk01.corp.autc.com> I am running into performance issues with ext3. Historically we had our image files (pictures of cars, currently 5.3 million) sub divided into a directory structure [0-9]/[0-9]/[0-9]/[0-9], where we would take the first 4 letters/numbers of the file name and use that to put it into this structure. Letters [a-cA-C] would become a 0, [d-fD-F] a 1, etc. As the file names used to be based on VIN numbers of vehicles, that wasn't a problem. But then our developers changed the image file names using a vehicle ID from the database. And as we rolled over 1,000,000 in vehicle ids we would get large numbers of files into directories. And files do not get well distributed. So we changed the method using [0-9a-f]/[0-9a-f]/[0-9a-f] and md5 on the file name, using then the first 3 letters/numbers to file it away. On initial testing this worked well, distribution nice across the directories, so we could split this on separate file systems or disks. When we actually got to do this, a decision was made to use hard links from the old structure to the new structure for backward capability. And this turned into a disaster. Rsync or find on the new structure takes dramatic longer, talking about 5 minutes for a find on the old structure and hours on the new structure. Using strace I tracked it down to lstat64. On the old structure lstat64 takes on average 37 usecs/call while on the new structure it is over 2,400 usecs/call. EL4 does not seem to have this problem, unfortunately I can't just upgrade, out of other reasons. So anyone have ideas why lstat64 would be so much slower on the new structure? Any help, hints, suggestions would be great. Regards, Ulf. --------------------------------------------------------------------- Autotradecenter.com Inc, T: 650-532-6382, F: 650-532-6441 4600 Bohannon Drive, Suite 100, Menlo Park, CA 94025 --------------------------------------------------------------------- From adilger at clusterfs.com Thu Jul 20 06:26:46 2006 From: adilger at clusterfs.com (Andreas Dilger) Date: Thu, 20 Jul 2006 02:26:46 -0400 Subject: create very large file system In-Reply-To: <200607191657.38644.zam@namesys.com> References: <200607191657.38644.zam@namesys.com> Message-ID: <20060720062646.GA6174@schatzie.adilger.int> On Jul 19, 2006 16:57 +0400, Alexander Zarochentsev wrote: > On Wednesday 19 July 2006 16:10, Mark F wrote: > > I've tried to create a large 5TB file system using both reiserfs and > > ext3 and both have failed. > > you might need to convert the partition table to GPT format for > supporting 2TB+ partitions. it can be done by the gnu parted tool. Or, for that matter, don't use a partition table at all, since this adds an unhelpful offset to all the filesystem structures and can hurt performance on RAID where the filesystem is trying to align IO to RAID stripe boundaries. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. From adilger at clusterfs.com Thu Jul 20 07:17:44 2006 From: adilger at clusterfs.com (Andreas Dilger) Date: Thu, 20 Jul 2006 03:17:44 -0400 Subject: Problems under Redhat EL3 and ext3 In-Reply-To: <5DE4B7D3E79067418154C49A739C1251D81503@msmpk01.corp.autc.com> References: <5DE4B7D3E79067418154C49A739C1251D81503@msmpk01.corp.autc.com> Message-ID: <20060720071744.GE6174@schatzie.adilger.int> On Jul 19, 2006 17:00 -0700, Ulf Zimmermann wrote: > I am running into performance issues with ext3. Historically we had our > image files (pictures of cars, currently 5.3 million) sub divided into a > directory structure [0-9]/[0-9]/[0-9]/[0-9], where we would take the > first 4 letters/numbers of the file name and use that to put it into > this structure. Letters [a-cA-C] would become a 0, [d-fD-F] a 1, etc. As > the file names used to be based on VIN numbers of vehicles, that wasn't > a problem. But then our developers changed the image file names using a > vehicle ID from the database. And as we rolled over 1,000,000 in vehicle > ids we would get large numbers of files into directories. And files do > not get well distributed. > > So we changed the method using [0-9a-f]/[0-9a-f]/[0-9a-f] and md5 on the > file name, using then the first 3 letters/numbers to file it away. On > initial testing this worked well, distribution nice across the > directories, so we could split this on separate file systems or disks. > > When we actually got to do this, a decision was made to use hard links > from the old structure to the new structure for backward capability. And > this turned into a disaster. Rsync or find on the new structure takes > dramatic longer, talking about 5 minutes for a find on the old structure > and hours on the new structure. Using strace I tracked it down to > lstat64. On the old structure lstat64 takes on average 37 usecs/call > while on the new structure it is over 2,400 usecs/call. > > EL4 does not seem to have this problem, unfortunately I can't just > upgrade, out of other reasons. So anyone have ideas why lstat64 would be > so much slower on the new structure? Any help, hints, suggestions would > be great. Do you have directories with more than, say, 10-15,000 entries? Do you have dir_index (directory indexing) feature enabled on your filesystem? This is done with "tune2fs -O dir_index" (even while mounted) but only affects new directories. I believe the RHEL3 code has this functionality, but it isn't enabled by default like I suspect it is on FC4. Once you have enabled this, then an OFFLINE run of "e2fsck -fD {dev}" will rebuild the directory indexes for existing directories. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. From ulf at autotradecenter.com Thu Jul 20 07:24:41 2006 From: ulf at autotradecenter.com (Ulf Zimmermann) Date: Thu, 20 Jul 2006 00:24:41 -0700 Subject: Problems under Redhat EL3 and ext3 Message-ID: <5DE4B7D3E79067418154C49A739C1251D8150F@msmpk01.corp.autc.com> > -----Original Message----- > From: Andreas Dilger [mailto:adilger at clusterfs.com] > Sent: 07/20/2006 12:18 AM > To: Ulf Zimmermann > Cc: ext3-users at redhat.com > Subject: Re: Problems under Redhat EL3 and ext3 > > On Jul 19, 2006 17:00 -0700, Ulf Zimmermann wrote: > > I am running into performance issues with ext3. Historically we had our > > image files (pictures of cars, currently 5.3 million) sub divided into a > > directory structure [0-9]/[0-9]/[0-9]/[0-9], where we would take the > > first 4 letters/numbers of the file name and use that to put it into > > this structure. Letters [a-cA-C] would become a 0, [d-fD-F] a 1, etc. As > > the file names used to be based on VIN numbers of vehicles, that wasn't > > a problem. But then our developers changed the image file names using a > > vehicle ID from the database. And as we rolled over 1,000,000 in vehicle > > ids we would get large numbers of files into directories. And files do > > not get well distributed. > > > > So we changed the method using [0-9a-f]/[0-9a-f]/[0-9a-f] and md5 on the > > file name, using then the first 3 letters/numbers to file it away. On > > initial testing this worked well, distribution nice across the > > directories, so we could split this on separate file systems or disks. > > > > When we actually got to do this, a decision was made to use hard links > > from the old structure to the new structure for backward capability. And > > this turned into a disaster. Rsync or find on the new structure takes > > dramatic longer, talking about 5 minutes for a find on the old structure > > and hours on the new structure. Using strace I tracked it down to > > lstat64. On the old structure lstat64 takes on average 37 usecs/call > > while on the new structure it is over 2,400 usecs/call. > > > > EL4 does not seem to have this problem, unfortunately I can't just > > upgrade, out of other reasons. So anyone have ideas why lstat64 would be > > so much slower on the new structure? Any help, hints, suggestions would > > be great. > > Do you have directories with more than, say, 10-15,000 entries? > Do you have dir_index (directory indexing) feature enabled on your > filesystem? This is done with "tune2fs -O dir_index" (even while > mounted) but only affects new directories. I believe the RHEL3 code > has this functionality, but it isn't enabled by default like I > suspect it is on FC4. The filesystem was created under EL3. I am currently copying everything in the new structure into a new directory and it seems to be fast. My plan at this point is to rename the hard linked new structure at the end, and use that copy. I did run on one of the nodes e2fsck -D but that did not help. Hmmm, I just ran "tune2fs -O dir_index" on one node, tune2fs -l does show dir_index enabled now. But I am not sure if that will help, as getdents64 wasn't showing much difference in a strace -c, lstat64 on the other hand did. > > Once you have enabled this, then an OFFLINE run of "e2fsck -fD {dev}" > will rebuild the directory indexes for existing directories. > > Cheers, Andreas > -- > Andreas Dilger > Principal Software Engineer > Cluster File Systems, Inc. From tytso at mit.edu Thu Jul 20 18:25:29 2006 From: tytso at mit.edu (Theodore Tso) Date: Thu, 20 Jul 2006 14:25:29 -0400 Subject: Problems under Redhat EL3 and ext3 In-Reply-To: <5DE4B7D3E79067418154C49A739C1251D8150F@msmpk01.corp.autc.com> References: <5DE4B7D3E79067418154C49A739C1251D8150F@msmpk01.corp.autc.com> Message-ID: <20060720182529.GB6634@thunk.org> On Thu, Jul 20, 2006 at 12:24:41AM -0700, Ulf Zimmermann wrote: > The filesystem was created under EL3. I am currently copying everything > in the new structure into a new directory and it seems to be fast. My > plan at this point is to rename the hard linked new structure at the > end, and use that copy. I did run on one of the nodes e2fsck -D but that > did not help. e2fsck -D, or e2fsck -fD? You need the -f option in order to force e2fsck to scan the whole filesystem and optimize all filesystems. - Ted From ulf at autotradecenter.com Thu Jul 20 19:07:23 2006 From: ulf at autotradecenter.com (Ulf Zimmermann) Date: Thu, 20 Jul 2006 12:07:23 -0700 Subject: Problems under Redhat EL3 and ext3 Message-ID: <5DE4B7D3E79067418154C49A739C1251D81515@msmpk01.corp.autc.com> > -----Original Message----- > From: Theodore Tso [mailto:tytso at mit.edu] > Sent: 07/20/2006 11:25 AM > To: Ulf Zimmermann > Cc: Andreas Dilger; ext3-users at redhat.com > Subject: Re: Problems under Redhat EL3 and ext3 > > On Thu, Jul 20, 2006 at 12:24:41AM -0700, Ulf Zimmermann wrote: > > The filesystem was created under EL3. I am currently copying everything > > in the new structure into a new directory and it seems to be fast. My > > plan at this point is to rename the hard linked new structure at the > > end, and use that copy. I did run on one of the nodes e2fsck -D but that > > did not help. > > e2fsck -D, or e2fsck -fD? You need the -f option in order to force > e2fsck to scan the whole filesystem and optimize all filesystems. > > - Ted On the one node I did, it was -D, which did do a force checked, but not because I specified -f, but because the file system hadn't been checked in > 192 days. Ulf. From ulf at autotradecenter.com Thu Jul 20 19:10:25 2006 From: ulf at autotradecenter.com (Ulf Zimmermann) Date: Thu, 20 Jul 2006 12:10:25 -0700 Subject: Problems under Redhat EL3 and ext3 Message-ID: <5DE4B7D3E79067418154C49A739C1251D81516@msmpk01.corp.autc.com> > -----Original Message----- > From: ext3-users-bounces at redhat.com [mailto:ext3-users-bounces at redhat.com] > On Behalf Of Ulf Zimmermann > Sent: 07/20/2006 12:07 PM > To: Theodore Tso > Cc: Andreas Dilger; ext3-users at redhat.com > Subject: RE: Problems under Redhat EL3 and ext3 > > > -----Original Message----- > > From: Theodore Tso [mailto:tytso at mit.edu] > > Sent: 07/20/2006 11:25 AM > > To: Ulf Zimmermann > > Cc: Andreas Dilger; ext3-users at redhat.com > > Subject: Re: Problems under Redhat EL3 and ext3 > > > > On Thu, Jul 20, 2006 at 12:24:41AM -0700, Ulf Zimmermann wrote: > > > The filesystem was created under EL3. I am currently copying > everything > > > in the new structure into a new directory and it seems to be fast. > My > > > plan at this point is to rename the hard linked new structure at the > > > end, and use that copy. I did run on one of the nodes e2fsck -D but > that > > > did not help. > > > > e2fsck -D, or e2fsck -fD? You need the -f option in order to force > > e2fsck to scan the whole filesystem and optimize all filesystems. > > > > - Ted > > On the one node I did, it was -D, which did do a force checked, but not > because I specified -f, but because the file system hadn't been checked > in > 192 days. > > Ulf. The one other thing I hadn't answered before, each directory has on average 1,293 files, deviation of less then 100 each direction. In the old structure some directories had over 50,000 files and it didn't seem to slow it down. Dir_index was not enabled on the systems, so I enabled it on one node, waiting for something to finish before I can unmount it and run e2fsck -fD on it. Ulf. From adilger at clusterfs.com Thu Jul 20 15:02:09 2006 From: adilger at clusterfs.com (Andreas Dilger) Date: Thu, 20 Jul 2006 11:02:09 -0400 Subject: create very large file system In-Reply-To: <200607201317.54566.chrivers@iversen-net.dk> References: <200607191657.38644.zam@namesys.com> <20060720062646.GA6174@schatzie.adilger.int> <200607201317.54566.chrivers@iversen-net.dk> Message-ID: <20060720150209.GA5299@schatzie.adilger.int> On Jul 20, 2006 13:17 +0200, Christian Iversen wrote: > On Thursday 20 July 2006 08:26, Andreas Dilger wrote: > > On Jul 19, 2006 16:57 +0400, Alexander Zarochentsev wrote: > > > On Wednesday 19 July 2006 16:10, Mark F wrote: > > > > I've tried to create a large 5TB file system using both reiserfs and > > > > ext3 and both have failed. > > > > > > you might need to convert the partition table to GPT format for > > > supporting 2TB+ partitions. it can be done by the gnu parted tool. > > > > Or, for that matter, don't use a partition table at all, since this > > adds an unhelpful offset to all the filesystem structures and can > > hurt performance on RAID where the filesystem is trying to align IO > > to RAID stripe boundaries. > > Can linux still auto-detect raid volumes if there's no partition table? Hmm, that I'm not sure of - we mostly deal with external RAID devices. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. From zam at namesys.com Wed Jul 19 12:57:38 2006 From: zam at namesys.com (Alexander Zarochentsev) Date: Wed, 19 Jul 2006 16:57:38 +0400 Subject: create very large file system In-Reply-To: References: Message-ID: <200607191657.38644.zam@namesys.com> On Wednesday 19 July 2006 16:10, Mark F wrote: > Suse Linux Enterprise Server 9 SP3 > > I've tried to create a large 5TB file system using both reiserfs and > ext3 and both have failed. how did they fail? > > I end up with only a 1.5TB file system. Does anyone know why this > doesn't work, what to do to fix it? you have a single 5TB device? h/w raid, I think ? you might need to convert the partition table to GPT format for supporting 2TB+ partitions. it can be done by the gnu parted tool. > Others have suggested that only XFS or JFS will work. Is this so? > > Thanks, > -Mark -- Alex. From mcguire at lzu.edu.cn Thu Jul 20 00:30:12 2006 From: mcguire at lzu.edu.cn (mcguire at lzu.edu.cn) Date: Thu, 20 Jul 2006 08:30:12 +0800 Subject: [RTLWS8-CFP] Eighth Real-Time Linux Workshop 2nd CFP Message-ID: <200607200030.k6K0UCLq021220@opentech.lzu.edu.cn> We apologize for multiple receipts. -------------------------------------------------------------------------------- Eighth Real-Time Linux Workshop October 12-15, 2006 Lanzhou University - SISE Tianshui South Road 222 Lanzhou, Gansu 730000 P.R.China General Following the meetings of developers and users at the previous 7 successful real-time Linux workshops held in Vienna, Orlando, Milano, Boston, and Valencia, Singapore, Lille, the Real-Time Linux Workshop for 2006 will come back to Asia again, to be held at the School for Information Science and Engineering, Lanzhou University, in Lanzhou China. Embedded and real-time Linux is rapidly gaining traction in the Asia Pacific region. Embedded systems in both automation/control and entertainment moving to 32/64bit systems, opening the door for the use of full featured OS like GNU/Linux on COTS based systems. With real-time capabilities being a common demand for embedded systems the soft and hard real-time variants are an important extension to the versatile GNU/Linux GPOS. Authors are invited to submit original work dealing with general topics related to real-time Linux research, experiments and case studies, as well as issues of integration of real-time and embedded Linux. A special focus will be on industrial case studies. Topics of interest include, but are not limited to: * Modifications and variants of the GNU/Linux operating system extending its real-time capabilities, * Contributions to real-time Linux variants, drivers and extensions, * User-mode real-time concepts, implementation and experience, * Real-time Linux applications, in academia, research and industry, * Work in progress reports, covering recent developments, * Educational material on real-time Linux, * Tools for embedding Linux or real-time Linux and embedded real-time Linux applications, * RTOS core concepts, RT-safe synchronization mechanisms, * RT-safe interaction of RT and non RT components, * IPC mechanisms in RTOS, * Analysis and Benchmarking methods and results of real-time GNU/Linux variants, * Debugging techniques and tools, both for code and temporal debugging of core RTOS components, drivers and real-time applications, * Real-time related extensions to development environments. Further information: EN: http://www.realtimelinuxfoundation.org/events/rtlws-2006/ws.html CN: http://dslab.lzu.edu.cn/rtlws8/index.html Awarded papers The Programme Committee will award a best paper in the category Real- Time Systems Theory. This best paper will be invited for publication to the Real-Time Systems Journal, RTSJ. The Programme Committee will award a best paper in the category Real- Time Systems Application. This best paper will be invited for publication to the Dr Dobbs Journal. Moreover, the publication of the other papers in a special issue of Dr Dobbs Journal is in discussion. Abstract submission In order register an abstract, please go to: http://www.realtimelinuxfoundation.org/rtlf/register-abstract.html Venue Lanzhou University Information Building, School of Information Science and Engineering, Laznhou University, http://www.lzu.edu.cn/. Registration In order to participate to the workshop, please register on the registration page at: http://www.realtimelinuxfoundation.org/rtlf/register-participant.html Accommodation Please refer to the Lanzhou hotel page for accomodation at http://dslab.lzu.edu.cn/rtlws8/hotels/hotels.htm Travel information For travel information and directions how to get to Lanzhou from an international airport in China please refer to: http://www.realtimelinuxfoundation.org/events/rtlws-2006/ Important dates August 28: Abstract submission September 15: Notification of acceptance September 29: Final paper Pannel Participants: o Roberto Bucher - Scuola Universitaria Professionale della Svizzera Italiana, Switzerland, RTAI/ADEOS/RTAI-Lab. o Alfons Crespo Lorente - University of Valenica, Spain,Departament d'Informtica de Sistemes i Computadors, XtratuM. o Herman Haertig - Technical University Dresden, Germany,Institute for System Architecture, L4/Fiasco/L4Linux. o Nicholas Mc Guire - Lanzhou University, P.R. China, Distributed and Embedded Systems Lab, RTLinux/GPL. o Douglas Niehaus - University of Kansas, USA, Information and Telecommunication Technology Center, RT-preempt. Organization committee: * Prof. Li LIAN (Co-Chair), (SISE, Lanzhou University, CHINA) * Xiaoping ZHANG, LZU, CHINA * Jiming WANG, PKU, CHINA * Zhibing LI, ECNU, China * Prof. Nicholas MCGUIRE (Co-Chair), Real Time Linux Foundation (RTLF) * Dr. Peter WURMSDOBLER, Real Time Linux Foundation (RTLF) * Dr. Qingguo ZHOU, (Distributed and Embedded Systems Lab, Lanzhou University, CHINA) Program committee: * Prof. Li Xing (Co-Chair), (Tsinghua University, CHINA) * Dr. Zhang Yunquan, (Institute of Software, Chinese Academy of Science, CHINA) * Dr. Chen Yu, (Tsinghua University, CHINA) * Dr. Chen Maoke, (Tsinghua University, CHINA) * Dr. Yu Guanghui, (Dalian University of Techonolgy, CHINA) * Prof. Dr. Paolo Mantegazza, (Dipartimento di Ingegneria Aerospaziale, ITALY) * Prof. Dr. Bernhard Zagar, (Johannes Kepler Universitt Linz, AUSTRIA) * Prof. Dr. Hermann Hrtig, (Technische Universitt Dresden, Fakultt Informatik, GERMANY) * Prof. Tei-Wei Kuo, (National Taiwan University, Department of Computer Science and Information Engineering,TAIWAN) * Anthony Skjellum, (Mississippi State University, USA) * Ing. Pavel Pisa, (Czech Technical University, CZECH REPUBLIC) * Prof. Alfons Crespo, (Universidad Politcnica de Valencia, SPAIN) * Dr. Qingguo Zhou, (Lanzhou University, CHINA) * PhD. Jaesoon Choi, (National Cancer Center, KOREA) * Prof. Douglas Niehaus, (Kansas University, USA) * Dr. Michael Hohmuth, (Technische Universitt Dresden, GERMANY) * Prof. Thambipillai Srikanthan, (Nanyang Technological University, SINGAPORE) * Zhengting He, (University of Texas, USA) * Martin Terbuc, (Universitz of Maribor, SLOVENIA) * Yoshinori Sato, (the H8/300 project, JAPAN) * Yuqing Lan, (China Standard SoftwareCo.,LTD, CHINA) * Dr. Peter Wurmsdobler, (Real Time Linux Foundation, USA) * Prof. Nicholas Mc Guire (Co-Chair), (Lanzhou University, CHINA) Workshop organizers: * School for Information Science and Engineering (SISE) , Lanzhou University , CHINA * IBM China, Xi'an Branch , China * Haag Embedded Systems, Austira Peter Wurmsdobler Nicholas Mc Guire Zhou Qingguo From chrivers at iversen-net.dk Thu Jul 20 11:17:54 2006 From: chrivers at iversen-net.dk (Christian Iversen) Date: Thu, 20 Jul 2006 13:17:54 +0200 Subject: create very large file system In-Reply-To: <20060720062646.GA6174@schatzie.adilger.int> References: <200607191657.38644.zam@namesys.com> <20060720062646.GA6174@schatzie.adilger.int> Message-ID: <200607201317.54566.chrivers@iversen-net.dk> On Thursday 20 July 2006 08:26, Andreas Dilger wrote: > On Jul 19, 2006 16:57 +0400, Alexander Zarochentsev wrote: > > On Wednesday 19 July 2006 16:10, Mark F wrote: > > > I've tried to create a large 5TB file system using both reiserfs and > > > ext3 and both have failed. > > > > you might need to convert the partition table to GPT format for > > supporting 2TB+ partitions. it can be done by the gnu parted tool. > > Or, for that matter, don't use a partition table at all, since this > adds an unhelpful offset to all the filesystem structures and can > hurt performance on RAID where the filesystem is trying to align IO > to RAID stripe boundaries. Can linux still auto-detect raid volumes if there's no partition table? -- Regards, Christian Iversen From jbriggs at esoft.com Thu Jul 20 16:22:55 2006 From: jbriggs at esoft.com (Jonathan Briggs) Date: Thu, 20 Jul 2006 10:22:55 -0600 Subject: create very large file system In-Reply-To: <200607201317.54566.chrivers@iversen-net.dk> References: <200607191657.38644.zam@namesys.com> <20060720062646.GA6174@schatzie.adilger.int> <200607201317.54566.chrivers@iversen-net.dk> Message-ID: <1153412575.9802.7.camel@localhost> On Thu, 2006-07-20 at 13:17 +0200, Christian Iversen wrote: > On Thursday 20 July 2006 08:26, Andreas Dilger wrote: > > On Jul 19, 2006 16:57 +0400, Alexander Zarochentsev wrote: > > > On Wednesday 19 July 2006 16:10, Mark F wrote: > > > > I've tried to create a large 5TB file system using both reiserfs and > > > > ext3 and both have failed. > > > > > > you might need to convert the partition table to GPT format for > > > supporting 2TB+ partitions. it can be done by the gnu parted tool. > > > > Or, for that matter, don't use a partition table at all, since this > > adds an unhelpful offset to all the filesystem structures and can > > hurt performance on RAID where the filesystem is trying to align IO > > to RAID stripe boundaries. > > Can linux still auto-detect raid volumes if there's no partition table? You're not supposed to be doing it that way these days. RAID autodetect is getting tossed out of the kernel in the future (probably still many versions away though), and RAID, DM, LVM, and maybe even regular partition setup is going to be done in initramfs / initrd. At least, that is what I read. -- Jonathan Briggs eSoft, Inc. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 189 bytes Desc: This is a digitally signed message part URL: From avuton at gmail.com Thu Jul 20 16:56:03 2006 From: avuton at gmail.com (Avuton Olrich) Date: Thu, 20 Jul 2006 09:56:03 -0700 Subject: create very large file system In-Reply-To: <1153412575.9802.7.camel@localhost> References: <200607191657.38644.zam@namesys.com> <20060720062646.GA6174@schatzie.adilger.int> <200607201317.54566.chrivers@iversen-net.dk> <1153412575.9802.7.camel@localhost> Message-ID: <3aa654a40607200956j37aeed0o66218c7ff94d815a@mail.gmail.com> On 7/20/06, Jonathan Briggs wrote: > You're not supposed to be doing it that way these days. RAID autodetect > is getting tossed out of the kernel in the future (probably still many Bit OT, but is there something that is supposed to replace RAID autodetect, or we're just supposed to make initscripts to run mdadm? -- avuton -- Anyone who quotes me in their sig is an idiot. -- Rusty Russell. From mcguire at lzu.edu.cn Fri Jul 21 06:57:49 2006 From: mcguire at lzu.edu.cn (mcguire at lzu.edu.cn) Date: Fri, 21 Jul 2006 14:57:49 +0800 Subject: [RTLWS8-CFP] Eighth Real-Time Linux Workshop 2nd CFP Message-ID: <200607210657.k6L6vnDE003997@opentech.lzu.edu.cn> We apologize for multiple receipts. -------------------------------------------------------------------------------- Eighth Real-Time Linux Workshop October 12-15, 2006 Lanzhou University - SISE Tianshui South Road 222 Lanzhou, Gansu 730000 P.R.China General Following the meetings of developers and users at the previous 7 successful real-time Linux workshops held in Vienna, Orlando, Milano, Boston, and Valencia, Singapore, Lille, the Real-Time Linux Workshop for 2006 will come back to Asia again, to be held at the School for Information Science and Engineering, Lanzhou University, in Lanzhou China. Embedded and real-time Linux is rapidly gaining traction in the Asia Pacific region. Embedded systems in both automation/control and entertainment moving to 32/64bit systems, opening the door for the use of full featured OS like GNU/Linux on COTS based systems. With real-time capabilities being a common demand for embedded systems the soft and hard real-time variants are an important extension to the versatile GNU/Linux GPOS. Authors are invited to submit original work dealing with general topics related to real-time Linux research, experiments and case studies, as well as issues of integration of real-time and embedded Linux. A special focus will be on industrial case studies. Topics of interest include, but are not limited to: * Modifications and variants of the GNU/Linux operating system extending its real-time capabilities, * Contributions to real-time Linux variants, drivers and extensions, * User-mode real-time concepts, implementation and experience, * Real-time Linux applications, in academia, research and industry, * Work in progress reports, covering recent developments, * Educational material on real-time Linux, * Tools for embedding Linux or real-time Linux and embedded real-time Linux applications, * RTOS core concepts, RT-safe synchronization mechanisms, * RT-safe interaction of RT and non RT components, * IPC mechanisms in RTOS, * Analysis and Benchmarking methods and results of real-time GNU/Linux variants, * Debugging techniques and tools, both for code and temporal debugging of core RTOS components, drivers and real-time applications, * Real-time related extensions to development environments. Further information: EN: http://www.realtimelinuxfoundation.org/events/rtlws-2006/ws.html CN: http://dslab.lzu.edu.cn/rtlws8/index.html Awarded papers The Programme Committee will award a best paper in the category Real- Time Systems Theory. This best paper will be invited for publication to the Real-Time Systems Journal, RTSJ. The Programme Committee will award a best paper in the category Real- Time Systems Application. This best paper will be invited for publication to the Dr Dobbs Journal. Moreover, the publication of the other papers in a special issue of Dr Dobbs Journal is in discussion. Abstract submission In order register an abstract, please go to: http://www.realtimelinuxfoundation.org/rtlf/register-abstract.html Venue Lanzhou University Information Building, School of Information Science and Engineering, Laznhou University, http://www.lzu.edu.cn/. Registration In order to participate to the workshop, please register on the registration page at: http://www.realtimelinuxfoundation.org/rtlf/register-participant.html Accommodation Please refer to the Lanzhou hotel page for accomodation at http://dslab.lzu.edu.cn/rtlws8/hotels/hotels.htm Travel information For travel information and directions how to get to Lanzhou from an international airport in China please refer to: http://www.realtimelinuxfoundation.org/events/rtlws-2006/ Important dates August 28: Abstract submission September 15: Notification of acceptance September 29: Final paper Pannel Participants: o Roberto Bucher - Scuola Universitaria Professionale della Svizzera Italiana, Switzerland, RTAI/ADEOS/RTAI-Lab. o Alfons Crespo Lorente - University of Valenica, Spain,Departament d'Informtica de Sistemes i Computadors, XtratuM. o Herman Haertig - Technical University Dresden, Germany,Institute for System Architecture, L4/Fiasco/L4Linux. o Nicholas Mc Guire - Lanzhou University, P.R. China, Distributed and Embedded Systems Lab, RTLinux/GPL. o Douglas Niehaus - University of Kansas, USA, Information and Telecommunication Technology Center, RT-preempt. Organization committee: * Prof. Li LIAN (Co-Chair), (SISE, Lanzhou University, CHINA) * Xiaoping ZHANG, LZU, CHINA * Jiming WANG, PKU, CHINA * Zhibing LI, ECNU, China * Prof. Nicholas MCGUIRE (Co-Chair), Real Time Linux Foundation (RTLF) * Dr. Peter WURMSDOBLER, Real Time Linux Foundation (RTLF) * Dr. Qingguo ZHOU, (Distributed and Embedded Systems Lab, Lanzhou University, CHINA) Program committee: * Prof. Li Xing (Co-Chair), (Tsinghua University, CHINA) * Dr. Zhang Yunquan, (Institute of Software, Chinese Academy of Science, CHINA) * Dr. Chen Yu, (Tsinghua University, CHINA) * Dr. Chen Maoke, (Tsinghua University, CHINA) * Dr. Yu Guanghui, (Dalian University of Techonolgy, CHINA) * Prof. Dr. Paolo Mantegazza, (Dipartimento di Ingegneria Aerospaziale, ITALY) * Prof. Dr. Bernhard Zagar, (Johannes Kepler Universitt Linz, AUSTRIA) * Prof. Dr. Hermann Hrtig, (Technische Universitt Dresden, Fakultt Informatik, GERMANY) * Prof. Tei-Wei Kuo, (National Taiwan University, Department of Computer Science and Information Engineering,TAIWAN) * Anthony Skjellum, (Mississippi State University, USA) * Ing. Pavel Pisa, (Czech Technical University, CZECH REPUBLIC) * Prof. Alfons Crespo, (Universidad Politcnica de Valencia, SPAIN) * Dr. Qingguo Zhou, (Lanzhou University, CHINA) * PhD. Jaesoon Choi, (National Cancer Center, KOREA) * Prof. Douglas Niehaus, (Kansas University, USA) * Dr. Michael Hohmuth, (Technische Universitt Dresden, GERMANY) * Prof. Thambipillai Srikanthan, (Nanyang Technological University, SINGAPORE) * Zhengting He, (University of Texas, USA) * Martin Terbuc, (Universitz of Maribor, SLOVENIA) * Yoshinori Sato, (the H8/300 project, JAPAN) * Yuqing Lan, (China Standard SoftwareCo.,LTD, CHINA) * Dr. Peter Wurmsdobler, (Real Time Linux Foundation, USA) * Prof. Nicholas Mc Guire (Co-Chair), (Lanzhou University, CHINA) Workshop organizers: * School for Information Science and Engineering (SISE) , Lanzhou University , CHINA * IBM China, Xi'an Branch , China * Haag Embedded Systems, Austira Peter Wurmsdobler Nicholas Mc Guire Zhou Qingguo From sct at redhat.com Fri Jul 21 15:37:22 2006 From: sct at redhat.com (Stephen Tweedie) Date: Fri, 21 Jul 2006 11:37:22 -0400 Subject: create very large file system In-Reply-To: References: Message-ID: <20060721153722.GA20270@devserv.devel.redhat.com> Hi, On Wed, Jul 19, 2006 at 07:10:56AM -0500, Mark F wrote: > Suse Linux Enterprise Server 9 SP3 > > I've tried to create a large 5TB file system using both reiserfs and ext3 > and both have failed. > > I end up with only a 1.5TB file system. Does anyone know why this doesn't > work, what to do to fix it? I fixed a bug in mke2fs that had this result over a year ago, so a recent e2fsprogs should fix it. Failing that, there's a workaround: use "mke2fs -b 4096" to prevent mke2fs from trying to work out the device size in units of 1k blocks. Counting in 4k blocks prevents a 32-bit overflow. --Stephen From mfaine at knology.net Fri Jul 21 15:58:44 2006 From: mfaine at knology.net (Mark F) Date: Fri, 21 Jul 2006 10:58:44 -0500 Subject: create very large file system In-Reply-To: <20060721153722.GA20270@devserv.devel.redhat.com> References: <20060721153722.GA20270@devserv.devel.redhat.com> Message-ID: Stephen Tweedie wrote: > Hi, > > On Wed, Jul 19, 2006 at 07:10:56AM -0500, Mark F wrote: >> Suse Linux Enterprise Server 9 SP3 >> >> I've tried to create a large 5TB file system using both reiserfs and ext3 >> and both have failed. >> >> I end up with only a 1.5TB file system. Does anyone know why this doesn't >> work, what to do to fix it? > > I fixed a bug in mke2fs that had this result over a year ago, so a > recent e2fsprogs should fix it. > > Failing that, there's a workaround: use "mke2fs -b 4096" to prevent > mke2fs from trying to work out the device size in units of 1k blocks. > Counting in 4k blocks prevents a 32-bit overflow. > > --Stephen Thanks, I finally got it formated using the GPT label with parted. I formatted it reiserfs, it takes a few seconds to mount but seems to work fine and shows up at full size. -Mark From bandurin at fnal.gov Wed Jul 26 00:27:11 2006 From: bandurin at fnal.gov (Dmitry Bandurin) Date: Tue, 25 Jul 2006 19:27:11 -0500 Subject: data recovering in EXT3 Message-ID: <22a6e1680607251727g2f11ad62g53e92dca8d042967@mail.gmail.com> Hello, We have run and stopped by chance command "fsck -y" on one of our raid disks (with ext3 file system). After that we have found that SOME files disappeared (they are not seen in the directories where they have been before). The data are extremely important and contain a lot of programs, scripts for some data analysis and very hard to recover by hands. I have run ''fsck -y" once more and it recovered just few files.. Is there any way, any tool that would allow to recover the data? Probably there is some specific options for the recovery ralated with journaling in ext3? I have used debugfs, it produced following, if it helps: debugfs: open -f -w /dev/sdb1 debugfs: features Filesystem features: has_journal resize_inode filetype needs_recovery sparse_super large_file thanks, Dmitry From jlb17 at duke.edu Thu Jul 27 10:58:01 2006 From: jlb17 at duke.edu (Joshua Baker-LePain) Date: Thu, 27 Jul 2006 06:58:01 -0400 (EDT) Subject: data recovering in EXT3 In-Reply-To: <22a6e1680607251727g2f11ad62g53e92dca8d042967@mail.gmail.com> References: <22a6e1680607251727g2f11ad62g53e92dca8d042967@mail.gmail.com> Message-ID: On Tue, 25 Jul 2006 at 7:27pm, Dmitry Bandurin wrote > We have run and stopped by chance command "fsck -y" on one of our raid disks > (with ext3 file system). After that we have found that SOME files disappeared > (they are not seen in the directories where they have been before). > The data are extremely important and contain a lot of programs, > scripts for some data analysis and very hard to recover by hands. > I have run ''fsck -y" once more and it recovered just few files.. > Is there any way, any tool that would allow to recover the data? > Probably there is some specific options for the recovery ralated with > journaling in ext3? Have you looked in the lost+found directory? That's my only idea, other than recovering the files from your backups. -- Joshua Baker-LePain Department of Biomedical Engineering Duke University From mr._x at shaw.ca Thu Jul 27 14:36:16 2006 From: mr._x at shaw.ca (..:::BeOS Mr. X:::..) Date: Thu, 27 Jul 2006 07:36:16 -0700 Subject: data recovering in EXT3 In-Reply-To: <22a6e1680607251727g2f11ad62g53e92dca8d042967@mail.gmail.com> References: <22a6e1680607251727g2f11ad62g53e92dca8d042967@mail.gmail.com> Message-ID: <44C8CF60.2090805@shaw.ca> http://batleth.sapienti-sat.org/projects/FAQs/ext3-faq.html ---- Q: How can I recover (undelete) deleted files from my ext3 partition? Actually, you can't! This is what one of the developers, Andreas Dilger, said about it: In order to ensure that ext3 can safely resume an unlink after a crash, it actually zeros out the block pointers in the inode, whereas ext2 just marks these blocks as unused in the block bitmaps and marks the inode as "deleted" and leaves the block pointers alone. Your only hope is to "grep" for parts of your files that have been deleted and hope for the best. ---- You can try to contact Andreas Dilger and maybe he can help. adilger at clusterfs.com Mr. X Dmitry Bandurin wrote: > Hello, > > We have run and stopped by chance command "fsck -y" on one of our raid > disks > (with ext3 file system). After that we have found that SOME files > disappeared > (they are not seen in the directories where they have been before). > The data are extremely important and contain a lot of programs, > scripts for some data analysis and very hard to recover by hands. > I have run ''fsck -y" once more and it recovered just few files.. > Is there any way, any tool that would allow to recover the data? > Probably there is some specific options for the recovery ralated with > journaling in ext3? > > I have used debugfs, it produced following, if it helps: > debugfs: open -f -w /dev/sdb1 > debugfs: features > Filesystem features: has_journal resize_inode filetype needs_recovery > sparse_super large_file > > > thanks, > Dmitry > > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users > From gelma_mailinglist at gelma.net Thu Jul 27 15:15:38 2006 From: gelma_mailinglist at gelma.net (Andrea Gelmini) Date: Thu, 27 Jul 2006 17:15:38 +0200 Subject: data recovering in EXT3 In-Reply-To: <22a6e1680607251727g2f11ad62g53e92dca8d042967@mail.gmail.com> References: <22a6e1680607251727g2f11ad62g53e92dca8d042967@mail.gmail.com> Message-ID: <20060727151538.GF13602@jnb.gelma.net> On Tue, Jul 25, 2006 at 07:27:11PM -0500, Dmitry Bandurin wrote: > The data are extremely important and contain a lot of programs, > scripts for some data analysis and very hard to recover by hands. maybe it could help: http://dirk.eddelbuettel.com/blog/2006/07/20#ext3_undelete ciao, gelma From bandurin at fnal.gov Thu Jul 27 19:04:19 2006 From: bandurin at fnal.gov (Dmitry Bandurin) Date: Thu, 27 Jul 2006 14:04:19 -0500 Subject: data recovering in EXT3 In-Reply-To: <20060727151538.GF13602@jnb.gelma.net> References: <22a6e1680607251727g2f11ad62g53e92dca8d042967@mail.gmail.com> <20060727151538.GF13602@jnb.gelma.net> Message-ID: <22a6e1680607271204l70329230h79ce53e78aa08852@mail.gmail.com> Thanks a lot! It looks as a really useful tool. I'll try... On 7/27/06, Andrea Gelmini wrote: > On Tue, Jul 25, 2006 at 07:27:11PM -0500, Dmitry Bandurin wrote: > > The data are extremely important and contain a lot of programs, > > scripts for some data analysis and very hard to recover by hands. > > maybe it could help: > http://dirk.eddelbuettel.com/blog/2006/07/20#ext3_undelete > > ciao, > gelma > From ext3 at jks.tupari.net Thu Jul 27 22:28:56 2006 From: ext3 at jks.tupari.net (Joseph Shraibman) Date: Thu, 27 Jul 2006 18:28:56 -0400 (EDT) Subject: maximums of ext3? Message-ID: Where can I find the maximums of ext3? Today I ran into trouble after a directory had 31999 subdirectories in it (not including . or ..). I know that ext3 can hold many more regular files than that, but where are the limits defined? From adilger at clusterfs.com Fri Jul 28 00:07:20 2006 From: adilger at clusterfs.com (Andreas Dilger) Date: Thu, 27 Jul 2006 18:07:20 -0600 Subject: maximums of ext3? In-Reply-To: References: Message-ID: <20060728000720.GP6452@schatzie.adilger.int> On Jul 27, 2006 18:28 -0400, Joseph Shraibman wrote: > Where can I find the maximums of ext3? Today I ran into trouble after a > directory had 31999 subdirectories in it (not including . or ..). I know > that ext3 can hold many more regular files than that, but where are the > limits defined? linux/Documentation/filesystems/ext2.txt is a good starting place, or wikipedia. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. From bandurin at fnal.gov Fri Jul 28 01:26:48 2006 From: bandurin at fnal.gov (Dmitry Bandurin) Date: Thu, 27 Jul 2006 20:26:48 -0500 Subject: data recovering in EXT3 In-Reply-To: <20060727151538.GF13602@jnb.gelma.net> References: <22a6e1680607251727g2f11ad62g53e92dca8d042967@mail.gmail.com> <20060727151538.GF13602@jnb.gelma.net> Message-ID: <22a6e1680607271826w30dede03u90bd820320f38058@mail.gmail.com> Hello, magicrescue contains is aimed to recover many file types, but unfortunately it seems it does not contain recipes for recovering most popular ascii(text) files. Does anybody have such extensions to standard magicrescue to restore text files? Dmitry On 7/27/06, Andrea Gelmini wrote: > On Tue, Jul 25, 2006 at 07:27:11PM -0500, Dmitry Bandurin wrote: > > The data are extremely important and contain a lot of programs, > > scripts for some data analysis and very hard to recover by hands. > > maybe it could help: > http://dirk.eddelbuettel.com/blog/2006/07/20#ext3_undelete > > ciao, > gelma > From keld at dkuug.dk Fri Jul 28 06:30:53 2006 From: keld at dkuug.dk (Keld =?iso-8859-1?Q?J=F8rn?= Simonsen) Date: Fri, 28 Jul 2006 08:30:53 +0200 Subject: data recovering in EXT3 In-Reply-To: <22a6e1680607251727g2f11ad62g53e92dca8d042967@mail.gmail.com> References: <22a6e1680607251727g2f11ad62g53e92dca8d042967@mail.gmail.com> Message-ID: <20060728063053.GA16598@rap.rap.dk> On Tue, Jul 25, 2006 at 07:27:11PM -0500, Dmitry Bandurin wrote: > Hello, > > We have run and stopped by chance command "fsck -y" on one of our raid disks > (with ext3 file system). After that we have found that SOME files > disappeared > (they are not seen in the directories where they have been before). > The data are extremely important and contain a lot of programs, > scripts for some data analysis and very hard to recover by hands. > I have run ''fsck -y" once more and it recovered just few files.. > Is there any way, any tool that would allow to recover the data? > Probably there is some specific options for the recovery ralated with > journaling in ext3? > > I have used debugfs, it produced following, if it helps: > debugfs: open -f -w /dev/sdb1 > debugfs: features > Filesystem features: has_journal resize_inode filetype needs_recovery > sparse_super large_file You could look at my patched version of debugfs - http://std.dkuug.dk/keld/readme-salvage.html Best regards Keld From keld at dkuug.dk Fri Jul 28 11:39:02 2006 From: keld at dkuug.dk (Keld =?iso-8859-1?Q?J=F8rn?= Simonsen) Date: Fri, 28 Jul 2006 13:39:02 +0200 Subject: data recovering in EXT3 In-Reply-To: <20060728063053.GA16598@rap.rap.dk> References: <22a6e1680607251727g2f11ad62g53e92dca8d042967@mail.gmail.com> <20060728063053.GA16598@rap.rap.dk> Message-ID: <20060728113902.GA22518@rap.rap.dk> On Fri, Jul 28, 2006 at 08:30:53AM +0200, Keld J?rn Simonsen wrote: > On Tue, Jul 25, 2006 at 07:27:11PM -0500, Dmitry Bandurin wrote: > > Hello, > > > > We have run and stopped by chance command "fsck -y" on one of our raid disks > > (with ext3 file system). After that we have found that SOME files > > disappeared > > (they are not seen in the directories where they have been before). > > The data are extremely important and contain a lot of programs, > > scripts for some data analysis and very hard to recover by hands. > > I have run ''fsck -y" once more and it recovered just few files.. > > Is there any way, any tool that would allow to recover the data? > > Probably there is some specific options for the recovery ralated with > > journaling in ext3? > > > > I have used debugfs, it produced following, if it helps: > > debugfs: open -f -w /dev/sdb1 > > debugfs: features > > Filesystem features: has_journal resize_inode filetype needs_recovery > > sparse_super large_file > > You could look at my patched version of debugfs - > http://std.dkuug.dk/keld/readme-salvage.html It saves files in the system by looking at the data blocks only. It does not need a directory structure at all. So you could have deleted all files on the disk, like when you have reformatted it, and still you can salvage most of the files. Best regards Keld