From mirjafarali at gmail.com Sat Mar 6 16:07:42 2010 From: mirjafarali at gmail.com (MirJafar Ali) Date: Sat, 6 Mar 2010 10:07:42 -0600 Subject: data block timestamp ? Message-ID: Hello I am interested in knowing the sequence of datablocks request/serviced by ext2 filesystem to analyse the disk IO pattern. Can someone suggest if there are some software that can do this ? Thanks Mir -------------- next part -------------- An HTML attachment was scrubbed... URL: From sandeen at redhat.com Mon Mar 8 17:31:03 2010 From: sandeen at redhat.com (Eric Sandeen) Date: Mon, 08 Mar 2010 11:31:03 -0600 Subject: data block timestamp ? In-Reply-To: References: Message-ID: <4B953457.1050904@redhat.com> MirJafar Ali wrote: > Hello > > I am interested in knowing the sequence of datablocks request/serviced > by ext2 filesystem to analyse > the disk IO pattern. > > Can someone suggest if there are some software that can do this ? Depending on what you want, blktrace may be helpful, if you want to know when IO happens and to which blocks. -Eric > Thanks > > Mir From mjtrac at gmail.com Tue Mar 9 01:23:12 2010 From: mjtrac at gmail.com (Mitch Trachtenberg) Date: Mon, 8 Mar 2010 17:23:12 -0800 Subject: problems with large directories? Message-ID: <6c52fee1003081723l59fdf52fr1b6a508f06d43c4e@mail.gmail.com> Hi, I have an application that deals with 100,000 to 1,000,000 image files. I initially structured it to use multiple directories, so that file 123456 would be stored in /12/34/123456. I'm now wondering if that's pointless, as it would simplify things to simply store the file in /123456. Can anyone indicate whether I'm gaining anything by using smaller directories in ext3/ext4? Thanks. Mitch -------------- next part -------------- An HTML attachment was scrubbed... URL: From rwheeler at redhat.com Tue Mar 9 03:14:03 2010 From: rwheeler at redhat.com (Ric Wheeler) Date: Mon, 08 Mar 2010 22:14:03 -0500 Subject: problems with large directories? In-Reply-To: <6c52fee1003081723l59fdf52fr1b6a508f06d43c4e@mail.gmail.com> References: <6c52fee1003081723l59fdf52fr1b6a508f06d43c4e@mail.gmail.com> Message-ID: <4B95BCFB.9010202@redhat.com> On 03/08/2010 08:23 PM, Mitch Trachtenberg wrote: > Hi, > > I have an application that deals with 100,000 to 1,000,000 image files. > > I initially structured it to use multiple directories, so that file > 123456 would be stored in /12/34/123456. I'm now wondering if that's > pointless, as it would simplify things to simply store the file in /123456. > > Can anyone indicate whether I'm gaining anything by using smaller > directories in ext3/ext4? Thanks. > > Mitch > I think that breaking up your files into subdirectories makes it easier to navigate the tree and find files from a human point of view. Even better if the bytes reflect something like year/month/day/hour/min (assuming your pathname has a date based guid or similar encoding). You can have a million files in one large directory, but be careful to iterate and copy them in a sorted order (sorted by inode) to avoid nasty performance issues that are side effects of the way we hash file names in ext3/4. Good luck! Ric From sandeen at redhat.com Tue Mar 9 03:55:12 2010 From: sandeen at redhat.com (Eric Sandeen) Date: Mon, 08 Mar 2010 21:55:12 -0600 Subject: problems with large directories? In-Reply-To: <6c52fee1003081723l59fdf52fr1b6a508f06d43c4e@mail.gmail.com> References: <6c52fee1003081723l59fdf52fr1b6a508f06d43c4e@mail.gmail.com> Message-ID: <4B95C6A0.8090502@redhat.com> Mitch Trachtenberg wrote: > Hi, > > I have an application that deals with 100,000 to 1,000,000 image files. > > I initially structured it to use multiple directories, so that file > 123456 would be stored in /12/34/123456. I'm now wondering if that's > pointless, as it would simplify things to simply store the file in /123456. > > Can anyone indicate whether I'm gaining anything by using smaller > directories in ext3/ext4? Thanks. > > Mitch > If you have one file per dir, that's a lot of dirs, and the time to search for new dir inode locations can get rather expensive as the fs fills, in my experience. You may also want to toy with setting the "topdir" flag on a dir; new directories -under- that topdir get spread around the block groups. New dirs under a non-topdir tend to stay closer to the parent. Finally, remember that ext2/3 has a limit of 32000 or so files per dir. ext4 lifts this restriction. -Eric From alex at alex.org.uk Tue Mar 9 07:11:32 2010 From: alex at alex.org.uk (Alex Bligh) Date: Tue, 09 Mar 2010 07:11:32 +0000 Subject: problems with large directories? In-Reply-To: <4B95C6A0.8090502@redhat.com> References: <6c52fee1003081723l59fdf52fr1b6a508f06d43c4e@mail.gmail.com> <4B95C6A0.8090502@redhat.com> Message-ID: <4CD1DE054DB6ECC9FD485DD6@nimrod.local> --On 8 March 2010 21:55:12 -0600 Eric Sandeen wrote: > Finally, remember that ext2/3 has a limit of 32000 or so files per dir. My IMAP spool suggests this is false w.r.t. ext3 -- Alex Bligh From lists at nerdbynature.de Tue Mar 9 09:14:40 2010 From: lists at nerdbynature.de (Christian Kujau) Date: Tue, 9 Mar 2010 01:14:40 -0800 (PST) Subject: problems with large directories? In-Reply-To: <4CD1DE054DB6ECC9FD485DD6@nimrod.local> References: <6c52fee1003081723l59fdf52fr1b6a508f06d43c4e@mail.gmail.com> <4B95C6A0.8090502@redhat.com> <4CD1DE054DB6ECC9FD485DD6@nimrod.local> Message-ID: On Tue, 9 Mar 2010 at 07:11, Alex Bligh wrote: > > Finally, remember that ext2/3 has a limit of 32000 or so files per dir. > My IMAP spool suggests this is false w.r.t. ext3 Did you use any special options when creating this ext3 filesystem? I've just tried but was only able to create 31998 directories in one directory. Christian. -- BOFH excuse #443: Zombie processes detected, machine is haunted. From bruno at wolff.to Tue Mar 9 13:16:46 2010 From: bruno at wolff.to (Bruno Wolff III) Date: Tue, 9 Mar 2010 07:16:46 -0600 Subject: problems with large directories? In-Reply-To: References: <6c52fee1003081723l59fdf52fr1b6a508f06d43c4e@mail.gmail.com> <4B95C6A0.8090502@redhat.com> <4CD1DE054DB6ECC9FD485DD6@nimrod.local> Message-ID: <20100309131646.GA17912@wolff.to> On Tue, Mar 09, 2010 at 01:14:40 -0800, Christian Kujau wrote: > On Tue, 9 Mar 2010 at 07:11, Alex Bligh wrote: > > > Finally, remember that ext2/3 has a limit of 32000 or so files per dir. > > My IMAP spool suggests this is false w.r.t. ext3 > > Did you use any special options when creating this ext3 filesystem? I've > just tried but was only able to create 31998 directories in one directory. You can create a lot of files (though things work slowly). I think there was a typo above and that it should have said dirs per dir, not files per dir. From alex at alex.org.uk Tue Mar 9 13:17:59 2010 From: alex at alex.org.uk (Alex Bligh) Date: Tue, 09 Mar 2010 13:17:59 +0000 Subject: problems with large directories? In-Reply-To: References: <6c52fee1003081723l59fdf52fr1b6a508f06d43c4e@mail.gmail.com> <4B95C6A0.8090502@redhat.com> <4CD1DE054DB6ECC9FD485DD6@nimrod.local> Message-ID: <74F787A596EC01458B9FB816@host122.msm.che.vodafone> --On 9 March 2010 01:14:40 -0800 Christian Kujau wrote: >> > Finally, remember that ext2/3 has a limit of 32000 or so files per dir. >> My IMAP spool suggests this is false w.r.t. ext3 > > Did you use any special options when creating this ext3 filesystem? I've > just tried but was only able to create 31998 directories in one directory. As Ted pointed out off list, the limit is the number of subdirectories in a directory, not the number of files in a directory. -- Alex Bligh From criley at erad.com Tue Mar 9 14:36:42 2010 From: criley at erad.com (Charles Riley) Date: Tue, 9 Mar 2010 09:36:42 -0500 (EST) Subject: Fwd: problems with large directories? In-Reply-To: <1741306.16141268145265613.JavaMail.root@boardwalk2.erad.com> Message-ID: <800886.16211268145402647.JavaMail.root@boardwalk2.erad.com> Sorry, I meant to send this to the list, not just Ric. ----- Forwarded Message ----- From: "Charles Riley" To: "Ric Wheeler" Sent: Tuesday, March 9, 2010 9:34:25 AM GMT -05:00 US/Canada Eastern Subject: Re: problems with large directories? ----- "Ric Wheeler" wrote: > On 03/08/2010 08:23 PM, Mitch Trachtenberg wrote: > > Hi, > > > > I have an application that deals with 100,000 to 1,000,000 image > files. > > > > I initially structured it to use multiple directories, so that file > > 123456 would be stored in /12/34/123456. I'm now wondering if > that's > > pointless, as it would simplify things to simply store the file in > /123456. > > > > Can anyone indicate whether I'm gaining anything by using smaller > > directories in ext3/ext4? Thanks. > > > > Mitch > > > > I think that breaking up your files into subdirectories makes it > easier to > navigate the tree and find files from a human point of view. Even > better if the > bytes reflect something like year/month/day/hour/min (assuming your > pathname has > a date based guid or similar encoding). > > You can have a million files in one large directory, but be careful to > iterate > and copy them in a sorted order (sorted by inode) to avoid nasty > performance > issues that are side effects of the way we hash file names in ext3/4. > > Good luck! > > Ric > Hi Ric, Can you elaborate on the performance issues you mention above? We use rhel4/ext3 on our pacs (medical imaging) servers. We ran into the 32k limit a couple of years back when our first customer hit the 31,999th study, at which point we implemented a directory hashing algorithm. Now we store images for a given patient's study in a path something like: aa/ab/ac/1.2.3/ where 1.2.3 is the dicom study instance uid (a wwuid for a medical study) and aa/ab/ac/ is the directory hash we derived from that study instance uid. The above is a simplified example for illustration purposes only, 1.2.3 does not really hash to aa/ab/ac/. Within aa/ab/ac/1.2.3/ there can be anywhere from three to a couple of thousand DICOM object files. Images are initially created in a non-hashed temporary directory and then copied to their permanent home in e.g. aa/ab/ac/1.2.3/ In this context, would we gain filesystem performance by sorting by inode before copying? Do the performance issues you refer to only apply to the copy process itself or do they contribute to long term filesystem performance? Thanks for any insight you can provide, Charles From sandeen at redhat.com Tue Mar 9 16:32:19 2010 From: sandeen at redhat.com (Eric Sandeen) Date: Tue, 09 Mar 2010 10:32:19 -0600 Subject: problems with large directories? In-Reply-To: <74F787A596EC01458B9FB816@host122.msm.che.vodafone> References: <6c52fee1003081723l59fdf52fr1b6a508f06d43c4e@mail.gmail.com> <4B95C6A0.8090502@redhat.com> <4CD1DE054DB6ECC9FD485DD6@nimrod.local> <74F787A596EC01458B9FB816@host122.msm.che.vodafone> Message-ID: <4B967813.8010601@redhat.com> Alex Bligh wrote: > > --On 9 March 2010 01:14:40 -0800 Christian Kujau > wrote: > >>>> Finally, remember that ext2/3 has a limit of 32000 or so files per dir. >>> My IMAP spool suggests this is false w.r.t. ext3 >> Did you use any special options when creating this ext3 filesystem? I've >> just tried but was only able to create 31998 directories in one directory. > > As Ted pointed out off list, the limit is the number of subdirectories > in a directory, not the number of files in a directory. Argh you are right, sorry, brain burp. :) -Eric From kyle at kbrandt.com Tue Mar 9 17:10:16 2010 From: kyle at kbrandt.com (Kyle Brandt) Date: Tue, 9 Mar 2010 12:10:16 -0500 Subject: fstab Pass Column and forced disk checks Message-ID: <9ee385321003090910r6fb0d682p6f6edc18f32ed5b8@mail.gmail.com> If I have the 6th column in fstab (the pass column) set to 0, does that mean disk checks will never be forced at boot regardless of anything like File System State, Mount Count, and Check Interval on the file system itself, or are there exceptions to this? I know `man fstab` says: If the sixth field is not present or zero, a value of zero is returned and fsck will assume that the filesystem does not need to be checked. But I wasn't sure if the fsck might be triggered in other ways during boot. Thank you, Kyle Brandt http://www.kbrandt.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From criley at erad.com Tue Mar 9 17:53:50 2010 From: criley at erad.com (Charles Riley) Date: Tue, 9 Mar 2010 12:53:50 -0500 (EST) Subject: fstab Pass Column and forced disk checks In-Reply-To: <24688412.17501268157223648.JavaMail.root@boardwalk2.erad.com> Message-ID: <23454625.17521268157230637.JavaMail.root@boardwalk2.erad.com> If the pass column is 0, no automatic check is done. It's been my experience that setting it that way is a bad idea though, unless you plan on periodic manual fscks. ----- "Kyle Brandt" wrote: > If I have the 6th column in fstab (the pass column) set to 0, does > that mean disk checks will never be forced at boot regardless of > anything like File System State, Mount Count, and Check Interval on > the file system itself, or are there exceptions to this? > > I know `man fstab` says: > If the sixth field is not present or zero, a value of zero is returned > and fsck will assume that the filesystem does not need to be checked. > > But I wasn't sure if the fsck might be triggered in other ways during > boot. > > Thank you, > Kyle Brandt > http://www.kbrandt.com > > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users From adilger at sun.com Tue Mar 9 20:54:18 2010 From: adilger at sun.com (Andreas Dilger) Date: Tue, 09 Mar 2010 13:54:18 -0700 Subject: fstab Pass Column and forced disk checks In-Reply-To: <9ee385321003090910r6fb0d682p6f6edc18f32ed5b8@mail.gmail.com> References: <9ee385321003090910r6fb0d682p6f6edc18f32ed5b8@mail.gmail.com> Message-ID: On 2010-03-09, at 10:10, Kyle Brandt wrote: > If I have the 6th column in fstab (the pass column) set to 0, does > that mean disk checks will never be forced at boot regardless of > anything like File System State, Mount Count, and Check Interval on > the file system itself, or are there exceptions to this? No, there are many filesystems which don't have/allow checking so the top-level fsck tool needs to honor this. I would never recommend disabling e2fsck on a system, unless you are running in an HA environment where it is not safe to do automated checks at startup time. I also do not recommend that people disable the periodic e2fsck checks, because people forget to check their filesystems, and the kernel can sometimes spread corruption further if it reads garbage from the disk. If you dislike the periodic (time/mount count) checks that e2fsck forces at boot, I would suggest using the "lvcheck" script I posted to linux-ext4 some months ago (assuming you are using LVM, which most people are these days), and will attach here again. That allows you to periodically check the filesystem in the background to detect corruptions on disk, without any concern that the next reboot will take a long time. It would be great to get these included as part of the lvm2 package, and have lvcheck installed in /etc/cron.weekly to automatically check all the LVs configured on the system, and solve the "we don't like periodic checks at boot" problem in a way that is still robust to the errors that will undoubtably appear on disk at one point or another. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. -------------- next part -------------- A non-text attachment was scrubbed... Name: lvcheck Type: application/octet-stream Size: 10785 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: lvcheck.conf Type: application/octet-stream Size: 1242 bytes Desc: not available URL: From rwheeler at redhat.com Wed Mar 10 01:51:20 2010 From: rwheeler at redhat.com (Ric Wheeler) Date: Tue, 09 Mar 2010 20:51:20 -0500 Subject: Fwd: problems with large directories? In-Reply-To: <800886.16211268145402647.JavaMail.root@boardwalk2.erad.com> References: <800886.16211268145402647.JavaMail.root@boardwalk2.erad.com> Message-ID: <4B96FB18.20300@redhat.com> On 03/09/2010 09:36 AM, Charles Riley wrote: > Sorry, I meant to send this to the list, not just Ric. > > > ----- Forwarded Message ----- > From: "Charles Riley" > To: "Ric Wheeler" > Sent: Tuesday, March 9, 2010 9:34:25 AM GMT -05:00 US/Canada Eastern > Subject: Re: problems with large directories? > > > > > ----- "Ric Wheeler" wrote: > >> On 03/08/2010 08:23 PM, Mitch Trachtenberg wrote: >>> Hi, >>> >>> I have an application that deals with 100,000 to 1,000,000 image >> files. >>> >>> I initially structured it to use multiple directories, so that file >>> 123456 would be stored in /12/34/123456. I'm now wondering if >> that's >>> pointless, as it would simplify things to simply store the file in >> /123456. >>> >>> Can anyone indicate whether I'm gaining anything by using smaller >>> directories in ext3/ext4? Thanks. >>> >>> Mitch >>> >> >> I think that breaking up your files into subdirectories makes it >> easier to >> navigate the tree and find files from a human point of view. Even >> better if the >> bytes reflect something like year/month/day/hour/min (assuming your >> pathname has >> a date based guid or similar encoding). >> >> You can have a million files in one large directory, but be careful to >> iterate >> and copy them in a sorted order (sorted by inode) to avoid nasty >> performance >> issues that are side effects of the way we hash file names in ext3/4. >> >> Good luck! >> >> Ric >> > > Hi Ric, > > Can you elaborate on the performance issues you mention above? > > We use rhel4/ext3 on our pacs (medical imaging) servers. > We ran into the 32k limit a couple of years back when our first customer hit the 31,999th study, at which point we implemented a directory hashing algorithm. Now we store images for a given patient's study in a path something like: > aa/ab/ac/1.2.3/ > > where 1.2.3 is the dicom study instance uid (a wwuid for a medical study) > and aa/ab/ac/ is the directory hash we derived from that study instance uid. > > The above is a simplified example for illustration purposes only, 1.2.3 does not really hash to aa/ab/ac/. > Within aa/ab/ac/1.2.3/ there can be anywhere from three to a couple of thousand DICOM object files. > Images are initially created in a non-hashed temporary directory and then copied to their permanent home in e.g. aa/ab/ac/1.2.3/ > > In this context, would we gain filesystem performance by sorting by inode before copying? > Do the performance issues you refer to only apply to the copy process itself or do they contribute to long term filesystem performance? > > Thanks for any insight you can provide, > > Charles > Hi Charles, The big issue with touching a lot of files (reading, stating, unlinking them) in ext3/4 is that readdir gives us back a list in effectively random order. This makes the accesses very seeky. Not an issue with a handful of files (say a couple of hundred), but when you get to thousands (or millions) of files, performance really tanks. To avoid that, you can sort the list returned by readdir() into ascending order by inode in reasonably large batches and get your performance up. Several core tools have been looking at doing this automatically, but it is important for any home grown applications as well. In your scenario with the directory hierarchy, I suspect that you won't hit this. If you had one very large directory, you certainly would. Best regards, Ric From kyle at kbrandt.com Wed Mar 10 12:43:54 2010 From: kyle at kbrandt.com (Kyle Brandt) Date: Wed, 10 Mar 2010 07:43:54 -0500 Subject: fstab Pass Column and forced disk checks In-Reply-To: References: <9ee385321003090910r6fb0d682p6f6edc18f32ed5b8@mail.gmail.com> Message-ID: <9ee385321003100443vb02284ay5fcba96363e31aec@mail.gmail.com> Thank you everyone for your responses. I agree with Andreas about not disabling the checks in general, but in this case I don't have the final word. I will look into the lvm script, is that limited to ext4 or does it work with ext3 as well? I cross posted this question at http://serverfault.com/questions/120804/pass-column-of-fstab/120815#120815and someone noticed that there is one exception (not a fstab exception though) on some distributions (RHEL5). That is if /forcefsck file system exists the check will still happen because of /etc/rc.d/rc.sysinit if [ -f /forcefsck ] || strstr "$cmdline" forcefsck ; then fsckoptions="-f $fsckoptions" Thanks! Kyle On 3/9/10, Andreas Dilger wrote: > > On 2010-03-09, at 10:10, Kyle Brandt wrote: > >> If I have the 6th column in fstab (the pass column) set to 0, does that >> mean disk checks will never be forced at boot regardless of anything like >> File System State, Mount Count, and Check Interval on the file system >> itself, or are there exceptions to this? >> > > No, there are many filesystems which don't have/allow checking so the > top-level fsck tool needs to honor this. I would never recommend disabling > e2fsck on a system, unless you are running in an HA environment where it is > not safe to do automated checks at startup time. I also do not recommend > that people disable the periodic e2fsck checks, because people forget to > check their filesystems, and the kernel can sometimes spread corruption > further if it reads garbage from the disk. > > If you dislike the periodic (time/mount count) checks that e2fsck forces at > boot, I would suggest using the "lvcheck" script I posted to linux-ext4 some > months ago (assuming you are using LVM, which most people are these days), > and will attach here again. That allows you to periodically check the > filesystem in the background to detect corruptions on disk, without any > concern that the next reboot will take a long time. > > It would be great to get these included as part of the lvm2 package, and > have lvcheck installed in /etc/cron.weekly to automatically check all the > LVs configured on the system, and solve the "we don't like periodic checks > at boot" problem in a way that is still robust to the errors that will > undoubtably appear on disk at one point or another. > > Cheers, Andreas > -- > Andreas Dilger > Sr. Staff Engineer, Lustre Group > Sun Microsystems of Canada, Inc. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Sean.D.McCauliff at nasa.gov Wed Mar 10 19:23:25 2010 From: Sean.D.McCauliff at nasa.gov (Sean McCauliff) Date: Wed, 10 Mar 2010 11:23:25 -0800 Subject: Finding the holes in sparse files. Message-ID: <4B97F1AD.2060804@nasa.gov> Is there a way to find the holes in sparse files, other than assuming contiguous blocks of zeroes are holes? Thanks, Sean From sandeen at redhat.com Wed Mar 10 19:43:09 2010 From: sandeen at redhat.com (Eric Sandeen) Date: Wed, 10 Mar 2010 13:43:09 -0600 Subject: Finding the holes in sparse files. In-Reply-To: <4B97F1AD.2060804@nasa.gov> References: <4B97F1AD.2060804@nasa.gov> Message-ID: <4B97F64D.3000202@redhat.com> Sean McCauliff wrote: > Is there a way to find the holes in sparse files, other than assuming > contiguous blocks of zeroes are holes? yes, programatically you can use a couple ioctls: fibmap (block-at-a-time) or fiemap in newer kernels. If you want a commandline, try filefrag -v. For ioctl usage examples, take a look at how filefrag is implemented. # dd if=/dev/zero of=testfile bs=4k count=1; dd if=/dev/zero of=testfile conv=notrunc bs=4k seek=4 count=1 # sync # filefrag -v testfile Filesystem type is: ef53 File size of testfile is 20480 (5 blocks, blocksize 4096) ext logical physical expected length flags 0 0 1829913 1 1 4 1802777 1829913 1 eof testfile: 2 extents found the logical+length gap shows you that there was a hole in there Andreas has patches to make it still clearer in the table output. -Eric > Thanks, > Sean > > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users From cax0cn at gmail.com Thu Mar 11 02:47:23 2010 From: cax0cn at gmail.com (Joseph Chen) Date: Thu, 11 Mar 2010 10:47:23 +0800 Subject: Finding the holes in sparse files. In-Reply-To: <4B97F64D.3000202@redhat.com> References: <4B97F1AD.2060804@nasa.gov> <4B97F64D.3000202@redhat.com> Message-ID: <8d423b321003101847t49566946lc87110e82bb5e81f@mail.gmail.com> Check my post here How to Check Sparse Files with Perl For any issues plesae let me know :) J On Thu, Mar 11, 2010 at 3:43 AM, Eric Sandeen wrote: > Sean McCauliff wrote: > > Is there a way to find the holes in sparse files, other than assuming > > contiguous blocks of zeroes are holes? > > yes, programatically you can use a couple ioctls: > fibmap (block-at-a-time) or fiemap in newer kernels. > > If you want a commandline, try filefrag -v. > > For ioctl usage examples, take a look at how filefrag is implemented. > > # dd if=/dev/zero of=testfile bs=4k count=1; dd if=/dev/zero of=testfile > conv=notrunc bs=4k seek=4 count=1 > # sync > # filefrag -v testfile > Filesystem type is: ef53 > File size of testfile is 20480 (5 blocks, blocksize 4096) > ext logical physical expected length flags > 0 0 1829913 1 > 1 4 1802777 1829913 1 eof > testfile: 2 extents found > > the logical+length gap shows you that there was a hole in there > > Andreas has patches to make it still clearer in the table output. > > -Eric > > > Thanks, > > Sean > > > > _______________________________________________ > > Ext3-users mailing list > > Ext3-users at redhat.com > > https://www.redhat.com/mailman/listinfo/ext3-users > > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users > -- Sponser and operater: Linux monitoring solution: http://www.admon.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From mirjafarali at gmail.com Tue Mar 16 20:43:43 2010 From: mirjafarali at gmail.com (MirJafar Ali) Date: Tue, 16 Mar 2010 15:43:43 -0500 Subject: Ext4 File System: newbee question Message-ID: Hello, I have installed Ubuntu 9.10 and it has ext4 filesystem. I have one very elementary question. When I say my filesystem is "ext4", which directories are part of it. I mean from the root I can see some directories such as /proc, /tmp, /dev etc. Are they store on the disk which have formatted with ext4, of certain files resides somewhere else. I am only sure only about /home directory because I keep my disk mobile and data goes with me all the time. Please execuse me for such a simple question. Mir -------------- next part -------------- An HTML attachment was scrubbed... URL: From mirjafarali at gmail.com Tue Mar 16 20:43:43 2010 From: mirjafarali at gmail.com (MirJafar Ali) Date: Tue, 16 Mar 2010 15:43:43 -0500 Subject: Ext4 File System: newbee question Message-ID: Hello, I have installed Ubuntu 9.10 and it has ext4 filesystem. I have one very elementary question. When I say my filesystem is "ext4", which directories are part of it. I mean from the root I can see some directories such as /proc, /tmp, /dev etc. Are they store on the disk which have formatted with ext4, of certain files resides somewhere else. I am only sure only about /home directory because I keep my disk mobile and data goes with me all the time. Please execuse me for such a simple question. Mir -------------- next part -------------- An HTML attachment was scrubbed... URL: From adilger at sun.com Wed Mar 17 18:45:45 2010 From: adilger at sun.com (Andreas Dilger) Date: Wed, 17 Mar 2010 12:45:45 -0600 Subject: Ext4 File System: newbee question In-Reply-To: References: Message-ID: <2C26629B-2AB8-4955-823F-F6D46A25C7BA@sun.com> On 2010-03-16, at 14:43, MirJafar Ali wrote: > I have installed Ubuntu 9.10 and it has ext4 filesystem. I have one > very elementary question. When I say my filesystem is "ext4", which > directories are part of it. I mean from the root I can see some > directories such as /proc, /tmp, /dev etc. Are they store on the > disk which have formatted with ext4, of certain files resides > somewhere else. > > I am only sure only about /home directory because I keep my disk > mobile and data goes with me all the time. Please run "mount" and "df", which show the mountpoints for each filesystem. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. From pg_ext3 at ext3.for.sabi.co.UK Wed Mar 17 20:56:15 2010 From: pg_ext3 at ext3.for.sabi.co.UK (Peter Grandi) Date: Wed, 17 Mar 2010 20:56:15 +0000 Subject: Ext4 File System: newbee question In-Reply-To: References: Message-ID: <19361.16879.463876.940171@tree.ty.sabi.co.uk> > [ ... ] my filesystem is "ext4", which directories are part of > it. 'ext4' is a file system *type*. You can have many filesystems of that type, each with its own tree of directories and files etc. Each filesystem of type 'ext4' will be stored on a particular storage device or a subsection of one, and will have some kind of indentifying label. Usually each filesystem tree will be stored in a partition on some disk, and will be "mounted" on (that is, its directories and files will appear under) some directory. You can see a list of those by reading the file '/proc/mounts'; in a terminal shell the command 'grep ext4 /proc/mounts' will print a list of all the currently active ("mounted") devices containing an 'ext4' filesystem. For a list of the more important ones in a more readable format run in a terminal shell the command 'df -T -BG -a'. > I mean from the root I can see some directories such as /proc, > /tmp, /dev etc. Those 3 directories are usually the mount points for special file system types, and almost never 'ext4' type. > Are they store on the disk which have formatted with ext4, of > certain files resides somewhere else. Some filesystems are stored only in memory as they are not persistent, as they represent temporary entities. > I am only sure only about /home directory because I keep my > disk mobile and data goes with me all the time. Most likely both the devices mounted on the "/" and "/home" directories contain filesystems of type 'ext4'. There are several tutorials online and in print that explain what is a file system type, a filesystem instance of a type, and the storage ("block device") holding that instance. From mirjafarali at gmail.com Thu Mar 18 22:48:45 2010 From: mirjafarali at gmail.com (MirJafar Ali) Date: Thu, 18 Mar 2010 17:48:45 -0500 Subject: DataBlock information Help Message-ID: Hello, I am using e2fsprogs and found it very nice. I want to know datablocks for a given a given file. I was going through the document and did lots of google search, but I am not sure what is the best way to get this information. Which "e2fsprogs" function can give all the datablock IDs. There is one function i.e. ext2fs_block_iterate, but I am not sure how it works. It wasn't clear from the document. Can someone please help without getting angry on this simple question ? Mir -------------- next part -------------- An HTML attachment was scrubbed... URL: From adilger at sun.com Thu Mar 18 23:13:35 2010 From: adilger at sun.com (Andreas Dilger) Date: Thu, 18 Mar 2010 17:13:35 -0600 Subject: DataBlock information Help In-Reply-To: References: Message-ID: <7F078568-B44E-4AEE-8DB3-D4BB40C0B5D2@sun.com> On 2010-03-18, at 16:48, MirJafar Ali wrote: > I am using e2fsprogs and found it very nice. I want to know > datablocks for a given a given file. > I was going through the document and did lots of google search, but > I am not sure what is the best way to get this information. Which > "e2fsprogs" function can give all the datablock IDs. There is one > function i.e. ext2fs_block_iterate, but I am not sure how it works. > It wasn't clear from the > document. If you use "dumpe2fs -c -R 'stat /path/to/file' /dev/XXX", where /path/ to/file is the filesystem relative pathname, that will dump all of the blocks. On newer kernels you can also use "filefrag -v" to list the blocks, though the output format is less than ideal right now. Programatically, on a newer kernel you can use the fiemap() API to get the list of all blocks for any file, regardless of the filesystem type. Cheers, Andreas -- Andreas Dilger Sr. Staff Engineer, Lustre Group Sun Microsystems of Canada, Inc. From sandeen at redhat.com Fri Mar 19 02:16:28 2010 From: sandeen at redhat.com (Eric Sandeen) Date: Thu, 18 Mar 2010 21:16:28 -0500 Subject: DataBlock information Help In-Reply-To: References: Message-ID: <4BA2DE7C.4020105@redhat.com> MirJafar Ali wrote: > Hello, > > I am using e2fsprogs and found it very nice. I want to know datablocks > for a given a given file. > I was going through the document and did lots of google search, but I am > not sure what is the > best way to get this information. Which "e2fsprogs" function can give > all the datablock IDs. There > is one function i.e. ext2fs_block_iterate, but I am not sure how it > works. It wasn't clear from the > document. > > Can someone please help without getting angry on this simple question ? > > Mir > >From the commandline, you can just use filefrag (-v) If you want to do it programatically, you can look at how filefrag uses the FIBMAP and/or FIEMAP ioctls. If you want to do it with the filesystem unmounted, you can look at how the debugfs "stat" command shows you the blocks. -Eric From mirjafarali at gmail.com Fri Mar 19 02:55:41 2010 From: mirjafarali at gmail.com (MirJafar Ali) Date: Thu, 18 Mar 2010 21:55:41 -0500 Subject: File Age emulation Message-ID: Hello, I have a new hard drive and need to do some study on filesystem aging. Is there any simulator which can fill the drive with some realistic file system behaviour ( size, age etc) ? Thanks. Mir -------------- next part -------------- An HTML attachment was scrubbed... URL: From sandeen at redhat.com Fri Mar 19 04:39:37 2010 From: sandeen at redhat.com (Eric Sandeen) Date: Thu, 18 Mar 2010 23:39:37 -0500 Subject: File Age emulation In-Reply-To: References: Message-ID: <4BA30009.1070503@redhat.com> MirJafar Ali wrote: > Hello, > > I have a new hard drive and need to do some study on filesystem aging. > Is there any > simulator which can fill the drive with some realistic file system > behaviour ( size, age etc) ? > > Thanks. > > Mir > Are these still questions for a class you are taking? Ted asked you that earlier, but I see there was no reply... You have had a very interesting collection of questions for the list, and I wonder what the goal might be for you... rather than asking seemingly random questions, is there a larger goal you are trying to accomplish here, or are we possibly just doing your homework? -Eric From sandeen at redhat.com Fri Mar 19 13:57:27 2010 From: sandeen at redhat.com (Eric Sandeen) Date: Fri, 19 Mar 2010 08:57:27 -0500 Subject: ext2 IF windows Xp Pro with Ubuntu 9.10 64amd In-Reply-To: References: Message-ID: <4BA382C7.9090308@redhat.com> Chris Taylor wrote: > Hi To all > I have just built a new System > 3.4Gb AMD Athlon 64bit > 1GB RAM > 500Gb SATA HDD > Disk /dev/sda: 500.1 GB, 500107862016 bytes > 255 heads, 63 sectors/track, 60801 cylinders > Units = cylinders of 16065 * 512 = 8225280 bytes > Disk identifier: 0x00e600e6 > > Device Boot Start End Blocks Id System > /dev/sda1 * 1 1912 15358108+ 7 HPFS/NTFS > /dev/sda2 1913 3824 15358140 83 Linux > /dev/sda3 3825 60801 457667752+ 5 Extended > /dev/sda5 60194 60801 4883760 82 Linux swap / > Solaris > /dev/sda6 3825 60193 452783929+ 83 Linux > > Partition table entries are not in disk order. > On sda1 I have windows Xp Pro sp2 > on sda2 I have Ubuntu 9.10 64bit just upgraded via web > On sda6 I have my home partition (according to gparted) > I have installed EXT2IFS so I can have XP and Ubuntu use the same place > for files. > Every time I try to access F: drive from Windows I get "do you want to > format the drive " I'm thinking that I have a Inodes problem, Thinking > they are 256 not 128. I have tried to format the drive with Gparted to > EXT3 a few times and get the same problem still > " Large inodes > The current version of Ext2 IFS only mounts volumes with an inode size of > 128 like old Linux kernels have. A word of warning, at least one windows driver for extN has been known in the past to corrupt filesytems. Since it's not open source, we can't debug or fix it. Maybe it's fixed now, but I don't know. > Some very new Linux distributions create an Ext3 file systems with inodes > of 256 bytes. Ext2 IFS 1.11 is not able to access them. > > Currently there is only one workaround: Please back up the files and > create the Ext3 file system again. Give the mkfs.ext3 tool the -I 128 > switch. Finally, restore all files with the backup. " > > If I'm write I need to unmount the /Home partition but I don't know how > to do this :-( > > Please if you would be so kind as to help me with any info > Chris It's not really an ext3-specific question, but you'll need to unbusy the /home mountpoint to unmount it to reformat it; booting into single-user mode would allow you to do that. -Eric From jcubedla at gmail.com Sat Mar 20 05:04:55 2010 From: jcubedla at gmail.com (John) Date: Fri, 19 Mar 2010 22:04:55 -0700 Subject: ext2 IF windows Xp Pro with Ubuntu 9.10 64amd In-Reply-To: References: Message-ID: Hi Chris, On Fri, Feb 26, 2010 at 9:13 PM, Chris Taylor < chris.j.taylor at optusnet.com.au> wrote: > Some very new Linux distributions create an Ext3 file systems with inodes > of 256 bytes. Ext2 IFS 1.11 is not able to access them. > You may want to look at Ext2 Fsd which doesn't have that limitation. John -------------- next part -------------- An HTML attachment was scrubbed... URL: From michaelm at plumbersstock.com Tue Mar 23 18:16:01 2010 From: michaelm at plumbersstock.com (Michael McGlothlin) Date: Tue, 23 Mar 2010 12:16:01 -0600 Subject: File caching? Message-ID: <5790d5b71003231116p6c098aebl66ba79d36ff1ca7e@mail.gmail.com> I've been asked to cache some high traffic files on one of our server. Is there an easy way to get ext3/ext4 filesystems to cache several GB of files in memory at once? I'd like writes to happen normally but reads to happen from RAM. (We have plenty of RAM so that isn't an issue.) If that isn't possible I can cache the files myself. Does the filesystem keep a cache in memory of the file attributes such as modification time? So if I check for a change will the disk actually have to physically move to check the mod time? Thanks, Michael McGlothlin -------------- next part -------------- An HTML attachment was scrubbed... URL: From davids at webmaster.com Tue Mar 23 18:52:53 2010 From: davids at webmaster.com (David Schwartz) Date: Tue, 23 Mar 2010 11:52:53 -0700 Subject: File caching? In-Reply-To: <5790d5b71003231116p6c098aebl66ba79d36ff1ca7e@mail.gmail.com> References: <5790d5b71003231116p6c098aebl66ba79d36ff1ca7e@mail.gmail.com> Message-ID: <005c01cacaba$0c7dcfa0$25796ee0$@com> Michael McGlothlin wrote: > I've been asked to cache some high traffic files on one of our server. > Is there an easy way to get ext3/ext4 filesystems to cache several GB > of files in memory at once? I'd like writes to happen normally but reads > to happen from RAM. (We have plenty of RAM so that isn't an issue.) > If that isn't possible I can cache the files myself. Does the filesystem > keep a cache in memory of the file attributes such as modification time? > So if I check for a change will the disk actually have to physically move > to check the mod time? I would first investigate whether your web server has some specific way to do this. Failing that, I strongly recommend just letting the disk cache do its job. If they really are frequently-accessed, they will stay in cache if sufficient RAM is available anyway. I would only suggest going further if you have specific latency requirements. If you do, I'd recommend simply using a separate program to map the files and then lock the pages in memory. The 'memlockd' program can do this. I'm not sure how well it handles file changes, but it shouldn't be difficult to modify it to restart if any file changes. The other possibility is to put the files on a ramdisk. You can use a scheduled script to update them from an on-disk copy if needed. Linux has good stat caching, so the need to move the disk to check the modification time will only occur if that information was pushed out of cache. DS From michaelm at plumbersstock.com Tue Mar 23 19:16:57 2010 From: michaelm at plumbersstock.com (Michael McGlothlin) Date: Tue, 23 Mar 2010 13:16:57 -0600 Subject: File caching? In-Reply-To: <005c01cacaba$0c7dcfa0$25796ee0$@com> References: <5790d5b71003231116p6c098aebl66ba79d36ff1ca7e@mail.gmail.com> <005c01cacaba$0c7dcfa0$25796ee0$@com> Message-ID: <5790d5b71003231216p6ee172dcx2ae66e2cbd53f1af@mail.gmail.com> This isn't for a web server although we might apply the same approach to that if this speeds things up a lot. I was going to store the cached copy on a RAM disk as many of the files are larger than the 1MB limit of memlockd and I don't feel like coming up with my own solution if I can avoid it. Is there a way to know how much RAM is being used for file cache or to tell it to use more? If the server has 128GB of RAM and typically uses half of that for it's actual work will it use the rest as file cache? Likewise is there a way to track/test if file stats are being pushed out of cache a lot? We've been considering switching to SSD or RAM drives but it seems they'd always be slower than system RAM and we haven't found a product that can affordably store sufficient data. I couldn't find a product that just sits between the disk and controller, or a controller that does this itself, and adds a large RAM-based file cache either. Thanks, Michael McGlothlin On Tue, Mar 23, 2010 at 12:52 PM, David Schwartz wrote: > > Michael McGlothlin wrote: > > > I've been asked to cache some high traffic files on one of our server. > > Is there an easy way to get ext3/ext4 filesystems to cache several GB > > of files in memory at once? I'd like writes to happen normally but reads > > to happen from RAM. (We have plenty of RAM so that isn't an issue.) > > > If that isn't possible I can cache the files myself. Does the filesystem > > keep a cache in memory of the file attributes such as modification time? > > So if I check for a change will the disk actually have to physically move > > to check the mod time? > > I would first investigate whether your web server has some specific way to > do this. Failing that, I strongly recommend just letting the disk cache do > its job. If they really are frequently-accessed, they will stay in cache if > sufficient RAM is available anyway. I would only suggest going further if > you have specific latency requirements. > > If you do, I'd recommend simply using a separate program to map the files > and then lock the pages in memory. The 'memlockd' program can do this. I'm > not sure how well it handles file changes, but it shouldn't be difficult to > modify it to restart if any file changes. > > The other possibility is to put the files on a ramdisk. You can use a > scheduled script to update them from an on-disk copy if needed. > > Linux has good stat caching, so the need to move the disk to check the > modification time will only occur if that information was pushed out of > cache. > > DS > > > > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users > -------------- next part -------------- An HTML attachment was scrubbed... URL: From balu.manyam at gmail.com Thu Mar 25 14:20:27 2010 From: balu.manyam at gmail.com (Balu manyam) Date: Thu, 25 Mar 2010 19:50:27 +0530 Subject: ext3 corruption Message-ID: <995392221003250720j75281ebvdff04cd826c0cea4@mail.gmail.com> hey ext3 gurus - i am desperately looking for some help on a problem where my ext3 filesystem got corrupted my beyond repair the issue i saw the FS size was much less the logical volume size - when we tried to mount it .....so we attempted an fsck - we lost all the data these are the messages we got before the data corruption happened the hal.hotplug seems to have triggered something - the filesystem is on an LVM2 volume with multipathing managing the hba paths to EMC does this ring a bell with anyone ? Feb 24 05:25:27 hal.hotplug[17949]: timout(10000 ms) waiting for /block/dm-24 Feb 24 05:26:09 kernel: attempt to access beyond end of device Feb 24 05:26:09 kernel: dm-19: rw=0, want=17334189504, limit=31457280 Feb 24 05:26:09 kernel: attempt to access beyond end of device Feb 24 05:26:09 kernel: dm-19: rw=0, want=134217688, limit=31457280 Feb 24 05:26:09 kernel: attempt to access beyond end of device Feb 24 05:26:09 kernel: dm-19: rw=0, want=17586972680, limit=31457280 Feb 24 05:26:09 kernel: attempt to access beyond end of device Feb 24 05:26:09 kernel: dm-19: rw=0, want=8777633304, limit=31457280 Feb 24 05:26:09 kernel: attempt to access beyond end of device Feb 24 05:26:09 kernel: dm-19: rw=0, want=10317744712, limit=31457280 Feb 24 05:26:09 kernel: attempt to access beyond end of device Feb 24 05:26:09 kernel: dm-19: rw=0, want=12067501136, limit=31457280 Feb 24 05:26:09 kernel: attempt to access beyond end of device Feb 24 05:26:09 kernel: dm-19: rw=0, want=19100288824, limit=31457280 Feb 24 05:26:09 kernel: attempt to access beyond end of device Feb 24 05:26:09 kernel: dm-19: rw=0, want=18694536512, limit=31457280 Feb 24 05:26:09 kernel: attempt to access beyond end of device Feb 24 05:26:09 kernel: dm-19: rw=0, want=25796951120, limit=31457280 Feb 24 05:26:09 kernel: attempt to access beyond end of device Feb 24 05:26:09 kernel: dm-19: rw=0, want=18425691224, limit=31457280 Feb 24 05:26:09 kernel: attempt to access beyond end of device Feb 24 05:26:09 kernel: dm-19: rw=0, want=15698503632, limit=31457280 Feb 24 05:26:09 kernel: attempt to access beyond end of device Feb 24 05:26:09 kernel: dm-19: rw=0, want=12067501136, limit=31457280 Feb 24 05:26:09 kernel: attempt to access beyond end of device Feb 24 05:26:09 kernel: dm-19: rw=0, want=12067501136, limit=31457280 Feb 24 05:26:09 kernel: attempt to access beyond end of device Feb 24 05:26:09 kernel: dm-19: rw=0, want=12067501136, limit=31457280 Feb 24 05:26:09 kernel: attempt to access beyond end of device Feb 24 05:26:09 kernel: dm-19: rw=0, want=12067501136, limit=31457280 Feb 24 05:26:09 kernel: attempt to access beyond end of device here are some errors from fsck in messages file Feb 24 08:31:02 kernel: EXT3-fs error (device dm-18): ext3_readdir: bad entry in directory #2: inode out of bounds - offset =44, inode=59047937, rec_len=16, name_len=8 thanks!!! Balu -------------- next part -------------- An HTML attachment was scrubbed... URL: From arun at bvinetworks.com Fri Mar 26 18:52:05 2010 From: arun at bvinetworks.com (Arun Nair) Date: Fri, 26 Mar 2010 11:52:05 -0700 Subject: Ext4 and large (>8TB) files Message-ID: <941b09771003261152p6408b4cfle907389f81270f15@mail.gmail.com> Hi - (I apologize for the ext4 question in an ext3 mailer, but I couldn't find a user list for ext4.) Per my understanding, ext4 can support file sizes upto 16 TiB if you use 4k blocks. I have a logical volume which uses ext4 with a 4k block size but I am unable to create files that are 8TiB (8796093022208 bytes) or larger. [root at camanoe] ls -l total 8589935388 -rw-rw---- 1 root root 8796093022207 2010-03-26 11:43 bigfile [root at camanoe] echo x >> bigfile -bash: echo: write error: File too large [root at camanoe] df -h . Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg-mysql_vol 15T 8.1T 5.6T 60% /mysql [root at camanoe]# tune2fs -l /dev/vg/mysql_vol | grep "Block size" Block size: 4096 [root at camanoe]# uname -a Linux camanoe 2.6.29.4-167.fc11.i686.PAE #1 SMP Wed May 27 17:28:22 EDT 2009 i686 i686 i386 GNU/Linux I'm probably doing something wrong here, but can't figure it out. Any ideas guys? Thanks in advance. Arun -------------- next part -------------- An HTML attachment was scrubbed... URL: From sandeen at redhat.com Fri Mar 26 19:16:24 2010 From: sandeen at redhat.com (Eric Sandeen) Date: Fri, 26 Mar 2010 14:16:24 -0500 Subject: Ext4 and large (>8TB) files In-Reply-To: <941b09771003261152p6408b4cfle907389f81270f15@mail.gmail.com> References: <941b09771003261152p6408b4cfle907389f81270f15@mail.gmail.com> Message-ID: <4BAD0808.1040407@redhat.com> On 03/26/2010 01:52 PM, Arun Nair wrote: > Hi - > > (I apologize for the ext4 question in an ext3 mailer, but I couldn't > find a user list for ext4.) linux-ext4 at vger.kernel.org :) but that's ok. > Per my understanding, ext4 can support file sizes upto 16 TiB if you use > 4k blocks. I have a logical volume which uses ext4 with a 4k block size > but I am unable to create files that are 8TiB (8796093022208 bytes) or > larger. > > [root at camanoe] ls -l > total 8589935388 > -rw-rw---- 1 root root 8796093022207 2010-03-26 11:43 bigfile > > [root at camanoe] echo x >> bigfile > -bash: echo: write error: File too large Perhaps echo isn't using largefile semantics? Is this the first test you did, or is echo the simple testcase, and something else failed? It works for me on rawhide x86_64: create a file with blocks past 8T: # xfs_io -F -f -c "pwrite 8T 1M" bigfile wrote 1048576/1048576 bytes at offset 8796093022208 1 MiB, 256 ops; 0.0000 sec (206.313 MiB/sec and 52816.1750 ops/sec) echo more into it: # echo x >> bigfile it really is that big: # ls -lh bigfile -rw-------. 1 root root 8.1T Mar 26 14:13 bigfile I don't have an x86 box to test quickly; try something besides echo, is what I'd suggest - xfs_io would work, or probably dd (with conv=notrunc if you want to append) -Eric From arun at bvinetworks.com Fri Mar 26 20:50:55 2010 From: arun at bvinetworks.com (Arun Nair) Date: Fri, 26 Mar 2010 13:50:55 -0700 Subject: Ext4 and large (>8TB) files In-Reply-To: <4BAD0808.1040407@redhat.com> References: <941b09771003261152p6408b4cfle907389f81270f15@mail.gmail.com> <4BAD0808.1040407@redhat.com> Message-ID: <941b09771003261350l535f35der63e1ecfe860b747b@mail.gmail.com> Eric, Thanks for the quick reply... see my responses inline... On Fri, Mar 26, 2010 at 12:16 PM, Eric Sandeen wrote: > On 03/26/2010 01:52 PM, Arun Nair wrote: > > Hi - > > > > (I apologize for the ext4 question in an ext3 mailer, but I couldn't > > find a user list for ext4.) > > linux-ext4 at vger.kernel.org :) but that's ok. > Saw that but thought it was a dev-only list, sorry. Next time :) > > > Per my understanding, ext4 can support file sizes upto 16 TiB if you use > > 4k blocks. I have a logical volume which uses ext4 with a 4k block size > > but I am unable to create files that are 8TiB (8796093022208 bytes) or > > larger. > > > > [root at camanoe] ls -l > > total 8589935388 > > -rw-rw---- 1 root root 8796093022207 2010-03-26 11:43 bigfile > > > > [root at camanoe] echo x >> bigfile > > -bash: echo: write error: File too large > > Perhaps echo isn't using largefile semantics? Is this the first > test you did, or is echo the simple testcase, and something else > failed? > It's the simple test case. We found the problem when MySQL failed to expand its ibdata file beyond 8 TB. I then tried dd as well with notrunc like you mentioned, same error: [root at camanoe]# dd oflag=append conv=notrunc if=/dev/zero of=bigfile bs=1 count=1 dd: writing `bigfile': File too large 1+0 records in 0+0 records out 0 bytes (0 B) copied, 0.000234712 s, 0.0 kB/s > It works for me on rawhide x86_64: > > create a file with blocks past 8T: > # xfs_io -F -f -c "pwrite 8T 1M" bigfile > wrote 1048576/1048576 bytes at offset 8796093022208 > 1 MiB, 256 ops; 0.0000 sec (206.313 MiB/sec and 52816.1750 ops/sec) > > echo more into it: > # echo x >> bigfile > > it really is that big: > # ls -lh bigfile > -rw-------. 1 root root 8.1T Mar 26 14:13 bigfile > > I don't have an x86 box to test quickly; try something besides echo, > is what I'd suggest - xfs_io would work, or probably dd (with > conv=notrunc if you want to append) > dd fails as mentioned above. xfs_io errors too: [root at camanoe]# xfs_io -F -f -c "pwrite 8T 1M" bigfile2 pwrite64: File too large > -Eric > > BTW, my system is NOT 64-bit but my guess is this doesn't affect max file size? -------------- next part -------------- An HTML attachment was scrubbed... URL: From sandeen at redhat.com Fri Mar 26 21:10:03 2010 From: sandeen at redhat.com (Eric Sandeen) Date: Fri, 26 Mar 2010 16:10:03 -0500 Subject: Ext4 and large (>8TB) files In-Reply-To: <941b09771003261350l535f35der63e1ecfe860b747b@mail.gmail.com> References: <941b09771003261152p6408b4cfle907389f81270f15@mail.gmail.com> <4BAD0808.1040407@redhat.com> <941b09771003261350l535f35der63e1ecfe860b747b@mail.gmail.com> Message-ID: <4BAD22AB.8050105@redhat.com> On 03/26/2010 03:50 PM, Arun Nair wrote: > Eric, > > Thanks for the quick reply... see my responses inline... > > On Fri, Mar 26, 2010 at 12:16 PM, Eric Sandeen > wrote: > > On 03/26/2010 01:52 PM, Arun Nair wrote: > > Hi - > > > > (I apologize for the ext4 question in an ext3 mailer, but I couldn't > > find a user list for ext4.) > > linux-ext4 at vger.kernel.org :) > but that's ok. > > > Saw that but thought it was a dev-only list, sorry. Next time :) *shrug* I think user questions are welcome too. At least I don't mind. ... > dd fails as mentioned above. xfs_io errors too: > [root at camanoe]# xfs_io -F -f -c "pwrite 8T 1M" bigfile2 > pwrite64: File too large Oh. Well, then! Must be something else. oh, ok: sbi->s_bitmap_maxbytes = ext4_max_bitmap_size(); static loff_t ext4_max_size(int blkbits, int has_huge_files) { loff_t res; loff_t upper_limit = MAX_LFS_FILESIZE; /* Sanity check against vm- & vfs- imposed limits */ if (res > upper_limit) res = upper_limit; return res; } and: /* Page cache limit. The filesystems should put that into their s_maxbytes limits, otherwise bad things can happen in VM. */ #if BITS_PER_LONG==32 #define MAX_LFS_FILESIZE (((u64)PAGE_CACHE_SIZE << (BITS_PER_LONG-1))-1) so it's only giving us 31 bits of pages, not 32. This limits it to 8T on a 32-bit machine with 4k pages. I'm not honestly sure if there is anything in the vm that can't actually cope with a 32-bit offset... but until proven otherwise, probably not going to change this without a lot of testing & inspection. -Eric From arun at bvinetworks.com Fri Mar 26 22:05:52 2010 From: arun at bvinetworks.com (Arun Nair) Date: Fri, 26 Mar 2010 15:05:52 -0700 Subject: Ext4 and large (>8TB) files In-Reply-To: <4BAD22AB.8050105@redhat.com> References: <941b09771003261152p6408b4cfle907389f81270f15@mail.gmail.com> <4BAD0808.1040407@redhat.com> <941b09771003261350l535f35der63e1ecfe860b747b@mail.gmail.com> <4BAD22AB.8050105@redhat.com> Message-ID: <941b09771003261505h66a21030i17bcad852373542e@mail.gmail.com> Eric - So I'm guessing switching the system to 64-bit would fix this for us. How about increasing the block size from the current 4k? Would that be an option too? Thanks much, Arun On Fri, Mar 26, 2010 at 2:10 PM, Eric Sandeen wrote: > On 03/26/2010 03:50 PM, Arun Nair wrote: > > Eric, > > > > Thanks for the quick reply... see my responses inline... > > > > On Fri, Mar 26, 2010 at 12:16 PM, Eric Sandeen > > wrote: > > > > On 03/26/2010 01:52 PM, Arun Nair wrote: > > > Hi - > > > > > > (I apologize for the ext4 question in an ext3 mailer, but I > couldn't > > > find a user list for ext4.) > > > > linux-ext4 at vger.kernel.org :) > > but that's ok. > > > > > > Saw that but thought it was a dev-only list, sorry. Next time :) > > *shrug* I think user questions are welcome too. At least I don't mind. > > ... > > > dd fails as mentioned above. xfs_io errors too: > > [root at camanoe]# xfs_io -F -f -c "pwrite 8T 1M" bigfile2 > > pwrite64: File too large > > Oh. Well, then! Must be something else. > > oh, ok: > > sbi->s_bitmap_maxbytes = ext4_max_bitmap_size(); > > static loff_t ext4_max_size(int blkbits, int has_huge_files) > { > loff_t res; > loff_t upper_limit = MAX_LFS_FILESIZE; > > > > /* Sanity check against vm- & vfs- imposed limits */ > if (res > upper_limit) > res = upper_limit; > > return res; > } > > and: > > /* Page cache limit. The filesystems should put that into their s_maxbytes > limits, otherwise bad things can happen in VM. */ > #if BITS_PER_LONG==32 > #define MAX_LFS_FILESIZE (((u64)PAGE_CACHE_SIZE << (BITS_PER_LONG-1))-1) > > so it's only giving us 31 bits of pages, not 32. This limits it to 8T > on a 32-bit machine with 4k pages. > > I'm not honestly sure if there is anything in the vm that can't actually > cope with a 32-bit offset... but until proven otherwise, probably not > going to change this without a lot of testing & inspection. > > -Eric > -------------- next part -------------- An HTML attachment was scrubbed... URL: From arun at bvinetworks.com Sat Mar 27 01:32:20 2010 From: arun at bvinetworks.com (Arun Nair) Date: Fri, 26 Mar 2010 18:32:20 -0700 Subject: Ext4 and large (>8TB) files In-Reply-To: <5FF9A9E7-B005-4AD7-9923-3493B126EE53@dilger.ca> References: <941b09771003261152p6408b4cfle907389f81270f15@mail.gmail.com> <4BAD0808.1040407@redhat.com> <941b09771003261350l535f35der63e1ecfe860b747b@mail.gmail.com> <4BAD22AB.8050105@redhat.com> <941b09771003261505h66a21030i17bcad852373542e@mail.gmail.com> <5FF9A9E7-B005-4AD7-9923-3493B126EE53@dilger.ca> Message-ID: <941b09771003261832g17dfc8f9k575bfbbf370a9cba@mail.gmail.com> Ok, so I guess ext4 with 64-bit, or another filesystem for us. Thanks Andreas & Eric for all the help. On Fri, Mar 26, 2010 at 3:38 PM, Andreas Dilger wrote: > On 2010-03-26, at 16:05, Arun Nair wrote: > >> So I'm guessing switching the system to 64-bit would fix this for us. How >> about increasing the block size from the current 4k? Would that be an option >> too? >> > > Not in the near future, unless you are running on PPC/ARM/SPARC that can > also handle large pages. > > On Fri, Mar 26, 2010 at 2:10 PM, Eric Sandeen wrote: >> On 03/26/2010 03:50 PM, Arun Nair wrote: >> > Eric, >> > >> > Thanks for the quick reply... see my responses inline... >> > >> > On Fri, Mar 26, 2010 at 12:16 PM, Eric Sandeen > > > wrote: >> > >> > On 03/26/2010 01:52 PM, Arun Nair wrote: >> > > Hi - >> > > >> > > (I apologize for the ext4 question in an ext3 mailer, but I >> couldn't >> > > find a user list for ext4.) >> > >> > linux-ext4 at vger.kernel.org :) >> > but that's ok. >> > >> > >> > Saw that but thought it was a dev-only list, sorry. Next time :) >> >> *shrug* I think user questions are welcome too. At least I don't mind. >> >> ... >> >> > dd fails as mentioned above. xfs_io errors too: >> > [root at camanoe]# xfs_io -F -f -c "pwrite 8T 1M" bigfile2 >> > pwrite64: File too large >> >> Oh. Well, then! Must be something else. >> >> oh, ok: >> >> sbi->s_bitmap_maxbytes = ext4_max_bitmap_size(); >> >> static loff_t ext4_max_size(int blkbits, int has_huge_files) >> { >> loff_t res; >> loff_t upper_limit = MAX_LFS_FILESIZE; >> >> >> >> /* Sanity check against vm- & vfs- imposed limits */ >> if (res > upper_limit) >> res = upper_limit; >> >> return res; >> } >> >> and: >> >> /* Page cache limit. The filesystems should put that into their s_maxbytes >> limits, otherwise bad things can happen in VM. */ >> #if BITS_PER_LONG==32 >> #define MAX_LFS_FILESIZE (((u64)PAGE_CACHE_SIZE << (BITS_PER_LONG-1))-1) >> >> so it's only giving us 31 bits of pages, not 32. This limits it to 8T >> on a 32-bit machine with 4k pages. >> >> I'm not honestly sure if there is anything in the vm that can't actually >> cope with a 32-bit offset... but until proven otherwise, probably not >> going to change this without a lot of testing & inspection. >> >> -Eric >> >> _______________________________________________ >> Ext3-users mailing list >> Ext3-users at redhat.com >> https://www.redhat.com/mailman/listinfo/ext3-users >> > > > Cheers, Andreas > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From arun at bvinetworks.com Sat Mar 27 04:16:41 2010 From: arun at bvinetworks.com (Arun Nair) Date: Fri, 26 Mar 2010 21:16:41 -0700 Subject: Ext4 and large (>8TB) files In-Reply-To: <468CD0F9-7EA3-4196-B662-050BABDB566E@dilger.ca> References: <941b09771003261152p6408b4cfle907389f81270f15@mail.gmail.com> <4BAD0808.1040407@redhat.com> <941b09771003261350l535f35der63e1ecfe860b747b@mail.gmail.com> <4BAD22AB.8050105@redhat.com> <941b09771003261505h66a21030i17bcad852373542e@mail.gmail.com> <5FF9A9E7-B005-4AD7-9923-3493B126EE53@dilger.ca> <941b09771003261832g17dfc8f9k575bfbbf370a9cba@mail.gmail.com> <468CD0F9-7EA3-4196-B662-050BABDB566E@dilger.ca> Message-ID: <941b09771003262116t2f8fec71rda6cf7ff6b7b72f5@mail.gmail.com> Ah. Got it, thanks. On Fri, Mar 26, 2010 at 9:04 PM, Andreas Dilger wrote: > On 2010-03-26, at 19:32, Arun Nair wrote: > >> Ok, so I guess ext4 with 64-bit, or another filesystem for us. Thanks >> Andreas & Eric for all the help. >> > > No, I don't think another filesystem will help, on a 32-bit host. The > limit that ext4 is reporting is the VM page cache limit for a single file, > and has nothing to do with ext4 itself. > > > On Fri, Mar 26, 2010 at 3:38 PM, Andreas Dilger >> wrote: >> On 2010-03-26, at 16:05, Arun Nair wrote: >> So I'm guessing switching the system to 64-bit would fix this for us. How >> about increasing the block size from the current 4k? Would that be an option >> too? >> >> Not in the near future, unless you are running on PPC/ARM/SPARC that can >> also handle large pages. >> >> On Fri, Mar 26, 2010 at 2:10 PM, Eric Sandeen wrote: >> On 03/26/2010 03:50 PM, Arun Nair wrote: >> > Eric, >> > >> > Thanks for the quick reply... see my responses inline... >> > >> > On Fri, Mar 26, 2010 at 12:16 PM, Eric Sandeen > > > wrote: >> > >> > On 03/26/2010 01:52 PM, Arun Nair wrote: >> > > Hi - >> > > >> > > (I apologize for the ext4 question in an ext3 mailer, but I >> couldn't >> > > find a user list for ext4.) >> > >> > linux-ext4 at vger.kernel.org :) >> > but that's ok. >> > >> > >> > Saw that but thought it was a dev-only list, sorry. Next time :) >> >> *shrug* I think user questions are welcome too. At least I don't mind. >> >> ... >> >> > dd fails as mentioned above. xfs_io errors too: >> > [root at camanoe]# xfs_io -F -f -c "pwrite 8T 1M" bigfile2 >> > pwrite64: File too large >> >> Oh. Well, then! Must be something else. >> >> oh, ok: >> >> sbi->s_bitmap_maxbytes = ext4_max_bitmap_size(); >> >> static loff_t ext4_max_size(int blkbits, int has_huge_files) >> { >> loff_t res; >> loff_t upper_limit = MAX_LFS_FILESIZE; >> >> >> >> /* Sanity check against vm- & vfs- imposed limits */ >> if (res > upper_limit) >> res = upper_limit; >> >> return res; >> } >> >> and: >> >> /* Page cache limit. The filesystems should put that into their s_maxbytes >> limits, otherwise bad things can happen in VM. */ >> #if BITS_PER_LONG==32 >> #define MAX_LFS_FILESIZE (((u64)PAGE_CACHE_SIZE << (BITS_PER_LONG-1))-1) >> >> so it's only giving us 31 bits of pages, not 32. This limits it to 8T >> on a 32-bit machine with 4k pages. >> >> I'm not honestly sure if there is anything in the vm that can't actually >> cope with a 32-bit offset... but until proven otherwise, probably not >> going to change this without a lot of testing & inspection. >> >> -Eric >> >> _______________________________________________ >> Ext3-users mailing list >> Ext3-users at redhat.com >> https://www.redhat.com/mailman/listinfo/ext3-users >> >> >> Cheers, Andreas >> >> >> >> >> >> >> > > Cheers, Andreas > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sandeen at redhat.com Mon Mar 29 18:06:12 2010 From: sandeen at redhat.com (Eric Sandeen) Date: Mon, 29 Mar 2010 13:06:12 -0500 Subject: Ext4 and large (>8TB) files In-Reply-To: <468CD0F9-7EA3-4196-B662-050BABDB566E@dilger.ca> References: <941b09771003261152p6408b4cfle907389f81270f15@mail.gmail.com> <4BAD0808.1040407@redhat.com> <941b09771003261350l535f35der63e1ecfe860b747b@mail.gmail.com> <4BAD22AB.8050105@redhat.com> <941b09771003261505h66a21030i17bcad852373542e@mail.gmail.com> <5FF9A9E7-B005-4AD7-9923-3493B126EE53@dilger.ca> <941b09771003261832g17dfc8f9k575bfbbf370a9cba@mail.gmail.com> <468CD0F9-7EA3-4196-B662-050BABDB566E@dilger.ca> Message-ID: <4BB0EC14.6040309@redhat.com> Andreas Dilger wrote: > On 2010-03-26, at 19:32, Arun Nair wrote: >> Ok, so I guess ext4 with 64-bit, or another filesystem for us. Thanks >> Andreas & Eric for all the help. > > No, I don't think another filesystem will help, on a 32-bit host. The > limit that ext4 is reporting is the VM page cache limit for a single > file, and has nothing to do with ext4 itself. Well, for what it's worth, xfs doesn't use MAX_LFS_FILESIZE for s_maxbytes, and: # mkfs.xfs -dfile,name=fsfile,size=5g ... # mount -o loop fsfile mnt/ # cd mnt/ # truncate --size 17592186044415 bigfile # ls -lh bigfile -rw-r--r--. 1 root root 16T Mar 29 14:03 bigfile # uname -m i686 it is possible to create a > 8T file offset. Now, whether the vm is really happy with this probably remains to be seen; this is the sort of thing that breaks without constant testing, IMHO. I'd certainly suggest that a 64-bit box is the best way to go if at all possible. -Eric From mnalis-ml at voyager.hr Mon Mar 29 18:55:58 2010 From: mnalis-ml at voyager.hr (Matija Nalis) Date: Mon, 29 Mar 2010 20:55:58 +0200 Subject: File caching? In-Reply-To: <5790d5b71003231216p6ee172dcx2ae66e2cbd53f1af@mail.gmail.com> References: <5790d5b71003231116p6c098aebl66ba79d36ff1ca7e@mail.gmail.com> <005c01cacaba$0c7dcfa0$25796ee0$@com> <5790d5b71003231216p6ee172dcx2ae66e2cbd53f1af@mail.gmail.com> Message-ID: <20100329185558.GA4060@eagle102.home.lan> On Tue, Mar 23, 2010 at 01:16:57PM -0600, Michael McGlothlin wrote: > I was going to store the cached copy on a RAM disk as many of the files are Note that it probably won't help you that much, the kernel usually does quite good job at caching reads (but, specific situations like extensive writes can starve it - which ramdisk solution would avoid as it is completely manually controlled. Only way to know it is to test it) > larger than the 1MB limit of memlockd and I don't feel like coming up with > my own solution if I can avoid it. > > Is there a way to know how much RAM is being used for file cache or to tell free(1) will tell you (look at "cached" column). Note that programs you "load" from disk are also actually executed directly from that page cache (without making any separate copy). Also, the writes (unless being done with O_DIRECT or such) will go to that same cache before they're flushed to disk (which is usually what you want, as subsequent reads can then be satisfied from cache, and the application issuing writes gets control much sooner than if it would wait for writes to complete to disk). > it to use more? If the server has 128GB of RAM and typically uses half of > that for it's actual work will it use the rest as file cache? Likewise is Yes, it will use (almost) ALL otherwise unused RAM (the "free" column in free(1) means "unused" or "wasted" if you like) for cache. The "almost" is because there is some very small amount reserved by /proc/sys/vm/min_free_kbytes (but you don't want to touch it, it is too small to give you any benefit, and your kernel might die if you set it too low). > there a way to track/test if file stats are being pushed out of cache a lot? Uh, dunno for something elegant. You can track /proc/meminfo and /proc/slabinfo (for example by free(1) and slabtop(1) or manually of course) and look how they change. Note that there are other users of memory (see "slabtop -s c"). For example, especially if you have lots of small files and/or directories, then dentry, inode and related fs cache structures can eat significant amounts of RAM. You can tune priority of expiration of those with /proc/sys/vm/vfs_cache_pressure (you might also want to look in your kernel docs for rest of /proc/sys/vm) There are also vmstat(8), iostat(1) and blktrace(8) which might help you track actual I/O, and you can compare that with actual data read from your program logs for example to see how well the cache performs. -- Opinions above are GNU-copylefted. From adilger at dilger.ca Fri Mar 26 22:39:24 2010 From: adilger at dilger.ca (Andreas Dilger) Date: Fri, 26 Mar 2010 22:39:24 -0000 Subject: Ext4 and large (>8TB) files In-Reply-To: <941b09771003261505h66a21030i17bcad852373542e@mail.gmail.com> References: <941b09771003261152p6408b4cfle907389f81270f15@mail.gmail.com> <4BAD0808.1040407@redhat.com> <941b09771003261350l535f35der63e1ecfe860b747b@mail.gmail.com> <4BAD22AB.8050105@redhat.com> <941b09771003261505h66a21030i17bcad852373542e@mail.gmail.com> Message-ID: <5FF9A9E7-B005-4AD7-9923-3493B126EE53@dilger.ca> On 2010-03-26, at 16:05, Arun Nair wrote: > So I'm guessing switching the system to 64-bit would fix this for > us. How about increasing the block size from the current 4k? Would > that be an option too? Not in the near future, unless you are running on PPC/ARM/SPARC that can also handle large pages. > On Fri, Mar 26, 2010 at 2:10 PM, Eric Sandeen > wrote: > On 03/26/2010 03:50 PM, Arun Nair wrote: > > Eric, > > > > Thanks for the quick reply... see my responses inline... > > > > On Fri, Mar 26, 2010 at 12:16 PM, Eric Sandeen > > wrote: > > > > On 03/26/2010 01:52 PM, Arun Nair wrote: > > > Hi - > > > > > > (I apologize for the ext4 question in an ext3 mailer, but I > couldn't > > > find a user list for ext4.) > > > > linux-ext4 at vger.kernel.org ext4 at vger.kernel.org> :) > > but that's ok. > > > > > > Saw that but thought it was a dev-only list, sorry. Next time :) > > *shrug* I think user questions are welcome too. At least I don't > mind. > > ... > > > dd fails as mentioned above. xfs_io errors too: > > [root at camanoe]# xfs_io -F -f -c "pwrite 8T 1M" bigfile2 > > pwrite64: File too large > > Oh. Well, then! Must be something else. > > oh, ok: > > sbi->s_bitmap_maxbytes = ext4_max_bitmap_size(); > > static loff_t ext4_max_size(int blkbits, int has_huge_files) > { > loff_t res; > loff_t upper_limit = MAX_LFS_FILESIZE; > > > > /* Sanity check against vm- & vfs- imposed limits */ > if (res > upper_limit) > res = upper_limit; > > return res; > } > > and: > > /* Page cache limit. The filesystems should put that into their > s_maxbytes > limits, otherwise bad things can happen in VM. */ > #if BITS_PER_LONG==32 > #define MAX_LFS_FILESIZE (((u64)PAGE_CACHE_SIZE << > (BITS_PER_LONG-1))-1) > > so it's only giving us 31 bits of pages, not 32. This limits it to 8T > on a 32-bit machine with 4k pages. > > I'm not honestly sure if there is anything in the vm that can't > actually > cope with a 32-bit offset... but until proven otherwise, probably not > going to change this without a lot of testing & inspection. > > -Eric > > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users Cheers, Andreas From adilger at dilger.ca Sat Mar 27 04:04:32 2010 From: adilger at dilger.ca (Andreas Dilger) Date: Sat, 27 Mar 2010 04:04:32 -0000 Subject: Ext4 and large (>8TB) files In-Reply-To: <941b09771003261832g17dfc8f9k575bfbbf370a9cba@mail.gmail.com> References: <941b09771003261152p6408b4cfle907389f81270f15@mail.gmail.com> <4BAD0808.1040407@redhat.com> <941b09771003261350l535f35der63e1ecfe860b747b@mail.gmail.com> <4BAD22AB.8050105@redhat.com> <941b09771003261505h66a21030i17bcad852373542e@mail.gmail.com> <5FF9A9E7-B005-4AD7-9923-3493B126EE53@dilger.ca> <941b09771003261832g17dfc8f9k575bfbbf370a9cba@mail.gmail.com> Message-ID: <468CD0F9-7EA3-4196-B662-050BABDB566E@dilger.ca> On 2010-03-26, at 19:32, Arun Nair wrote: > Ok, so I guess ext4 with 64-bit, or another filesystem for us. > Thanks Andreas & Eric for all the help. No, I don't think another filesystem will help, on a 32-bit host. The limit that ext4 is reporting is the VM page cache limit for a single file, and has nothing to do with ext4 itself. > On Fri, Mar 26, 2010 at 3:38 PM, Andreas Dilger > wrote: > On 2010-03-26, at 16:05, Arun Nair wrote: > So I'm guessing switching the system to 64-bit would fix this for > us. How about increasing the block size from the current 4k? Would > that be an option too? > > Not in the near future, unless you are running on PPC/ARM/SPARC that > can also handle large pages. > > On Fri, Mar 26, 2010 at 2:10 PM, Eric Sandeen > wrote: > On 03/26/2010 03:50 PM, Arun Nair wrote: > > Eric, > > > > Thanks for the quick reply... see my responses inline... > > > > On Fri, Mar 26, 2010 at 12:16 PM, Eric Sandeen > > wrote: > > > > On 03/26/2010 01:52 PM, Arun Nair wrote: > > > Hi - > > > > > > (I apologize for the ext4 question in an ext3 mailer, but I > couldn't > > > find a user list for ext4.) > > > > linux-ext4 at vger.kernel.org ext4 at vger.kernel.org> :) > > but that's ok. > > > > > > Saw that but thought it was a dev-only list, sorry. Next time :) > > *shrug* I think user questions are welcome too. At least I don't > mind. > > ... > > > dd fails as mentioned above. xfs_io errors too: > > [root at camanoe]# xfs_io -F -f -c "pwrite 8T 1M" bigfile2 > > pwrite64: File too large > > Oh. Well, then! Must be something else. > > oh, ok: > > sbi->s_bitmap_maxbytes = ext4_max_bitmap_size(); > > static loff_t ext4_max_size(int blkbits, int has_huge_files) > { > loff_t res; > loff_t upper_limit = MAX_LFS_FILESIZE; > > > > /* Sanity check against vm- & vfs- imposed limits */ > if (res > upper_limit) > res = upper_limit; > > return res; > } > > and: > > /* Page cache limit. The filesystems should put that into their > s_maxbytes > limits, otherwise bad things can happen in VM. */ > #if BITS_PER_LONG==32 > #define MAX_LFS_FILESIZE (((u64)PAGE_CACHE_SIZE << > (BITS_PER_LONG-1))-1) > > so it's only giving us 31 bits of pages, not 32. This limits it to 8T > on a 32-bit machine with 4k pages. > > I'm not honestly sure if there is anything in the vm that can't > actually > cope with a 32-bit offset... but until proven otherwise, probably not > going to change this without a lot of testing & inspection. > > -Eric > > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users > > > Cheers, Andreas > > > > > > Cheers, Andreas