From mirjafarali at gmail.com  Sat Mar  6 16:07:42 2010
From: mirjafarali at gmail.com (MirJafar Ali)
Date: Sat, 6 Mar 2010 10:07:42 -0600
Subject: data block timestamp ?
Message-ID: <d3c668651003060807ifa5fa39u498858c9e180190@mail.gmail.com>

Hello

I am interested in knowing the sequence of datablocks request/serviced by
ext2 filesystem to analyse
the disk IO pattern.

Can someone suggest if there are some software that can do this ?

Thanks

Mir
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20100306/830aec3f/attachment.htm>

From sandeen at redhat.com  Mon Mar  8 17:31:03 2010
From: sandeen at redhat.com (Eric Sandeen)
Date: Mon, 08 Mar 2010 11:31:03 -0600
Subject: data block timestamp ?
In-Reply-To: <d3c668651003060807ifa5fa39u498858c9e180190@mail.gmail.com>
References: <d3c668651003060807ifa5fa39u498858c9e180190@mail.gmail.com>
Message-ID: <4B953457.1050904@redhat.com>

MirJafar Ali wrote:
> Hello
> 
> I am interested in knowing the sequence of datablocks request/serviced
> by ext2 filesystem to analyse
> the disk IO pattern.
> 
> Can someone suggest if there are some software that can do this ?

Depending on what you want, blktrace may be helpful, if you want to know
when IO happens and to which blocks.

-Eric

> Thanks
> 
> Mir


From mjtrac at gmail.com  Tue Mar  9 01:23:12 2010
From: mjtrac at gmail.com (Mitch Trachtenberg)
Date: Mon, 8 Mar 2010 17:23:12 -0800
Subject: problems with large directories?
Message-ID: <6c52fee1003081723l59fdf52fr1b6a508f06d43c4e@mail.gmail.com>

Hi,

I have an application that deals with 100,000 to 1,000,000 image files.

I initially structured it to use multiple directories, so that file 123456
would be stored in /12/34/123456.  I'm now wondering if that's pointless, as
it would simplify things to simply store the file in /123456.

Can anyone indicate whether I'm gaining anything by using smaller
directories in ext3/ext4?  Thanks.

Mitch
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20100308/8b6625ca/attachment.htm>

From rwheeler at redhat.com  Tue Mar  9 03:14:03 2010
From: rwheeler at redhat.com (Ric Wheeler)
Date: Mon, 08 Mar 2010 22:14:03 -0500
Subject: problems with large directories?
In-Reply-To: <6c52fee1003081723l59fdf52fr1b6a508f06d43c4e@mail.gmail.com>
References: <6c52fee1003081723l59fdf52fr1b6a508f06d43c4e@mail.gmail.com>
Message-ID: <4B95BCFB.9010202@redhat.com>

On 03/08/2010 08:23 PM, Mitch Trachtenberg wrote:
> Hi,
>
> I have an application that deals with 100,000 to 1,000,000 image files.
>
> I initially structured it to use multiple directories, so that file
> 123456 would be stored in /12/34/123456.  I'm now wondering if that's
> pointless, as it would simplify things to simply store the file in /123456.
>
> Can anyone indicate whether I'm gaining anything by using smaller
> directories in ext3/ext4?  Thanks.
>
> Mitch
>

I think that breaking up your files into subdirectories makes it easier to 
navigate the tree and find files from a human point of view. Even better if the 
bytes reflect something like year/month/day/hour/min (assuming your pathname has 
a date based guid or similar encoding).

You can have a million files in one large directory, but be careful to iterate 
and copy them in a sorted order (sorted by inode) to avoid nasty performance 
issues that are side effects of the way we hash file names in ext3/4.

Good luck!

Ric


From sandeen at redhat.com  Tue Mar  9 03:55:12 2010
From: sandeen at redhat.com (Eric Sandeen)
Date: Mon, 08 Mar 2010 21:55:12 -0600
Subject: problems with large directories?
In-Reply-To: <6c52fee1003081723l59fdf52fr1b6a508f06d43c4e@mail.gmail.com>
References: <6c52fee1003081723l59fdf52fr1b6a508f06d43c4e@mail.gmail.com>
Message-ID: <4B95C6A0.8090502@redhat.com>

Mitch Trachtenberg wrote:
> Hi,
> 
> I have an application that deals with 100,000 to 1,000,000 image files.
> 
> I initially structured it to use multiple directories, so that file
> 123456 would be stored in /12/34/123456.  I'm now wondering if that's
> pointless, as it would simplify things to simply store the file in /123456. 
> 
> Can anyone indicate whether I'm gaining anything by using smaller
> directories in ext3/ext4?  Thanks.
> 
> Mitch 
> 

If you have one file per dir, that's a lot of dirs, and the time to search for
new dir inode locations can get rather expensive as the fs fills, in my experience.

You may also want to toy with setting the "topdir" flag on a dir; new directories
-under- that topdir get spread around the block groups.  New dirs under a non-topdir
tend to stay closer to the parent.

Finally, remember that ext2/3 has a limit of 32000 or so files per dir.
ext4 lifts this restriction.

-Eric


From alex at alex.org.uk  Tue Mar  9 07:11:32 2010
From: alex at alex.org.uk (Alex Bligh)
Date: Tue, 09 Mar 2010 07:11:32 +0000
Subject: problems with large directories?
In-Reply-To: <4B95C6A0.8090502@redhat.com>
References: <6c52fee1003081723l59fdf52fr1b6a508f06d43c4e@mail.gmail.com>
	<4B95C6A0.8090502@redhat.com>
Message-ID: <4CD1DE054DB6ECC9FD485DD6@nimrod.local>


--On 8 March 2010 21:55:12 -0600 Eric Sandeen <sandeen at redhat.com> wrote:

> Finally, remember that ext2/3 has a limit of 32000 or so files per dir.

My IMAP spool suggests this is false w.r.t. ext3

-- 
Alex Bligh


From lists at nerdbynature.de  Tue Mar  9 09:14:40 2010
From: lists at nerdbynature.de (Christian Kujau)
Date: Tue, 9 Mar 2010 01:14:40 -0800 (PST)
Subject: problems with large directories?
In-Reply-To: <4CD1DE054DB6ECC9FD485DD6@nimrod.local>
References: <6c52fee1003081723l59fdf52fr1b6a508f06d43c4e@mail.gmail.com>
	<4B95C6A0.8090502@redhat.com>
	<4CD1DE054DB6ECC9FD485DD6@nimrod.local>
Message-ID: <alpine.DEB.2.01.1003090112020.3144@bogon.housecafe.de>

On Tue, 9 Mar 2010 at 07:11, Alex Bligh wrote:
> > Finally, remember that ext2/3 has a limit of 32000 or so files per dir.
> My IMAP spool suggests this is false w.r.t. ext3

Did you use any special options when creating this ext3 filesystem? I've 
just tried but was only able to create 31998 directories in one directory.

Christian.
-- 
BOFH excuse #443:

Zombie processes detected, machine is haunted.


From bruno at wolff.to  Tue Mar  9 13:16:46 2010
From: bruno at wolff.to (Bruno Wolff III)
Date: Tue, 9 Mar 2010 07:16:46 -0600
Subject: problems with large directories?
In-Reply-To: <alpine.DEB.2.01.1003090112020.3144@bogon.housecafe.de>
References: <6c52fee1003081723l59fdf52fr1b6a508f06d43c4e@mail.gmail.com>
	<4B95C6A0.8090502@redhat.com>
	<4CD1DE054DB6ECC9FD485DD6@nimrod.local>
	<alpine.DEB.2.01.1003090112020.3144@bogon.housecafe.de>
Message-ID: <20100309131646.GA17912@wolff.to>

On Tue, Mar 09, 2010 at 01:14:40 -0800,
  Christian Kujau <lists at nerdbynature.de> wrote:
> On Tue, 9 Mar 2010 at 07:11, Alex Bligh wrote:
> > > Finally, remember that ext2/3 has a limit of 32000 or so files per dir.
> > My IMAP spool suggests this is false w.r.t. ext3
> 
> Did you use any special options when creating this ext3 filesystem? I've 
> just tried but was only able to create 31998 directories in one directory.

You can create a lot of files (though things work slowly). I think there
was a typo above and that it should have said dirs per dir, not files per dir.


From alex at alex.org.uk  Tue Mar  9 13:17:59 2010
From: alex at alex.org.uk (Alex Bligh)
Date: Tue, 09 Mar 2010 13:17:59 +0000
Subject: problems with large directories?
In-Reply-To: <alpine.DEB.2.01.1003090112020.3144@bogon.housecafe.de>
References: <6c52fee1003081723l59fdf52fr1b6a508f06d43c4e@mail.gmail.com>
	<4B95C6A0.8090502@redhat.com>	<4CD1DE054DB6ECC9FD485DD6@nimrod.local>
	<alpine.DEB.2.01.1003090112020.3144@bogon.housecafe.de>
Message-ID: <74F787A596EC01458B9FB816@host122.msm.che.vodafone>


--On 9 March 2010 01:14:40 -0800 Christian Kujau <lists at nerdbynature.de> 
wrote:

>> > Finally, remember that ext2/3 has a limit of 32000 or so files per dir.
>> My IMAP spool suggests this is false w.r.t. ext3
>
> Did you use any special options when creating this ext3 filesystem? I've
> just tried but was only able to create 31998 directories in one directory.

As Ted pointed out off list, the limit is the number of subdirectories
in a directory, not the number of files in a directory.

-- 
Alex Bligh


From criley at erad.com  Tue Mar  9 14:36:42 2010
From: criley at erad.com (Charles Riley)
Date: Tue, 9 Mar 2010 09:36:42 -0500 (EST)
Subject: Fwd: problems with large directories?
In-Reply-To: <1741306.16141268145265613.JavaMail.root@boardwalk2.erad.com>
Message-ID: <800886.16211268145402647.JavaMail.root@boardwalk2.erad.com>

Sorry, I meant to send this to the list, not just Ric.


----- Forwarded Message -----
From: "Charles Riley" <criley at erad.com>
To: "Ric Wheeler" <rwheeler at redhat.com>
Sent: Tuesday, March 9, 2010 9:34:25 AM GMT -05:00 US/Canada Eastern
Subject: Re: problems with large directories?


----- "Ric Wheeler" <rwheeler at redhat.com> wrote:

> On 03/08/2010 08:23 PM, Mitch Trachtenberg wrote:
> > Hi,
> >
> > I have an application that deals with 100,000 to 1,000,000 image
> files.
> >
> > I initially structured it to use multiple directories, so that file
> > 123456 would be stored in /12/34/123456.  I'm now wondering if
> that's
> > pointless, as it would simplify things to simply store the file in
> /123456.
> >
> > Can anyone indicate whether I'm gaining anything by using smaller
> > directories in ext3/ext4?  Thanks.
> >
> > Mitch
> >
> 
> I think that breaking up your files into subdirectories makes it
> easier to 
> navigate the tree and find files from a human point of view. Even
> better if the 
> bytes reflect something like year/month/day/hour/min (assuming your
> pathname has 
> a date based guid or similar encoding).
> 
> You can have a million files in one large directory, but be careful to
> iterate 
> and copy them in a sorted order (sorted by inode) to avoid nasty
> performance 
> issues that are side effects of the way we hash file names in ext3/4.
> 
> Good luck!
> 
> Ric
> 

Hi Ric,

Can you elaborate on the performance issues you mention above?

We use rhel4/ext3 on our pacs (medical imaging) servers.
We ran into the 32k limit a couple of years back when our first customer hit the 31,999th study, at which point we implemented a directory hashing algorithm.  Now we store images for a given patient's study in a path something like:
aa/ab/ac/1.2.3/

where 1.2.3 is the dicom study instance uid (a wwuid for a medical study) 
and aa/ab/ac/ is the directory hash we derived from that study instance uid.

The above is a simplified example for illustration purposes only, 1.2.3 does not really hash to aa/ab/ac/.
Within aa/ab/ac/1.2.3/ there can be anywhere from three to a couple of thousand DICOM object files.
Images are initially created in a non-hashed temporary directory and then copied to their permanent home in e.g. aa/ab/ac/1.2.3/

In this context, would we gain filesystem performance by sorting by inode before copying?
Do the performance issues you refer to only apply to the copy process itself or do they contribute to long term filesystem performance?

Thanks for any insight you can provide,

Charles


From sandeen at redhat.com  Tue Mar  9 16:32:19 2010
From: sandeen at redhat.com (Eric Sandeen)
Date: Tue, 09 Mar 2010 10:32:19 -0600
Subject: problems with large directories?
In-Reply-To: <74F787A596EC01458B9FB816@host122.msm.che.vodafone>
References: <6c52fee1003081723l59fdf52fr1b6a508f06d43c4e@mail.gmail.com>	<4B95C6A0.8090502@redhat.com>	<4CD1DE054DB6ECC9FD485DD6@nimrod.local>	<alpine.DEB.2.01.1003090112020.3144@bogon.housecafe.de>
	<74F787A596EC01458B9FB816@host122.msm.che.vodafone>
Message-ID: <4B967813.8010601@redhat.com>

Alex Bligh wrote:
> 
> --On 9 March 2010 01:14:40 -0800 Christian Kujau <lists at nerdbynature.de> 
> wrote:
> 
>>>> Finally, remember that ext2/3 has a limit of 32000 or so files per dir.
>>> My IMAP spool suggests this is false w.r.t. ext3
>> Did you use any special options when creating this ext3 filesystem? I've
>> just tried but was only able to create 31998 directories in one directory.
> 
> As Ted pointed out off list, the limit is the number of subdirectories
> in a directory, not the number of files in a directory.

Argh you are right, sorry, brain burp.  :)

-Eric


From kyle at kbrandt.com  Tue Mar  9 17:10:16 2010
From: kyle at kbrandt.com (Kyle Brandt)
Date: Tue, 9 Mar 2010 12:10:16 -0500
Subject: fstab Pass Column and forced disk checks
Message-ID: <9ee385321003090910r6fb0d682p6f6edc18f32ed5b8@mail.gmail.com>

If I have the 6th column in fstab (the pass column) set to 0, does that mean
disk checks will never be forced at boot regardless of anything like File
System State, Mount Count, and Check Interval on the file system itself, or
are there exceptions to this?

I know `man fstab` says:
If the sixth field is not present or zero, a value of zero is returned and
fsck will assume that the filesystem does not need to be checked.

But I wasn't sure if the fsck might be triggered in other ways during boot.

Thank you,
Kyle Brandt
http://www.kbrandt.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20100309/d1a49630/attachment.htm>

From criley at erad.com  Tue Mar  9 17:53:50 2010
From: criley at erad.com (Charles Riley)
Date: Tue, 9 Mar 2010 12:53:50 -0500 (EST)
Subject: fstab Pass Column and forced disk checks
In-Reply-To: <24688412.17501268157223648.JavaMail.root@boardwalk2.erad.com>
Message-ID: <23454625.17521268157230637.JavaMail.root@boardwalk2.erad.com>


If the pass column is 0, no automatic check is done.
It's been my experience that setting it that way is a bad idea though, unless you plan on periodic manual fscks.

----- "Kyle Brandt" <kyle at kbrandt.com> wrote:

> If I have the 6th column in fstab (the pass column) set to 0, does
> that mean disk checks will never be forced at boot regardless of
> anything like File System State, Mount Count, and Check Interval on
> the file system itself, or are there exceptions to this?
> 
> I know `man fstab` says:
> If the sixth field is not present or zero, a value of zero is returned
> and fsck will assume that the filesystem does not need to be checked.
> 
> But I wasn't sure if the fsck might be triggered in other ways during
> boot.
> 
> Thank you,
> Kyle Brandt
> http://www.kbrandt.com
> 
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users


From adilger at sun.com  Tue Mar  9 20:54:18 2010
From: adilger at sun.com (Andreas Dilger)
Date: Tue, 09 Mar 2010 13:54:18 -0700
Subject: fstab Pass Column and forced disk checks
In-Reply-To: <9ee385321003090910r6fb0d682p6f6edc18f32ed5b8@mail.gmail.com>
References: <9ee385321003090910r6fb0d682p6f6edc18f32ed5b8@mail.gmail.com>
Message-ID: <A2432318-7AAA-4683-B759-D191643068A9@sun.com>

On 2010-03-09, at 10:10, Kyle Brandt wrote:
> If I have the 6th column in fstab (the pass column) set to 0, does  
> that mean disk checks will never be forced at boot regardless of  
> anything like File System State, Mount Count, and Check Interval on  
> the file system itself, or are there exceptions to this?

No, there are many filesystems which don't have/allow checking so the  
top-level fsck tool needs to honor this.  I would never recommend  
disabling e2fsck on a system, unless you are running in an HA  
environment where it is not safe to do automated checks at startup  
time.  I also do not recommend that people disable the periodic e2fsck  
checks, because people forget to check their filesystems, and the  
kernel can sometimes spread corruption further if it reads garbage  
from the disk.

If you dislike the periodic (time/mount count) checks that e2fsck  
forces at boot, I would suggest using the "lvcheck" script I posted to  
linux-ext4 some months ago (assuming you are using LVM, which most  
people are these days), and will attach here again. That allows you to  
periodically check the filesystem in the background to detect  
corruptions on disk, without any concern that the next reboot will  
take a long time.

It would be great to get these included as part of the lvm2 package,  
and have lvcheck installed in /etc/cron.weekly to automatically check  
all the LVs configured on the system, and solve the "we don't like  
periodic checks at boot" problem in a way that is still robust to the  
errors that will undoubtably appear on disk at one point or another.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lvcheck
Type: application/octet-stream
Size: 10785 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20100309/38db7958/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lvcheck.conf
Type: application/octet-stream
Size: 1242 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20100309/38db7958/attachment-0001.obj>

From rwheeler at redhat.com  Wed Mar 10 01:51:20 2010
From: rwheeler at redhat.com (Ric Wheeler)
Date: Tue, 09 Mar 2010 20:51:20 -0500
Subject: Fwd: problems with large directories?
In-Reply-To: <800886.16211268145402647.JavaMail.root@boardwalk2.erad.com>
References: <800886.16211268145402647.JavaMail.root@boardwalk2.erad.com>
Message-ID: <4B96FB18.20300@redhat.com>

On 03/09/2010 09:36 AM, Charles Riley wrote:
> Sorry, I meant to send this to the list, not just Ric.
>
>
> ----- Forwarded Message -----
> From: "Charles Riley"<criley at erad.com>
> To: "Ric Wheeler"<rwheeler at redhat.com>
> Sent: Tuesday, March 9, 2010 9:34:25 AM GMT -05:00 US/Canada Eastern
> Subject: Re: problems with large directories?
>
>
>
>
> ----- "Ric Wheeler"<rwheeler at redhat.com>  wrote:
>
>> On 03/08/2010 08:23 PM, Mitch Trachtenberg wrote:
>>> Hi,
>>>
>>> I have an application that deals with 100,000 to 1,000,000 image
>> files.
>>>
>>> I initially structured it to use multiple directories, so that file
>>> 123456 would be stored in /12/34/123456.  I'm now wondering if
>> that's
>>> pointless, as it would simplify things to simply store the file in
>> /123456.
>>>
>>> Can anyone indicate whether I'm gaining anything by using smaller
>>> directories in ext3/ext4?  Thanks.
>>>
>>> Mitch
>>>
>>
>> I think that breaking up your files into subdirectories makes it
>> easier to
>> navigate the tree and find files from a human point of view. Even
>> better if the
>> bytes reflect something like year/month/day/hour/min (assuming your
>> pathname has
>> a date based guid or similar encoding).
>>
>> You can have a million files in one large directory, but be careful to
>> iterate
>> and copy them in a sorted order (sorted by inode) to avoid nasty
>> performance
>> issues that are side effects of the way we hash file names in ext3/4.
>>
>> Good luck!
>>
>> Ric
>>
>
> Hi Ric,
>
> Can you elaborate on the performance issues you mention above?
>
> We use rhel4/ext3 on our pacs (medical imaging) servers.
> We ran into the 32k limit a couple of years back when our first customer hit the 31,999th study, at which point we implemented a directory hashing algorithm.  Now we store images for a given patient's study in a path something like:
> aa/ab/ac/1.2.3/
>
> where 1.2.3 is the dicom study instance uid (a wwuid for a medical study)
> and aa/ab/ac/ is the directory hash we derived from that study instance uid.
>
> The above is a simplified example for illustration purposes only, 1.2.3 does not really hash to aa/ab/ac/.
> Within aa/ab/ac/1.2.3/ there can be anywhere from three to a couple of thousand DICOM object files.
> Images are initially created in a non-hashed temporary directory and then copied to their permanent home in e.g. aa/ab/ac/1.2.3/
>
> In this context, would we gain filesystem performance by sorting by inode before copying?
> Do the performance issues you refer to only apply to the copy process itself or do they contribute to long term filesystem performance?
>
> Thanks for any insight you can provide,
>
> Charles
>


Hi Charles,

The big issue with touching a lot of files (reading, stating, unlinking them) in 
ext3/4 is that readdir gives us back a list in effectively random order. This 
makes the accesses very seeky.

Not an issue with a handful of files (say a couple of hundred), but when you get 
to thousands (or millions) of files, performance really tanks.

To avoid that, you can sort the list returned by readdir() into ascending order 
by inode in reasonably large batches and get your performance up.

Several core tools have been looking at doing this automatically, but it is 
important for any home grown applications as well.

In your scenario with the directory hierarchy, I suspect that you won't hit 
this. If you had one very large directory, you certainly would.

Best regards,

Ric


From kyle at kbrandt.com  Wed Mar 10 12:43:54 2010
From: kyle at kbrandt.com (Kyle Brandt)
Date: Wed, 10 Mar 2010 07:43:54 -0500
Subject: fstab Pass Column and forced disk checks
In-Reply-To: <A2432318-7AAA-4683-B759-D191643068A9@sun.com>
References: <9ee385321003090910r6fb0d682p6f6edc18f32ed5b8@mail.gmail.com>
	<A2432318-7AAA-4683-B759-D191643068A9@sun.com>
Message-ID: <9ee385321003100443vb02284ay5fcba96363e31aec@mail.gmail.com>

Thank you everyone for your responses.  I agree with Andreas about not
disabling the checks in general, but in this case I don't have the final
word.  I will look into the lvm script, is that limited to ext4 or does it
work with ext3 as well?

I cross posted this question at
http://serverfault.com/questions/120804/pass-column-of-fstab/120815#120815and
someone noticed that there is one exception (not a fstab exception
though) on some distributions (RHEL5).  That is if /forcefsck file system
exists the check will still happen because of /etc/rc.d/rc.sysinit

if [ -f /forcefsck ] || strstr "$cmdline" forcefsck ; then
        fsckoptions="-f $fsckoptions"

Thanks!
Kyle

On 3/9/10, Andreas Dilger <adilger at sun.com> wrote:
>
> On 2010-03-09, at 10:10, Kyle Brandt wrote:
>
>> If I have the 6th column in fstab (the pass column) set to 0, does that
>> mean disk checks will never be forced at boot regardless of anything like
>> File System State, Mount Count, and Check Interval on the file system
>> itself, or are there exceptions to this?
>>
>
> No, there are many filesystems which don't have/allow checking so the
> top-level fsck tool needs to honor this.  I would never recommend disabling
> e2fsck on a system, unless you are running in an HA environment where it is
> not safe to do automated checks at startup time.  I also do not recommend
> that people disable the periodic e2fsck checks, because people forget to
> check their filesystems, and the kernel can sometimes spread corruption
> further if it reads garbage from the disk.
>
> If you dislike the periodic (time/mount count) checks that e2fsck forces at
> boot, I would suggest using the "lvcheck" script I posted to linux-ext4 some
> months ago (assuming you are using LVM, which most people are these days),
> and will attach here again. That allows you to periodically check the
> filesystem in the background to detect corruptions on disk, without any
> concern that the next reboot will take a long time.
>
> It would be great to get these included as part of the lvm2 package, and
> have lvcheck installed in /etc/cron.weekly to automatically check all the
> LVs configured on the system, and solve the "we don't like periodic checks
> at boot" problem in a way that is still robust to the errors that will
> undoubtably appear on disk at one point or another.
>
> Cheers, Andreas
> --
> Andreas Dilger
> Sr. Staff Engineer, Lustre Group
> Sun Microsystems of Canada, Inc.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20100310/d1ad4a3f/attachment.htm>

From Sean.D.McCauliff at nasa.gov  Wed Mar 10 19:23:25 2010
From: Sean.D.McCauliff at nasa.gov (Sean McCauliff)
Date: Wed, 10 Mar 2010 11:23:25 -0800
Subject: Finding the holes in sparse files.
Message-ID: <4B97F1AD.2060804@nasa.gov>

Is there a way to find the holes in sparse files, other than assuming 
contiguous blocks of zeroes are holes?

Thanks,
Sean


From sandeen at redhat.com  Wed Mar 10 19:43:09 2010
From: sandeen at redhat.com (Eric Sandeen)
Date: Wed, 10 Mar 2010 13:43:09 -0600
Subject: Finding the holes in sparse files.
In-Reply-To: <4B97F1AD.2060804@nasa.gov>
References: <4B97F1AD.2060804@nasa.gov>
Message-ID: <4B97F64D.3000202@redhat.com>

Sean McCauliff wrote:
> Is there a way to find the holes in sparse files, other than assuming
> contiguous blocks of zeroes are holes?

yes, programatically you can use a couple ioctls:
fibmap (block-at-a-time) or fiemap in newer kernels.

If you want a commandline, try filefrag -v.

For ioctl usage examples, take a look at how filefrag is implemented.

# dd if=/dev/zero of=testfile bs=4k count=1; dd if=/dev/zero of=testfile conv=notrunc bs=4k seek=4 count=1
# sync
# filefrag -v testfile
Filesystem type is: ef53
File size of testfile is 20480 (5 blocks, blocksize 4096)
 ext logical physical expected length flags
   0       0  1829913               1 
   1       4  1802777  1829913      1 eof
testfile: 2 extents found

the logical+length gap shows you that there was a hole in there

Andreas has patches to make it still clearer in the table output.

-Eric

> Thanks,
> Sean
> 
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users


From cax0cn at gmail.com  Thu Mar 11 02:47:23 2010
From: cax0cn at gmail.com (Joseph Chen)
Date: Thu, 11 Mar 2010 10:47:23 +0800
Subject: Finding the holes in sparse files.
In-Reply-To: <4B97F64D.3000202@redhat.com>
References: <4B97F1AD.2060804@nasa.gov> <4B97F64D.3000202@redhat.com>
Message-ID: <8d423b321003101847t49566946lc87110e82bb5e81f@mail.gmail.com>

Check my post here How to Check Sparse Files with
Perl<http://planet.admon.org/howto/how-to-check-sparse-files-with-perl/>

For any issues plesae let me know :)

J

On Thu, Mar 11, 2010 at 3:43 AM, Eric Sandeen <sandeen at redhat.com> wrote:

> Sean McCauliff wrote:
> > Is there a way to find the holes in sparse files, other than assuming
> > contiguous blocks of zeroes are holes?
>
> yes, programatically you can use a couple ioctls:
> fibmap (block-at-a-time) or fiemap in newer kernels.
>
> If you want a commandline, try filefrag -v.
>
> For ioctl usage examples, take a look at how filefrag is implemented.
>
> # dd if=/dev/zero of=testfile bs=4k count=1; dd if=/dev/zero of=testfile
> conv=notrunc bs=4k seek=4 count=1
> # sync
> # filefrag -v testfile
> Filesystem type is: ef53
> File size of testfile is 20480 (5 blocks, blocksize 4096)
>  ext logical physical expected length flags
>   0       0  1829913               1
>   1       4  1802777  1829913      1 eof
> testfile: 2 extents found
>
> the logical+length gap shows you that there was a hole in there
>
> Andreas has patches to make it still clearer in the table output.
>
> -Eric
>
> > Thanks,
> > Sean
> >
> > _______________________________________________
> > Ext3-users mailing list
> > Ext3-users at redhat.com
> > https://www.redhat.com/mailman/listinfo/ext3-users
>
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users
>


-- 
Sponser and operater: Linux monitoring solution: http://www.admon.org
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20100311/78564037/attachment.htm>

From mirjafarali at gmail.com  Tue Mar 16 20:43:43 2010
From: mirjafarali at gmail.com (MirJafar Ali)
Date: Tue, 16 Mar 2010 15:43:43 -0500
Subject: Ext4 File System: newbee question
Message-ID: <d3c668651003161343q4e4983ean1b0507fb150959b5@mail.gmail.com>

Hello,

I have installed Ubuntu 9.10 and it has ext4 filesystem. I have one very
elementary question. When
I say my filesystem is "ext4", which directories are part of it. I mean from
the root I can see some
directories such as /proc, /tmp, /dev etc. Are they store on the disk which
have formatted with
ext4, of certain files resides somewhere else.

I am only sure only about /home directory because I keep my disk mobile and
data goes with me all
the time.


Please execuse me for such a simple question.

Mir
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20100316/00dafc5e/attachment.htm>

From mirjafarali at gmail.com  Tue Mar 16 20:43:43 2010
From: mirjafarali at gmail.com (MirJafar Ali)
Date: Tue, 16 Mar 2010 15:43:43 -0500
Subject: Ext4 File System: newbee question
Message-ID: <d3c668651003161343q4e4983ean1b0507fb150959b5@mail.gmail.com>

Hello,

I have installed Ubuntu 9.10 and it has ext4 filesystem. I have one very
elementary question. When
I say my filesystem is "ext4", which directories are part of it. I mean from
the root I can see some
directories such as /proc, /tmp, /dev etc. Are they store on the disk which
have formatted with
ext4, of certain files resides somewhere else.

I am only sure only about /home directory because I keep my disk mobile and
data goes with me all
the time.


Please execuse me for such a simple question.

Mir
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20100316/00dafc5e/attachment-0001.htm>

From adilger at sun.com  Wed Mar 17 18:45:45 2010
From: adilger at sun.com (Andreas Dilger)
Date: Wed, 17 Mar 2010 12:45:45 -0600
Subject: Ext4 File System: newbee question
In-Reply-To: <d3c668651003161343q4e4983ean1b0507fb150959b5@mail.gmail.com>
References: <d3c668651003161343q4e4983ean1b0507fb150959b5@mail.gmail.com>
Message-ID: <2C26629B-2AB8-4955-823F-F6D46A25C7BA@sun.com>

On 2010-03-16, at 14:43, MirJafar Ali wrote:
> I have installed Ubuntu 9.10 and it has ext4 filesystem. I have one  
> very elementary question. When I say my filesystem is "ext4", which  
> directories are part of it. I mean from the root I can see some  
> directories such as /proc, /tmp, /dev etc. Are they store on the  
> disk which have formatted with ext4, of certain files resides  
> somewhere else.
>
> I am only sure only about /home directory because I keep my disk  
> mobile and data goes with me all the time.

Please run "mount" and "df", which show the mountpoints for each  
filesystem.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


From pg_ext3 at ext3.for.sabi.co.UK  Wed Mar 17 20:56:15 2010
From: pg_ext3 at ext3.for.sabi.co.UK (Peter Grandi)
Date: Wed, 17 Mar 2010 20:56:15 +0000
Subject: Ext4 File System: newbee question
In-Reply-To: <d3c668651003161343q4e4983ean1b0507fb150959b5@mail.gmail.com>
References: <d3c668651003161343q4e4983ean1b0507fb150959b5@mail.gmail.com>
Message-ID: <19361.16879.463876.940171@tree.ty.sabi.co.uk>


> [ ... ] my filesystem is "ext4", which directories are part of
> it.

'ext4' is a file system *type*. You can have many filesystems of
that type, each with its own tree of directories and files etc.

Each filesystem of type 'ext4' will be stored on a particular
storage device or a subsection of one, and will have some kind
of indentifying label.

Usually each filesystem tree will be stored in a partition on
some disk, and will be "mounted" on (that is, its directories
and files will appear under) some directory.

You can see a list of those by reading the file '/proc/mounts';
in a terminal shell the command 'grep ext4 /proc/mounts' will
print a list of all the currently active ("mounted") devices
containing an 'ext4' filesystem.

For a list of the more important ones in a more readable format
run in a terminal shell the command 'df -T -BG -a'.

> I mean from the root I can see some directories such as /proc,
> /tmp, /dev etc.

Those 3 directories are usually the mount points for special
file system types, and almost never 'ext4' type.

> Are they store on the disk which have formatted with ext4, of
> certain files resides somewhere else.

Some filesystems are stored only in memory as they are not
persistent, as they represent temporary entities.

> I am only sure only about /home directory because I keep my
> disk mobile and data goes with me all the time.

Most likely both the devices mounted on the "/" and "/home"
directories contain filesystems of type 'ext4'.

There are several tutorials online and in print that explain
what is a file system type, a filesystem instance of a type,
and the storage ("block device") holding that instance.


From mirjafarali at gmail.com  Thu Mar 18 22:48:45 2010
From: mirjafarali at gmail.com (MirJafar Ali)
Date: Thu, 18 Mar 2010 17:48:45 -0500
Subject: DataBlock information Help
Message-ID: <d3c668651003181548k25c68d27he59d36408d4d4f0c@mail.gmail.com>

Hello,

I am using e2fsprogs and found it very nice. I want to know datablocks for a
given a given file.
I was going through the document and did lots of google search, but I am not
sure what is the
best way to get this information. Which "e2fsprogs" function can give all
the datablock IDs. There
is one function i.e. ext2fs_block_iterate, but I am not sure how it works.
It wasn't clear from the
document.

Can someone please help without getting angry on this simple question ?

Mir
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20100318/ddf04f28/attachment.htm>

From adilger at sun.com  Thu Mar 18 23:13:35 2010
From: adilger at sun.com (Andreas Dilger)
Date: Thu, 18 Mar 2010 17:13:35 -0600
Subject: DataBlock information Help
In-Reply-To: <d3c668651003181548k25c68d27he59d36408d4d4f0c@mail.gmail.com>
References: <d3c668651003181548k25c68d27he59d36408d4d4f0c@mail.gmail.com>
Message-ID: <7F078568-B44E-4AEE-8DB3-D4BB40C0B5D2@sun.com>

On 2010-03-18, at 16:48, MirJafar Ali wrote:
> I am using e2fsprogs and found it very nice. I want to know  
> datablocks for a given a given file.
> I was going through the document and did lots of google search, but  
> I am not sure what is the best way to get this information. Which  
> "e2fsprogs" function can give all the datablock IDs. There is one  
> function i.e. ext2fs_block_iterate, but I am not sure how it works.  
> It wasn't clear from the
> document.


If you use "dumpe2fs -c -R 'stat /path/to/file' /dev/XXX", where /path/ 
to/file is the filesystem relative pathname, that will dump all of the  
blocks.

On newer kernels you can also use "filefrag -v" to list the blocks,  
though the output format is less than ideal right now.

Programatically, on a newer kernel you can use the fiemap() API to get  
the list of all blocks for any file, regardless of the filesystem type.

Cheers, Andreas
--
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.


From sandeen at redhat.com  Fri Mar 19 02:16:28 2010
From: sandeen at redhat.com (Eric Sandeen)
Date: Thu, 18 Mar 2010 21:16:28 -0500
Subject: DataBlock information Help
In-Reply-To: <d3c668651003181548k25c68d27he59d36408d4d4f0c@mail.gmail.com>
References: <d3c668651003181548k25c68d27he59d36408d4d4f0c@mail.gmail.com>
Message-ID: <4BA2DE7C.4020105@redhat.com>

MirJafar Ali wrote:
> Hello,
> 
> I am using e2fsprogs and found it very nice. I want to know datablocks
> for a given a given file.
> I was going through the document and did lots of google search, but I am
> not sure what is the
> best way to get this information. Which "e2fsprogs" function can give
> all the datablock IDs. There
> is one function i.e. ext2fs_block_iterate, but I am not sure how it
> works. It wasn't clear from the
> document.
> 
> Can someone please help without getting angry on this simple question ?
> 
> Mir
> 

>From the commandline, you can just use filefrag (-v)

If you want to do it programatically, you can look at how filefrag uses
the FIBMAP and/or FIEMAP ioctls.

If you want to do it with the filesystem unmounted, you can look at
how the debugfs "stat" command shows you the blocks.

-Eric


From mirjafarali at gmail.com  Fri Mar 19 02:55:41 2010
From: mirjafarali at gmail.com (MirJafar Ali)
Date: Thu, 18 Mar 2010 21:55:41 -0500
Subject: File Age emulation
Message-ID: <d3c668651003181955k460e76b7ie859e1d9f2951aef@mail.gmail.com>

Hello,

I have a new hard drive and need to do some study on filesystem aging. Is
there any
simulator which can fill the drive with some realistic file system behaviour
( size, age etc) ?

Thanks.

Mir
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20100318/e6b1b862/attachment.htm>

From sandeen at redhat.com  Fri Mar 19 04:39:37 2010
From: sandeen at redhat.com (Eric Sandeen)
Date: Thu, 18 Mar 2010 23:39:37 -0500
Subject: File Age emulation
In-Reply-To: <d3c668651003181955k460e76b7ie859e1d9f2951aef@mail.gmail.com>
References: <d3c668651003181955k460e76b7ie859e1d9f2951aef@mail.gmail.com>
Message-ID: <4BA30009.1070503@redhat.com>

MirJafar Ali wrote:
> Hello,
> 
> I have a new hard drive and need to do some study on filesystem aging.
> Is there any
> simulator which can fill the drive with some realistic file system
> behaviour ( size, age etc) ?
> 
> Thanks.
> 
> Mir
> 

Are these still questions for a class you are taking?  Ted asked you
that earlier, but I see there was no reply...  You have had
a very interesting collection of questions for the list, and I
wonder what the goal might be for you... rather than asking seemingly
random questions, is there a larger goal you are trying to accomplish
here, or are we possibly just doing your homework?

-Eric


From sandeen at redhat.com  Fri Mar 19 13:57:27 2010
From: sandeen at redhat.com (Eric Sandeen)
Date: Fri, 19 Mar 2010 08:57:27 -0500
Subject: ext2 IF windows Xp Pro with Ubuntu 9.10 64amd
In-Reply-To: <hma64j$7mq$2@dough.gmane.org>
References: <hma64j$7mq$2@dough.gmane.org>
Message-ID: <4BA382C7.9090308@redhat.com>

Chris Taylor wrote:
> Hi To all
> I have just built a new System
> 3.4Gb AMD Athlon 64bit
> 1GB RAM
> 500Gb SATA HDD
> Disk /dev/sda: 500.1 GB, 500107862016 bytes
> 255 heads, 63 sectors/track, 60801 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
> Disk identifier: 0x00e600e6
> 
>    Device Boot      Start         End      Blocks   Id  System
> /dev/sda1   *           1        1912    15358108+   7  HPFS/NTFS
> /dev/sda2            1913        3824    15358140   83  Linux
> /dev/sda3            3825       60801   457667752+   5  Extended
> /dev/sda5           60194       60801     4883760   82  Linux swap / 
> Solaris
> /dev/sda6            3825       60193   452783929+  83  Linux
> 
> Partition table entries are not in disk order.
> On sda1 I have windows Xp Pro sp2
> on sda2 I have Ubuntu 9.10 64bit just upgraded via web 
> On sda6 I have my home partition (according to gparted)
> I have installed EXT2IFS so I can have XP and Ubuntu use the same place 
> for files.
> Every time I try to access F: drive from Windows I get "do you want to 
> format the drive " I'm thinking that I have a Inodes problem, Thinking 
> they are 256 not 128. I have tried to format the drive with Gparted to 
> EXT3 a few times and get the same problem still 
> " Large inodes
> The current version of Ext2 IFS only mounts volumes with an inode size of 
> 128 like old Linux kernels have.

A word of warning, at least one windows driver for extN has been known
in the past to corrupt filesytems.  Since it's not open source, we can't
debug or fix it.  Maybe it's fixed now, but I don't know.

> Some very new Linux distributions create an Ext3 file systems with inodes 
> of 256 bytes. Ext2 IFS 1.11 is not able to access them.
> 
> Currently there is only one workaround: Please back up the files and 
> create the Ext3 file system again. Give the mkfs.ext3 tool the -I 128 
> switch. Finally, restore all files with the backup. "
> 
> If I'm write I need to unmount the /Home partition but I don't know how 
> to do this  :-(
> 
> Please if you would be so kind as to help me with any info 
> Chris

It's not really an ext3-specific question, but you'll need to unbusy the
/home mountpoint to unmount it to reformat it; booting into single-user mode
would allow you to do that.

-Eric


From jcubedla at gmail.com  Sat Mar 20 05:04:55 2010
From: jcubedla at gmail.com (John)
Date: Fri, 19 Mar 2010 22:04:55 -0700
Subject: ext2 IF windows Xp Pro with Ubuntu 9.10 64amd
In-Reply-To: <hma64j$7mq$2@dough.gmane.org>
References: <hma64j$7mq$2@dough.gmane.org>
Message-ID: <be40d24b1003192204k114607a9vf3d278be8f88f815@mail.gmail.com>

Hi Chris,

On Fri, Feb 26, 2010 at 9:13 PM, Chris Taylor <
chris.j.taylor at optusnet.com.au> wrote:

> Some very new Linux distributions create an Ext3 file systems with inodes
> of 256 bytes. Ext2 IFS 1.11 is not able to access them.
>

You may want to look at Ext2 Fsd which doesn't have that limitation.

John
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20100319/8d59a6c8/attachment.htm>

From michaelm at plumbersstock.com  Tue Mar 23 18:16:01 2010
From: michaelm at plumbersstock.com (Michael McGlothlin)
Date: Tue, 23 Mar 2010 12:16:01 -0600
Subject: File caching?
Message-ID: <5790d5b71003231116p6c098aebl66ba79d36ff1ca7e@mail.gmail.com>

I've been asked to cache some high traffic files on one of our server. Is
there an easy way to get ext3/ext4 filesystems to cache several GB of files
in memory at once? I'd like writes to happen normally but reads to happen
from RAM. (We have plenty of RAM so that isn't an issue.)

If that isn't possible I can cache the files myself. Does the filesystem
keep a cache in memory of the file attributes such as modification time? So
if I check for a change will the disk actually have to physically move to
check the mod time?

Thanks,
Michael McGlothlin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20100323/89923d9b/attachment.htm>

From davids at webmaster.com  Tue Mar 23 18:52:53 2010
From: davids at webmaster.com (David Schwartz)
Date: Tue, 23 Mar 2010 11:52:53 -0700
Subject: File caching?
In-Reply-To: <5790d5b71003231116p6c098aebl66ba79d36ff1ca7e@mail.gmail.com>
References: <5790d5b71003231116p6c098aebl66ba79d36ff1ca7e@mail.gmail.com>
Message-ID: <005c01cacaba$0c7dcfa0$25796ee0$@com>


Michael McGlothlin wrote:

> I've been asked to cache some high traffic files on one of our server.
> Is there an easy way to get ext3/ext4 filesystems to cache several GB
> of files in memory at once? I'd like writes to happen normally but reads
> to happen from RAM. (We have plenty of RAM so that isn't an issue.)

> If that isn't possible I can cache the files myself. Does the filesystem
> keep a cache in memory of the file attributes such as modification time?
> So if I check for a change will the disk actually have to physically move
> to check the mod time?

I would first investigate whether your web server has some specific way to
do this. Failing that, I strongly recommend just letting the disk cache do
its job. If they really are frequently-accessed, they will stay in cache if
sufficient RAM is available anyway. I would only suggest going further if
you have specific latency requirements.

If you do, I'd recommend simply using a separate program to map the files
and then lock the pages in memory. The 'memlockd' program can do this. I'm
not sure how well it handles file changes, but it shouldn't be difficult to
modify it to restart if any file changes.

The other possibility is to put the files on a ramdisk. You can use a
scheduled script to update them from an on-disk copy if needed.

Linux has good stat caching, so the need to move the disk to check the
modification time will only occur if that information was pushed out of
cache.

DS


From michaelm at plumbersstock.com  Tue Mar 23 19:16:57 2010
From: michaelm at plumbersstock.com (Michael McGlothlin)
Date: Tue, 23 Mar 2010 13:16:57 -0600
Subject: File caching?
In-Reply-To: <005c01cacaba$0c7dcfa0$25796ee0$@com>
References: <5790d5b71003231116p6c098aebl66ba79d36ff1ca7e@mail.gmail.com>
	<005c01cacaba$0c7dcfa0$25796ee0$@com>
Message-ID: <5790d5b71003231216p6ee172dcx2ae66e2cbd53f1af@mail.gmail.com>

This isn't for a web server although we might apply the same approach to
that if this speeds things up a lot.

I was going to store the cached copy on a RAM disk as many of the files are
larger than the 1MB limit of memlockd and I don't feel like coming up with
my own solution if I can avoid it.

Is there a way to know how much RAM is being used for file cache or to tell
it to use more? If the server has 128GB of RAM and typically uses half of
that for it's actual work will it use the rest as file cache? Likewise is
there a way to track/test if file stats are being pushed out of cache a lot?

We've been considering switching to SSD or RAM drives but it seems they'd
always be slower than system RAM and we haven't found a product that can
affordably store sufficient data. I couldn't find a product that just sits
between the disk and controller, or a controller that does this itself, and
adds a large RAM-based file cache either.

Thanks,
Michael McGlothlin


On Tue, Mar 23, 2010 at 12:52 PM, David Schwartz <davids at webmaster.com>wrote:

>
> Michael McGlothlin wrote:
>
> > I've been asked to cache some high traffic files on one of our server.
> > Is there an easy way to get ext3/ext4 filesystems to cache several GB
> > of files in memory at once? I'd like writes to happen normally but reads
> > to happen from RAM. (We have plenty of RAM so that isn't an issue.)
>
> > If that isn't possible I can cache the files myself. Does the filesystem
> > keep a cache in memory of the file attributes such as modification time?
> > So if I check for a change will the disk actually have to physically move
> > to check the mod time?
>
> I would first investigate whether your web server has some specific way to
> do this. Failing that, I strongly recommend just letting the disk cache do
> its job. If they really are frequently-accessed, they will stay in cache if
> sufficient RAM is available anyway. I would only suggest going further if
> you have specific latency requirements.
>
> If you do, I'd recommend simply using a separate program to map the files
> and then lock the pages in memory. The 'memlockd' program can do this. I'm
> not sure how well it handles file changes, but it shouldn't be difficult to
> modify it to restart if any file changes.
>
> The other possibility is to put the files on a ramdisk. You can use a
> scheduled script to update them from an on-disk copy if needed.
>
> Linux has good stat caching, so the need to move the disk to check the
> modification time will only occur if that information was pushed out of
> cache.
>
> DS
>
>
>
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20100323/6cf98ac6/attachment.htm>

From balu.manyam at gmail.com  Thu Mar 25 14:20:27 2010
From: balu.manyam at gmail.com (Balu manyam)
Date: Thu, 25 Mar 2010 19:50:27 +0530
Subject: ext3 corruption
Message-ID: <995392221003250720j75281ebvdff04cd826c0cea4@mail.gmail.com>

hey ext3 gurus - i am desperately looking for some help on a problem where
my ext3 filesystem got corrupted my beyond repair

the issue i saw the FS size  was much less the logical volume size  - when
we tried to mount it .....so we attempted an fsck - we lost all the data

these are the messages we got before the data corruption happened

the hal.hotplug seems to have triggered something  - the filesystem is on an
LVM2 volume with multipathing managing the hba paths to EMC

does this ring a bell with anyone ?


Feb 24 05:25:27  hal.hotplug[17949]: timout(10000 ms) waiting for
/block/dm-24
Feb 24 05:26:09  kernel: attempt to access beyond end of device
Feb 24 05:26:09  kernel: dm-19: rw=0, want=17334189504, limit=31457280
Feb 24 05:26:09  kernel: attempt to access beyond end of device
Feb 24 05:26:09  kernel: dm-19: rw=0, want=134217688, limit=31457280
Feb 24 05:26:09  kernel: attempt to access beyond end of device
Feb 24 05:26:09  kernel: dm-19: rw=0, want=17586972680, limit=31457280
Feb 24 05:26:09  kernel: attempt to access beyond end of device
Feb 24 05:26:09  kernel: dm-19: rw=0, want=8777633304, limit=31457280
Feb 24 05:26:09  kernel: attempt to access beyond end of device
Feb 24 05:26:09  kernel: dm-19: rw=0, want=10317744712, limit=31457280
Feb 24 05:26:09  kernel: attempt to access beyond end of device
Feb 24 05:26:09  kernel: dm-19: rw=0, want=12067501136, limit=31457280
Feb 24 05:26:09  kernel: attempt to access beyond end of device
Feb 24 05:26:09  kernel: dm-19: rw=0, want=19100288824, limit=31457280
Feb 24 05:26:09  kernel: attempt to access beyond end of device
Feb 24 05:26:09  kernel: dm-19: rw=0, want=18694536512, limit=31457280
Feb 24 05:26:09  kernel: attempt to access beyond end of device
Feb 24 05:26:09  kernel: dm-19: rw=0, want=25796951120, limit=31457280
Feb 24 05:26:09  kernel: attempt to access beyond end of device
Feb 24 05:26:09  kernel: dm-19: rw=0, want=18425691224, limit=31457280
Feb 24 05:26:09  kernel: attempt to access beyond end of device
Feb 24 05:26:09  kernel: dm-19: rw=0, want=15698503632, limit=31457280
Feb 24 05:26:09  kernel: attempt to access beyond end of device
Feb 24 05:26:09  kernel: dm-19: rw=0, want=12067501136, limit=31457280
Feb 24 05:26:09  kernel: attempt to access beyond end of device
Feb 24 05:26:09  kernel: dm-19: rw=0, want=12067501136, limit=31457280
Feb 24 05:26:09  kernel: attempt to access beyond end of device
Feb 24 05:26:09  kernel: dm-19: rw=0, want=12067501136, limit=31457280
Feb 24 05:26:09  kernel: attempt to access beyond end of device
Feb 24 05:26:09  kernel: dm-19: rw=0, want=12067501136, limit=31457280
Feb 24 05:26:09  kernel: attempt to access beyond end of device


here are some errors from fsck in messages file

Feb 24 08:31:02 kernel: EXT3-fs error (device dm-18): ext3_readdir: bad
entry in directory #2: inode out of bounds - offset
=44, inode=59047937, rec_len=16, name_len=8

thanks!!!

Balu
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20100325/cb147238/attachment.htm>

From arun at bvinetworks.com  Fri Mar 26 18:52:05 2010
From: arun at bvinetworks.com (Arun Nair)
Date: Fri, 26 Mar 2010 11:52:05 -0700
Subject: Ext4 and large (>8TB) files
Message-ID: <941b09771003261152p6408b4cfle907389f81270f15@mail.gmail.com>

Hi -

(I apologize for the ext4 question in an ext3 mailer, but I couldn't find a
user list for ext4.)

Per my understanding, ext4 can support file sizes upto 16 TiB if you use 4k
blocks. I have a logical volume which uses ext4 with a 4k block size but I
am unable to create files that are 8TiB (8796093022208 bytes) or larger.

[root at camanoe] ls -l
total 8589935388
-rw-rw---- 1 root root 8796093022207 2010-03-26 11:43 bigfile

[root at camanoe] echo x >> bigfile
-bash: echo: write error: File too large

[root at camanoe] df -h .
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/vg-mysql_vol
                       15T  8.1T  5.6T  60% /mysql

[root at camanoe]# tune2fs -l /dev/vg/mysql_vol | grep "Block size"
Block size: 4096

[root at camanoe]# uname -a
Linux camanoe 2.6.29.4-167.fc11.i686.PAE #1 SMP Wed May 27 17:28:22 EDT 2009
i686 i686 i386 GNU/Linux

I'm probably doing something wrong here, but can't figure it out. Any ideas
guys? Thanks in advance.

Arun
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20100326/b4a9cfc7/attachment.htm>

From sandeen at redhat.com  Fri Mar 26 19:16:24 2010
From: sandeen at redhat.com (Eric Sandeen)
Date: Fri, 26 Mar 2010 14:16:24 -0500
Subject: Ext4 and large (>8TB) files
In-Reply-To: <941b09771003261152p6408b4cfle907389f81270f15@mail.gmail.com>
References: <941b09771003261152p6408b4cfle907389f81270f15@mail.gmail.com>
Message-ID: <4BAD0808.1040407@redhat.com>

On 03/26/2010 01:52 PM, Arun Nair wrote:
> Hi -
> 
> (I apologize for the ext4 question in an ext3 mailer, but I couldn't
> find a user list for ext4.)

linux-ext4 at vger.kernel.org :)  but that's ok.

> Per my understanding, ext4 can support file sizes upto 16 TiB if you use
> 4k blocks. I have a logical volume which uses ext4 with a 4k block size
> but I am unable to create files that are 8TiB (8796093022208 bytes) or
> larger.
> 
> [root at camanoe] ls -l
> total 8589935388
> -rw-rw---- 1 root root 8796093022207 2010-03-26 11:43 bigfile
> 
> [root at camanoe] echo x >> bigfile
> -bash: echo: write error: File too large

Perhaps echo isn't using largefile semantics?  Is this the first
test you did, or is echo the simple testcase, and something else
failed?

It works for me on rawhide x86_64:

create a file with blocks past 8T:
# xfs_io -F -f -c "pwrite 8T 1M" bigfile
wrote 1048576/1048576 bytes at offset 8796093022208
1 MiB, 256 ops; 0.0000 sec (206.313 MiB/sec and 52816.1750 ops/sec)

echo more into it:
# echo x >> bigfile

it really is that big:
# ls -lh bigfile
-rw-------. 1 root root 8.1T Mar 26 14:13 bigfile

I don't have an x86 box to test quickly; try something besides echo,
is what I'd suggest - xfs_io would work, or probably dd (with
conv=notrunc if you want to append)

-Eric


From arun at bvinetworks.com  Fri Mar 26 20:50:55 2010
From: arun at bvinetworks.com (Arun Nair)
Date: Fri, 26 Mar 2010 13:50:55 -0700
Subject: Ext4 and large (>8TB) files
In-Reply-To: <4BAD0808.1040407@redhat.com>
References: <941b09771003261152p6408b4cfle907389f81270f15@mail.gmail.com> 
	<4BAD0808.1040407@redhat.com>
Message-ID: <941b09771003261350l535f35der63e1ecfe860b747b@mail.gmail.com>

Eric,

Thanks for the quick reply... see my responses inline...

On Fri, Mar 26, 2010 at 12:16 PM, Eric Sandeen <sandeen at redhat.com> wrote:

> On 03/26/2010 01:52 PM, Arun Nair wrote:
> > Hi -
> >
> > (I apologize for the ext4 question in an ext3 mailer, but I couldn't
> > find a user list for ext4.)
>
> linux-ext4 at vger.kernel.org :)  but that's ok.
>

Saw that but thought it was a dev-only list, sorry. Next time :)


>
> > Per my understanding, ext4 can support file sizes upto 16 TiB if you use
> > 4k blocks. I have a logical volume which uses ext4 with a 4k block size
> > but I am unable to create files that are 8TiB (8796093022208 bytes) or
> > larger.
> >
> > [root at camanoe] ls -l
> > total 8589935388
> > -rw-rw---- 1 root root 8796093022207 2010-03-26 11:43 bigfile
> >
> > [root at camanoe] echo x >> bigfile
> > -bash: echo: write error: File too large
>
> Perhaps echo isn't using largefile semantics?  Is this the first
> test you did, or is echo the simple testcase, and something else
> failed?
>

It's the simple test case. We found the problem when MySQL failed to expand
its ibdata file beyond 8 TB. I then tried dd as well with notrunc like you
mentioned, same error:

[root at camanoe]# dd oflag=append conv=notrunc if=/dev/zero of=bigfile bs=1
count=1
dd: writing `bigfile': File too large
1+0 records in
0+0 records out
0 bytes (0 B) copied, 0.000234712 s, 0.0 kB/s


> It works for me on rawhide x86_64:
>
> create a file with blocks past 8T:
> # xfs_io -F -f -c "pwrite 8T 1M" bigfile
> wrote 1048576/1048576 bytes at offset 8796093022208
> 1 MiB, 256 ops; 0.0000 sec (206.313 MiB/sec and 52816.1750 ops/sec)
>
> echo more into it:
> # echo x >> bigfile
>
> it really is that big:
> # ls -lh bigfile
> -rw-------. 1 root root 8.1T Mar 26 14:13 bigfile
>
> I don't have an x86 box to test quickly; try something besides echo,
> is what I'd suggest - xfs_io would work, or probably dd (with
> conv=notrunc if you want to append)
>

dd fails as mentioned above. xfs_io errors too:
[root at camanoe]# xfs_io -F -f -c "pwrite 8T 1M" bigfile2
pwrite64: File too large


> -Eric
>
>
BTW, my system is NOT 64-bit but my guess is this doesn't affect max file
size?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20100326/22948c2a/attachment.htm>

From sandeen at redhat.com  Fri Mar 26 21:10:03 2010
From: sandeen at redhat.com (Eric Sandeen)
Date: Fri, 26 Mar 2010 16:10:03 -0500
Subject: Ext4 and large (>8TB) files
In-Reply-To: <941b09771003261350l535f35der63e1ecfe860b747b@mail.gmail.com>
References: <941b09771003261152p6408b4cfle907389f81270f15@mail.gmail.com>
	<4BAD0808.1040407@redhat.com>
	<941b09771003261350l535f35der63e1ecfe860b747b@mail.gmail.com>
Message-ID: <4BAD22AB.8050105@redhat.com>

On 03/26/2010 03:50 PM, Arun Nair wrote:
> Eric,
> 
> Thanks for the quick reply... see my responses inline...
> 
> On Fri, Mar 26, 2010 at 12:16 PM, Eric Sandeen <sandeen at redhat.com
> <mailto:sandeen at redhat.com>> wrote:
> 
>     On 03/26/2010 01:52 PM, Arun Nair wrote:
>     > Hi -
>     >
>     > (I apologize for the ext4 question in an ext3 mailer, but I couldn't
>     > find a user list for ext4.)
> 
>     linux-ext4 at vger.kernel.org <mailto:linux-ext4 at vger.kernel.org> :)
>      but that's ok.
> 
> 
> Saw that but thought it was a dev-only list, sorry. Next time :)

*shrug* I think user questions are welcome too.  At least I don't mind.

...

> dd fails as mentioned above. xfs_io errors too:
> [root at camanoe]# xfs_io -F -f -c "pwrite 8T 1M" bigfile2
> pwrite64: File too large

Oh.  Well, then!  Must be something else.

oh, ok:

sbi->s_bitmap_maxbytes = ext4_max_bitmap_size();

static loff_t ext4_max_size(int blkbits, int has_huge_files)
{
        loff_t res;
        loff_t upper_limit = MAX_LFS_FILESIZE;

<snip>

        /* Sanity check against vm- & vfs- imposed limits */
        if (res > upper_limit)
                res = upper_limit;

        return res;
}

and:

/* Page cache limit. The filesystems should put that into their s_maxbytes
   limits, otherwise bad things can happen in VM. */
#if BITS_PER_LONG==32
#define MAX_LFS_FILESIZE (((u64)PAGE_CACHE_SIZE << (BITS_PER_LONG-1))-1)

so it's only giving us 31 bits of pages, not 32.  This limits it to 8T
on a 32-bit machine with 4k pages.

I'm not honestly sure if there is anything in the vm that can't actually
cope with a 32-bit offset... but until proven otherwise, probably not
going to change this without a lot of testing & inspection.

-Eric


From arun at bvinetworks.com  Fri Mar 26 22:05:52 2010
From: arun at bvinetworks.com (Arun Nair)
Date: Fri, 26 Mar 2010 15:05:52 -0700
Subject: Ext4 and large (>8TB) files
In-Reply-To: <4BAD22AB.8050105@redhat.com>
References: <941b09771003261152p6408b4cfle907389f81270f15@mail.gmail.com> 
	<4BAD0808.1040407@redhat.com>
	<941b09771003261350l535f35der63e1ecfe860b747b@mail.gmail.com> 
	<4BAD22AB.8050105@redhat.com>
Message-ID: <941b09771003261505h66a21030i17bcad852373542e@mail.gmail.com>

Eric -

So I'm guessing switching the system to 64-bit would fix this for us. How
about increasing the block size from the current 4k? Would that be an option
too?

Thanks much,
Arun

On Fri, Mar 26, 2010 at 2:10 PM, Eric Sandeen <sandeen at redhat.com> wrote:

> On 03/26/2010 03:50 PM, Arun Nair wrote:
> > Eric,
> >
> > Thanks for the quick reply... see my responses inline...
> >
> > On Fri, Mar 26, 2010 at 12:16 PM, Eric Sandeen <sandeen at redhat.com
> > <mailto:sandeen at redhat.com>> wrote:
> >
> >     On 03/26/2010 01:52 PM, Arun Nair wrote:
> >     > Hi -
> >     >
> >     > (I apologize for the ext4 question in an ext3 mailer, but I
> couldn't
> >     > find a user list for ext4.)
> >
> >     linux-ext4 at vger.kernel.org <mailto:linux-ext4 at vger.kernel.org> :)
> >      but that's ok.
> >
> >
> > Saw that but thought it was a dev-only list, sorry. Next time :)
>
> *shrug* I think user questions are welcome too.  At least I don't mind.
>
> ...
>
> > dd fails as mentioned above. xfs_io errors too:
> > [root at camanoe]# xfs_io -F -f -c "pwrite 8T 1M" bigfile2
> > pwrite64: File too large
>
> Oh.  Well, then!  Must be something else.
>
> oh, ok:
>
> sbi->s_bitmap_maxbytes = ext4_max_bitmap_size();
>
> static loff_t ext4_max_size(int blkbits, int has_huge_files)
> {
>        loff_t res;
>        loff_t upper_limit = MAX_LFS_FILESIZE;
>
> <snip>
>
>        /* Sanity check against vm- & vfs- imposed limits */
>        if (res > upper_limit)
>                res = upper_limit;
>
>        return res;
> }
>
> and:
>
> /* Page cache limit. The filesystems should put that into their s_maxbytes
>   limits, otherwise bad things can happen in VM. */
> #if BITS_PER_LONG==32
> #define MAX_LFS_FILESIZE (((u64)PAGE_CACHE_SIZE << (BITS_PER_LONG-1))-1)
>
> so it's only giving us 31 bits of pages, not 32.  This limits it to 8T
> on a 32-bit machine with 4k pages.
>
> I'm not honestly sure if there is anything in the vm that can't actually
> cope with a 32-bit offset... but until proven otherwise, probably not
> going to change this without a lot of testing & inspection.
>
> -Eric
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20100326/57147d16/attachment.htm>

From arun at bvinetworks.com  Sat Mar 27 01:32:20 2010
From: arun at bvinetworks.com (Arun Nair)
Date: Fri, 26 Mar 2010 18:32:20 -0700
Subject: Ext4 and large (>8TB) files
In-Reply-To: <5FF9A9E7-B005-4AD7-9923-3493B126EE53@dilger.ca>
References: <941b09771003261152p6408b4cfle907389f81270f15@mail.gmail.com> 
	<4BAD0808.1040407@redhat.com>
	<941b09771003261350l535f35der63e1ecfe860b747b@mail.gmail.com> 
	<4BAD22AB.8050105@redhat.com>
	<941b09771003261505h66a21030i17bcad852373542e@mail.gmail.com> 
	<5FF9A9E7-B005-4AD7-9923-3493B126EE53@dilger.ca>
Message-ID: <941b09771003261832g17dfc8f9k575bfbbf370a9cba@mail.gmail.com>

Ok, so I guess ext4 with 64-bit, or another filesystem for us. Thanks
Andreas & Eric for all the help.

On Fri, Mar 26, 2010 at 3:38 PM, Andreas Dilger <adilger at dilger.ca> wrote:

> On 2010-03-26, at 16:05, Arun Nair wrote:
>
>> So I'm guessing switching the system to 64-bit would fix this for us. How
>> about increasing the block size from the current 4k? Would that be an option
>> too?
>>
>
> Not in the near future, unless you are running on PPC/ARM/SPARC that can
> also handle large pages.
>
>  On Fri, Mar 26, 2010 at 2:10 PM, Eric Sandeen <sandeen at redhat.com> wrote:
>> On 03/26/2010 03:50 PM, Arun Nair wrote:
>> > Eric,
>> >
>> > Thanks for the quick reply... see my responses inline...
>> >
>> > On Fri, Mar 26, 2010 at 12:16 PM, Eric Sandeen <sandeen at redhat.com
>> > <mailto:sandeen at redhat.com>> wrote:
>> >
>> >     On 03/26/2010 01:52 PM, Arun Nair wrote:
>> >     > Hi -
>> >     >
>> >     > (I apologize for the ext4 question in an ext3 mailer, but I
>> couldn't
>> >     > find a user list for ext4.)
>> >
>> >     linux-ext4 at vger.kernel.org <mailto:linux-ext4 at vger.kernel.org> :)
>> >      but that's ok.
>> >
>> >
>> > Saw that but thought it was a dev-only list, sorry. Next time :)
>>
>> *shrug* I think user questions are welcome too.  At least I don't mind.
>>
>> ...
>>
>> > dd fails as mentioned above. xfs_io errors too:
>> > [root at camanoe]# xfs_io -F -f -c "pwrite 8T 1M" bigfile2
>> > pwrite64: File too large
>>
>> Oh.  Well, then!  Must be something else.
>>
>> oh, ok:
>>
>> sbi->s_bitmap_maxbytes = ext4_max_bitmap_size();
>>
>> static loff_t ext4_max_size(int blkbits, int has_huge_files)
>> {
>>       loff_t res;
>>       loff_t upper_limit = MAX_LFS_FILESIZE;
>>
>> <snip>
>>
>>       /* Sanity check against vm- & vfs- imposed limits */
>>       if (res > upper_limit)
>>               res = upper_limit;
>>
>>       return res;
>> }
>>
>> and:
>>
>> /* Page cache limit. The filesystems should put that into their s_maxbytes
>>  limits, otherwise bad things can happen in VM. */
>> #if BITS_PER_LONG==32
>> #define MAX_LFS_FILESIZE (((u64)PAGE_CACHE_SIZE << (BITS_PER_LONG-1))-1)
>>
>> so it's only giving us 31 bits of pages, not 32.  This limits it to 8T
>> on a 32-bit machine with 4k pages.
>>
>> I'm not honestly sure if there is anything in the vm that can't actually
>> cope with a 32-bit offset... but until proven otherwise, probably not
>> going to change this without a lot of testing & inspection.
>>
>> -Eric
>>
>> _______________________________________________
>> Ext3-users mailing list
>> Ext3-users at redhat.com
>> https://www.redhat.com/mailman/listinfo/ext3-users
>>
>
>
> Cheers, Andreas
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20100326/59eba361/attachment.htm>

From arun at bvinetworks.com  Sat Mar 27 04:16:41 2010
From: arun at bvinetworks.com (Arun Nair)
Date: Fri, 26 Mar 2010 21:16:41 -0700
Subject: Ext4 and large (>8TB) files
In-Reply-To: <468CD0F9-7EA3-4196-B662-050BABDB566E@dilger.ca>
References: <941b09771003261152p6408b4cfle907389f81270f15@mail.gmail.com> 
	<4BAD0808.1040407@redhat.com>
	<941b09771003261350l535f35der63e1ecfe860b747b@mail.gmail.com> 
	<4BAD22AB.8050105@redhat.com>
	<941b09771003261505h66a21030i17bcad852373542e@mail.gmail.com> 
	<5FF9A9E7-B005-4AD7-9923-3493B126EE53@dilger.ca>
	<941b09771003261832g17dfc8f9k575bfbbf370a9cba@mail.gmail.com> 
	<468CD0F9-7EA3-4196-B662-050BABDB566E@dilger.ca>
Message-ID: <941b09771003262116t2f8fec71rda6cf7ff6b7b72f5@mail.gmail.com>

Ah. Got it, thanks.

On Fri, Mar 26, 2010 at 9:04 PM, Andreas Dilger <adilger at dilger.ca> wrote:

> On 2010-03-26, at 19:32, Arun Nair wrote:
>
>> Ok, so I guess ext4 with 64-bit, or another filesystem for us. Thanks
>> Andreas & Eric for all the help.
>>
>
> No, I don't think another filesystem will help, on a 32-bit host.  The
> limit that ext4 is reporting is the VM page cache limit for a single file,
> and has nothing to do with ext4 itself.
>
>
>  On Fri, Mar 26, 2010 at 3:38 PM, Andreas Dilger <adilger at dilger.ca>
>> wrote:
>> On 2010-03-26, at 16:05, Arun Nair wrote:
>> So I'm guessing switching the system to 64-bit would fix this for us. How
>> about increasing the block size from the current 4k? Would that be an option
>> too?
>>
>> Not in the near future, unless you are running on PPC/ARM/SPARC that can
>> also handle large pages.
>>
>> On Fri, Mar 26, 2010 at 2:10 PM, Eric Sandeen <sandeen at redhat.com> wrote:
>> On 03/26/2010 03:50 PM, Arun Nair wrote:
>> > Eric,
>> >
>> > Thanks for the quick reply... see my responses inline...
>> >
>> > On Fri, Mar 26, 2010 at 12:16 PM, Eric Sandeen <sandeen at redhat.com
>> > <mailto:sandeen at redhat.com>> wrote:
>> >
>> >     On 03/26/2010 01:52 PM, Arun Nair wrote:
>> >     > Hi -
>> >     >
>> >     > (I apologize for the ext4 question in an ext3 mailer, but I
>> couldn't
>> >     > find a user list for ext4.)
>> >
>> >     linux-ext4 at vger.kernel.org <mailto:linux-ext4 at vger.kernel.org> :)
>> >      but that's ok.
>> >
>> >
>> > Saw that but thought it was a dev-only list, sorry. Next time :)
>>
>> *shrug* I think user questions are welcome too.  At least I don't mind.
>>
>> ...
>>
>> > dd fails as mentioned above. xfs_io errors too:
>> > [root at camanoe]# xfs_io -F -f -c "pwrite 8T 1M" bigfile2
>> > pwrite64: File too large
>>
>> Oh.  Well, then!  Must be something else.
>>
>> oh, ok:
>>
>> sbi->s_bitmap_maxbytes = ext4_max_bitmap_size();
>>
>> static loff_t ext4_max_size(int blkbits, int has_huge_files)
>> {
>>      loff_t res;
>>      loff_t upper_limit = MAX_LFS_FILESIZE;
>>
>> <snip>
>>
>>      /* Sanity check against vm- & vfs- imposed limits */
>>      if (res > upper_limit)
>>              res = upper_limit;
>>
>>      return res;
>> }
>>
>> and:
>>
>> /* Page cache limit. The filesystems should put that into their s_maxbytes
>>  limits, otherwise bad things can happen in VM. */
>> #if BITS_PER_LONG==32
>> #define MAX_LFS_FILESIZE (((u64)PAGE_CACHE_SIZE << (BITS_PER_LONG-1))-1)
>>
>> so it's only giving us 31 bits of pages, not 32.  This limits it to 8T
>> on a 32-bit machine with 4k pages.
>>
>> I'm not honestly sure if there is anything in the vm that can't actually
>> cope with a 32-bit offset... but until proven otherwise, probably not
>> going to change this without a lot of testing & inspection.
>>
>> -Eric
>>
>> _______________________________________________
>> Ext3-users mailing list
>> Ext3-users at redhat.com
>> https://www.redhat.com/mailman/listinfo/ext3-users
>>
>>
>> Cheers, Andreas
>>
>>
>>
>>
>>
>>
>>
>
> Cheers, Andreas
>
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20100326/9ea0f4e1/attachment.htm>

From sandeen at redhat.com  Mon Mar 29 18:06:12 2010
From: sandeen at redhat.com (Eric Sandeen)
Date: Mon, 29 Mar 2010 13:06:12 -0500
Subject: Ext4 and large (>8TB) files
In-Reply-To: <468CD0F9-7EA3-4196-B662-050BABDB566E@dilger.ca>
References: <941b09771003261152p6408b4cfle907389f81270f15@mail.gmail.com>
	<4BAD0808.1040407@redhat.com>
	<941b09771003261350l535f35der63e1ecfe860b747b@mail.gmail.com>
	<4BAD22AB.8050105@redhat.com>
	<941b09771003261505h66a21030i17bcad852373542e@mail.gmail.com>
	<5FF9A9E7-B005-4AD7-9923-3493B126EE53@dilger.ca>
	<941b09771003261832g17dfc8f9k575bfbbf370a9cba@mail.gmail.com>
	<468CD0F9-7EA3-4196-B662-050BABDB566E@dilger.ca>
Message-ID: <4BB0EC14.6040309@redhat.com>

Andreas Dilger wrote:
> On 2010-03-26, at 19:32, Arun Nair wrote:
>> Ok, so I guess ext4 with 64-bit, or another filesystem for us. Thanks
>> Andreas & Eric for all the help.
> 
> No, I don't think another filesystem will help, on a 32-bit host.  The
> limit that ext4 is reporting is the VM page cache limit for a single
> file, and has nothing to do with ext4 itself.

Well, for what it's worth, xfs doesn't use MAX_LFS_FILESIZE for s_maxbytes, and:

# mkfs.xfs -dfile,name=fsfile,size=5g
...
# mount -o loop fsfile  mnt/
# cd mnt/
# truncate --size 17592186044415 bigfile
# ls -lh bigfile
-rw-r--r--. 1 root root 16T Mar 29 14:03 bigfile
# uname -m
i686

it is possible to create a > 8T file offset.

Now, whether the vm is really happy with this probably remains to be seen;
this is the sort of thing that breaks without constant testing, IMHO.

I'd certainly suggest that a 64-bit box is the best way to go if at all possible.

-Eric


From mnalis-ml at voyager.hr  Mon Mar 29 18:55:58 2010
From: mnalis-ml at voyager.hr (Matija Nalis)
Date: Mon, 29 Mar 2010 20:55:58 +0200
Subject: File caching?
In-Reply-To: <5790d5b71003231216p6ee172dcx2ae66e2cbd53f1af@mail.gmail.com>
References: <5790d5b71003231116p6c098aebl66ba79d36ff1ca7e@mail.gmail.com>
	<005c01cacaba$0c7dcfa0$25796ee0$@com>
	<5790d5b71003231216p6ee172dcx2ae66e2cbd53f1af@mail.gmail.com>
Message-ID: <20100329185558.GA4060@eagle102.home.lan>

On Tue, Mar 23, 2010 at 01:16:57PM -0600, Michael McGlothlin wrote:
> I was going to store the cached copy on a RAM disk as many of the files are

Note that it probably won't help you that much, the kernel usually does
quite good job at caching reads (but, specific situations like extensive
writes can starve it - which ramdisk solution would avoid as it is
completely manually controlled. Only way to know it is to test it)

> larger than the 1MB limit of memlockd and I don't feel like coming up with
> my own solution if I can avoid it.
> 
> Is there a way to know how much RAM is being used for file cache or to tell

free(1) will tell you (look at "cached" column). Note that programs you
"load" from disk are also actually executed directly from that page cache
(without making any separate copy). Also, the writes (unless being done with
O_DIRECT or such) will go to that same cache before they're flushed to disk
(which is usually what you want, as subsequent reads can then be satisfied
from cache, and the application issuing writes gets control much sooner than
if it would wait for writes to complete to disk).

> it to use more? If the server has 128GB of RAM and typically uses half of
> that for it's actual work will it use the rest as file cache? Likewise is

Yes, it will use (almost) ALL otherwise unused RAM (the "free" column in
free(1) means "unused" or "wasted" if you like) for cache. The "almost" is
because there is some very small amount reserved by /proc/sys/vm/min_free_kbytes 
(but you don't want to touch it, it is too small to give you any benefit, and 
your kernel might die if you set it too low).

> there a way to track/test if file stats are being pushed out of cache a lot?

Uh, dunno for something elegant. You can track /proc/meminfo and
/proc/slabinfo (for example by free(1) and slabtop(1) or manually of course)
and look how they change.

Note that there are other users of memory (see "slabtop -s c"). For example,
especially if you have lots of small files and/or directories, then dentry,
inode and related fs cache structures can eat significant amounts of RAM. 
You can tune priority of expiration of those with /proc/sys/vm/vfs_cache_pressure 
(you might also want to look in your kernel docs for rest of /proc/sys/vm)

There are also vmstat(8), iostat(1) and blktrace(8) which might help you
track actual I/O, and you can compare that with actual data read from your
program logs for example to see how well the cache performs.


-- 
Opinions above are GNU-copylefted.


From adilger at dilger.ca  Fri Mar 26 22:39:24 2010
From: adilger at dilger.ca (Andreas Dilger)
Date: Fri, 26 Mar 2010 22:39:24 -0000
Subject: Ext4 and large (>8TB) files
In-Reply-To: <941b09771003261505h66a21030i17bcad852373542e@mail.gmail.com>
References: <941b09771003261152p6408b4cfle907389f81270f15@mail.gmail.com>
	<4BAD0808.1040407@redhat.com>
	<941b09771003261350l535f35der63e1ecfe860b747b@mail.gmail.com>
	<4BAD22AB.8050105@redhat.com>
	<941b09771003261505h66a21030i17bcad852373542e@mail.gmail.com>
Message-ID: <5FF9A9E7-B005-4AD7-9923-3493B126EE53@dilger.ca>

On 2010-03-26, at 16:05, Arun Nair wrote:
> So I'm guessing switching the system to 64-bit would fix this for  
> us. How about increasing the block size from the current 4k? Would  
> that be an option too?

Not in the near future, unless you are running on PPC/ARM/SPARC that  
can also handle large pages.

> On Fri, Mar 26, 2010 at 2:10 PM, Eric Sandeen <sandeen at redhat.com>  
> wrote:
> On 03/26/2010 03:50 PM, Arun Nair wrote:
> > Eric,
> >
> > Thanks for the quick reply... see my responses inline...
> >
> > On Fri, Mar 26, 2010 at 12:16 PM, Eric Sandeen <sandeen at redhat.com
> > <mailto:sandeen at redhat.com>> wrote:
> >
> >     On 03/26/2010 01:52 PM, Arun Nair wrote:
> >     > Hi -
> >     >
> >     > (I apologize for the ext4 question in an ext3 mailer, but I  
> couldn't
> >     > find a user list for ext4.)
> >
> >     linux-ext4 at vger.kernel.org <mailto:linux- 
> ext4 at vger.kernel.org> :)
> >      but that's ok.
> >
> >
> > Saw that but thought it was a dev-only list, sorry. Next time :)
>
> *shrug* I think user questions are welcome too.  At least I don't  
> mind.
>
> ...
>
> > dd fails as mentioned above. xfs_io errors too:
> > [root at camanoe]# xfs_io -F -f -c "pwrite 8T 1M" bigfile2
> > pwrite64: File too large
>
> Oh.  Well, then!  Must be something else.
>
> oh, ok:
>
> sbi->s_bitmap_maxbytes = ext4_max_bitmap_size();
>
> static loff_t ext4_max_size(int blkbits, int has_huge_files)
> {
>        loff_t res;
>        loff_t upper_limit = MAX_LFS_FILESIZE;
>
> <snip>
>
>        /* Sanity check against vm- & vfs- imposed limits */
>        if (res > upper_limit)
>                res = upper_limit;
>
>        return res;
> }
>
> and:
>
> /* Page cache limit. The filesystems should put that into their  
> s_maxbytes
>   limits, otherwise bad things can happen in VM. */
> #if BITS_PER_LONG==32
> #define MAX_LFS_FILESIZE (((u64)PAGE_CACHE_SIZE <<  
> (BITS_PER_LONG-1))-1)
>
> so it's only giving us 31 bits of pages, not 32.  This limits it to 8T
> on a 32-bit machine with 4k pages.
>
> I'm not honestly sure if there is anything in the vm that can't  
> actually
> cope with a 32-bit offset... but until proven otherwise, probably not
> going to change this without a lot of testing & inspection.
>
> -Eric
>
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users


Cheers, Andreas


From adilger at dilger.ca  Sat Mar 27 04:04:32 2010
From: adilger at dilger.ca (Andreas Dilger)
Date: Sat, 27 Mar 2010 04:04:32 -0000
Subject: Ext4 and large (>8TB) files
In-Reply-To: <941b09771003261832g17dfc8f9k575bfbbf370a9cba@mail.gmail.com>
References: <941b09771003261152p6408b4cfle907389f81270f15@mail.gmail.com>
	<4BAD0808.1040407@redhat.com>
	<941b09771003261350l535f35der63e1ecfe860b747b@mail.gmail.com>
	<4BAD22AB.8050105@redhat.com>
	<941b09771003261505h66a21030i17bcad852373542e@mail.gmail.com>
	<5FF9A9E7-B005-4AD7-9923-3493B126EE53@dilger.ca>
	<941b09771003261832g17dfc8f9k575bfbbf370a9cba@mail.gmail.com>
Message-ID: <468CD0F9-7EA3-4196-B662-050BABDB566E@dilger.ca>

On 2010-03-26, at 19:32, Arun Nair wrote:
> Ok, so I guess ext4 with 64-bit, or another filesystem for us.  
> Thanks Andreas & Eric for all the help.

No, I don't think another filesystem will help, on a 32-bit host.  The  
limit that ext4 is reporting is the VM page cache limit for a single  
file, and has nothing to do with ext4 itself.

> On Fri, Mar 26, 2010 at 3:38 PM, Andreas Dilger <adilger at dilger.ca>  
> wrote:
> On 2010-03-26, at 16:05, Arun Nair wrote:
> So I'm guessing switching the system to 64-bit would fix this for  
> us. How about increasing the block size from the current 4k? Would  
> that be an option too?
>
> Not in the near future, unless you are running on PPC/ARM/SPARC that  
> can also handle large pages.
>
> On Fri, Mar 26, 2010 at 2:10 PM, Eric Sandeen <sandeen at redhat.com>  
> wrote:
> On 03/26/2010 03:50 PM, Arun Nair wrote:
> > Eric,
> >
> > Thanks for the quick reply... see my responses inline...
> >
> > On Fri, Mar 26, 2010 at 12:16 PM, Eric Sandeen <sandeen at redhat.com
> > <mailto:sandeen at redhat.com>> wrote:
> >
> >     On 03/26/2010 01:52 PM, Arun Nair wrote:
> >     > Hi -
> >     >
> >     > (I apologize for the ext4 question in an ext3 mailer, but I  
> couldn't
> >     > find a user list for ext4.)
> >
> >     linux-ext4 at vger.kernel.org <mailto:linux- 
> ext4 at vger.kernel.org> :)
> >      but that's ok.
> >
> >
> > Saw that but thought it was a dev-only list, sorry. Next time :)
>
> *shrug* I think user questions are welcome too.  At least I don't  
> mind.
>
> ...
>
> > dd fails as mentioned above. xfs_io errors too:
> > [root at camanoe]# xfs_io -F -f -c "pwrite 8T 1M" bigfile2
> > pwrite64: File too large
>
> Oh.  Well, then!  Must be something else.
>
> oh, ok:
>
> sbi->s_bitmap_maxbytes = ext4_max_bitmap_size();
>
> static loff_t ext4_max_size(int blkbits, int has_huge_files)
> {
>       loff_t res;
>       loff_t upper_limit = MAX_LFS_FILESIZE;
>
> <snip>
>
>       /* Sanity check against vm- & vfs- imposed limits */
>       if (res > upper_limit)
>               res = upper_limit;
>
>       return res;
> }
>
> and:
>
> /* Page cache limit. The filesystems should put that into their  
> s_maxbytes
>  limits, otherwise bad things can happen in VM. */
> #if BITS_PER_LONG==32
> #define MAX_LFS_FILESIZE (((u64)PAGE_CACHE_SIZE <<  
> (BITS_PER_LONG-1))-1)
>
> so it's only giving us 31 bits of pages, not 32.  This limits it to 8T
> on a 32-bit machine with 4k pages.
>
> I'm not honestly sure if there is anything in the vm that can't  
> actually
> cope with a 32-bit offset... but until proven otherwise, probably not
> going to change this without a lot of testing & inspection.
>
> -Eric
>
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users
>
>
> Cheers, Andreas
>
>
>
>
>
>


Cheers, Andreas