From y-takahashi at gmo-hs.com Fri Mar 4 06:52:01 2011 From: y-takahashi at gmo-hs.com (GMO-HS Yoichi Takahashi) Date: Fri, 04 Mar 2011 15:52:01 +0900 Subject: minus disk usage Message-ID: <20110304155201.895B.A9C031E0@gmo-hs.com> Hi,This is Yoichi Takahashi I have a trouble on the ext3 filesystem. The display changes whenever the df command is executed. ?at short intervals? It is a minus display for the following, and normal displays. see below /dev/sda2 Filesystem Size Used Avail Use% Mounted on /dev/sda2 97G -345M 93G 0% / /dev/sda1 99M 15M 80M 16% /boot tmpfs 2.0G 0 2.0G 0% /dev/shm /dev/sda3 803G 2.5G 759G 1% /home Filesystem Size Used Avail Use% Mounted on /dev/sda2 97G 1.2G 91G 2% / /dev/sda1 99M 15M 80M 16% /boot tmpfs 2.0G 0 2.0G 0% /dev/shm /dev/sda3 803G 2.5G 759G 1% /home Filesystem Size Used Avail Use% Mounted on /dev/sda2 97G 448M 92G 1% / /dev/sda1 99M 15M 80M 16% /boot tmpfs 2.0G 0 2.0G 0% /dev/shm /dev/sda3 803G 2.5G 759G 1% /home Filesystem Size Used Avail Use% Mounted on /dev/sda2 97G -109M 92G 0% / /dev/sda1 99M 15M 80M 16% /boot tmpfs 2.0G 0 2.0G 0% /dev/shm /dev/sda3 803G 2.5G 759G 1% /home The load is always a high server. LoadAverage is always 3or4 in the server Dose anyone know why this happned ? Any ideas be appreciated. ?????????????????????????????? GMO?????? & ?????????? ???23?4?1???GMO??????????????????? ?????????????? ???????????????????????? ??????Youichi Takahashi ?150-8512?????????26?1????????? Cerulean Tower 26-1 Sakuragaoka-cho,Shibuya-ku,Tokyo (150-8512) Japan TEL +81-3-6415-7075 FAX ? +81-3-6415-6108 E-MAIL y-takahashi at gmo-hs.com URL http://www.gmo-hs.com STOCK CODE ??????3788???????? ?????????????????????????????? From scerveau at awox.com Fri Mar 4 10:52:35 2011 From: scerveau at awox.com (Stephane Cerveau) Date: Fri, 4 Mar 2011 11:52:35 +0100 Subject: ext3_free_blocks_sb when removing a more than 1GB file In-Reply-To: References: Message-ID: Dear all, I?m formatting a specific MSC key in ext3. This key sizes 4GB. When I copy a file which is more than 1GB ( 1024MB), I got many errors ?Ext3-fs error ( device sda1): ext3_free_blocks_sb: bit already cleared for block xxxx? when I try to remove it after having sync the copy. Do you know why I could have this kind of error? Why the problem appears only on this kind of key ? What can I do to identify the problem? It seems that there is no error in the FS (e2fsck done successfully) before removing the key. Best regards. -------------- next part -------------- An HTML attachment was scrubbed... URL: From scerveau at awox.com Fri Mar 4 15:45:39 2011 From: scerveau at awox.com (Stephane Cerveau) Date: Fri, 4 Mar 2011 16:45:39 +0100 Subject: ext3_free_blocks_sb when removing a more than 1GB file In-Reply-To: References: Message-ID: Dear all, It seems that if I change the size of blocks to 2048 by mkfs.ext3 ?b 2048 /dev/sda1, the problem does not appear. Is there a way to know by advance what is the best block size for an external device ? BR Stephane From: Stephane Cerveau [mailto:scerveau at awox.com] Sent: vendredi 4 mars 2011 11:53 To: Andreas Dilger Cc: ext3-users at redhat.com Subject: ext3_free_blocks_sb when removing a more than 1GB file Dear all, I?m formatting a specific MSC key in ext3. This key sizes 4GB. When I copy a file which is more than 1GB ( 1024MB), I got many errors ?Ext3-fs error ( device sda1): ext3_free_blocks_sb: bit already cleared for block xxxx? when I try to remove it after having sync the copy. Do you know why I could have this kind of error? Why the problem appears only on this kind of key ? What can I do to identify the problem? It seems that there is no error in the FS (e2fsck done successfully) before removing the key. Best regards. __________ Information from ESET NOD32 Antivirus, version of virus signature database 5924 (20110303) __________ The message was checked by ESET NOD32 Antivirus. http://www.eset.com __________ Information from ESET NOD32 Antivirus, version of virus signature database 5924 (20110303) __________ The message was checked by ESET NOD32 Antivirus. http://www.eset.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From sandeen at redhat.com Fri Mar 4 16:26:12 2011 From: sandeen at redhat.com (Eric Sandeen) Date: Fri, 04 Mar 2011 10:26:12 -0600 Subject: ext3_free_blocks_sb when removing a more than 1GB file In-Reply-To: References: Message-ID: <4D7112A4.6050209@redhat.com> On 3/4/11 9:45 AM, Stephane Cerveau wrote: > Dear all, > > It seems that if I change the size of blocks to 2048 by mkfs.ext3 ?b > 2048 /dev/sda1, the problem does not appear. > > Is there a way to know by advance what is the best block size for an > external device ? > > BR Sounds like a storage problem; not a filesystem problem, something to do with the flash behaving badly. -Eric From scerveau at awox.com Fri Mar 4 17:33:23 2011 From: scerveau at awox.com (Stephane Cerveau) Date: Fri, 4 Mar 2011 18:33:23 +0100 Subject: ext3_free_blocks_sb when removing a more than 1GB file In-Reply-To: <4D7112A4.6050209@redhat.com> References: <4D7112A4.6050209@redhat.com> Message-ID: Hello, I checked the storage that seems to be ok ( check bad block) and I still have the problem. I did the test on vfat and I don?t have the problem. I'm using a 2.6.23 kernel ? When the ext3 fs is considered as stable ? BR -----Original Message----- From: Eric Sandeen [mailto:sandeen at redhat.com] Sent: vendredi 4 mars 2011 17:26 To: Stephane Cerveau Cc: ext3-users at redhat.com; Tristan Pateloup Subject: Re: ext3_free_blocks_sb when removing a more than 1GB file On 3/4/11 9:45 AM, Stephane Cerveau wrote: > Dear all, > > It seems that if I change the size of blocks to 2048 by mkfs.ext3 ?b > 2048 /dev/sda1, the problem does not appear. > > Is there a way to know by advance what is the best block size for an > external device ? > > BR Sounds like a storage problem; not a filesystem problem, something to do with the flash behaving badly. -Eric __________ Information from ESET NOD32 Antivirus, version of virus signature database 5925 (20110304) __________ The message was checked by ESET NOD32 Antivirus. http://www.eset.com __________ Information from ESET NOD32 Antivirus, version of virus signature database 5926 (20110304) __________ The message was checked by ESET NOD32 Antivirus. http://www.eset.com From sandeen at redhat.com Fri Mar 4 17:44:38 2011 From: sandeen at redhat.com (Eric Sandeen) Date: Fri, 04 Mar 2011 11:44:38 -0600 Subject: ext3_free_blocks_sb when removing a more than 1GB file In-Reply-To: References: <4D7112A4.6050209@redhat.com> Message-ID: <4D712506.4070405@redhat.com> On 3/4/11 11:33 AM, Stephane Cerveau wrote: > Hello, > > > I checked the storage that seems to be ok ( check bad block) and I > still have the problem. I did the test on vfat and I don?t have the > problem. I'm using a 2.6.23 kernel ? When the ext3 fs is considered > as stable ? I don't mean that it's an IO error or a bad block, but perhaps a behavioral problem with the flash; maybe not syncing properly before it's powered off, etc. If it only happens with some USB drives, I do not think it is an ext3 issue. Your original problem report may not have been totally clear so maybe I misunderstand. Can you show exactly what you did, and exactly what error messages you received, keeping in mind that any IO type errors that don't say "ext3" are also relevant? -Eric > BR > > -----Original Message----- From: Eric Sandeen > [mailto:sandeen at redhat.com] Sent: vendredi 4 mars 2011 17:26 To: > Stephane Cerveau Cc: ext3-users at redhat.com; Tristan Pateloup Subject: > Re: ext3_free_blocks_sb when removing a more than 1GB file > > On 3/4/11 9:45 AM, Stephane Cerveau wrote: >> Dear all, >> >> It seems that if I change the size of blocks to 2048 by mkfs.ext3 >> ?b 2048 /dev/sda1, the problem does not appear. >> >> Is there a way to know by advance what is the best block size for >> an external device ? >> >> BR > > Sounds like a storage problem; not a filesystem problem, something to > do with the flash behaving badly. > > -Eric > > > __________ Information from ESET NOD32 Antivirus, version of virus > signature database 5925 (20110304) __________ > > The message was checked by ESET NOD32 Antivirus. > > http://www.eset.com > > > > __________ Information from ESET NOD32 Antivirus, version of virus > signature database 5926 (20110304) __________ > > The message was checked by ESET NOD32 Antivirus. > > http://www.eset.com > From scerveau at awox.com Fri Mar 4 17:54:23 2011 From: scerveau at awox.com (Stephane Cerveau) Date: Fri, 4 Mar 2011 18:54:23 +0100 Subject: ext3_free_blocks_sb when removing a more than 1GB file In-Reply-To: <4D712506.4070405@redhat.com> References: <4D7112A4.6050209@redhat.com> <4D712506.4070405@redhat.com> Message-ID: Hi, Thanks for your answer. Here is my steps: - mkfs.ext3 /dev/sda1 - mount /dev/sda1 /mnt/usb - dd if=/dev/zero of=/mnt/usb/test_file bs=1M count=1025 ( the size is important) - sync - rm /mnt/usb/test_file Then many errors appears "Ext3-fs error ( device sda1): ext3_free_blocks_sb: bit already cleared for block xxxx" I tried to umount/mount the storage but its not working also. I tried to check the device before removing the file, not working also. Indeed with another usb key it's working... I'm using a kernel 2.6.23 The problem does NOT appear with mkfs.ext2 /dev/sda1 before What do you advise to do ? BR Stephane. -----Original Message----- From: Eric Sandeen [mailto:sandeen at redhat.com] Sent: vendredi 4 mars 2011 18:45 To: Stephane Cerveau Cc: ext3-users at redhat.com; Tristan Pateloup Subject: Re: ext3_free_blocks_sb when removing a more than 1GB file On 3/4/11 11:33 AM, Stephane Cerveau wrote: > Hello, > > > I checked the storage that seems to be ok ( check bad block) and I > still have the problem. I did the test on vfat and I don?t have the > problem. I'm using a 2.6.23 kernel ? When the ext3 fs is considered > as stable ? I don't mean that it's an IO error or a bad block, but perhaps a behavioral problem with the flash; maybe not syncing properly before it's powered off, etc. If it only happens with some USB drives, I do not think it is an ext3 issue. Your original problem report may not have been totally clear so maybe I misunderstand. Can you show exactly what you did, and exactly what error messages you received, keeping in mind that any IO type errors that don't say "ext3" are also relevant? -Eric > BR > > -----Original Message----- From: Eric Sandeen > [mailto:sandeen at redhat.com] Sent: vendredi 4 mars 2011 17:26 To: > Stephane Cerveau Cc: ext3-users at redhat.com; Tristan Pateloup Subject: > Re: ext3_free_blocks_sb when removing a more than 1GB file > > On 3/4/11 9:45 AM, Stephane Cerveau wrote: >> Dear all, >> >> It seems that if I change the size of blocks to 2048 by mkfs.ext3 >> ?b 2048 /dev/sda1, the problem does not appear. >> >> Is there a way to know by advance what is the best block size for >> an external device ? >> >> BR > > Sounds like a storage problem; not a filesystem problem, something to > do with the flash behaving badly. > > -Eric > > > __________ Information from ESET NOD32 Antivirus, version of virus > signature database 5925 (20110304) __________ > > The message was checked by ESET NOD32 Antivirus. > > http://www.eset.com > > > > __________ Information from ESET NOD32 Antivirus, version of virus > signature database 5926 (20110304) __________ > > The message was checked by ESET NOD32 Antivirus. > > http://www.eset.com > __________ Information from ESET NOD32 Antivirus, version of virus signature database 5926 (20110304) __________ The message was checked by ESET NOD32 Antivirus. http://www.eset.com __________ Information from ESET NOD32 Antivirus, version of virus signature database 5926 (20110304) __________ The message was checked by ESET NOD32 Antivirus. http://www.eset.com From sandeen at redhat.com Fri Mar 4 17:55:52 2011 From: sandeen at redhat.com (Eric Sandeen) Date: Fri, 04 Mar 2011 11:55:52 -0600 Subject: minus disk usage In-Reply-To: <20110304155201.895B.A9C031E0@gmo-hs.com> References: <20110304155201.895B.A9C031E0@gmo-hs.com> Message-ID: <4D7127A8.6070302@redhat.com> On 3/4/11 12:52 AM, GMO-HS Yoichi Takahashi wrote: > Hi,This is Yoichi Takahashi > > I have a trouble on the ext3 filesystem. > The display changes whenever the df command is executed. ?at short intervals? > It is a minus display for the following, and normal displays. > see below /dev/sda2 > > Filesystem Size Used Avail Use% Mounted on > /dev/sda2 97G -345M 93G 0% / > /dev/sda1 99M 15M 80M 16% /boot > tmpfs 2.0G 0 2.0G 0% /dev/shm > /dev/sda3 803G 2.5G 759G 1% /home > > Filesystem Size Used Avail Use% Mounted on > /dev/sda2 97G 1.2G 91G 2% / > /dev/sda1 99M 15M 80M 16% /boot > tmpfs 2.0G 0 2.0G 0% /dev/shm > /dev/sda3 803G 2.5G 759G 1% /home > > Filesystem Size Used Avail Use% Mounted on > /dev/sda2 97G 448M 92G 1% / > /dev/sda1 99M 15M 80M 16% /boot > tmpfs 2.0G 0 2.0G 0% /dev/shm > /dev/sda3 803G 2.5G 759G 1% /home > > Filesystem Size Used Avail Use% Mounted on > /dev/sda2 97G -109M 92G 0% / > /dev/sda1 99M 15M 80M 16% /boot > tmpfs 2.0G 0 2.0G 0% /dev/shm > /dev/sda3 803G 2.5G 759G 1% /home > > The load is always a high server. > LoadAverage is always 3or4 in the server > > Dose anyone know why this happned ? > Any ideas be appreciated. > It may be changing because files are being added & removed at the time? As for the negative... What kernel and what coreutils are you using? stat -f / would let us know what the kernel is returning; I am guessing that this is a bug in coreutils related to the handling of reserved-for-root space... If you can try the test again, but do: # df -B 4096 /; stat -f / (assuming 4k fs blocksize) a few times, and see what those return. -Eric From sandeen at redhat.com Fri Mar 4 17:59:08 2011 From: sandeen at redhat.com (Eric Sandeen) Date: Fri, 04 Mar 2011 11:59:08 -0600 Subject: ext3_free_blocks_sb when removing a more than 1GB file In-Reply-To: References: <4D7112A4.6050209@redhat.com> <4D712506.4070405@redhat.com> Message-ID: <4D71286C.2080408@redhat.com> On 3/4/11 11:54 AM, Stephane Cerveau wrote: > Hi, > > Thanks for your answer. > Here is my steps: > > - mkfs.ext3 /dev/sda1 > - mount /dev/sda1 /mnt/usb > - dd if=/dev/zero of=/mnt/usb/test_file bs=1M count=1025 ( the size is important) > - sync > - rm /mnt/usb/test_file Ok, I had the impression that you were removing the usb key at some point in the test, but I guess not. > Then many errors appears "Ext3-fs error ( device sda1): ext3_free_blocks_sb: bit already cleared for block xxxx" > > I tried to umount/mount the storage but its not working also. > I tried to check the device before removing the file, not working also. you mean that umount/mount/rm gives the same error? As does umount/fsck/mount/rm ? > Indeed with another usb key it's working... > I'm using a kernel 2.6.23 > > The problem does NOT appear with mkfs.ext2 /dev/sda1 before > > What do you advise to do ? Try a much newer kernel, first of all, to see if it's a known, fixed bug. But since it works on another usb key, I still tend to blame the hardware. "bit already cleared" makes it sound like it is reading zeros when it should not be. -Eric > BR > > Stephane. From scerveau at awox.com Fri Mar 4 18:07:34 2011 From: scerveau at awox.com (Stephane Cerveau) Date: Fri, 4 Mar 2011 19:07:34 +0100 Subject: ext3_free_blocks_sb when removing a more than 1GB file In-Reply-To: <4D71286C.2080408@redhat.com> References: <4D7112A4.6050209@redhat.com> <4D712506.4070405@redhat.com> <4D71286C.2080408@redhat.com> Message-ID: :) I'm not removing the key during the test. Yes "umount/mount/rm", "umount/fsck/mount/rm" and "rm" give the same error... I have several keys from the same brand, model and I have the same issue. When I said, a different key, it was a different brand. At the end, it seems that ext2 is working fine! So maybe a problem in ext3 in 2.6.23 kernel ?!? I had a try on 2.6.32_27, I did not succeed to reproduce the issue. Do you know when ext3 is supposed to be stable ? BR Stephane -----Original Message----- From: Eric Sandeen [mailto:sandeen at redhat.com] Sent: vendredi 4 mars 2011 18:59 To: Stephane Cerveau Cc: ext3-users at redhat.com; Tristan Pateloup Subject: Re: ext3_free_blocks_sb when removing a more than 1GB file On 3/4/11 11:54 AM, Stephane Cerveau wrote: > Hi, > > Thanks for your answer. > Here is my steps: > > - mkfs.ext3 /dev/sda1 > - mount /dev/sda1 /mnt/usb > - dd if=/dev/zero of=/mnt/usb/test_file bs=1M count=1025 ( the size is important) > - sync > - rm /mnt/usb/test_file Ok, I had the impression that you were removing the usb key at some point in the test, but I guess not. > Then many errors appears "Ext3-fs error ( device sda1): ext3_free_blocks_sb: bit already cleared for block xxxx" > > I tried to umount/mount the storage but its not working also. > I tried to check the device before removing the file, not working also. you mean that umount/mount/rm gives the same error? As does umount/fsck/mount/rm ? > Indeed with another usb key it's working... > I'm using a kernel 2.6.23 > > The problem does NOT appear with mkfs.ext2 /dev/sda1 before > > What do you advise to do ? Try a much newer kernel, first of all, to see if it's a known, fixed bug. But since it works on another usb key, I still tend to blame the hardware. "bit already cleared" makes it sound like it is reading zeros when it should not be. -Eric > BR > > Stephane. __________ Information from ESET NOD32 Antivirus, version of virus signature database 5926 (20110304) __________ The message was checked by ESET NOD32 Antivirus. http://www.eset.com __________ Information from ESET NOD32 Antivirus, version of virus signature database 5926 (20110304) __________ The message was checked by ESET NOD32 Antivirus. http://www.eset.com From sandeen at redhat.com Fri Mar 4 18:13:45 2011 From: sandeen at redhat.com (Eric Sandeen) Date: Fri, 04 Mar 2011 12:13:45 -0600 Subject: ext3_free_blocks_sb when removing a more than 1GB file In-Reply-To: References: <4D7112A4.6050209@redhat.com> <4D712506.4070405@redhat.com> <4D71286C.2080408@redhat.com> Message-ID: <4D712BD9.7090000@redhat.com> On 3/4/11 12:07 PM, Stephane Cerveau wrote: > :) I'm not removing the key during the test. > > Yes "umount/mount/rm", "umount/fsck/mount/rm" and "rm" give the same error... > > I have several keys from the same brand, model and I have the same issue. > > When I said, a different key, it was a different brand. > > At the end, it seems that ext2 is working fine! It may well have different IO patterns. > So maybe a problem in ext3 in 2.6.23 kernel ?!? > I had a try on 2.6.32_27, I did not succeed to reproduce the issue. > > Do you know when ext3 is supposed to be stable ? heh, 2.4.x or so. You could bisect kernel versions and see if you can arrive at a change that fixed it. I'd also find some tools to do more extensive IO testing on your usb key, I still think it might be a hardware problem since only some brand/models are affected. -Eric > BR > > Stephane > > > -----Original Message----- > From: Eric Sandeen [mailto:sandeen at redhat.com] > Sent: vendredi 4 mars 2011 18:59 > To: Stephane Cerveau > Cc: ext3-users at redhat.com; Tristan Pateloup > Subject: Re: ext3_free_blocks_sb when removing a more than 1GB file > > On 3/4/11 11:54 AM, Stephane Cerveau wrote: >> Hi, >> >> Thanks for your answer. >> Here is my steps: >> >> - mkfs.ext3 /dev/sda1 >> - mount /dev/sda1 /mnt/usb >> - dd if=/dev/zero of=/mnt/usb/test_file bs=1M count=1025 ( the size is important) >> - sync >> - rm /mnt/usb/test_file > > Ok, I had the impression that you were removing the usb key at > some point in the test, but I guess not. > >> Then many errors appears "Ext3-fs error ( device sda1): ext3_free_blocks_sb: bit already cleared for block xxxx" >> >> I tried to umount/mount the storage but its not working also. >> I tried to check the device before removing the file, not working also. > > you mean that umount/mount/rm gives the same error? As does umount/fsck/mount/rm ? > >> Indeed with another usb key it's working... >> I'm using a kernel 2.6.23 >> >> The problem does NOT appear with mkfs.ext2 /dev/sda1 before >> >> What do you advise to do ? > > Try a much newer kernel, first of all, to see if it's a known, fixed bug. > > But since it works on another usb key, I still tend to blame the hardware. > "bit already cleared" makes it sound like it is reading zeros when it > should not be. > > -Eric > >> BR >> >> Stephane. > > > > > __________ Information from ESET NOD32 Antivirus, version of virus signature database 5926 (20110304) __________ > > The message was checked by ESET NOD32 Antivirus. > > http://www.eset.com > > > > __________ Information from ESET NOD32 Antivirus, version of virus signature database 5926 (20110304) __________ > > The message was checked by ESET NOD32 Antivirus. > > http://www.eset.com > From adilger at dilger.ca Fri Mar 4 22:17:55 2011 From: adilger at dilger.ca (Andreas Dilger) Date: Fri, 4 Mar 2011 15:17:55 -0700 Subject: ext3_free_blocks_sb when removing a more than 1GB file In-Reply-To: References: <4D7112A4.6050209@redhat.com> <4D712506.4070405@redhat.com> <4D71286C.2080408@redhat.com> Message-ID: <38CFAFAE-F09D-4FE7-AFD1-C79D2844144C@dilger.ca> On 2011-03-04, at 11:07 AM, Stephane Cerveau wrote: > I have several keys from the same brand, model and I have the same issue. > > When I said, a different key, it was a different brand. I would typically blame the USB key. Some cheap vendors use unreliable chips, and sometimes even mis-label e.g. 1GB flash as 2GB. > At the end, it seems that ext2 is working fine! Except I don't think ext2 is doing this bitmap validation at runtime, like ext3/4 is doing. I'm not sure whether "badblocks" is verifying that the storage is behaving correctly (i.e. correct block addressing), or only whether it is able to write/read a particular sector on disk. You could use a more advanced block device verification tool, like llverdev from Lustre, which writes a unique test pattern to every block, and then reads it back afterward. > So maybe a problem in ext3 in 2.6.23 kernel ?!? > I had a try on 2.6.32_27, I did not succeed to reproduce the issue. > > Do you know when ext3 is supposed to be stable ? For 10+ years already. > -----Original Message----- > From: Eric Sandeen [mailto:sandeen at redhat.com] > Sent: vendredi 4 mars 2011 18:59 > To: Stephane Cerveau > Cc: ext3-users at redhat.com; Tristan Pateloup > Subject: Re: ext3_free_blocks_sb when removing a more than 1GB file > > On 3/4/11 11:54 AM, Stephane Cerveau wrote: >> Hi, >> >> Thanks for your answer. >> Here is my steps: >> >> - mkfs.ext3 /dev/sda1 >> - mount /dev/sda1 /mnt/usb >> - dd if=/dev/zero of=/mnt/usb/test_file bs=1M count=1025 ( the size is important) >> - sync >> - rm /mnt/usb/test_file > > Ok, I had the impression that you were removing the usb key at > some point in the test, but I guess not. > >> Then many errors appears "Ext3-fs error ( device sda1): ext3_free_blocks_sb: bit already cleared for block xxxx" >> >> I tried to umount/mount the storage but its not working also. >> I tried to check the device before removing the file, not working also. > > you mean that umount/mount/rm gives the same error? As does umount/fsck/mount/rm ? > >> Indeed with another usb key it's working... >> I'm using a kernel 2.6.23 >> >> The problem does NOT appear with mkfs.ext2 /dev/sda1 before >> >> What do you advise to do ? > > Try a much newer kernel, first of all, to see if it's a known, fixed bug. > > But since it works on another usb key, I still tend to blame the hardware. > "bit already cleared" makes it sound like it is reading zeros when it > should not be. > > -Eric > >> BR >> >> Stephane. > > > > > __________ Information from ESET NOD32 Antivirus, version of virus signature database 5926 (20110304) __________ > > The message was checked by ESET NOD32 Antivirus. > > http://www.eset.com > > > > __________ Information from ESET NOD32 Antivirus, version of virus signature database 5926 (20110304) __________ > > The message was checked by ESET NOD32 Antivirus. > > http://www.eset.com > > > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users Cheers, Andreas From alex at alex.org.uk Sat Mar 5 09:21:38 2011 From: alex at alex.org.uk (Alex Bligh) Date: Sat, 05 Mar 2011 09:21:38 +0000 Subject: ext3_free_blocks_sb when removing a more than 1GB file In-Reply-To: References: <4D7112A4.6050209@redhat.com> <4D712506.4070405@redhat.com> Message-ID: <026CC28AC9A32E0A7FD2D20F@nimrod.local> --On 4 March 2011 18:54:23 +0100 Stephane Cerveau wrote: > Then many errors appears "Ext3-fs error ( device sda1): > ext3_free_blocks_sb: bit already cleared for block xxxx" > > I tried to umount/mount the storage but its not working also. > I tried to check the device before removing the file, not working also. > Indeed with another usb key it's working... > I'm using a kernel 2.6.23 If it's that old, perhaps it is http://lkml.org/lkml/2008/11/14/121 fixed by http://www.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.29 in 2.6.29 commit 7ef0d7377cb287e08f3ae94cebc919448e1f5dff I think. I am interested in this particular error. We see it very occasionally on 2.6.31 in an environment where we can be sure no underlying I/O error occurred (because it's on a VM whose dom0 uses iSCSI mapped to the domU's disk) and we would see error logging. It is normally during intense disk activity (unlike the OP), such as running "aptitude update", often while unlinking a file. It does not appear to happen on ext4. Unfortunately the result is that the disk goes readonly. Our current theory is that the disk got damaged in some way during a previous unclean shutdown that fsck did not fix. Is that possible? -- Alex Bligh From scerveau at awox.com Sat Mar 5 15:52:33 2011 From: scerveau at awox.com (Stephane Cerveau) Date: Sat, 5 Mar 2011 16:52:33 +0100 Subject: ext3_free_blocks_sb when removing a more than 1GB file In-Reply-To: <38CFAFAE-F09D-4FE7-AFD1-C79D2844144C@dilger.ca> References: <4D7112A4.6050209@redhat.com> <4D712506.4070405@redhat.com> <4D71286C.2080408@redhat.com> , <38CFAFAE-F09D-4FE7-AFD1-C79D2844144C@dilger.ca> Message-ID: Hello, Thanks for your answer, i will have a look to this tool and test my hardware ... I have many similar keys and the problem appears systematically on these devices ... So yes i could blame the hardware but it seems to be validated by the provide and I tried on a desktop linux ( embedded system problem) with a 2.6.32 and I dont have the issue. I dont have the problem neither, when I change the block size of ext3. But I thnik that the performance can be decreased ( 4096 to 2048 ). BR Stephane ________________________________________ De : Andreas Dilger [adilger at dilger.ca] Date d'envoi : vendredi 4 mars 2011 23:17 ? : Stephane Cerveau Cc : Eric Sandeen; ext3-users at redhat.com; Tristan Pateloup Objet : Re: ext3_free_blocks_sb when removing a more than 1GB file On 2011-03-04, at 11:07 AM, Stephane Cerveau wrote: > I have several keys from the same brand, model and I have the same issue. > > When I said, a different key, it was a different brand. I would typically blame the USB key. Some cheap vendors use unreliable chips, and sometimes even mis-label e.g. 1GB flash as 2GB. > At the end, it seems that ext2 is working fine! Except I don't think ext2 is doing this bitmap validation at runtime, like ext3/4 is doing. I'm not sure whether "badblocks" is verifying that the storage is behaving correctly (i.e. correct block addressing), or only whether it is able to write/read a particular sector on disk. You could use a more advanced block device verification tool, like llverdev from Lustre, which writes a unique test pattern to every block, and then reads it back afterward. > So maybe a problem in ext3 in 2.6.23 kernel ?!? > I had a try on 2.6.32_27, I did not succeed to reproduce the issue. > > Do you know when ext3 is supposed to be stable ? For 10+ years already. > -----Original Message----- > From: Eric Sandeen [mailto:sandeen at redhat.com] > Sent: vendredi 4 mars 2011 18:59 > To: Stephane Cerveau > Cc: ext3-users at redhat.com; Tristan Pateloup > Subject: Re: ext3_free_blocks_sb when removing a more than 1GB file > > On 3/4/11 11:54 AM, Stephane Cerveau wrote: >> Hi, >> >> Thanks for your answer. >> Here is my steps: >> >> - mkfs.ext3 /dev/sda1 >> - mount /dev/sda1 /mnt/usb >> - dd if=/dev/zero of=/mnt/usb/test_file bs=1M count=1025 ( the size is important) >> - sync >> - rm /mnt/usb/test_file > > Ok, I had the impression that you were removing the usb key at > some point in the test, but I guess not. > >> Then many errors appears "Ext3-fs error ( device sda1): ext3_free_blocks_sb: bit already cleared for block xxxx" >> >> I tried to umount/mount the storage but its not working also. >> I tried to check the device before removing the file, not working also. > > you mean that umount/mount/rm gives the same error? As does umount/fsck/mount/rm ? > >> Indeed with another usb key it's working... >> I'm using a kernel 2.6.23 >> >> The problem does NOT appear with mkfs.ext2 /dev/sda1 before >> >> What do you advise to do ? > > Try a much newer kernel, first of all, to see if it's a known, fixed bug. > > But since it works on another usb key, I still tend to blame the hardware. > "bit already cleared" makes it sound like it is reading zeros when it > should not be. > > -Eric > >> BR >> >> Stephane. > > > > > __________ Information from ESET NOD32 Antivirus, version of virus signature database 5926 (20110304) __________ > > The message was checked by ESET NOD32 Antivirus. > > http://www.eset.com > > > > __________ Information from ESET NOD32 Antivirus, version of virus signature database 5926 (20110304) __________ > > The message was checked by ESET NOD32 Antivirus. > > http://www.eset.com > > > _______________________________________________ > Ext3-users mailing list > Ext3-users at redhat.com > https://www.redhat.com/mailman/listinfo/ext3-users Cheers, Andreas From scerveau at awox.com Sat Mar 5 15:52:50 2011 From: scerveau at awox.com (Stephane Cerveau) Date: Sat, 5 Mar 2011 16:52:50 +0100 Subject: ext3_free_blocks_sb when removing a more than 1GB file In-Reply-To: <026CC28AC9A32E0A7FD2D20F@nimrod.local> References: <4D7112A4.6050209@redhat.com> <4D712506.4070405@redhat.com> , <026CC28AC9A32E0A7FD2D20F@nimrod.local> Message-ID: Hello Alex, With a brand new key, I had the issue after formatting it, copying the file and erasing the file without any shutdown or any trouble... I will have a look into the commit you stipulated in your email and let you know ... Stephane ________________________________________ De : Alex Bligh [alex at alex.org.uk] Date d'envoi : samedi 5 mars 2011 10:21 ? : Stephane Cerveau; Eric Sandeen Cc : ext3-users at redhat.com; Tristan Pateloup; Alex Bligh Objet : RE: ext3_free_blocks_sb when removing a more than 1GB file --On 4 March 2011 18:54:23 +0100 Stephane Cerveau wrote: > Then many errors appears "Ext3-fs error ( device sda1): > ext3_free_blocks_sb: bit already cleared for block xxxx" > > I tried to umount/mount the storage but its not working also. > I tried to check the device before removing the file, not working also. > Indeed with another usb key it's working... > I'm using a kernel 2.6.23 If it's that old, perhaps it is http://lkml.org/lkml/2008/11/14/121 fixed by http://www.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.29 in 2.6.29 commit 7ef0d7377cb287e08f3ae94cebc919448e1f5dff I think. I am interested in this particular error. We see it very occasionally on 2.6.31 in an environment where we can be sure no underlying I/O error occurred (because it's on a VM whose dom0 uses iSCSI mapped to the domU's disk) and we would see error logging. It is normally during intense disk activity (unlike the OP), such as running "aptitude update", often while unlinking a file. It does not appear to happen on ext4. Unfortunately the result is that the disk goes readonly. Our current theory is that the disk got damaged in some way during a previous unclean shutdown that fsck did not fix. Is that possible? -- Alex Bligh From samuel at bcgreen.com Sat Mar 5 18:50:43 2011 From: samuel at bcgreen.com (Stephen Samuel) Date: Sat, 5 Mar 2011 10:50:43 -0800 Subject: ext3_free_blocks_sb when removing a more than 1GB file In-Reply-To: <38CFAFAE-F09D-4FE7-AFD1-C79D2844144C@dilger.ca> References: <4D7112A4.6050209@redhat.com> <4D712506.4070405@redhat.com> <4D71286C.2080408@redhat.com> <38CFAFAE-F09D-4FE7-AFD1-C79D2844144C@dilger.ca> Message-ID: On Fri, Mar 4, 2011 at 2:17 PM, Andreas Dilger wrote: > On 2011-03-04, at 11:07 AM, Stephane Cerveau wrote: > > I have several keys from the same brand, model and I have the same issue. > > > > When I said, a different key, it was a different brand. > > I would typically blame the USB key. Some cheap vendors use unreliable > chips, and sometimes even mis-label e.g. 1GB flash as 2GB. > > > At the end, it seems that ext2 is working fine! > > Except I don't think ext2 is doing this bitmap validation at runtime, like > ext3/4 is doing. > > I'm not sure whether "badblocks" is verifying that the storage is behaving > correctly (i.e. correct block addressing), or only whether it is able to > write/read a particular sector on disk. > You could use a more advanced block device verification tool, like llverdev > from Lustre, which writes a unique test pattern to every block, and then > reads it back afterward. > Quick test, in the meantime: badblocks -n -t0xffff /dev/the_thumb_drive -n is non-destructive. -w is destructive of data. then I'd try '-n -trandom -p5' If you don't mind losing the data (I don't think you do), then use -w, rather than -n. -- Stephen Samuel http://www.bcgreen.com Software, like love, 778-861-7641 grows when you give it away -------------- next part -------------- An HTML attachment was scrubbed... URL: From y-takahashi at gmo-hs.com Mon Mar 7 12:00:33 2011 From: y-takahashi at gmo-hs.com (GMO-HS Yoichi Takahashi) Date: Mon, 07 Mar 2011 21:00:33 +0900 Subject: minus disk usage In-Reply-To: <4D7127A8.6070302@redhat.com> References: <20110304155201.895B.A9C031E0@gmo-hs.com> <4D7127A8.6070302@redhat.com> Message-ID: <20110307210032.01AA.A9C031E0@gmo-hs.com> Hi Eric Thank you for your prompt reply. > It may be changing because files are being added & removed > at the time? As for the negative... I imagine you're right > What kernel and what coreutils are you using? 2.6.18-8.el5PAE #1 SMP Thu Mar 15 20:29:51 EDT 2007 i686 i686 i386 GNU/Linux Name : coreutils Relocations: (not relocatable) Version : 5.97 Vendor: CentOS Release : 23.el5_4.1 Build Date: Tue Oct 27 11:12:41 2009 Install Date: Wed Feb 9 17:41:18 2011 Build Host: builder16.centos.org Group : System Environment/Base Source RPM: coreutils-5.97-23.el5_4.1.src.rpm Size : 9053932 License: GPLv2+ Signature : DSA/SHA1, Tue Oct 27 23:47:24 2009, Key ID a8a447dce8562897 URL : http://www.gnu.org/software/coreutils/ Summary : The GNU core utilities: a set of tools commonly used in shell scripts Description : These are the GNU core utilities. This package is the combination of the old GNU fileutils, sh-utils, and textutils packages. Filesystem 4K-blocks Used Available Use% Mounted on /dev/sda2 25393143 2685539 21396901 12% / File: "/" ID: 0 Namelen: 255 Type: ext2/ext3 Block size: 4096 Fundamental block size: 4096 Blocks: Total: 25393143 Free: 22707604 Available: 21396901 Inodes: Total: 26214400 Free: 26037500 I did many things,The server has recovered. Something about this bothers me Let's bring this matter to a close. You will hear from me again. From scerveau at awox.com Mon Mar 7 15:05:12 2011 From: scerveau at awox.com (Stephane Cerveau) Date: Mon, 7 Mar 2011 16:05:12 +0100 Subject: ext3_free_blocks_sb when removing a more than 1GB file In-Reply-To: <026CC28AC9A32E0A7FD2D20F@nimrod.local> References: <4D7112A4.6050209@redhat.com> <4D712506.4070405@redhat.com> <026CC28AC9A32E0A7FD2D20F@nimrod.local> Message-ID: I tried to integrate this patch but it still does not work. http://gitorious.org/opensuse/kernel-source/commit/9f62d21f70e77298018f63c72e6d10a621ee6dcf I don't know how to debug it and don't understand why it happens only with large files. Is there anyone who can help me or advise me on how I could debug it ...Get some log or anything...:) I have to say that I'm working on an embedded system with a SH4 processor... Thanks St?phane. -----Original Message----- From: Alex Bligh [mailto:alex at alex.org.uk] Sent: samedi 5 mars 2011 10:22 To: Stephane Cerveau; Eric Sandeen Cc: ext3-users at redhat.com; Tristan Pateloup; Alex Bligh Subject: RE: ext3_free_blocks_sb when removing a more than 1GB file --On 4 March 2011 18:54:23 +0100 Stephane Cerveau wrote: > Then many errors appears "Ext3-fs error ( device sda1): > ext3_free_blocks_sb: bit already cleared for block xxxx" > > I tried to umount/mount the storage but its not working also. > I tried to check the device before removing the file, not working also. > Indeed with another usb key it's working... > I'm using a kernel 2.6.23 If it's that old, perhaps it is http://lkml.org/lkml/2008/11/14/121 fixed by http://www.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.29 in 2.6.29 commit 7ef0d7377cb287e08f3ae94cebc919448e1f5dff I think. I am interested in this particular error. We see it very occasionally on 2.6.31 in an environment where we can be sure no underlying I/O error occurred (because it's on a VM whose dom0 uses iSCSI mapped to the domU's disk) and we would see error logging. It is normally during intense disk activity (unlike the OP), such as running "aptitude update", often while unlinking a file. It does not appear to happen on ext4. Unfortunately the result is that the disk goes readonly. Our current theory is that the disk got damaged in some way during a previous unclean shutdown that fsck did not fix. Is that possible? -- Alex Bligh __________ Information from ESET NOD32 Antivirus, version of virus signature database 5931 (20110306) __________ The message was checked by ESET NOD32 Antivirus. http://www.eset.com __________ Information from ESET NOD32 Antivirus, version of virus signature database 5933 (20110307) __________ The message was checked by ESET NOD32 Antivirus. http://www.eset.com From y-takahashi at gmo-hs.com Thu Mar 3 05:53:16 2011 From: y-takahashi at gmo-hs.com (GMO-HS Yoichi Takahashi) Date: Thu, 03 Mar 2011 14:53:16 +0900 Subject: minus disk usage Message-ID: <20110303145315.3883.A9C031E0@gmo-hs.com> Hi,This is Yoichi Takahashi I have a trouble on the ext3 filesystem. The display changes whenever the df command is executed. ?at short intervals? It is a minus display for the following, and normal displays. see below /dev/sda2 Filesystem Size Used Avail Use% Mounted on /dev/sda2 97G -345M 93G 0% / /dev/sda1 99M 15M 80M 16% /boot tmpfs 2.0G 0 2.0G 0% /dev/shm /dev/sda3 803G 2.5G 759G 1% /home Filesystem Size Used Avail Use% Mounted on /dev/sda2 97G 1.2G 91G 2% / /dev/sda1 99M 15M 80M 16% /boot tmpfs 2.0G 0 2.0G 0% /dev/shm /dev/sda3 803G 2.5G 759G 1% /home Filesystem Size Used Avail Use% Mounted on /dev/sda2 97G 448M 92G 1% / /dev/sda1 99M 15M 80M 16% /boot tmpfs 2.0G 0 2.0G 0% /dev/shm /dev/sda3 803G 2.5G 759G 1% /home Filesystem Size Used Avail Use% Mounted on /dev/sda2 97G -109M 92G 0% / /dev/sda1 99M 15M 80M 16% /boot tmpfs 2.0G 0 2.0G 0% /dev/shm /dev/sda3 803G 2.5G 759G 1% /home The load is always a high server. LoadAverage is always 3or4 in the server Dose anyone know why this happned ? Any ideas be appreciated. ?????????????????????????????? GMO?????? & ?????????? ???23?4?1???GMO??????????????????? ?????????????? ???????????????????????? ??????Youichi Takahashi ?150-8512?????????26?1????????? Cerulean Tower 26-1 Sakuragaoka-cho,Shibuya-ku,Tokyo (150-8512) Japan TEL +81-3-6415-7075 FAX ? +81-3-6415-6108 E-MAIL y-takahashi at gmo-hs.com URL http://www.gmo-hs.com STOCK CODE ??????3788???????? ?????????????????????????????? From scerveau at awox.com Tue Mar 8 15:24:46 2011 From: scerveau at awox.com (Stephane Cerveau) Date: Tue, 8 Mar 2011 16:24:46 +0100 Subject: ext3_free_blocks_sb when removing a more than 1GB file In-Reply-To: References: <4D7112A4.6050209@redhat.com> <4D712506.4070405@redhat.com> <026CC28AC9A32E0A7FD2D20F@nimrod.local> Message-ID: Dear all, First of all, it seems that I don't have any trouble with 2048 block size. I did a test with random size from 1024 to 2048 MB and I did not have the issue. Do you know which drawback I can have using this size ? ( except the speed ??) Concerning the key using 4096, it seems that I have some trouble also on a regular desktop with 2.6.23 and 2.6.28 kernel from ubuntu dist (live cd). But I don't have the issue on a ubuntu 2.6.32.27 generic kernel. Best regards. Stephane -----Original Message----- From: Stephane Cerveau [mailto:scerveau at awox.com] Sent: lundi 7 mars 2011 16:05 To: Alex Bligh; Eric Sandeen Cc: ext3-users at redhat.com; Tristan Pateloup Subject: RE: ext3_free_blocks_sb when removing a more than 1GB file I tried to integrate this patch but it still does not work. http://gitorious.org/opensuse/kernel-source/commit/9f62d21f70e77298018f63c72e6d10a621ee6dcf I don't know how to debug it and don't understand why it happens only with large files. Is there anyone who can help me or advise me on how I could debug it ...Get some log or anything...:) I have to say that I'm working on an embedded system with a SH4 processor... Thanks St?phane. -----Original Message----- From: Alex Bligh [mailto:alex at alex.org.uk] Sent: samedi 5 mars 2011 10:22 To: Stephane Cerveau; Eric Sandeen Cc: ext3-users at redhat.com; Tristan Pateloup; Alex Bligh Subject: RE: ext3_free_blocks_sb when removing a more than 1GB file --On 4 March 2011 18:54:23 +0100 Stephane Cerveau wrote: > Then many errors appears "Ext3-fs error ( device sda1): > ext3_free_blocks_sb: bit already cleared for block xxxx" > > I tried to umount/mount the storage but its not working also. > I tried to check the device before removing the file, not working also. > Indeed with another usb key it's working... > I'm using a kernel 2.6.23 If it's that old, perhaps it is http://lkml.org/lkml/2008/11/14/121 fixed by http://www.kernel.org/pub/linux/kernel/v2.6/ChangeLog-2.6.29 in 2.6.29 commit 7ef0d7377cb287e08f3ae94cebc919448e1f5dff I think. I am interested in this particular error. We see it very occasionally on 2.6.31 in an environment where we can be sure no underlying I/O error occurred (because it's on a VM whose dom0 uses iSCSI mapped to the domU's disk) and we would see error logging. It is normally during intense disk activity (unlike the OP), such as running "aptitude update", often while unlinking a file. It does not appear to happen on ext4. Unfortunately the result is that the disk goes readonly. Our current theory is that the disk got damaged in some way during a previous unclean shutdown that fsck did not fix. Is that possible? -- Alex Bligh __________ Information from ESET NOD32 Antivirus, version of virus signature database 5931 (20110306) __________ The message was checked by ESET NOD32 Antivirus. http://www.eset.com __________ Information from ESET NOD32 Antivirus, version of virus signature database 5933 (20110307) __________ The message was checked by ESET NOD32 Antivirus. http://www.eset.com _______________________________________________ Ext3-users mailing list Ext3-users at redhat.com https://www.redhat.com/mailman/listinfo/ext3-users __________ Information from ESET NOD32 Antivirus, version of virus signature database 5933 (20110307) __________ The message was checked by ESET NOD32 Antivirus. http://www.eset.com __________ Information from ESET NOD32 Antivirus, version of virus signature database 5936 (20110308) __________ The message was checked by ESET NOD32 Antivirus. http://www.eset.com From dshaw at JABBERWOCKY.COM Tue Mar 15 22:42:35 2011 From: dshaw at JABBERWOCKY.COM (David Shaw) Date: Tue, 15 Mar 2011 18:42:35 -0400 Subject: Using stride on non-RAID Message-ID: Hello, I understand the need for a proper stride setting when formatting a filesystem on a RAID device. However, is there any problem in using a stride setting when formatting a filesystem on a regular non-RAID, non-SSD, just plain-vanilla-single-disk block device? I'm sure there isn't any benefit to it, but I'm curious if there is any harm. The reason I ask is I'm looking at some code here that can be used on either RAID or non-RAID devices. The stride setting it has is correct for the particular RAID setup it is intended for, but it also uses those settings when formatting a non-RAID device. David From sandeen at redhat.com Tue Mar 15 22:53:55 2011 From: sandeen at redhat.com (Eric Sandeen) Date: Tue, 15 Mar 2011 17:53:55 -0500 Subject: Using stride on non-RAID In-Reply-To: References: Message-ID: <4D7FEE03.3020809@redhat.com> On 3/15/11 5:42 PM, David Shaw wrote: > Hello, > > I understand the need for a proper stride setting when formatting a > filesystem on a RAID device. However, is there any problem in using > a stride setting when formatting a filesystem on a regular non-RAID, > non-SSD, just plain-vanilla-single-disk block device? I'm sure there > isn't any benefit to it, but I'm curious if there is any harm. > > The reason I ask is I'm looking at some code here that can be used on > either RAID or non-RAID devices. The stride setting it has is > correct for the particular RAID setup it is intended for, but it also > uses those settings when formatting a non-RAID device. > > David just FWIW, recent kernels & e2fsprogs will just automatically pick stride based on storage geometry - for md/lvm at least, and for scsi devices that export this geometry as well. ext4 has a little stripe-awareness in its allocator; otherwise, stride just staggers bitmap starts so they don't all end up on the same spindle; [1] Offhand I don't think it'd cause any harm to set stride on non-raid. -Eric [1] ext2fs_allocate_group_table() in lib/ext2fs/alloc_tables.c From dshaw at jabberwocky.com Wed Mar 16 18:02:34 2011 From: dshaw at jabberwocky.com (David Shaw) Date: Wed, 16 Mar 2011 14:02:34 -0400 Subject: Using stride on non-RAID In-Reply-To: <4D7FEE03.3020809@redhat.com> References: <4D7FEE03.3020809@redhat.com> Message-ID: <84874DA1-AB26-413B-9496-D1CD7986FDAE@jabberwocky.com> On Mar 15, 2011, at 6:53 PM, Eric Sandeen wrote: > On 3/15/11 5:42 PM, David Shaw wrote: >> Hello, >> >> I understand the need for a proper stride setting when formatting a >> filesystem on a RAID device. However, is there any problem in using >> a stride setting when formatting a filesystem on a regular non-RAID, >> non-SSD, just plain-vanilla-single-disk block device? I'm sure there >> isn't any benefit to it, but I'm curious if there is any harm. >> >> The reason I ask is I'm looking at some code here that can be used on >> either RAID or non-RAID devices. The stride setting it has is >> correct for the particular RAID setup it is intended for, but it also >> uses those settings when formatting a non-RAID device. >> >> David > > just FWIW, recent kernels & e2fsprogs will just automatically pick > stride based on storage geometry - for md/lvm at least, and for > scsi devices that export this geometry as well. > > ext4 has a little stripe-awareness in its allocator; otherwise, stride > just staggers bitmap starts so they don't all end up on the same spindle; [1] > Offhand I don't think it'd cause any harm to set stride on non-raid. Thanks very much for your pointers. It's a nice enhancement that this is done automatically now. David From jidong.xiao at gmail.com Sat Mar 26 23:20:08 2011 From: jidong.xiao at gmail.com (Jidong Xiao) Date: Sat, 26 Mar 2011 19:20:08 -0400 Subject: Ext3: Why data=journal is better than data=ordered when data needs to be read from and written to disk at the same time Message-ID: Hi, I see many literatures mentioned this, but I have never seen any one explains it in detail.(Although this link exposed the original story: http://lkml.indiana.edu/hypermail//linux/kernel/0107.1/0364.html) "Journal mode: This mode is the slowest except when data needs to be read from and written to disk at the same time where it outperform all others mode." Since this is pretty counter-intuitive, I believe many people are not aware about the root cause, thus I won't be the last one to ask this same question. Can any one kindly explain it so as to make it more clear? Thank you! Regards Jidong From tytso at mit.edu Sat Mar 26 23:53:11 2011 From: tytso at mit.edu (Ted Ts'o) Date: Sat, 26 Mar 2011 19:53:11 -0400 Subject: Ext3: Why data=journal is better than data=ordered when data needs to be read from and written to disk at the same time In-Reply-To: References: Message-ID: <20110326235311.GB21075@thunk.org> On Sat, Mar 26, 2011 at 07:20:08PM -0400, Jidong Xiao wrote: > Hi, > > I see many literatures mentioned this, but I have never seen any one > explains it in detail.(Although this link exposed the original story: > http://lkml.indiana.edu/hypermail//linux/kernel/0107.1/0364.html) > > "Journal mode: This mode is the slowest except when data needs to be > read from and written to disk at the same time where it outperform all > others mode." I didn't see any reference to that in that mail thread (which seemed to be mostly about reiserfs). It is true that you have a bursty, fsync-heavy workload, you can reduce latency by using data=journal mode, because it avoids seeks --- the data and metadata blocks are written into the journal, and this allows the fsync() to finish more quickly. There are some applications where this might be useful, such as NFS file serving, where the NFS server is not allowed to send an acknowledgement back to the client until the data is written to stable store. - Ted From jidong.xiao at gmail.com Sun Mar 27 00:25:23 2011 From: jidong.xiao at gmail.com (Jidong Xiao) Date: Sat, 26 Mar 2011 20:25:23 -0400 Subject: Ext3: Why data=journal is better than data=ordered when data needs to be read from and written to disk at the same time In-Reply-To: <20110326235311.GB21075@thunk.org> References: <20110326235311.GB21075@thunk.org> Message-ID: On Sat, Mar 26, 2011 at 7:53 PM, Ted Ts'o wrote: > On Sat, Mar 26, 2011 at 07:20:08PM -0400, Jidong Xiao wrote: >> Hi, >> >> I see many literatures mentioned this, but I have never seen any one >> explains it in detail.(Although this link exposed the original story: >> http://lkml.indiana.edu/hypermail//linux/kernel/0107.1/0364.html) >> >> "Journal mode: This mode is the slowest except when data needs to be >> read from and written to disk at the same time where it outperform all >> others mode." > > I didn't see any reference to that in that mail thread (which seemed > to be mostly about reiserfs). ?It is true that you have a bursty, > fsync-heavy workload, you can reduce latency by using data=journal > mode, because it avoids seeks --- the data and metadata blocks are > written into the journal, and this allows the fsync() to finish more > quickly. ?There are some applications where this might be useful, such > as NFS file serving, where the NFS server is not allowed to send an > acknowledgement back to the client until the data is written to stable > store. > > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? - Ted > Well, this first time when Andrew Morton claimed that data=journal better than data=ordered in certain conditions was when he announced the release of ext3-2.4-0.9.4: http://www.redhat.com/archives/ext3-users/2001-July/msg00169.html And the link I provided in the original email actually is source or background of this story. This release was immediately after the previous discussion. But my question is, why data=journal could outperform data=ordered, for the data=journal mode, you have to write the data and metadata blocks into the journal, but for the data=ordered mode, you only have to write the metadata blocks into the journal. If, in some certain cases, the former mode can avoid seeks, then the same behavior should apply to the latter mode. So it's really odd that the former mode can outperform the latter mode. Regards Jidong From tytso at mit.edu Sun Mar 27 02:44:10 2011 From: tytso at mit.edu (Ted Ts'o) Date: Sat, 26 Mar 2011 22:44:10 -0400 Subject: Ext3: Why data=journal is better than data=ordered when data needs to be read from and written to disk at the same time In-Reply-To: References: <20110326235311.GB21075@thunk.org> Message-ID: <20110327024410.GC21075@thunk.org> On Sat, Mar 26, 2011 at 08:25:23PM -0400, Jidong Xiao wrote: > > But my question is, why data=journal could outperform data=ordered, > for the data=journal mode, you have to write the data and metadata > blocks into the journal, but for the data=ordered mode, you only have > to write the metadata blocks into the journal. If, in some certain > cases, the former mode can avoid seeks, then the same behavior should > apply to the latter mode. So it's really odd that the former mode can > outperform the latter mode. When executing an fsync(), in data=ordered mode you have to write the data data blocks into the journal and wait for the data blocks to be written. This requires generally will require extra seeks. In data=journaled mode, the data blocks can be written directly into the sjoujournal without needing to seek. Of course eventually the data and metadata blocks will need to be written to their permanent locations before the journal space can be reused. But for short bursty write patterns, the fsync() latency will be much smaller in data=journal mode. Regards, - Ted From jidong.xiao at gmail.com Sun Mar 27 04:52:21 2011 From: jidong.xiao at gmail.com (Jidong Xiao) Date: Sun, 27 Mar 2011 00:52:21 -0400 Subject: Ext3: Why data=journal is better than data=ordered when data needs to be read from and written to disk at the same time In-Reply-To: <20110327024410.GC21075@thunk.org> References: <20110326235311.GB21075@thunk.org> <20110327024410.GC21075@thunk.org> Message-ID: On Sat, Mar 26, 2011 at 10:44 PM, Ted Ts'o wrote: > On Sat, Mar 26, 2011 at 08:25:23PM -0400, Jidong Xiao wrote: >> >> But my question is, why data=journal could outperform data=ordered, >> for the data=journal mode, you have to write the data and metadata >> blocks into the journal, but for the data=ordered mode, you only have >> to write the metadata blocks into the journal. If, in some certain >> cases, the former mode can avoid seeks, then the same behavior should >> apply to the latter mode. So it's really odd that the former mode can >> outperform the latter mode. > > When executing an fsync(), in data=ordered mode you have to write the > data data blocks into the journal and wait for the data blocks to be > written. ?This requires generally will require extra seeks. ?In > data=journaled mode, the data blocks can be written directly into the > sjoujournal without needing to seek. > > Of course eventually the data and metadata blocks will need to be > written to their permanent locations before the journal space can be > reused. ?But for short bursty write patterns, the fsync() latency will > be much smaller in data=journal mode. > Thank you Ted, it is really helpful! So the difference is: data=ordered mode: fsync() will return only if the meta data blocks have been written into the journal and the data blocks have been written into the disk. data=journal mode: fsync() returns if the meta data and data have been written into the journal. The journal is contiguous, so data=journal mode means no seeking needed, therefore, fsync() would return more quicker. If, we perform read from and write to the disk simultaneously, like following example: First, write data to the filesystem as quickly as possible: Rapid writing while true do dd if=/dev/zero of=largefile bs=16384 count=131072 done While data was being written to the test filesystem, read 16Mb of data from the same filesystem on the same disk, timing the results: Reading a 16Mb file time cat 16-meg-file > /dev/null In this case, if we conduct the experiment in data=journal mode and data=ordered mode respectively, since write latency is much smaller in data=journal mode, the disk will focus more on the read operation, hence, the read operation will also finish earlier than it do in the data=ordered mode. Am I understanding correctly? Regards Jidong From pg_ext3 at ext3.for.sabi.co.UK Mon Mar 28 16:43:20 2011 From: pg_ext3 at ext3.for.sabi.co.UK (Peter Grandi) Date: Mon, 28 Mar 2011 17:43:20 +0100 Subject: Ext3: Why data=journal is better than data=ordered when data needs to be read from and written to disk at the same time In-Reply-To: References: <20110326235311.GB21075@thunk.org> <20110327024410.GC21075@thunk.org> Message-ID: <19856.47784.421703.81840@tree.ty.sabi.co.UK> [ ... ] >> When executing an fsync(), in data=ordered mode you have to >> write the data data blocks into the journal and wait for the >> data blocks to be written. This requires generally will >> require extra seeks. In data=journaled mode, the data blocks >> can be written directly into the sjoujournal without needing >> to seek. >> Of course eventually the data and metadata blocks will need >> to be written to their permanent locations before the journal >> space can be reused. But for short bursty write patterns, >> the fsync() latency will be much smaller in data=journal >> mode. > [ ... ] > In this case, if we conduct the experiment in data=journal > mode and data=ordered mode respectively, That experiment is not necessarily demonstrative, it depends on RAM caching, elevator, ... > since write latency is much smaller in data=journal mode, Write latency is actually much longer: because it requires *two* writes instead of one. It is *fsync* latency as mentioned above that is smaller, because it depends only on the first write to what is in effect a small log based filesystem. This distinction matters a great deal, because it is the reason why "short bursty write patterns" is the qualification above. For long write patterns things are very different as the journal eventually fills up. For any given size it will also fill up a lot faster for 'data=journal'. Ahhh while writing that I have just realized that large journals can be a bad idea especially for metadata operations. Will have to think more about that. > the disk will focus more on the read operation, hence, the > read operation will also finish earlier than it do in the > data=ordered mode. Am I understanding correctly? That again depends on a lot of things, including caching, the elevator, flusher behaviour, exactly where the files are... ALso, whether the journal is on the same drive as the filesystem or another drive can matter enormously; also whether for example the journal is on SSD or battery backed RAM. There are reasons why 'ext2' still quite outperforms 'ext3' on simple tests.