From Vince.McIntyre at atnf.csiro.au Tue Nov 1 01:48:10 2005 From: Vince.McIntyre at atnf.csiro.au (Vincent McIntyre) Date: Tue, 1 Nov 2005 12:48:10 +1100 (EST) Subject: ext3 + fs > 2Tbyte In-Reply-To: <20051031220648.GC31368@schatzie.adilger.int> References: <20051031220648.GC31368@schatzie.adilger.int> Message-ID: thanks for your response, Andreas. > It sounds like you have overflowed the end of the 2TB device limit and > clobbered the beginning of your filesystem. This can happen if the > SCSI driver, kernel, or even ext3 isn't handling offsets > 2^31 properly. > I know RH has only recently started supporting ext3 filesystems > 2TB, > and it isn't clear that all drivers handle this properly yet. This box is using the fusion mpt drivers as in 2.6.7 - mptbase,mptscsih etc. Do you recall any >2Tb issue being fixed in later kernels? When the machine was last in a good state, the filesystem had 1.5Tbyte used, ie as far as I can tell nothing would have written past 2Tb, although I suppose there is no guarantee the space is used up in order of increasing offset. The filesystem was exported over NFS, and was being written to by client machines. It is using NFSv3 (nfs-kernel-server 1.0-2woody3). Worked great for several months. > Please update your e2fsprogs to the latest. You also need to use > "e2fsck -b 32768" (or multiple thereof) for such large filesystems. > I think newer e2fsprogs will print this message properly in that case. > I downloaded 1.38 from sourceforge and built it. No change in behaviour. I tried e2fsck with block offsets from 1025 to 4194305 in steps of 1024. I also tried dumpe2fs with the same range of offsets, also nothing. I've attached an strace of dumpe2fs, perhaps it is helpful? Another question. The e2fsck(8) manpage says the superblocks are at - Blocksize -b 1k 8193 2k 16384 4k 32768 Why is the superblock offset for 1k at 8193, not 8192? Is that an error in the manpage? Or should it be that the 2k, 4k block offsets should be odd, ie 16385, 32769? This article suggests the latter - http://www2.linuxjournal.com/article/0193 -------------- next part -------------- A non-text attachment was scrubbed... Name: log.dumpe2fs.gz Type: application/octet-stream Size: 1027 bytes Desc: URL: From tytso at mit.edu Tue Nov 1 04:46:58 2005 From: tytso at mit.edu (Theodore Ts'o) Date: Mon, 31 Oct 2005 23:46:58 -0500 Subject: What is the history of CONFIG_EXT{2,3}_CHECK? In-Reply-To: <20051031212503.GY31368@schatzie.adilger.int> References: <20051031001334.GP4180@stusta.de> <20051031212503.GY31368@schatzie.adilger.int> Message-ID: <20051101044658.GA7500@thunk.org> On Mon, Oct 31, 2005 at 02:25:03PM -0700, Andreas Dilger wrote: > On Oct 31, 2005 01:13 +0100, Adrian Bunk wrote: > > Can anyone tell me the history of CONFIG_EXT{2,3}_CHECK? > > > > There is code for a "check" option for mount if these options are > > enabled, but there's no way to enable them. > > These are expensive debugging options, which walk the inode/block bitmaps > for getting the group inode/block usage instead of using the group > summary data. Not used very often but I suspect occasionally useful for > developers mucking with ext[23] internals. Since it is developer-only > code it needs to be enabled with #define CONFIG_EXT[23]_CHECK in a > header or compile option. It's basically a stripped down version of e2fsck pass #5, though. Is there any reason why this needs to be in the kernel? If it would be useful I could easily make a userspace implementation of these checks. - Ted From adilger at clusterfs.com Tue Nov 1 06:08:32 2005 From: adilger at clusterfs.com (Andreas Dilger) Date: Mon, 31 Oct 2005 23:08:32 -0700 Subject: ext3 + fs > 2Tbyte In-Reply-To: References: <20051031220648.GC31368@schatzie.adilger.int> Message-ID: <20051101060832.GK31368@schatzie.adilger.int> On Nov 01, 2005 12:45 +1100, Vincent.McIntyre at csiro.au wrote: > >It sounds like you have overflowed the end of the 2TB device limit and > >clobbered the beginning of your filesystem. This can happen if the > >SCSI driver, kernel, or even ext3 isn't handling offsets > 2^31 properly. > >I know RH has only recently started supporting ext3 filesystems > 2TB, > >and it isn't clear that all drivers handle this properly yet. > > This box is using the fusion mpt drivers as in 2.6.7 - mptbase,mptscsih > etc. Do you recall any >2Tb issue being fixed in later kernels? Sorry, I don't know, I've just heard of occasional problems in this area and very few people reporting success. > When the machine was last in a good state, the filesystem had 1.5Tbyte > used, ie as far as I can tell nothing would have written past 2Tb, > although I suppose there is no guarantee the space is used up in order > of increasing offset. No, it is "kind of" used in increasing offset, but not strictly so. > >Please update your e2fsprogs to the latest. You also need to use > >"e2fsck -b 32768" (or multiple thereof) for such large filesystems. > >I think newer e2fsprogs will print this message properly in that case. You might also need to add "-B 4096". > I downloaded 1.38 from sourceforge and built it. No change in behaviour. > I tried e2fsck with block offsets from 1025 to 4194305 in steps of 1024. > I also tried dumpe2fs with the same range of offsets, also nothing. > > > Another question. The e2fsck(8) manpage says the superblocks are at - > Blocksize -b > 1k 8193 > 2k 16384 > 4k 32768 > Why is the superblock offset for 1k at 8193, not 8192? Because the ext[23] superblock is at 1024 bytes offset from the beginning of the device. For 1kB blocksize this is a whole block so the filesystem starts at block 1, while for larger blocksize this is still in block 0. Backup superblocks are at block offsets: (blocksize * 8) * {3,5,7}^n, n={0,1,2,3...} Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. From Vince.McIntyre at atnf.csiro.au Tue Nov 1 13:38:45 2005 From: Vince.McIntyre at atnf.csiro.au (Vincent McIntyre) Date: Wed, 2 Nov 2005 00:38:45 +1100 (EST) Subject: ext3 + fs > 2Tbyte In-Reply-To: References: <20051031220648.GC31368@schatzie.adilger.int> <20051101060832.GK31368@schatzie.adilger.int> Message-ID: >>> Please update your e2fsprogs to the latest. You also need to use >>> "e2fsck -b 32768" (or multiple thereof) for such large filesystems. >>> I think newer e2fsprogs will print this message properly in that case. > > You might also need to add "-B 4096". I gave that a try as well (and -B 8192), with the same results. I tried to make a copy of the first part of the filesystem with dd; # dd if=/dev/sdb1 of=/tmp/sdb1.dd bs=1 count=16384 \ conv=noerror,sync,notrunc This returned a file supposedly 16384 bytes long , but it didn't make much sense - looking at it with 'od' or 'hexdump' I get only 17 lines of output, not the roughly 178 I get for the same exercise with a good ext3 filesystem. (The /tmp filesystem has 128-byte inodes.) The output appears to be just the EFI GPT partition label. I'm starting to suspect something in the raid device is in a strange state. Or that the whole filesystem has just totally disappeared. :( A bit more digging in the logs found this, from the first boot when power was reapplied sdb : very big device. try to use READ CAPACITY(16). kernel: SCSI device sdb: 4688461824 512-byte hdwr sectors (2400492 MB) kernel: SCSI device sdb: drive cache: write back kernel: /dev/scsi/host2/bus0/target0/lun0: p1 kernel: Attached scsi disk sdb at scsi2, channel 0, id 0, lun 0 so far so good - and then (eek) kernel: VFS: Can't find ext3 filesystem on dev sdb1. when kjournald attempts to take a peek at the journal. >> I downloaded 1.38 from sourceforge and built it. No change in behaviour. >> I tried e2fsck with block offsets from 1025 to 4194305 in steps of 1024. >> I also tried dumpe2fs with the same range of offsets, also nothing. >> >> >> Another question. The e2fsck(8) manpage says the superblocks are at - >> Blocksize -b >> 1k 8193 >> 2k 16384 >> 4k 32768 >> Why is the superblock offset for 1k at 8193, not 8192? > > Because the ext[23] superblock is at 1024 bytes offset from the > beginning of the device. For 1kB blocksize this is a whole block > so the filesystem starts at block 1, while for larger blocksize > this is still in block 0. Backup superblocks are at block offsets: > > (blocksize * 8) * {3,5,7}^n, n={0,1,2,3...} I'm starting to get this, thanks for your patience. I tried all the feasible values of -b less than 2147483647, as I mention above. I did not try larger block sizes than 8192. I since found these links which fill out the picture a bit more. http://web.mit.edu/tytso/www/linux/ext2intro.html http://homepage.smc.edu/morgan_david/cs40/analyze-ext2.htm http://uranus.it.swin.edu.au/~jn/explore2fs/es2fs.htm http://www.unixwiz.net/techtips/recovering-ext2.html http://nepto.atomicpile.sk/mix/articles/ext2-superblock/ext2-superblock-notes.txt Any further thoughts appreciated. Cheers Vince From bloch at verdurin.com Tue Nov 1 15:59:31 2005 From: bloch at verdurin.com (bloch at verdurin.com) Date: Tue, 1 Nov 2005 15:59:31 +0000 Subject: Recover original superblock on corrupted filesystem? In-Reply-To: <20051025220521.GB17476@bloch.smith.man.ac.uk> References: <20051021145114.GA432@bloch.smith.man.ac.uk> <1130265842.4965.21.camel@orbit.scot.redhat.com> <20051025220521.GB17476@bloch.smith.man.ac.uk> Message-ID: <20051101155931.GA1256@bloch.smith.man.ac.uk> On Tue, 25 Oct 2005, bloch at verdurin.com wrote: > On Tue, 25 Oct 2005, Stephen C. Tweedie wrote: > > > Hi, > > > > On Fri, 2005-10-21 at 15:51 +0100, bloch at verdurin.com wrote: > > > > > It appears the original superblock is corrupted too, as it has an inode > > > count of 0. When I start fsck with -b 32760, it uses the alternate > > > superblock and proceeds. However, it restarts from the beginning a > > > couple of times and after the second restart it doesn't use the > > > alternate superblock, stopping instead as it can't find the original > > > one. > > > > Do you have a log of the fsck output, and which e2fsprogs version is > > this? Sounds like it may be an e2fsck bug if we don't honour the backup > > superblock flag on subsequent passes. > > > > I do have a log, yes. It's rather large... > > It's version 1.38 > > > > Is there a way around this, such as using one of the alternate > > > superblocks to replace the broken one > > > > Yes, "dd" of the appropriate block should work... but do this with > > extreme care, as getting it slightly wrong will cause major havoc. > > > > "debugfs" may be a better bet. > > > > # debugfs -w -b$BLOCKSIZE -s$SUPERBLOCK /dev/$DEV > > > > will tell debugfs to read the specified superblock. If you dirty the > > superblock (eg. with the "dirty" command) then quit, it will write back > > the backup superblock to the home location too. > > > As an update to this, the problem seems to have re-occurred. Here are the relevant error messages: EXT3-fs error (device sdb1): ext3_new_block: Allocating block in system zone - block = 41484288 Aborting journal on device sdb1. EXT3-fs error (device sdb1) in ext3_new_block: Journal has aborted ext3_abort called. EXT3-fs error (device sdb1): ext3_journal_start_sb: Detected aborted journal Remounting filesystem read-only EXT3-fs error (device sdb1) in ext3_reserve_inode_write: Journal has aborted __journal_remove_journal_head: freeing b_committed_data Is there anything you can suggest to look at before I run fsck on this? Thanks, Adam From adilger at clusterfs.com Tue Nov 1 18:09:12 2005 From: adilger at clusterfs.com (Andreas Dilger) Date: Tue, 1 Nov 2005 11:09:12 -0700 Subject: ext3 + fs > 2Tbyte In-Reply-To: References: <20051031220648.GC31368@schatzie.adilger.int> <20051101060832.GK31368@schatzie.adilger.int> Message-ID: <20051101180912.GN31368@schatzie.adilger.int> On Nov 02, 2005 00:37 +1100, Vincent.McIntyre at csiro.au wrote: > I tried to make a copy of the first part of the filesystem with dd; > > # dd if=/dev/sdb1 of=/tmp/sdb1.dd bs=1 count=16384 \ > conv=noerror,sync,notrunc > > This returned a file supposedly 16384 bytes long , but it didn't make > much sense - looking at it with 'od' or 'hexdump' I get only 17 lines > of output, not the roughly 178 I get for the same exercise with a good > ext3 filesystem. (The /tmp filesystem has 128-byte inodes.) "od" will compress lines that are identical (usually all-zero) as "*". If you want all the output, use -v. > The output appears to be just the EFI GPT partition label. The EFI GPT label can be restored from the backup (which is located at the end of the device) so that might have happened. > I'm starting to suspect something in the raid device is in a strange > state. Or that the whole filesystem has just totally disappeared. :( od -Ax -tx4 /dev/sdb1 | grep "^[0-9a-f]30 [0-9a-f]* [0-9a-f]* 000[1-3]ef53 " should locate the ext2 superblock magic number(s) eventually. There is also a utility in e2fsprogs source (misc/findsuper) that is not installed that you could build that does this more efficiently. If those don't appear anywhere, then something dramatically bad has happened to your filesystem. Aliasing would only damage at most (if you did "dd if=/dev/zero" into a file at the end of the filesystem) the first 300GB of your device, and there _should_ be a backup super somewhere beyond that (haven't done math to confirm). Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. From bloch at verdurin.com Wed Nov 2 13:09:57 2005 From: bloch at verdurin.com (bloch at verdurin.com) Date: Wed, 2 Nov 2005 13:09:57 +0000 Subject: Recover original superblock on corrupted filesystem? In-Reply-To: <20051101155931.GA1256@bloch.smith.man.ac.uk> References: <20051021145114.GA432@bloch.smith.man.ac.uk> <1130265842.4965.21.camel@orbit.scot.redhat.com> <20051025220521.GB17476@bloch.smith.man.ac.uk> <20051101155931.GA1256@bloch.smith.man.ac.uk> Message-ID: <20051102130956.GA16564@bloch.smith.man.ac.uk> On Tue, 01 Nov 2005, bloch at verdurin.com wrote: > > As an update to this, the problem seems to have re-occurred. Here are > the relevant error messages: > > EXT3-fs error (device sdb1): ext3_new_block: Allocating block in system > zone - block = 41484288 > Aborting journal on device sdb1. > EXT3-fs error (device sdb1) in ext3_new_block: Journal has aborted > ext3_abort called. > EXT3-fs error (device sdb1): ext3_journal_start_sb: Detected aborted > journal > Remounting filesystem read-only > EXT3-fs error (device sdb1) in ext3_reserve_inode_write: Journal has > aborted > __journal_remove_journal_head: freeing b_committed_data > Another update - exactly the same problem has occurred on an identical machine. The disks are on a Megaraid RAID1 array. Two other machines which only differ from the problem ones in that they have 4G RAM instead of 8G have not shown any such symptoms. Adam From kent at cpttm.org.mo Thu Nov 3 01:35:04 2005 From: kent at cpttm.org.mo (Kent Tong) Date: Thu, 3 Nov 2005 01:35:04 +0000 (UTC) Subject: filesystem remounted as read only Message-ID: Hi, I'm running kernel 2.6.8-15, lvm2 v2.01.04-5 and acl v2.2.23-1 on a Sunblade 100 (sparc). In a few months we have experienced for several times that an ext3 filesystem is remounted as read-only (this is due to the option "errors=remount-ro" in /etc/fstab). Sometimes there is no error in log files but sometimes we see: kernel: init_special_inode: bogus i_mode (3016) kernel: init_special_inode: bogus i_mode (3125) kernel: init_special_inode: bogus i_mode (3144) kernel: init_special_inode: bogus i_mode (3231) kernel: init_special_inode: bogus i_mode (3423) kernel: init_special_inode: bogus i_mode (3452) In the former case (no error in the logs), then running fsck will find no error. In the latter case, it may find some errors and fix them. I've run smartmontools to check the disks but no errors are found. I've run "fsck -c" to look up bad blocks but nothing is found. What else can I do to troubleshoot the problem? In particular, the most strange is if it is remounting as read-only, why there is no error in the logs? Could remounting as read-only prevent it from writing to the logs? Thanks! From Vincent.McIntyre at csiro.au Tue Nov 1 01:45:27 2005 From: Vincent.McIntyre at csiro.au (Vincent.McIntyre at csiro.au) Date: Tue, 1 Nov 2005 12:45:27 +1100 (EST) Subject: ext3 + fs > 2Tbyte In-Reply-To: <20051031220648.GC31368@schatzie.adilger.int> References: <20051031220648.GC31368@schatzie.adilger.int> Message-ID: thanks for your response, Andreas. > It sounds like you have overflowed the end of the 2TB device limit and > clobbered the beginning of your filesystem. This can happen if the > SCSI driver, kernel, or even ext3 isn't handling offsets > 2^31 properly. > I know RH has only recently started supporting ext3 filesystems > 2TB, > and it isn't clear that all drivers handle this properly yet. This box is using the fusion mpt drivers as in 2.6.7 - mptbase,mptscsih etc. Do you recall any >2Tb issue being fixed in later kernels? When the machine was last in a good state, the filesystem had 1.5Tbyte used, ie as far as I can tell nothing would have written past 2Tb, although I suppose there is no guarantee the space is used up in order of increasing offset. The filesystem was exported over NFS, and was being written to by client machines. It is using NFSv3 (nfs-kernel-server 1.0-2woody3). Worked great for several months. > Please update your e2fsprogs to the latest. You also need to use > "e2fsck -b 32768" (or multiple thereof) for such large filesystems. > I think newer e2fsprogs will print this message properly in that case. > I downloaded 1.38 from sourceforge and built it. No change in behaviour. I tried e2fsck with block offsets from 1025 to 4194305 in steps of 1024. I also tried dumpe2fs with the same range of offsets, also nothing. I've attached an strace of dumpe2fs, perhaps it is helpful? Another question. The e2fsck(8) manpage says the superblocks are at - Blocksize -b 1k 8193 2k 16384 4k 32768 Why is the superblock offset for 1k at 8193, not 8192? Is that an error in the manpage? Or should it be that the 2k, 4k block offsets should be odd, ie 16385, 32769? This article suggests the latter - http://www2.linuxjournal.com/article/0193 -------------- next part -------------- A non-text attachment was scrubbed... Name: log.dumpe2fs.gz Type: application/octet-stream Size: 1027 bytes Desc: URL: From Vincent.McIntyre at csiro.au Tue Nov 1 13:37:06 2005 From: Vincent.McIntyre at csiro.au (Vincent.McIntyre at csiro.au) Date: Wed, 2 Nov 2005 00:37:06 +1100 (EST) Subject: ext3 + fs > 2Tbyte In-Reply-To: <20051101060832.GK31368@schatzie.adilger.int> References: <20051031220648.GC31368@schatzie.adilger.int> <20051101060832.GK31368@schatzie.adilger.int> Message-ID: >>> Please update your e2fsprogs to the latest. You also need to use >>> "e2fsck -b 32768" (or multiple thereof) for such large filesystems. >>> I think newer e2fsprogs will print this message properly in that case. > > You might also need to add "-B 4096". I gave that a try as well (and -B 8192), with the same results. I tried to make a copy of the first part of the filesystem with dd; # dd if=/dev/sdb1 of=/tmp/sdb1.dd bs=1 count=16384 \ conv=noerror,sync,notrunc This returned a file supposedly 16384 bytes long , but it didn't make much sense - looking at it with 'od' or 'hexdump' I get only 17 lines of output, not the roughly 178 I get for the same exercise with a good ext3 filesystem. (The /tmp filesystem has 128-byte inodes.) The output appears to be just the EFI GPT partition label. I'm starting to suspect something in the raid device is in a strange state. Or that the whole filesystem has just totally disappeared. :( A bit more digging in the logs found this, from the first boot when power was reapplied sdb : very big device. try to use READ CAPACITY(16). kernel: SCSI device sdb: 4688461824 512-byte hdwr sectors (2400492 MB) kernel: SCSI device sdb: drive cache: write back kernel: /dev/scsi/host2/bus0/target0/lun0: p1 kernel: Attached scsi disk sdb at scsi2, channel 0, id 0, lun 0 so far so good - and then (eek) kernel: VFS: Can't find ext3 filesystem on dev sdb1. when kjournald attempts to take a peek at the journal. >> I downloaded 1.38 from sourceforge and built it. No change in behaviour. >> I tried e2fsck with block offsets from 1025 to 4194305 in steps of 1024. >> I also tried dumpe2fs with the same range of offsets, also nothing. >> >> >> Another question. The e2fsck(8) manpage says the superblocks are at - >> Blocksize -b >> 1k 8193 >> 2k 16384 >> 4k 32768 >> Why is the superblock offset for 1k at 8193, not 8192? > > Because the ext[23] superblock is at 1024 bytes offset from the > beginning of the device. For 1kB blocksize this is a whole block > so the filesystem starts at block 1, while for larger blocksize > this is still in block 0. Backup superblocks are at block offsets: > > (blocksize * 8) * {3,5,7}^n, n={0,1,2,3...} I'm starting to get this, thanks for your patience. I tried all the feasible values of -b less than 2147483647, as I mention above. I did not try larger block sizes than 8192. I since found these links which fill out the picture a bit more. http://web.mit.edu/tytso/www/linux/ext2intro.html http://homepage.smc.edu/morgan_david/cs40/analyze-ext2.htm http://uranus.it.swin.edu.au/~jn/explore2fs/es2fs.htm http://www.unixwiz.net/techtips/recovering-ext2.html http://nepto.atomicpile.sk/mix/articles/ext2-superblock/ext2-superblock-notes.txt Any further thoughts appreciated. Cheers Vince From adilger at clusterfs.com Thu Nov 3 17:40:40 2005 From: adilger at clusterfs.com (Andreas Dilger) Date: Thu, 3 Nov 2005 10:40:40 -0700 Subject: filesystem remounted as read only In-Reply-To: References: Message-ID: <20051103174040.GM31368@schatzie.adilger.int> On Nov 03, 2005 01:35 +0000, Kent Tong wrote: > I'm running kernel 2.6.8-15, lvm2 v2.01.04-5 and acl v2.2.23-1 on a > Sunblade 100 (sparc). In a few months we have experienced for several > times that an ext3 filesystem is remounted as read-only (this is due > to the option "errors=remount-ro" in /etc/fstab). Sometimes there is > no error in log files but sometimes we see: > > kernel: init_special_inode: bogus i_mode (3016) > kernel: init_special_inode: bogus i_mode (3125) > kernel: init_special_inode: bogus i_mode (3144) > kernel: init_special_inode: bogus i_mode (3231) > kernel: init_special_inode: bogus i_mode (3423) > kernel: init_special_inode: bogus i_mode (3452) > > In the former case (no error in the logs), then running fsck will find > no error. In the latter case, it may find some errors and fix them. > > I've run smartmontools to check the disks but no errors are found. > > I've run "fsck -c" to look up bad blocks but nothing is found. > > What else can I do to troubleshoot the problem? In particular, the > most strange is if it is remounting as read-only, why there is no > error in the logs? Could remounting as read-only prevent it from > writing to the logs? Remounting read-only should only happen in the context of "ext3_error". The init_special_inode() code does not return an error to the caller so in some cases this error may go unnoticed. In cases where there is a runtime error but no problem is found on disk, it is usually a memory error. It is also possible there is a cable error or similar. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. From jeff at jettis.com Thu Nov 3 21:18:40 2005 From: jeff at jettis.com (Jeff Dinisco) Date: Thu, 3 Nov 2005 13:18:40 -0800 Subject: mount r/w and r/o Message-ID: I have an ext3 filesystem mounted r/w on 1 host and r/o on multiple hosts. Dangerous but cost effective. I recently implemented some protection through a fc switch that restricts some hosts to r/o access to the data luns. So if someone types mount -o rw or something, all is not lost. The issue occurs when it's mounted r/w on 1 host and another host attempts to mount it r/o. The mount command takes about a minute to complete, it successfully mounts, and several error messages are reported... Nov 3 12:52:26 lax kernel: EXT3-fs: INFO: recovery required on readonly filesystem. Nov 3 12:52:26 lax kernel: EXT3-fs: write access will be enabled during recovery. Nov 3 12:52:27 lax kernel: cfq: depth 4 reached, tagging now on ...reports this for about 260 different sectors (makes sense, fc switch is preventing write access)... Nov 3 12:52:27 lax kernel: SCSI error : <494 0 0 1> return code = 0x8000002 Nov 3 12:52:27 lax kernel: sdl: Current: sense key: Data Protect Nov 3 12:52:27 lax kernel: Additional sense: Logical unit software write protected Nov 3 12:52:27 lax kernel: end_request: I/O error, dev sdl, sector 496 Nov 3 12:52:27 lax kernel: Buffer I/O error on device sdl, logical block 62 Nov 3 12:52:27 lax kernel: lost page write due to I/O error on sdl then completes... Nov 3 12:52:44 laxl kernel: EXT3-fs: recovery complete. (how???) Nov 3 12:52:44 laxl kernel: EXT3-fs: mounted filesystem with ordered data mode. This also happens on other filesystems and other devices under the same circumstances. When the filesystem is umounted from the r/w host, it mounts w/ out error on r/o host. It's interesting to note that after that's done, you can remount the filesystem on the r/w host, and then mount it on the r/o w/ just a few errors and w/ in seconds. My questions are... Should I be concerned by this? Is there a way to automatically skip the recovery attempt, and if so, should I use it? Am I going about this all wrong, is there a better way to do this (other than GFS)? Thanks. - Jeff -------------- next part -------------- An HTML attachment was scrubbed... URL: From menscher at uiuc.edu Thu Nov 3 21:37:37 2005 From: menscher at uiuc.edu (Damian Menscher) Date: Thu, 3 Nov 2005 15:37:37 -0600 (CST) Subject: mount r/w and r/o In-Reply-To: References: Message-ID: On Thu, 3 Nov 2005, Jeff Dinisco wrote: > I have an ext3 filesystem mounted r/w on 1 host and r/o on multiple > hosts. Dangerous but cost effective. > > My questions are... > Should I be concerned by this? > Is there a way to automatically skip the recovery attempt, and if so, > should I use it? > Am I going about this all wrong, is there a better way to do this (other > than GFS)? Sorry to ask the obvious question, but why not just use NFS? Damian Menscher -- -=#| www.uiuc.edu/~menscher/ Ofc:(650)273-2757 |#=- -=#| The above opinions are not necessarily those of my employers. |#=- From jeff at jettis.com Thu Nov 3 21:58:24 2005 From: jeff at jettis.com (Jeff Dinisco) Date: Thu, 3 Nov 2005 13:58:24 -0800 Subject: mount r/w and r/o Message-ID: Performance is the answer. This is streaming media and the throughput is very high. -----Original Message----- From: Wolber, Richard C [mailto:richard.c.wolber at boeing.com] Sent: Thursday, November 03, 2005 5:01 PM To: Damian Menscher; Jeff Dinisco Cc: ext3-users at redhat.com Subject: RE: mount r/w and r/o > > My questions are... > > Should I be concerned by this? > > Is there a way to automatically skip the recovery attempt, and if so, > > should I use it? > > Am I going about this all wrong, is there a better way to do this > > (other than GFS)? > > Sorry to ask the obvious question, but why not just use NFS? Performance? NFS is a lot of overhead to consider using on something like FC. Mounting r/o seems (and I await the experts opinion) at first glance to be a very effictive way of doing this. ..Chuck.. From adilger at clusterfs.com Thu Nov 3 22:07:35 2005 From: adilger at clusterfs.com (Andreas Dilger) Date: Thu, 3 Nov 2005 15:07:35 -0700 Subject: mount r/w and r/o In-Reply-To: References: Message-ID: <20051103220735.GW31368@schatzie.adilger.int> On Nov 03, 2005 13:18 -0800, Jeff Dinisco wrote: > I have an ext3 filesystem mounted r/w on 1 host and r/o on multiple > hosts. Dangerous but cost effective. I recently implemented some > protection through a fc switch that restricts some hosts to r/o access > to the data luns. So if someone types mount -o rw or something, all is > not lost. This is completely dangerous and should not be done. The FC switch is preventing potentially serious corruption to your filesystem, but is not preventing the r/o clients from getting corrupt/stale data and possibly crashing. There is nothing on those clients to keep their cache up-to-date with what is happening on the r/w server. > Is there a way to automatically skip the recovery attempt, and if so, > should I use it? No. > Am I going about this all wrong, is there a better way to do this (other > than GFS)? As another person suggested, NFS is fine for small-scale usage like this. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. From Vince.McIntyre at atnf.csiro.au Fri Nov 4 01:17:16 2005 From: Vince.McIntyre at atnf.csiro.au (Vincent McIntyre) Date: Fri, 4 Nov 2005 12:17:16 +1100 (EST) Subject: ext3 + fs > 2Tbyte In-Reply-To: <20051101180912.GN31368@schatzie.adilger.int> References: <20051031220648.GC31368@schatzie.adilger.int> <20051101060832.GK31368@schatzie.adilger.int> <20051101180912.GN31368@schatzie.adilger.int> Message-ID: Hi again I unplugged the original xraid and did some tests on a non-production one, building larger and larger filesystems, mounting, & dismounting. I can reproduce the problem with this sequence: * boot with xraid device plugged in, kernel 2.6.7-1-686-smp (packaged as 2.6.7-1.backports.org.1) * install a gpt disklabel with parted (-1.6.24 rather than 1.6.19) * make an ext2 filesystem as big as the disk with parted * mount - it mounts ok * umount * tune2fs -j (-1.38) * mount - it mounts ok (-2.12) * umount (-2.12) * reboot * try to mount - it fails. (the filesystem is not mentioned in /etc/fstab, the system should not be attempting to mount it of fsck it at boot time) No files were written to the filesystem during the test sequence. I have not yet tried filesystems smaller than 2Tb across reboots. I expect it will work, but I will try that shortly to check. findsuper tells me there are superblocks, but fs_blk_sz changes (!?) # /root/e2fsprogs-1.38/misc/findsuper /dev/sdb1 starting at 0, with 512 byte increments thisoff block fs_blk_sz blksz grp last_mount 17920 17 586057719 4096 0 Thu Jan 1 10:00:00 1970 134234624 131088 586057719 4096 1 Thu Jan 1 10:00:00 1970 134235648 131089 586057719 4096 1 Thu Jan 1 10:00:00 1970 209733120 204817 1023983 1024 25 Thu Jan 1 10:00:00 1970 226510336 221201 1023983 1024 27 Thu Jan 1 10:00:00 1970 402670080 393232 586057719 4096 3 Thu Jan 1 10:00:00 1970 402671104 393233 586057719 4096 3 Thu Jan 1 10:00:00 1970 411059712 401425 1023983 1024 49 Thu Jan 1 10:00:00 1970 671105536 655376 586057719 4096 5 Thu Jan 1 10:00:00 1970 671106560 655377 586057719 4096 5 Thu Jan 1 10:00:00 1970 679495168 663569 1023983 1024 81 Thu Jan 1 10:00:00 1970 939540992 917520 586057719 4096 7 Thu Jan 1 10:00:00 1970 939542016 917521 586057719 4096 7 Thu Jan 1 10:00:00 1970 1207976448 1179664 586057719 4096 9 Thu Jan 1 10:00:00 1970 1207977472 1179665 586057719 4096 9 Thu Jan 1 10:00:00 1970 3355460096 3276816 586057719 4096 25 Thu Jan 1 10:00:00 1970 3355461120 3276817 586057719 4096 25 Thu Jan 1 10:00:00 1970 3623895552 3538960 586057719 4096 27 Thu Jan 1 10:00:00 1970 3623896576 3538961 586057719 4096 27 Thu Jan 1 10:00:00 1970 6576685568 6422544 586057719 4096 49 Thu Jan 1 10:00:00 1970 6576686592 6422545 586057719 4096 49 Thu Jan 1 10:00:00 1970 10871652864 10616848 586057719 4096 81 Thu Jan 1 10:00:00 1970 10871653888 10616849 586057719 4096 81 Thu Jan 1 10:00:00 1970 16777232896 16384016 586057719 4096 125 Thu Jan 1 10:00:00 1970 16777233920 16384017 586057719 4096 125 Thu Jan 1 10:00:00 1970 ^C This is not looking good... Your nice od trick tells me slightly different locations for the superblock signatures - # od -Ax -tx4 /dev/sdb1 | \ grep "^[0-9a-f]*30 [0-9a-f]* [0-9a-f]* 000[1-3]ef53 " 004630 436a93dd 001e0000 0001ef53 00000001 8004630 00000000 001e0000 0001ef53 00000001 c804630 00000000 001e0000 0001ef53 00000001 d804630 00000000 001e0000 0001ef53 00000001 18004630 00000000 001e0000 0001ef53 00000001 18804630 00000000 001e0000 0001ef53 00000001 28004630 00000000 001e0000 0001ef53 00000001 28804630 00000000 001e0000 0001ef53 00000001 38004630 00000000 001e0000 0001ef53 00000001 48004630 00000000 001e0000 0001ef53 00000001 c8004630 00000000 001e0000 0001ef53 00000001 d8004630 00000000 001e0000 0001ef53 00000001 88004630 00000000 001e0000 0001ef53 00000001 ^C 0x004630 corresponds to byte offset 17968, 48 bytes away. Is this explainable by the position of the superblock signature within the disk block? 0x8004630 corresponds to 134220222, delta=14400. This is confusing me. So I tried a few e2fsck runs. I know I'm probably being dense but none of these worked: e2fsck -n -b 16 -B 4096 /dev/sdb1 e2fsck -n -b 17 -B 4096 /dev/sdb1 e2fsck -n -b 18 -B 4096 /dev/sdb1 e2fsck -n -b 204816 -B 1024 /dev/sdb1 e2fsck -n -b 204817 -B 1024 /dev/sdb1 e2fsck -n -b 204818 -B 1024 /dev/sdb1 e2fsck -n -b 221200 -B 1024 /dev/sdb1 e2fsck -n -b 221201 -B 1024 /dev/sdb1 e2fsck -n -b 221202 -B 1024 /dev/sdb1 e2fsck -n -b 1179664 -B 4096 /dev/sdb1 e2fsck -n -b 1179665 -B 4096 /dev/sdb1 e2fsck -n -b 6422544 -B 4096 /dev/sdb1 e2fsck -n -b 6422545 -B 4096 /dev/sdb1 e2fsck -n -b 10616848 -B 4096 /dev/sdb1 e2fsck -n -b 10616849 -B 4096 /dev/sdb1 (The e2fsck manpage could be a tiny bit clearer in that - I think - it means you to use -b , not -b ) oh, and just trying to mount does not work, as one might expect. # mount -text2 /dev/sdb1 /tmp/a mount: wrong fs type, bad option, bad superblock on /dev/sdb1, or too many mounted file systems (aren't you trying to mount an extended partition, instead of some logical partition inside?) I did straces of the e2fsck before and after the reboot; would it help to send those? Thanks again Vince From adilger at clusterfs.com Fri Nov 4 02:35:47 2005 From: adilger at clusterfs.com (Andreas Dilger) Date: Thu, 3 Nov 2005 19:35:47 -0700 Subject: ext3 + fs > 2Tbyte In-Reply-To: References: <20051031220648.GC31368@schatzie.adilger.int> <20051101060832.GK31368@schatzie.adilger.int> <20051101180912.GN31368@schatzie.adilger.int> Message-ID: <20051104023547.GY31368@schatzie.adilger.int> On Nov 04, 2005 12:17 +1100, Vincent McIntyre wrote: > * boot with xraid device plugged in, kernel 2.6.7-1-686-smp > (packaged as 2.6.7-1.backports.org.1) > * install a gpt disklabel with parted (-1.6.24 rather than 1.6.19) > * make an ext2 filesystem as big as the disk with parted > * mount - it mounts ok > * umount > * tune2fs -j (-1.38) > * mount - it mounts ok (-2.12) > * umount (-2.12) > * reboot > * try to mount - it fails. > (the filesystem is not mentioned in /etc/fstab, the system should > not be attempting to mount it of fsck it at boot time) > > No files were written to the filesystem during the test sequence. Hmm, I would expect at least the need to write something to the filesystem, unless you are unlucky enough that the last group(s) aliases exactly over the first superblock on disk, but is kept in the cache enough to remount it before you reboot. If you just to the mke2fs + reboot + mount does that fail? Same with just the tune2fs -j + reboot + remount? Do you only use the parted "mkfs" or do you actually use the mke2fs from e2fsprogs? > I have not yet tried filesystems smaller than 2Tb across reboots. > I expect it will work, but I will try that shortly to check. > > > findsuper tells me there are superblocks, but fs_blk_sz changes (!?) These are remnants of previous filesystems on the device, each with slightly different offsets (maybe with and without a partition table, or with different partition types). In one case there was a small 1kB block filesystem on the disk in the past. > # /root/e2fsprogs-1.38/misc/findsuper /dev/sdb1 > starting at 0, with 512 byte increments > thisoff block fs_blk_sz blksz grp last_mount > 17920 17 586057719 4096 0 Thu Jan 1 10:00:00 1970 What is missing is the superblock at offset "1024". What this tool _should_ also print out is part of the superblock UUID so it is possible to say which superblocks belong to a single filesystem. With an ext3 filesystem you will also find copies of the superblock in the journal, they will all be marked "grp 0" and are not valid backups. > 134234624 131088 586057719 4096 1 Thu Jan 1 10:00:00 1970 > 134235648 131089 586057719 4096 1 Thu Jan 1 10:00:00 1970 > 209733120 204817 1023983 1024 25 Thu Jan 1 10:00:00 1970 > 226510336 221201 1023983 1024 27 Thu Jan 1 10:00:00 1970 > 402670080 393232 586057719 4096 3 Thu Jan 1 10:00:00 1970 > 402671104 393233 586057719 4096 3 Thu Jan 1 10:00:00 1970 > 411059712 401425 1023983 1024 49 Thu Jan 1 10:00:00 1970 > 671105536 655376 586057719 4096 5 Thu Jan 1 10:00:00 1970 > 671106560 655377 586057719 4096 5 Thu Jan 1 10:00:00 1970 > 679495168 663569 1023983 1024 81 Thu Jan 1 10:00:00 1970 > 939540992 917520 586057719 4096 7 Thu Jan 1 10:00:00 1970 > 939542016 917521 586057719 4096 7 Thu Jan 1 10:00:00 1970 > 1207976448 1179664 586057719 4096 9 Thu Jan 1 10:00:00 1970 > 1207977472 1179665 586057719 4096 9 Thu Jan 1 10:00:00 1970 > 3355460096 3276816 586057719 4096 25 Thu Jan 1 10:00:00 1970 > 3355461120 3276817 586057719 4096 25 Thu Jan 1 10:00:00 1970 > 3623895552 3538960 586057719 4096 27 Thu Jan 1 10:00:00 1970 > 3623896576 3538961 586057719 4096 27 Thu Jan 1 10:00:00 1970 > 6576685568 6422544 586057719 4096 49 Thu Jan 1 10:00:00 1970 > 6576686592 6422545 586057719 4096 49 Thu Jan 1 10:00:00 1970 > 10871652864 10616848 586057719 4096 81 Thu Jan 1 10:00:00 1970 > 10871653888 10616849 586057719 4096 81 Thu Jan 1 10:00:00 1970 > 16777232896 16384016 586057719 4096 125 Thu Jan 1 10:00:00 1970 > 16777233920 16384017 586057719 4096 125 Thu Jan 1 10:00:00 1970 > ^C > This is not looking good... There appear to be 2 filesystems of interest. One has offset 0x4200 = 16896, but is missing the primary superblock. The other has offset 0x4600 = 17920. Neither of these would allow you to mount the filesystem as-is, because the superblock is not aligned at 1024 bytes from the start of the device. I would suspect something wacky with the partitioning and/or the way that parted is making the filesystem. > Your nice od trick tells me slightly different locations for the > superblock signatures - > # od -Ax -tx4 /dev/sdb1 | \ > grep "^[0-9a-f]*30 [0-9a-f]* [0-9a-f]* 000[1-3]ef53 " > 004630 436a93dd 001e0000 0001ef53 00000001 > 8004630 00000000 001e0000 0001ef53 00000001 > c804630 00000000 001e0000 0001ef53 00000001 > d804630 00000000 001e0000 0001ef53 00000001 > 18004630 00000000 001e0000 0001ef53 00000001 > ^C > > 0x004630 corresponds to byte offset 17968, 48 bytes away. > Is this explainable by the position of the superblock signature within > the disk block? Yes, this hack is only looking for the ext[23] magic number, which is not at the start of the superblock (0x30 = 48 bytes offset). > So I tried a few e2fsck runs. I know I'm probably being dense but none > of these worked: > e2fsck -n -b 16 -B 4096 /dev/sdb1 > e2fsck -n -b 17 -B 4096 /dev/sdb1 > e2fsck -n -b 18 -B 4096 /dev/sdb1 > e2fsck -n -b 204816 -B 1024 /dev/sdb1 > e2fsck -n -b 204817 -B 1024 /dev/sdb1 > e2fsck -n -b 204818 -B 1024 /dev/sdb1 > e2fsck -n -b 221200 -B 1024 /dev/sdb1 > e2fsck -n -b 221201 -B 1024 /dev/sdb1 > e2fsck -n -b 221202 -B 1024 /dev/sdb1 > e2fsck -n -b 1179664 -B 4096 /dev/sdb1 > e2fsck -n -b 1179665 -B 4096 /dev/sdb1 > e2fsck -n -b 6422544 -B 4096 /dev/sdb1 > e2fsck -n -b 6422545 -B 4096 /dev/sdb1 > e2fsck -n -b 10616848 -B 4096 /dev/sdb1 > e2fsck -n -b 10616849 -B 4096 /dev/sdb1 No, I'd expect you need to do something with the device partitioning to get the filesystem aligned properly. They aren't even aligned on a block boundary, there is a 512-byte offset. > (The e2fsck manpage could be a tiny bit clearer in that - I think - > it means you to use -b , not -b ) Send a patch to Ted. I would recommend to do the following: - make a partition - reboot the system - use mke2fs -j to make the filesystem - test mount, unmount, reboot at this point Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. From Vincent.McIntyre at csiro.au Fri Nov 4 05:19:00 2005 From: Vincent.McIntyre at csiro.au (Vincent.McIntyre at csiro.au) Date: Fri, 4 Nov 2005 16:19:00 +1100 (EST) Subject: ext3 + fs > 2Tbyte In-Reply-To: <20051104023547.GY31368@schatzie.adilger.int> References: <20051031220648.GC31368@schatzie.adilger.int> <20051101060832.GK31368@schatzi <20051104023547.GY31368@schatzie.adilger.int> Message-ID: >> No files were written to the filesystem during the test sequence. > > Hmm, I would expect at least the need to write something to the filesystem, > unless you are unlucky enough that the last group(s) aliases exactly over > the first superblock on disk, but is kept in the cache enough to remount > it before you reboot. ok, I can add that to the scripts in my next round of tests. > Do you only use the parted "mkfs" or do you actually use the mke2fs > from e2fsprogs? The script does this parted -s /dev/sdb1 print parted -s /dev/sdb1 mklabel gpt parted -s /dev/sdb1 print parted -s /dev/sdb1 mkpart primary 0 10 parted -s /dev/sdb1 print parted -s /dev/sdb1 mke2fs 1 ext2 parted -s /dev/sdb1 print I did not try mke2fs before now because I don't think it worked when I was trying to make FS larger than 2Tb. Can't recall now. > If you just to the mke2fs + reboot + mount does that fail? Yes. While you were typing, * I made a teeny 10 Mbyte filesystem (using parted, as above) * mounted * umounted * ran findsuper and od * reboot * ran parted /dev/sdb1 print (repeated, using strace) * ran an straced e2fsck /dev/sdb1 and got the same error. I couldn't quite believe this so I tried it again. Same result. Post reboot, I did things in slightly different order: * strace e2fsck -n /dev/sdb1 e2fsck 1.38 (30-Jun-2005) Couldn't find ext2 superblock, trying backup blocks... /local/sbin/e2fsck: Bad magic number in super-block while trying to open /dev/sdb1 The superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with an alternate superblock: e2fsck -b 8193 * /local/sbin/parted /dev/sdb print Disk geometry for /dev/sdb: 0.000-2289288.000 megabytes Disk label type: gpt Minor Start End Filesystem Name Flags 1 0.017 10.000 ext2 Information: Don't forget to update /etc/fstab, if necessary. > Same with just the tune2fs -j + reboot + remount? I switched to using mke2fs to create the filesystem, ie * I made a teeny 10 Mbyte partition (using parted) * mke2fs /dev/sdb1 * mounted * umounted * ran findsuper and od * reboot * strace -o strace.e2fsck.postboot /local/sbin/e2fsck -n /dev/sdb1 e2fsck 1.38 (30-Jun-2005) Couldn't find ext2 superblock, trying backup blocks... /local/sbin/e2fsck: Bad magic number in super-block while trying to open /dev/sdb1 The superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with an alternate superblock: e2fsck -b 8193 So it is starting to look like the GPT disklabel is causing a problem. I switched to having parted make a msdos disklabel but kept everything else the same - it worked fine. # strace -o strace.e2fsck.postboot /local/sbin/e2fsck -n /dev/sdb1 e2fsck 1.38 (30-Jun-2005) /dev/sdb1: clean, 11/2000 files, 268/8000 blocks # >> findsuper tells me there are superblocks, but fs_blk_sz changes (!?) > > These are remnants of previous filesystems on the device, each with > slightly different offsets (maybe with and without a partition table, > or with different partition types). In one case there was a small > 1kB block filesystem on the disk in the past. ah, of course. I thought findsuper would respect the partition boundaries and stop at the "end" of the filesystem. It did that pre-reboot, e.g. my 10Mbyte test above starting at 0, with 512 byte increments thisoff block fs_blk_sz blksz grp last_mount 1024 1 10223 1024 0 Thu Jan 1 10:00:00 1970 8389632 8193 10223 1024 1 Thu Jan 1 10:00:00 1970 10468864: finished with errno 0 Post-reboot, I get this: starting at 0, with 512 byte increments thisoff block fs_blk_sz blksz grp last_mount 17920 17 10223 1024 0 Thu Jan 1 10:00:00 1970 8406528 8209 10223 1024 1 Thu Jan 1 10:00:00 1970 134235648 131089 511999995 4096 1 Thu Jan 1 10:00:00 1970 209733120 204817 1023983 1024 25 Thu Jan 1 10:00:00 1970 226510336 221201 1023983 1024 27 Thu Jan 1 10:00:00 1970 To clean things up, I suppose I could dd /dev/zero into /dev/sdb? It'll only take about 10 hours.. >> # /root/e2fsprogs-1.38/misc/findsuper /dev/sdb1 >> starting at 0, with 512 byte increments >> thisoff block fs_blk_sz blksz grp last_mount >> 17920 17 586057719 4096 0 Thu Jan 1 10:00:00 1970 > > What is missing is the superblock at offset "1024". What this tool > _should_ also print out is part of the superblock UUID so it is possible > to say which superblocks belong to a single filesystem. > > With an ext3 filesystem you will also find copies of the superblock in > the journal, they will all be marked "grp 0" and are not valid backups. ok, thanks for explaining this. > There appear to be 2 filesystems of interest. One has offset 0x4200 = 16896, > but is missing the primary superblock. The other has offset 0x4600 = 17920. > Neither of these would allow you to mount the filesystem as-is, because the > superblock is not aligned at 1024 bytes from the start of the device. > > I would suspect something wacky with the partitioning and/or the way that > parted is making the filesystem. Most of this just the history of the fs creation tests I did I guess. Remeber all these are just test filesystems on separate hardware. I have not dared to run findsuper on the filesystem of interest yet, I want to make sure I can actually recover a test FS first. >> So I tried a few e2fsck runs. I know I'm probably being dense but none >> of these worked: >> e2fsck -n -b 16 -B 4096 /dev/sdb1 >> e2fsck -n -b 17 -B 4096 /dev/sdb1 .... > > No, I'd expect you need to do something with the device partitioning > to get the filesystem aligned properly. They aren't even aligned on > a block boundary, there is a 512-byte offset. I noticed that when computing thisoff/blksz, but didn't make much of it. Thanks for clearing that up. I'll take a look at the manuals to see if I can force things to be on a block boundary. > I would recommend to do the following: > - make a partition > - reboot the system > - use mke2fs -j to make the filesystem > - test mount, unmount, reboot at this point This reboot-after-partition thing is foreign to me (coming from solaris); it seems quite a poor design to need this. But let's run with it. parted -s /dev/sdb1 print parted -s /dev/sdb1 mklabel gpt parted -s /dev/sdb1 print parted -s /dev/sdb1 mkpart primary 0 10 parted -s /dev/sdb1 print sleep 60 reboot parted -s /dev/sdb1 print mke2fs -n -v /dev/sdb1 mke2fs -q /dev/sdb1 mke2fs gets stuck... I have to ^C it. # fdisk -l /dev/sdb You must set cylinders. You can do this from the extra functions menu. Disk /dev/sdb: 0 MB, 0 bytes 255 heads, 63 sectors/track, 0 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sdb1 1 267350 2147483647+ ee EFI GPT Partition 1 has different physical/logical beginnings (non-Linux?): phys=(0, 0, 1) logical=(0, 0, 2) Partition 1 has different physical/logical endings: phys=(1023, 254, 63) logical=(267349, 89, 4) # /local/sbin/parted /dev/sdb print Error: The primary GPT table is corrupt, but the backup appears ok, so that will be used. OK/Cancel? C Information: Don't forget to update /etc/fstab, if necessary. # /local/sbin/parted /dev/sdb print Error: The primary GPT table is corrupt, but the backup appears ok, so that will be used. OK/Cancel? OK Disk geometry for /dev/sdb: 0.000-2289288.000 megabytes Disk label type: gpt Minor Start End Filesystem Name Flags 1 0.017 10.000 ext2 Information: Don't forget to update /etc/fstab, if necessary. # strace -o strace.e2fsck.post-parted /local/sbin/e2fsck -n /dev/sdb1 e2fsck 1.38 (30-Jun-2005) Couldn't find ext2 superblock, trying backup blocks... /local/sbin/e2fsck: Bad magic number in super-block while trying to open /dev/sdb1 The superblock could not be read or does not describe a correct ext2 filesystem. If the device is valid and it really contains an ext2 filesystem (and not swap or ufs or something else), then the superblock is corrupt, and you might try running e2fsck with an alternate superblock: e2fsck -b 8193 So it appears that support is lacking for GPT disklabels in e2fsprogs and possibly the kernel as well. I ran one more time, partition with parted, gpt label. reboot make 10Mbyte ext2 fs with parted mount, umount, findsuper, od - all this seems to work ok. reboot attempt to mount mount -text2 /dev/sdb1 /tmp/a mount: wrong fs type, bad option, bad superblock on /dev/sdb1, or too many mounted file systems (aren't you trying to mount an extended partition, instead of some logical partition inside?) I think this says there is something funky with the GPT disklabelling. Thanks for your help, Vince From adilger at clusterfs.com Fri Nov 4 07:37:44 2005 From: adilger at clusterfs.com (Andreas Dilger) Date: Fri, 4 Nov 2005 00:37:44 -0700 Subject: ext3 + fs > 2Tbyte In-Reply-To: References: <20051031220648.GC31368@schatzie.adilger.int> <20051104023547.GY31368@schatzie.adilger.int> Message-ID: <20051104073744.GZ31368@schatzie.adilger.int> On Nov 04, 2005 16:19 +1100, Vincent.McIntyre at csiro.au wrote: > >Do you only use the parted "mkfs" or do you actually use the mke2fs > >from e2fsprogs? > The script does this > parted -s /dev/sdb1 print > parted -s /dev/sdb1 mklabel gpt > parted -s /dev/sdb1 print > parted -s /dev/sdb1 mkpart primary 0 10 > parted -s /dev/sdb1 print > parted -s /dev/sdb1 mke2fs 1 ext2 > parted -s /dev/sdb1 print Hmm, I don't use parted often, but does it make sense to be making a GPT disklabel on /dev/sdb1 instead of making it on /dev/sdb? Note also that there is actually no need to make a partition at all if you are just going to use the whole device for the filesystem. This is particularly interesting with some RAID hardware, since the partition table adds a 512-byte offset to every single IO, and this can cause some noticable performance problems. Just do "mke2fs -j /dev/sdb" and be happy. > Yes. While you were typing, > * I made a teeny 10 Mbyte filesystem (using parted, as above) > * mounted > * umounted > * ran findsuper and od > * reboot > * ran parted /dev/sdb1 print > (repeated, using strace) > * ran an straced e2fsck /dev/sdb1 > and got the same error. > > I couldn't quite believe this so I tried it again. Same result. This sounds like parted isn't doing what you want, and ext3 is not the source of the problem at all. > So it is starting to look like the GPT disklabel is causing a problem. I agree. > ah, of course. I thought findsuper would respect the partition boundaries > and stop at the "end" of the filesystem. It did that pre-reboot, e.g. my > 10Mbyte test above It DOES respect the partition boundaries, actually. In fact, if you point it at a partition (instead of the parent device) it should not be POSSIBLE for it to read beyond the end of the partition, and the kernel should prevent it. > starting at 0, with 512 byte increments > thisoff block fs_blk_sz blksz grp last_mount > 1024 1 10223 1024 0 Thu Jan 1 10:00:00 1970 > 8389632 8193 10223 1024 1 Thu Jan 1 10:00:00 1970 > > 10468864: finished with errno 0 > > Post-reboot, I get this: > starting at 0, with 512 byte increments > thisoff block fs_blk_sz blksz grp last_mount > 17920 17 10223 1024 0 Thu Jan 1 10:00:00 1970 > 8406528 8209 10223 1024 1 Thu Jan 1 10:00:00 1970 > 134235648 131089 511999995 4096 1 Thu Jan 1 10:00:00 1970 > 209733120 204817 1023983 1024 25 Thu Jan 1 10:00:00 1970 > 226510336 221201 1023983 1024 27 Thu Jan 1 10:00:00 1970 This would seem to indicate your partition table is being corrupted. > # /local/sbin/parted /dev/sdb print > Error: The primary GPT table is corrupt, but the backup appears ok, so > that will be used. > OK/Cancel? OK > Disk geometry for /dev/sdb: 0.000-2289288.000 megabytes > Disk label type: gpt > Minor Start End Filesystem Name Flags > 1 0.017 10.000 ext2 > Information: Don't forget to update /etc/fstab, if necessary. I suspect this is part of the problem. The GPT disk label is being written into /dev/sdb1 (which isn't really valid) and upon reboot the "backup" is being found at the end of the device and doesn't match the existing partition table on /dev/sdb. > # strace -o strace.e2fsck.post-parted /local/sbin/e2fsck -n /dev/sdb1 > e2fsck 1.38 (30-Jun-2005) > Couldn't find ext2 superblock, trying backup blocks... > /local/sbin/e2fsck: Bad magic number in super-block while trying to open > /dev/sdb1 At this point, you are trying to access a filesystem with an offset from the start of the partition. If you want to recover from this (your real filesystem), what you should probably do is locate the expected start of the filesystem using findsuper and then copy it onto your backup device: dd if=/dev/orig of=/dev/backup bs=offset skip=1 The backup superblocks should have a byte offset of {1,3,5,...} * 32768 * 4096 from the start of the device, so subtracting this from the actual offsets found will tell you where the filesystem is supposed to start. Checking the first few (non group = 0) backup superblocks should make it pretty clear where the filesystem is supposed to start. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. From Vincent.McIntyre at csiro.au Fri Nov 4 11:30:26 2005 From: Vincent.McIntyre at csiro.au (Vincent.McIntyre at csiro.au) Date: Fri, 4 Nov 2005 22:30:26 +1100 (EST) Subject: ext3 + fs > 2Tbyte In-Reply-To: <20051104073744.GZ31368@schatzie.adilger.int> References: <20051031220648.GC31368@schatzie.adilger.int> <20051104023547.GY31368@schatzie.adilger.int> <20051104073744.GZ31368@schatzie.adilger.int> Message-ID: >>> Do you only use the parted "mkfs" or do you actually use the mke2fs >>> from e2fsprogs? >> The script does this >> parted -s /dev/sdb1 print >> parted -s /dev/sdb1 mklabel gpt >> parted -s /dev/sdb1 print >> parted -s /dev/sdb1 mkpart primary 0 10 >> parted -s /dev/sdb1 print >> parted -s /dev/sdb1 mke2fs 1 ext2 >> parted -s /dev/sdb1 print > > Hmm, I don't use parted often, but does it make sense to be making a GPT > disklabel on /dev/sdb1 instead of making it on /dev/sdb? ooops - misquote on my part. I was indeed using /dev/sdb for this. I was translating from a shell script that uses a variable for the disk device and the partition, and confused the two when translating. > Note also that there is actually no need to make a partition at all if > you are just going to use the whole device for the filesystem. This > is particularly interesting with some RAID hardware, since the partition > table adds a 512-byte offset to every single IO, and this can cause > some noticable performance problems. > > Just do "mke2fs -j /dev/sdb" and be happy. ok, I'll give that a whirl. >> ah, of course. I thought findsuper would respect the partition boundaries >> and stop at the "end" of the filesystem. It did that pre-reboot, e.g. my >> 10Mbyte test above > > It DOES respect the partition boundaries, actually. In fact, if you > point it at a partition (instead of the parent device) it should not > be POSSIBLE for it to read beyond the end of the partition, and the > kernel should prevent it. > >> starting at 0, with 512 byte increments >> thisoff block fs_blk_sz blksz grp last_mount >> 1024 1 10223 1024 0 Thu Jan 1 10:00:00 1970 >> 8389632 8193 10223 1024 1 Thu Jan 1 10:00:00 1970 >> >> 10468864: finished with errno 0 >> >> Post-reboot, I get this: >> starting at 0, with 512 byte increments >> thisoff block fs_blk_sz blksz grp last_mount >> 17920 17 10223 1024 0 Thu Jan 1 10:00:00 1970 >> 8406528 8209 10223 1024 1 Thu Jan 1 10:00:00 1970 >> 134235648 131089 511999995 4096 1 Thu Jan 1 10:00:00 1970 >> 209733120 204817 1023983 1024 25 Thu Jan 1 10:00:00 1970 >> 226510336 221201 1023983 1024 27 Thu Jan 1 10:00:00 1970 > > This would seem to indicate your partition table is being corrupted. right. > >> # /local/sbin/parted /dev/sdb print >> Error: The primary GPT table is corrupt, but the backup appears ok, so >> that will be used. >> OK/Cancel? OK >> Disk geometry for /dev/sdb: 0.000-2289288.000 megabytes >> Disk label type: gpt >> Minor Start End Filesystem Name Flags >> 1 0.017 10.000 ext2 >> Information: Don't forget to update /etc/fstab, if necessary. > > I suspect this is part of the problem. The GPT disk label is being > written into /dev/sdb1 (which isn't really valid) and upon reboot the > "backup" is being found at the end of the device and doesn't match > the existing partition table on /dev/sdb. Does your reasoning change given my silly mistake above, ie I was running parted on /dev/sdb not /dev/sdb1. >> # strace -o strace.e2fsck.post-parted /local/sbin/e2fsck -n /dev/sdb1 >> e2fsck 1.38 (30-Jun-2005) >> Couldn't find ext2 superblock, trying backup blocks... >> /local/sbin/e2fsck: Bad magic number in super-block while trying to open >> /dev/sdb1 > > At this point, you are trying to access a filesystem with an offset from > the start of the partition. If you want to recover from this (your real > filesystem), what you should probably do is locate the expected start of > the filesystem using findsuper and then copy it onto your backup device: > > dd if=/dev/orig of=/dev/backup bs=offset skip=1 > > The backup superblocks should have a byte offset of {1,3,5,...} * 32768 * 4096 > from the start of the device, so subtracting this from the actual offsets > found will tell you where the filesystem is supposed to start. Checking the > first few (non group = 0) backup superblocks should make it pretty clear > where the filesystem is supposed to start. I'll take a poke at this. Assuming there is a problem with GPT labels, can you advise where to report this? parted-bug, or bugzilla.kernel.org? Or both? Cheers Vince From jp at enix.org Fri Nov 4 17:11:04 2005 From: jp at enix.org (=?ISO-8859-1?Q?J=E9r=F4me_Petazzoni?=) Date: Fri, 04 Nov 2005 18:11:04 +0100 Subject: mount r/w and r/o In-Reply-To: References: Message-ID: <436B9628.1010102@enix.org> [one r-w mount, multiple r-o mounts shared thru FC switch] >>>should I use it? >>>Am I going about this all wrong, is there a better way to do this >>>(other than GFS)? >>> >>> I once heard about someone doing something like that for a video farm, intermixing solaris and freebsd servers (so as far as he, and I, knew, there was no easy sharing solution). He did the following : - create the filesystem on the solaris bow - create many 1 GB files, with a specific byte pattern (512 bytes sectors iirc) - the freebsd box would read the raw device, detect the byte patterns and build an internal lookup table, to know that file F, offset O was located on physical sector S - the solaris box would then write data to the 1 GB files, and the freebsd box could then read back the data, thanks to the previously built lookup table (the 1 GB files would only be rewritten to, never truncated or rewritten, AFAIK) IIRC, there was 2 solaris boxen using some HA solution, and many freebsd boxen accessing the data. This worked because the files were smaller than 1 GB (to be honnest, I don't know the exact size he used), and the very impressive performance of the solution balanced the hassle involved in setting up the whole thing. Now, I would not ask "why not NFS?", but "why not GFS?" (and please apologize if it the answer is obvious...) From jeff at jettis.com Fri Nov 4 17:40:20 2005 From: jeff at jettis.com (Jeff Dinisco) Date: Fri, 4 Nov 2005 09:40:20 -0800 Subject: mount r/w and r/o Message-ID: Thanks for the reply. Very interesting. Could you explain how the bsd box read the raw device and built the internal lookup table? The main reason I wrote "not GFS" is because I'm aware of it and that it would take a bit of work to implement. I'm currently looking for a quick fix to give me some time to implement a more robust solution. Also, realizing I had some definite issues w/ my current config, I researched GFS a little while back. It's my understanding that total storage in a GFS cluster cannot exceed 8TB and we have > 12TB. I didn't investigate too much further for a work-around. Andreas suggested lustre which on the surface appears to be viable. -----Original Message----- From: J?r?me Petazzoni [mailto:jp at enix.org] Sent: Friday, November 04, 2005 12:11 PM To: Jeff Dinisco Cc: Wolber, Richard C; Damian Menscher; ext3-users at redhat.com Subject: Re: mount r/w and r/o [one r-w mount, multiple r-o mounts shared thru FC switch] >>>should I use it? >>>Am I going about this all wrong, is there a better way to do this >>>(other than GFS)? >>> >>> I once heard about someone doing something like that for a video farm, intermixing solaris and freebsd servers (so as far as he, and I, knew, there was no easy sharing solution). He did the following : - create the filesystem on the solaris bow - create many 1 GB files, with a specific byte pattern (512 bytes sectors iirc) - the freebsd box would read the raw device, detect the byte patterns and build an internal lookup table, to know that file F, offset O was located on physical sector S - the solaris box would then write data to the 1 GB files, and the freebsd box could then read back the data, thanks to the previously built lookup table (the 1 GB files would only be rewritten to, never truncated or rewritten, AFAIK) IIRC, there was 2 solaris boxen using some HA solution, and many freebsd boxen accessing the data. This worked because the files were smaller than 1 GB (to be honnest, I don't know the exact size he used), and the very impressive performance of the solution balanced the hassle involved in setting up the whole thing. Now, I would not ask "why not NFS?", but "why not GFS?" (and please apologize if it the answer is obvious...) From jp at enix.org Fri Nov 4 18:03:57 2005 From: jp at enix.org (=?ISO-8859-1?Q?J=E9r=F4me_Petazzoni?=) Date: Fri, 04 Nov 2005 19:03:57 +0100 Subject: mount r/w and r/o In-Reply-To: References: Message-ID: <436BA28D.9070808@enix.org> >Thanks for the reply. Very interesting. Could you explain how the bsd box read the raw device and built the internal lookup table? > > I suppose the BSD box was just accessing the device "raw" (like in "/dev/sdX" ; I don't know the exact syntax for BSD, tho), bypassing even the partition scheme. I also guess that the big files created thru the Solaris box were a succession of 512-bytes records, each with 4 bytes for the file number, then 4 bytes of sector number, the rest being some magic padding. The BSD box just had to scan all the sectors and build a kind of hash map. Sorry for the lack of details and accurracy, but this was more the kind of "around a beer" discussion rather than a formal report ... And this was, as I understood, a long-term solution, which required a bit of hacking before being ready to production (modifying the code of the streaming video server running on the BSD boxen, I assume). >The main reason I wrote "not GFS" is because I'm aware of it and that it would take a bit of work to implement. I'm currently looking for a quick fix to give me some time to implement a more robust solution. Also, realizing I had some definite issues w/ my current config, I researched GFS a little while back. It's my understanding that total storage in a GFS cluster cannot exceed 8TB and we have > 12TB. I didn't investigate too much further for a work-around. > >Andreas suggested lustre which on the surface appears to be viable. > > Let us know your findings ;-) From adilger at clusterfs.com Fri Nov 4 21:27:51 2005 From: adilger at clusterfs.com (Andreas Dilger) Date: Fri, 4 Nov 2005 14:27:51 -0700 Subject: mount r/w and r/o In-Reply-To: <436B9628.1010102@enix.org> References: <436B9628.1010102@enix.org> Message-ID: <20051104212751.GD31368@schatzie.adilger.int> On Nov 04, 2005 18:11 +0100, J?r?me Petazzoni wrote: > [one r-w mount, multiple r-o mounts shared thru FC switch] > > I once heard about someone doing something like that for a video farm, > intermixing solaris and freebsd servers (so as far as he, and I, knew, > there was no easy sharing solution). He did the following : > - create the filesystem on the solaris bow > - create many 1 GB files, with a specific byte pattern (512 bytes > sectors iirc) Actually, if this was the case (files were never extended or truncated) and the clients always used O_DIRECT to prevent caching of the data then this would also work with an ext2 mount (or a modified ext3 that had a mount option to disable journal recovery, only in conjunction with read-only mounting). It wouldn't really be a "normal" filesystem but for a specialized app environment it might work. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. From richard.c.wolber at boeing.com Thu Nov 3 22:01:05 2005 From: richard.c.wolber at boeing.com (Wolber, Richard C) Date: Thu, 3 Nov 2005 14:01:05 -0800 Subject: mount r/w and r/o Message-ID: <8C7C41A176AC0B468BEFB2EFD9BDAB992005B4@XCH-NW-5V2.nw.nos.boeing.com> > > My questions are... > > Should I be concerned by this? > > Is there a way to automatically skip the recovery attempt, and if so, > > should I use it? > > Am I going about this all wrong, is there a better way to do this > > (other than GFS)? > > Sorry to ask the obvious question, but why not just use NFS? Performance? NFS is a lot of overhead to consider using on something like FC. Mounting r/o seems (and I await the experts opinion) at first glance to be a very effictive way of doing this. ..Chuck.. From arnd at arndb.de Sat Nov 5 16:27:00 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Sat, 05 Nov 2005 17:27:00 +0100 Subject: [PATCH 10/25] fs: move ext2 ioctl32 handlers into file systems References: <20051105162650.620266000@b551138y.boeblingen.de.ibm.com> Message-ID: <20051105162714.555612000@b551138y.boeblingen.de.ibm.com> An embedded and charset-unspecified text was scrubbed... Name: ext2-ioctl.diff URL: From arnd at arndb.de Sat Nov 5 16:26:50 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Sat, 05 Nov 2005 17:26:50 +0100 Subject: [PATCH 00/25] reduce code in fs/compat_ioctl.c Message-ID: <20051105162650.620266000@b551138y.boeblingen.de.ibm.com> On S?nnavend 05 November 2005 00:51, Christoph Hellwig wrote: > On Sat, Nov 05, 2005 at 12:10:46AM +0100, Arnd Bergmann wrote: > > > > BTW, I now have a set of 25 patches that moves all handlers from > > fs/compat_ioctl.c over to the respective drivers and subsystems, > > but I'm not sure how to best test that. > > I intend to at least give it a test run on my Opteron for the whatever > > ioctls I normally use, but the rest is just guesswork. Christoph, > > can you review those patches? > > I'm not sure moving everything from fs/compat_ioctl.c is a good idea. > Everything that is just in a single driver or subsystem that has > common ioctl code - sure. else it doesn't make a lot of sense. Ok, here is my full set of patches, let's see which ones are sensible and which ones we are better off without. Getting rid of fs/compat_ioctl.c completely could at least simplify the compat_sys_ioctl() code a bit and would also make sure that we only build the handlers into the kernel that can be used potentially, which reduces the binary size. The patch set is still largely untested, except for a single compile test, but at least some of the patches are very simple, so maybe I can get a quick ack or nack on them. In general, I'm just moving over the handlers to the respective subsystem without changing the logic, so the patch should not have any effect on the ioctl operation itself, but it also means that the handlers still use compat_alloc_user_space or get_fs/set_fs when it's not really necessary. Arnd <>< drivers/block/ioctl.c | 549 +++++ drivers/block/loop.c | 76 drivers/block/paride/pcd.c | 1 drivers/block/paride/pd.c | 1 drivers/block/paride/pt.c | 1 drivers/block/pktcdvd.c | 20 drivers/bluetooth/hci_ldisc.c | 22 drivers/cdrom/Makefile | 2 drivers/cdrom/aztcd.c | 1 drivers/cdrom/cdu31a.c | 1 drivers/cdrom/cm206.c | 1 drivers/cdrom/compat.c | 163 + drivers/cdrom/gscd.c | 1 drivers/cdrom/mcdx.c | 1 drivers/cdrom/optcd.c | 1 drivers/cdrom/sbpcd.c | 1 drivers/cdrom/sjcd.c | 1 drivers/cdrom/sonycd535.c | 2 drivers/char/Makefile | 1 drivers/char/compat_mtio.c | 81 drivers/char/ftape/zftape/zftape-init.c | 1 drivers/char/n_tty.c | 1 drivers/char/raw.c | 91 drivers/char/tty_io.c | 191 + drivers/char/viotape.c | 1 drivers/char/vt.c | 3 drivers/char/vt_ioctl.c | 195 + drivers/i2c/i2c-dev.c | 141 + drivers/ide/ide-cd.c | 1 drivers/ide/ide-floppy.c | 1 drivers/ide/ide-tape.c | 1 drivers/media/radio/miropcm20-radio.c | 1 drivers/media/radio/radio-aimslab.c | 1 drivers/media/radio/radio-aztech.c | 1 drivers/media/radio/radio-cadet.c | 1 drivers/media/radio/radio-gemtek-pci.c | 1 drivers/media/radio/radio-gemtek.c | 1 drivers/media/radio/radio-maestro.c | 1 drivers/media/radio/radio-maxiradio.c | 1 drivers/media/radio/radio-rtrack2.c | 1 drivers/media/radio/radio-sf16fmi.c | 1 drivers/media/radio/radio-sf16fmr2.c | 1 drivers/media/radio/radio-terratec.c | 1 drivers/media/radio/radio-trust.c | 1 drivers/media/radio/radio-typhoon.c | 1 drivers/media/radio/radio-zoltrix.c | 1 drivers/media/video/Makefile | 2 drivers/media/video/arv.c | 1 drivers/media/video/bttv-driver.c | 1 drivers/media/video/bw-qcam.c | 1 drivers/media/video/c-qcam.c | 1 drivers/media/video/compat_ioctl.c | 318 +++ drivers/media/video/cpia.c | 1 drivers/media/video/cx88/cx88-video.c | 2 drivers/media/video/meye.c | 1 drivers/media/video/pms.c | 1 drivers/media/video/saa5249.c | 1 drivers/media/video/saa7134/saa7134-video.c | 2 drivers/media/video/stradis.c | 1 drivers/media/video/w9966.c | 1 drivers/media/video/zoran_driver.c | 1 drivers/media/video/zr36120.c | 1 drivers/mtd/mtdchar.c | 94 drivers/net/ppp_generic.c | 179 + drivers/s390/char/tape_char.c | 1 drivers/scsi/osst.c | 2 drivers/scsi/sg.c | 154 + drivers/scsi/sr.c | 1 drivers/scsi/st.c | 2 drivers/usb/core/devio.c | 139 + drivers/usb/media/dsbr100.c | 1 drivers/usb/media/ov511.c | 1 drivers/usb/media/pwc/pwc-if.c | 1 drivers/usb/media/se401.c | 1 drivers/usb/media/stv680.c | 1 drivers/usb/media/usbvideo.c | 1 drivers/usb/media/vicam.c | 1 drivers/usb/media/w9968cf.c | 1 drivers/video/fbmem.c | 147 + fs/autofs/root.c | 35 fs/autofs4/root.c | 41 fs/block_dev.c | 10 fs/cifs/cifsfs.c | 10 fs/cifs/cifsfs.h | 2 fs/cifs/ioctl.c | 29 fs/compat.c | 27 fs/compat_ioctl.c | 2918 ---------------------------- fs/ext2/dir.c | 3 fs/ext2/ext2.h | 1 fs/ext2/file.c | 6 fs/ext2/ioctl.c | 31 fs/ext3/dir.c | 3 fs/ext3/file.c | 3 fs/ext3/ioctl.c | 66 fs/fat/dir.c | 54 fs/hfsplus/dir.c | 4 fs/hfsplus/hfsplus_fs.h | 4 fs/hfsplus/inode.c | 4 fs/hfsplus/ioctl.c | 29 fs/ncpfs/dir.c | 3 fs/ncpfs/file.c | 4 fs/ncpfs/ioctl.c | 241 ++ fs/reiserfs/dir.c | 3 fs/reiserfs/file.c | 4 fs/reiserfs/ioctl.c | 36 fs/smbfs/dir.c | 4 fs/smbfs/file.c | 4 fs/smbfs/ioctl.c | 16 fs/smbfs/proto.h | 1 fs/xfs/linux-2.6/xfs_ioctl32.c | 15 include/linux/cdrom.h | 2 include/linux/compat_ioctl.h | 387 --- include/linux/ext2_fs.h | 7 include/linux/ext3_fs.h | 1 include/linux/fs.h | 3 include/linux/ioctl32.h | 2 include/linux/mtio.h | 12 include/linux/ncp_fs.h | 1 include/linux/net.h | 2 include/linux/reiserfs_fs.h | 9 include/linux/socket.h | 4 include/linux/tty.h | 2 include/linux/tty_driver.h | 4 include/linux/tty_ldisc.h | 2 include/linux/videodev.h | 2 include/net/sock.h | 9 net/atm/common.h | 1 net/atm/ioctl.c | 167 + net/atm/pvc.c | 3 net/atm/svc.c | 3 net/bluetooth/bnep/sock.c | 1 net/bluetooth/cmtp/sock.c | 1 net/bluetooth/hci_sock.c | 1 net/bluetooth/hidp/sock.c | 1 net/bluetooth/rfcomm/sock.c | 1 net/compat.c | 1456 +++++++++---- net/socket.c | 7 137 files changed, 4527 insertions(+), 3807 deletions(-) From hch at lst.de Sun Nov 6 04:39:42 2005 From: hch at lst.de (Christoph Hellwig) Date: Sun, 6 Nov 2005 05:39:42 +0100 Subject: [PATCH 10/25] fs: move ext2 ioctl32 handlers into file systems In-Reply-To: <20051105162714.555612000@b551138y.boeblingen.de.ibm.com> References: <20051105162650.620266000@b551138y.boeblingen.de.ibm.com> <20051105162714.555612000@b551138y.boeblingen.de.ibm.com> Message-ID: <20051106043942.GA31343@lst.de> On Sat, Nov 05, 2005 at 05:27:00PM +0100, Arnd Bergmann wrote: > The same ioctls (originally from ext2) are used on ext2, ext3, > hfsplus, cifs, reiserfs and xfs. Since they are really compatible > between 32 and 64 bit except for the ioctl number, the conversion > handler is trivial and I copy it to each of these file systems > in order to eventually get rid of fs/compat_ioctl.c completely. NACK, this is completely idiotic. Duplicating handlers is the very last thing we want. I actually have patches to move handling some of those ioctls into generic code, but that's a different story. From arnd at arndb.de Mon Nov 7 10:24:47 2005 From: arnd at arndb.de (Arnd Bergmann) Date: Mon, 7 Nov 2005 11:24:47 +0100 Subject: [PATCH 10/25] fs: move ext2 ioctl32 handlers into file systems In-Reply-To: <20051106043942.GA31343@lst.de> References: <20051105162650.620266000@b551138y.boeblingen.de.ibm.com> <20051105162714.555612000@b551138y.boeblingen.de.ibm.com> <20051106043942.GA31343@lst.de> Message-ID: <200511071124.49467.arnd@arndb.de> On S?nndag 06 November 2005 05:39, Christoph Hellwig wrote: > NACK, this is completely idiotic. ?Duplicating handlers is the very > last thing we want. ?I actually have patches to move handling some > of those ioctls into generic code, but that's a different story. Ok, I'll drop this patch then, except for the ext3 parts that fix an actual problem of missing conversion handlers. What is your opinion on the xfs bit. The current code is somewhat broken, since XFS_IOC_{GET,SET}{VERSION,XFLAGS} are not really compatible. Should those three lines simply be removed? Arnd <>< --- linux-cg.orig/fs/xfs/linux-2.6/xfs_ioctl32.c 2005-11-05 02:44:55.000000000 +0100 +++ linux-cg/fs/xfs/linux-2.6/xfs_ioctl32.c 2005-11-05 02:45:35.000000000 +0100 @@ -34,6 +34,11 @@ #define _NATIVE_IOC(cmd, type) \ _IOC(_IOC_DIR(cmd), _IOC_TYPE(cmd), _IOC_NR(cmd), sizeof(type)) +/* broken ext2 ioctl numbers */ +#define XFS_IOC_GETVERSION32 _IOR('v', 1, int) +#define XFS_IOC_GETXFLAGS32 _IOR('f', 1, int) +#define XFS_IOC_SETXFLAGS32 _IOW('f', 2, int) + #if defined(CONFIG_IA64) || defined(CONFIG_X86_64) #define BROKEN_X86_ALIGNMENT /* on ia32 l_start is on a 32-bit boundary */ @@ -115,12 +120,16 @@ vnode_t *vp = LINVFS_GET_VP(inode); switch (cmd) { + /* these take an int as their argument, not a long */ + case XFS_IOC_GETVERSION32: + case XFS_IOC_GETXFLAGS32: + case XFS_IOC_SETXFLAGS32: + cmd = _NATIVE_IOC(cmd, long); + break; + case XFS_IOC_DIOINFO: case XFS_IOC_FSGEOMETRY_V1: case XFS_IOC_FSGEOMETRY: - case XFS_IOC_GETVERSION: - case XFS_IOC_GETXFLAGS: - case XFS_IOC_SETXFLAGS: case XFS_IOC_FSGETXATTR: case XFS_IOC_FSSETXATTR: case XFS_IOC_FSGETXATTRA: From dev at sw.ru Mon Nov 7 13:41:40 2005 From: dev at sw.ru (Kirill Korotaev) Date: Mon, 07 Nov 2005 16:41:40 +0300 Subject: [PATCH] ext3: journal handling on error path in ext3_journalled_writepage() Message-ID: <436F5994.2070703@sw.ru> Forwarded original patch from Denis Lunev: This patch fixes lost referrence on ext3 current handle in ext3_journalled_writepage() Signed-Off-By: Denis Lunev P.S. against 2.6.14 -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: diff-ms-ext3handle-20051031 URL: From bunk at stusta.de Mon Nov 7 21:18:50 2005 From: bunk at stusta.de (Adrian Bunk) Date: Mon, 7 Nov 2005 22:18:50 +0100 Subject: [2.6 patch] remove CONFIG_EXT{2,3}_CHECK In-Reply-To: <20051101044658.GA7500@thunk.org> References: <20051031001334.GP4180@stusta.de> <20051031212503.GY31368@schatzie.adilger.int> <20051101044658.GA7500@thunk.org> Message-ID: <20051107211850.GZ3847@stusta.de> On Mon, Oct 31, 2005 at 11:46:58PM -0500, Theodore Ts'o wrote: > On Mon, Oct 31, 2005 at 02:25:03PM -0700, Andreas Dilger wrote: > > On Oct 31, 2005 01:13 +0100, Adrian Bunk wrote: > > > Can anyone tell me the history of CONFIG_EXT{2,3}_CHECK? > > > > > > There is code for a "check" option for mount if these options are > > > enabled, but there's no way to enable them. > > > > These are expensive debugging options, which walk the inode/block bitmaps > > for getting the group inode/block usage instead of using the group > > summary data. Not used very often but I suspect occasionally useful for > > developers mucking with ext[23] internals. Since it is developer-only > > code it needs to be enabled with #define CONFIG_EXT[23]_CHECK in a > > header or compile option. > > It's basically a stripped down version of e2fsck pass #5, though. Is > there any reason why this needs to be in the kernel? If it would be > useful I could easily make a userspace implementation of these checks. This code was introduced with kernel 2.4, but as far as I can see there was never an option for enabling it. Unless someone can give a strong reason for keeping it, I'd suggest the patch below. > - Ted cu Adrian <-- snip --> The CONFIG_EXT{2,3}_CHECK options where were never available, and all they did was to implement a subset of e2fsck in the kernel. Signed-off-by: Adrian Bunk --- Documentation/filesystems/ext2.txt | 2 fs/ext2/balloc.c | 73 ----------------------------- fs/ext2/ialloc.c | 40 --------------- fs/ext2/super.c | 16 ------ fs/ext3/balloc.c | 73 ----------------------------- fs/ext3/ialloc.c | 41 ---------------- fs/ext3/super.c | 17 ------ 7 files changed, 2 insertions(+), 260 deletions(-) --- linux-2.6.14-mm1-full/Documentation/filesystems/ext2.txt.old 2005-11-07 21:22:25.000000000 +0100 +++ linux-2.6.14-mm1-full/Documentation/filesystems/ext2.txt 2005-11-07 21:22:36.000000000 +0100 @@ -17,8 +17,6 @@ bsddf (*) Makes `df' act like BSD. minixdf Makes `df' act like Minix. -check Check block and inode bitmaps at mount time - (requires CONFIG_EXT2_CHECK). check=none, nocheck (*) Don't do extra checking of bitmaps on mount (check=normal and check=strict options removed) --- linux-2.6.14-mm1-full/fs/ext2/balloc.c.old 2005-11-07 21:22:43.000000000 +0100 +++ linux-2.6.14-mm1-full/fs/ext2/balloc.c 2005-11-07 21:22:56.000000000 +0100 @@ -624,76 +624,3 @@ return EXT2_SB(sb)->s_gdb_count; } -#ifdef CONFIG_EXT2_CHECK -/* Called at mount-time, super-block is locked */ -void ext2_check_blocks_bitmap (struct super_block * sb) -{ - struct buffer_head *bitmap_bh = NULL; - struct ext2_super_block * es; - unsigned long desc_count, bitmap_count, x, j; - unsigned long desc_blocks; - struct ext2_group_desc * desc; - int i; - - es = EXT2_SB(sb)->s_es; - desc_count = 0; - bitmap_count = 0; - desc = NULL; - for (i = 0; i < EXT2_SB(sb)->s_groups_count; i++) { - desc = ext2_get_group_desc (sb, i, NULL); - if (!desc) - continue; - desc_count += le16_to_cpu(desc->bg_free_blocks_count); - brelse(bitmap_bh); - bitmap_bh = read_block_bitmap(sb, i); - if (!bitmap_bh) - continue; - - if (ext2_bg_has_super(sb, i) && - !ext2_test_bit(0, bitmap_bh->b_data)) - ext2_error(sb, __FUNCTION__, - "Superblock in group %d is marked free", i); - - desc_blocks = ext2_bg_num_gdb(sb, i); - for (j = 0; j < desc_blocks; j++) - if (!ext2_test_bit(j + 1, bitmap_bh->b_data)) - ext2_error(sb, __FUNCTION__, - "Descriptor block #%ld in group " - "%d is marked free", j, i); - - if (!block_in_use(le32_to_cpu(desc->bg_block_bitmap), - sb, bitmap_bh->b_data)) - ext2_error(sb, "ext2_check_blocks_bitmap", - "Block bitmap for group %d is marked free", - i); - - if (!block_in_use(le32_to_cpu(desc->bg_inode_bitmap), - sb, bitmap_bh->b_data)) - ext2_error(sb, "ext2_check_blocks_bitmap", - "Inode bitmap for group %d is marked free", - i); - - for (j = 0; j < EXT2_SB(sb)->s_itb_per_group; j++) - if (!block_in_use(le32_to_cpu(desc->bg_inode_table) + j, - sb, bitmap_bh->b_data)) - ext2_error (sb, "ext2_check_blocks_bitmap", - "Block #%ld of the inode table in " - "group %d is marked free", j, i); - - x = ext2_count_free(bitmap_bh, sb->s_blocksize); - if (le16_to_cpu(desc->bg_free_blocks_count) != x) - ext2_error (sb, "ext2_check_blocks_bitmap", - "Wrong free blocks count for group %d, " - "stored = %d, counted = %lu", i, - le16_to_cpu(desc->bg_free_blocks_count), x); - bitmap_count += x; - } - if (le32_to_cpu(es->s_free_blocks_count) != bitmap_count) - ext2_error (sb, "ext2_check_blocks_bitmap", - "Wrong free blocks count in super block, " - "stored = %lu, counted = %lu", - (unsigned long)le32_to_cpu(es->s_free_blocks_count), - bitmap_count); - brelse(bitmap_bh); -} -#endif --- linux-2.6.14-mm1-full/fs/ext2/ialloc.c.old 2005-11-07 21:23:04.000000000 +0100 +++ linux-2.6.14-mm1-full/fs/ext2/ialloc.c 2005-11-07 21:23:13.000000000 +0100 @@ -700,43 +700,3 @@ return count; } -#ifdef CONFIG_EXT2_CHECK -/* Called at mount-time, super-block is locked */ -void ext2_check_inodes_bitmap (struct super_block * sb) -{ - struct ext2_super_block * es = EXT2_SB(sb)->s_es; - unsigned long desc_count = 0, bitmap_count = 0; - struct buffer_head *bitmap_bh = NULL; - int i; - - for (i = 0; i < EXT2_SB(sb)->s_groups_count; i++) { - struct ext2_group_desc *desc; - unsigned x; - - desc = ext2_get_group_desc(sb, i, NULL); - if (!desc) - continue; - desc_count += le16_to_cpu(desc->bg_free_inodes_count); - brelse(bitmap_bh); - bitmap_bh = read_inode_bitmap(sb, i); - if (!bitmap_bh) - continue; - - x = ext2_count_free(bitmap_bh, EXT2_INODES_PER_GROUP(sb) / 8); - if (le16_to_cpu(desc->bg_free_inodes_count) != x) - ext2_error (sb, "ext2_check_inodes_bitmap", - "Wrong free inodes count in group %d, " - "stored = %d, counted = %lu", i, - le16_to_cpu(desc->bg_free_inodes_count), x); - bitmap_count += x; - } - brelse(bitmap_bh); - if (percpu_counter_read(&EXT2_SB(sb)->s_freeinodes_counter) != - bitmap_count) - ext2_error(sb, "ext2_check_inodes_bitmap", - "Wrong free inodes count in super block, " - "stored = %lu, counted = %lu", - (unsigned long)le32_to_cpu(es->s_free_inodes_count), - bitmap_count); -} -#endif --- linux-2.6.14-mm1-full/fs/ext2/super.c.old 2005-11-07 21:23:21.000000000 +0100 +++ linux-2.6.14-mm1-full/fs/ext2/super.c 2005-11-07 21:23:56.000000000 +0100 @@ -281,7 +281,7 @@ enum { Opt_bsd_df, Opt_minix_df, Opt_grpid, Opt_nogrpid, Opt_resgid, Opt_resuid, Opt_sb, Opt_err_cont, Opt_err_panic, - Opt_err_ro, Opt_nouid32, Opt_check, Opt_nocheck, Opt_debug, + Opt_err_ro, Opt_nouid32, Opt_nocheck, Opt_debug, Opt_oldalloc, Opt_orlov, Opt_nobh, Opt_user_xattr, Opt_nouser_xattr, Opt_acl, Opt_noacl, Opt_xip, Opt_ignore, Opt_err, Opt_quota, Opt_usrquota, Opt_grpquota @@ -303,7 +303,6 @@ {Opt_nouid32, "nouid32"}, {Opt_nocheck, "check=none"}, {Opt_nocheck, "nocheck"}, - {Opt_check, "check"}, {Opt_debug, "debug"}, {Opt_oldalloc, "oldalloc"}, {Opt_orlov, "orlov"}, @@ -376,13 +375,6 @@ case Opt_nouid32: set_opt (sbi->s_mount_opt, NO_UID32); break; - case Opt_check: -#ifdef CONFIG_EXT2_CHECK - set_opt (sbi->s_mount_opt, CHECK); -#else - printk("EXT2 Check option not supported\n"); -#endif - break; case Opt_nocheck: clear_opt (sbi->s_mount_opt, CHECK); break; @@ -503,12 +495,6 @@ EXT2_BLOCKS_PER_GROUP(sb), EXT2_INODES_PER_GROUP(sb), sbi->s_mount_opt); -#ifdef CONFIG_EXT2_CHECK - if (test_opt (sb, CHECK)) { - ext2_check_blocks_bitmap (sb); - ext2_check_inodes_bitmap (sb); - } -#endif return res; } --- linux-2.6.14-mm1-full/fs/ext3/balloc.c.old 2005-11-07 21:24:04.000000000 +0100 +++ linux-2.6.14-mm1-full/fs/ext3/balloc.c 2005-11-07 21:26:53.000000000 +0100 @@ -1517,76 +1517,3 @@ return EXT3_SB(sb)->s_gdb_count; } -#ifdef CONFIG_EXT3_CHECK -/* Called at mount-time, super-block is locked */ -void ext3_check_blocks_bitmap (struct super_block * sb) -{ - struct ext3_super_block *es; - unsigned long desc_count, bitmap_count, x, j; - unsigned long desc_blocks; - struct buffer_head *bitmap_bh = NULL; - struct ext3_group_desc *gdp; - int i; - - es = EXT3_SB(sb)->s_es; - desc_count = 0; - bitmap_count = 0; - gdp = NULL; - for (i = 0; i < EXT3_SB(sb)->s_groups_count; i++) { - gdp = ext3_get_group_desc (sb, i, NULL); - if (!gdp) - continue; - desc_count += le16_to_cpu(gdp->bg_free_blocks_count); - brelse(bitmap_bh); - bitmap_bh = read_block_bitmap(sb, i); - if (bitmap_bh == NULL) - continue; - - if (ext3_bg_has_super(sb, i) && - !ext3_test_bit(0, bitmap_bh->b_data)) - ext3_error(sb, __FUNCTION__, - "Superblock in group %d is marked free", i); - - desc_blocks = ext3_bg_num_gdb(sb, i); - for (j = 0; j < desc_blocks; j++) - if (!ext3_test_bit(j + 1, bitmap_bh->b_data)) - ext3_error(sb, __FUNCTION__, - "Descriptor block #%ld in group " - "%d is marked free", j, i); - - if (!block_in_use (le32_to_cpu(gdp->bg_block_bitmap), - sb, bitmap_bh->b_data)) - ext3_error (sb, "ext3_check_blocks_bitmap", - "Block bitmap for group %d is marked free", - i); - - if (!block_in_use (le32_to_cpu(gdp->bg_inode_bitmap), - sb, bitmap_bh->b_data)) - ext3_error (sb, "ext3_check_blocks_bitmap", - "Inode bitmap for group %d is marked free", - i); - - for (j = 0; j < EXT3_SB(sb)->s_itb_per_group; j++) - if (!block_in_use (le32_to_cpu(gdp->bg_inode_table) + j, - sb, bitmap_bh->b_data)) - ext3_error (sb, "ext3_check_blocks_bitmap", - "Block #%d of the inode table in " - "group %d is marked free", j, i); - - x = ext3_count_free(bitmap_bh, sb->s_blocksize); - if (le16_to_cpu(gdp->bg_free_blocks_count) != x) - ext3_error (sb, "ext3_check_blocks_bitmap", - "Wrong free blocks count for group %d, " - "stored = %d, counted = %lu", i, - le16_to_cpu(gdp->bg_free_blocks_count), x); - bitmap_count += x; - } - brelse(bitmap_bh); - if (le32_to_cpu(es->s_free_blocks_count) != bitmap_count) - ext3_error (sb, "ext3_check_blocks_bitmap", - "Wrong free blocks count in super block, " - "stored = %lu, counted = %lu", - (unsigned long)le32_to_cpu(es->s_free_blocks_count), - bitmap_count); -} -#endif --- linux-2.6.14-mm1-full/fs/ext3/ialloc.c.old 2005-11-07 21:27:02.000000000 +0100 +++ linux-2.6.14-mm1-full/fs/ext3/ialloc.c 2005-11-07 21:27:09.000000000 +0100 @@ -756,44 +756,3 @@ return count; } -#ifdef CONFIG_EXT3_CHECK -/* Called at mount-time, super-block is locked */ -void ext3_check_inodes_bitmap (struct super_block * sb) -{ - struct ext3_super_block * es; - unsigned long desc_count, bitmap_count, x; - struct buffer_head *bitmap_bh = NULL; - struct ext3_group_desc * gdp; - int i; - - es = EXT3_SB(sb)->s_es; - desc_count = 0; - bitmap_count = 0; - gdp = NULL; - for (i = 0; i < EXT3_SB(sb)->s_groups_count; i++) { - gdp = ext3_get_group_desc (sb, i, NULL); - if (!gdp) - continue; - desc_count += le16_to_cpu(gdp->bg_free_inodes_count); - brelse(bitmap_bh); - bitmap_bh = read_inode_bitmap(sb, i); - if (!bitmap_bh) - continue; - - x = ext3_count_free(bitmap_bh, EXT3_INODES_PER_GROUP(sb) / 8); - if (le16_to_cpu(gdp->bg_free_inodes_count) != x) - ext3_error (sb, "ext3_check_inodes_bitmap", - "Wrong free inodes count in group %d, " - "stored = %d, counted = %lu", i, - le16_to_cpu(gdp->bg_free_inodes_count), x); - bitmap_count += x; - } - brelse(bitmap_bh); - if (le32_to_cpu(es->s_free_inodes_count) != bitmap_count) - ext3_error (sb, "ext3_check_inodes_bitmap", - "Wrong free inodes count in super block, " - "stored = %lu, counted = %lu", - (unsigned long)le32_to_cpu(es->s_free_inodes_count), - bitmap_count); -} -#endif --- linux-2.6.14-mm1-full/fs/ext3/super.c.old 2005-11-07 21:27:17.000000000 +0100 +++ linux-2.6.14-mm1-full/fs/ext3/super.c 2005-11-07 21:27:48.000000000 +0100 @@ -625,7 +625,7 @@ enum { Opt_bsd_df, Opt_minix_df, Opt_grpid, Opt_nogrpid, Opt_resgid, Opt_resuid, Opt_sb, Opt_err_cont, Opt_err_panic, Opt_err_ro, - Opt_nouid32, Opt_check, Opt_nocheck, Opt_debug, Opt_oldalloc, Opt_orlov, + Opt_nouid32, Opt_nocheck, Opt_debug, Opt_oldalloc, Opt_orlov, Opt_user_xattr, Opt_nouser_xattr, Opt_acl, Opt_noacl, Opt_reservation, Opt_noreservation, Opt_noload, Opt_nobh, Opt_commit, Opt_journal_update, Opt_journal_inum, @@ -652,7 +652,6 @@ {Opt_nouid32, "nouid32"}, {Opt_nocheck, "nocheck"}, {Opt_nocheck, "check=none"}, - {Opt_check, "check"}, {Opt_debug, "debug"}, {Opt_oldalloc, "oldalloc"}, {Opt_orlov, "orlov"}, @@ -773,14 +772,6 @@ case Opt_nouid32: set_opt (sbi->s_mount_opt, NO_UID32); break; - case Opt_check: -#ifdef CONFIG_EXT3_CHECK - set_opt (sbi->s_mount_opt, CHECK); -#else - printk(KERN_ERR - "EXT3 Check option not supported\n"); -#endif - break; case Opt_nocheck: clear_opt (sbi->s_mount_opt, CHECK); break; @@ -1115,12 +1106,6 @@ } else { printk("internal journal\n"); } -#ifdef CONFIG_EXT3_CHECK - if (test_opt (sb, CHECK)) { - ext3_check_blocks_bitmap (sb); - ext3_check_inodes_bitmap (sb); - } -#endif return res; } From brice+ext3 at daysofwonder.com Tue Nov 8 15:14:44 2005 From: brice+ext3 at daysofwonder.com (Brice Figureau) Date: Tue, 08 Nov 2005 16:14:44 +0100 Subject: EXT3-fs error (device md2): ext3_journal_start_sb: Detected aborted journal... Message-ID: <1131462884.7659.31.camel@localhost.localdomain> Hi, I'm running a production server (Debian Sarge install) whose root filesystem (a software raid 1 array of 2 partitions of IDE drive) exhibited the following problem: Oct 28 06:00:06 server2 kernel: attempt to access beyond end of device Oct 28 06:00:06 server2 kernel: md2: rw=1, want=3050401328, limit=16353920 [...] a few of the above line snipped, want is different each time Oct 28 06:00:06 server2 kernel: md2: rw=1, want=2323778952, limit=16353920 Oct 28 06:00:06 server2 kernel: printk: 2 messages suppressed. Oct 28 06:00:06 server2 kernel: Buffer I/O error on device md2, logical block 3511697840 Oct 28 06:00:06 server2 kernel: lost page write due to I/O error on md2 Oct 28 06:00:06 server2 kernel: Aborting journal on device md2. Oct 28 06:05:01 server2 kernel: ext3_abort called. Oct 28 06:05:01 server2 kernel: EXT3-fs error (device md2): ext3_journal_start_sb: Detected aborted journal Oct 28 06:05:01 server2 kernel: Remounting filesystem read-only Now the root filesystem is remounted read-only. Running fsck on it produces the following: server2:~# e2fsck /dev/md2 e2fsck 1.37 (21-Mar-2005) Pass 1: Checking inodes, blocks, and sizes Inode 8 has illegal block(s). Clear? yes Illegal block #2371 (3939553560) in inode 8. CLEARED. Illegal block #2372 (2534662274) in inode 8. CLEARED. Illegal block #2373 (860109200) in inode 8. CLEARED. Illegal block #2374 (3289467369) in inode 8. CLEARED. Illegal block #2375 (3883044785) in inode 8. CLEARED. Illegal block #2376 (819724782) in inode 8. CLEARED. Illegal block #2377 (2957378758) in inode 8. CLEARED. Illegal block #2378 (1131441392) in inode 8. CLEARED. Illegal block #2379 (1473257247) in inode 8. CLEARED. Illegal block #2380 (2359314433) in inode 8. CLEARED. Illegal block #2381 (448867375) in inode 8. CLEARED. Too many illegal blocks in inode 8. Clear inode? yes Restarting e2fsck from the beginning... Pass 1: Checking inodes, blocks, and sizes Inode 8 has illegal block(s). Clear? and loops forever. I know inode 8 is the journal inode. I didn't try to reboot the server as I fear the recovery process would not work and would need a human presence to force the fsck (see question #1) What can I do to remotely repair this root filesystem (as the server is in a datacenter from which I'm far at the moment), and remount it rw ? Thank you, -- Brice Figureau From adilger at clusterfs.com Tue Nov 8 17:31:50 2005 From: adilger at clusterfs.com (Andreas Dilger) Date: Tue, 8 Nov 2005 10:31:50 -0700 Subject: EXT3-fs error (device md2): ext3_journal_start_sb: Detected aborted journal... In-Reply-To: <1131462884.7659.31.camel@localhost.localdomain> References: <1131462884.7659.31.camel@localhost.localdomain> Message-ID: <20051108173150.GF12862@schatzie.adilger.int> On Nov 08, 2005 16:14 +0100, Brice Figureau wrote: > Restarting e2fsck from the beginning... > Pass 1: Checking inodes, blocks, and sizes > Inode 8 has illegal block(s). Clear? > > and loops forever. > > I know inode 8 is the journal inode. > > What can I do to remotely repair this root filesystem (as the server is > in a datacenter from which I'm far at the moment), and remount it rw ? Running 'debugfs -w -R "feature ^has_journal,^needs_recovery" /dev/md2' should remove the journal from the filesystem, and then your e2fsck may work. Don't forget to add it back afterward "tune2fs -j /dev/md2". Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. From rmunk at quake.Stanford.EDU Wed Nov 9 02:09:47 2005 From: rmunk at quake.Stanford.EDU (Rasmus Munk Larsen) Date: Tue, 08 Nov 2005 18:09:47 -0800 Subject: smarter sparse files? Message-ID: <1131502187.12313.70.camel@akhenaten.Stanford.EDU> Question: Does ext2/3 (or any other filesystem you know of) support a system call turning blocks within a file back into "sparse zeros", i.e. giving the blocks back to the filesystem? Background: I am working on a slotted fileformat where internal fragmentation occurs. One such occurrence is growth of the data in a given slot, which currently requires me to handle the fragmentation explicitly. For example: ...==|== slot 1 ==|=== slot 2 ===|==... Now assume that contents of slot 1 is replaced with a larger chunk of data. I must either append additional data e.g. at the end of the file ...==|== slot 1a =|=== slot2 ===|==...==|=== slot1b ==| (and add my own data structures and code infrastructure to read fragmented slots) or leave the old (defunct) slot 1 data in place and garbage collect it later: ...==|= deadbeef =|=== slot2 ===|==...==|====== slot1' =====| It's my impression that mechanisms for handling similar types of fragmentation is already implemented quite well in most modern filesystems, and hence I was wondering: Question: Does ext2/3 (or any other filesystem you know of) support turning blocks within a file back into "sparse zeros", i.e. giving the blocks back to the filesystem? If that was the case I could simply free the disk blocks belonging entirely to slot1 (as in turning it into zeros in a sparse file) and append the new data at the end: ...==|XXXXXXXXXXXX|=== slot2 ===|==...==|===== slot1' =====| and in effect having the file system do my garbage collection for me. I would (probably very naively) think that this should be possible and cheap since it only involves by manipulating trees/freelists/whatever and perhaps "massaging" a small number of actual data blocks. (I should mention that my slots are typically much larger than a disk block 100kB-1MB, say) Simple example: On a file system supporting sparse files, the following fd = open("sparse1",w"); write(fd, buf, 10); lseek(fd,1000000,SEEK_CUR); write(fd, buf, 10); will create a file occupying a small number of disk blocks. Ideally I would like to be able to do the following fd = open("sparse2",w"); write(fd, buf, 1000020); lseek(fd,10,SEEK_SET); giveback(fd, 1000000, SEEK_CUR); lseek(fd,1000010,SEEK_SET); write(fd, buf, 10); and end up with a sparse2 not much larger than sparse1. "giveback" is my imaginary system call that tells the file system that n bytes starting at a given offset should no longer be considered part of the file and the associated blocks given back to the file system's freelist. I realize that this might not be possible currently if the chunk you wish to free is not aligned with the start of a block etc. etc. A "cruder" interface, just giving whole disk blocks back would be acceptable, though. Your comments would be much appreciated, Rasmus Munk Larsen, Stanford University. From adilger at clusterfs.com Tue Nov 15 19:28:59 2005 From: adilger at clusterfs.com (Andreas Dilger) Date: Tue, 15 Nov 2005 12:28:59 -0700 Subject: smarter sparse files? In-Reply-To: <1131502187.12313.70.camel@akhenaten.Stanford.EDU> References: <1131502187.12313.70.camel@akhenaten.Stanford.EDU> Message-ID: <20051115192859.GB5831@schatzie.adilger.int> On Nov 08, 2005 18:09 -0800, Rasmus Munk Larsen wrote: > Question: Does ext2/3 (or any other filesystem you know of) support > a system call turning blocks within a file back into "sparse zeros", > i.e. giving the blocks back to the filesystem? This is something that was implemented a long time ago, called "punch" but never integrated into the core kernel. It is essentially a form of truncate that has an "end" parameter instead of removing all blocks until EOF. Implementing this is quite complex and I imagine it is much more complex now than when we did it (maybe 1.2.x kernel days). However, I believe it is a useful interface and I think it would be used if it were available. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. From mb/ext3 at dcs.qmul.ac.uk Wed Nov 16 08:52:44 2005 From: mb/ext3 at dcs.qmul.ac.uk (Matt Bernstein) Date: Wed, 16 Nov 2005 08:52:44 +0000 Subject: (large, external) data journal BUG (Assertion failure in __journal_drop_transaction() at fs/jbd/checkpoint.c:626: "transaction->t_forget == NULL") Message-ID: <437AF35C.3010106@dcs.qmul.ac.uk> Hi, A couple of our important servers, both running FC4 but one i386 and one x86_64, have been crashing recently. They both are running ext3 data=journal with large external journals and high commit intervals. Both machines use the gdth driver for their hardware RAID sets, if that's of any use. I think the hardware is good in both cases. I hope someone finds this data useful enough to be able to fix the bug. IMAP server crash (once only, thus far): Assertion failure in __journal_drop_transaction() at fs/jbd/checkpoint.c:626: "transaction->t_forget == NULL" ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at "fs/jbd/checkpoint.c":626 invalid operand: 0000 [1] SMP CPU 0 Modules linked in: loop iptable_nat ip_conntrack_amanda ipt_ULOG ipt_REJECT ipt_state ip_conntrack iptable_filter ip_tables w83627hf eeprom lm85 i2c_sensor i2c_isa md5 ipv6 video button battery ac ohci_hcd i2c_amd8111 i2c_amd756 i2c_core shpchp e100 mii tg3 floppy sg dm_snapshot dm_zero dm_mirror ext3 jbd raid1 dm_mod gdth sata_sil libata sd_mod scsi_mod Pid: 1485, comm: kjournald Not tainted 2.6.12-1.1398_FC4smp RIP: 0010:[] {:jbd:__journal_drop_transaction+319} RSP: 0018:ffff8100fade9de8 EFLAGS: 00010292 RAX: 0000000000000074 RBX: ffff8100c5f0ea80 RCX: ffffffff8042d908 RDX: ffffffff8042d908 RSI: 0000000000000296 RDI: ffffffff8042d900 RBP: ffff8100f8b55000 R08: ffff81008234c040 R09: 0000000000000030 R10: 0000000000000000 R11: ffffffff8011d680 R12: ffff81003b333080 R13: ffff8100c5f0ea80 R14: ffff8100f8b55000 R15: 0000000000000000 FS: 00002aaaaadfcf00(0000) GS:ffffffff8050d780(0000) knlGS:00000000f7ff16c0 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00002aaab51a0000 CR3: 00000000e2980000 CR4: 00000000000006e0 Process kjournald (pid: 1485, threadinfo ffff8100fade8000, task ffff8100fb9be880) Stack: ffff8100020ba898 ffff81008caebce8 0000000000000000 ffffffff8807c9d2 ffff8100f8b55024 0000000000000cf7 ffff8100f8b5515c 0000000000000000 0000000000000000 0000000000000000 Call Trace:{:jbd:journal_commit_transaction+4194} {del_timer+113} {:jbd:kjournald+275} {:jbd:commit_timeout+0} {autoremove_wake_function+0} {child_rip+8} {:jbd:kjournald+0} {child_rip+0} Code: 0f 0b fe 15 08 88 ff ff ff ff 72 02 48 83 7b 50 00 74 34 49 RIP {:jbd:__journal_drop_transaction+319} RSP <3>Debug: sleeping function called from invalid context at include/linux/rwsem.h:43 in_atomic():0, irqs_disabled():1 Call Trace:{profile_task_exit+21} {do_exit+34} {vgacon_cursor+221} {die+77} {do_invalid_op+163} {:jbd:__journal_drop_transaction+319} {error_exit+0} {flat_send_IPI_mask+0} {:jbd:__journal_drop_transaction+319} {:jbd:__journal_drop_transaction+319} {:jbd:journal_commit_transaction+4194} {del_timer+113} {:jbd:kjournald+275} {:jbd:commit_timeout+0} {autoremove_wake_function+0} {child_rip+8} {:jbd:kjournald+0} {child_rip+0} File server crash (has happened a few times now): Assertion failure in __journal_drop_transaction() at fs/jbd/checkpoint.c:626: "transaction->t_forget == NULL" ------------[ cut here ]------------ kernel BUG at fs/jbd/checkpoint.c:626! invalid operand: 0000 [#1] SMP Modules linked in: loop nfsd exportfs lockd nfs_acl sunrpc autofs4 ipv6 ip_conntrack_amanda ipt_REJECT ipt_state ip_conntrack iptable_filter ip_tables dm_mod video button battery ac ohci_hcd i2c_amd756 i2c_core 3c59x mii ns83820 floppy sg ext3 jbd gdth sd_mod scsi_mod CPU: 0 EIP: 0060:[] Not tainted VLI EFLAGS: 00010296 (2.6.13-1.1526_FC4smp) EIP is at __journal_drop_transaction+0x117/0x2fa [jbd] eax: 00000074 ebx: f064d2e0 ecx: c036fbf4 edx: 00000286 esi: f699a200 edi: c2f50000 ebp: e775df84 esp: c2f50ec4 ds: 007b es: 007b ss: 0068 Process kjournald (pid: 1168, threadinfo=c2f50000 task=c2e64020) Stack: f88acfa8 f88b2e92 f88ada14 00000272 f88ada7c f064d2e0 f699a200 f88a9781 c2f50000 d142414c e775df84 f88a8f61 e775df84 f88a9700 c2f50000 ecb98e60 f064d2e0 000000f5 e85cc160 defc4598 f699a200 00000000 defc4560 f88a7846 Call Trace: [] __journal_remove_checkpoint+0x56/0x75 [jbd] [] __try_to_free_cp_buf+0x31/0x68 [jbd] [] __journal_clean_checkpoint_list+0x6f/0x9a [jbd] [] journal_commit_transaction+0x147/0xff1 [jbd] [] lock_timer_base+0x15/0x2f [] try_to_del_timer_sync+0x45/0x4d [] kjournald+0xc5/0x20d [jbd] [] commit_timeout+0x0/0x5 [jbd] [] autoremove_wake_function+0x0/0x37 [] kjournald+0x0/0x20d [jbd] [] kernel_thread_helper+0x5/0xb Code: 44 24 10 7c da 8a f8 c7 44 24 0c 72 02 00 00 c7 44 24 08 14 da 8a f8 c7 44 24 04 92 2e 8b f8 c7 04 24 a8 cf 8a f8 e8 cb 7c 87 c7 <0f> 0b 72 02 14 da 8a f8 8b 4b 2c 85 c9 74 34 c7 44 24 10 c4 d0 From tobias.orlamuende at googlemail.com Thu Nov 17 15:27:53 2005 From: tobias.orlamuende at googlemail.com (=?ISO-8859-1?Q?Tobias_Orlam=FCnde?=) Date: Thu, 17 Nov 2005 16:27:53 +0100 Subject: ext3-image doesn't mount anymore and reports errors Message-ID: Hi folks, we made an image of a partition by using dd. Original filesystem is ext3 (4k block-size). My colleague was able to mount this image once (using mount with "-o loop"). Since then anytime we try to mount it, it ends in the following error-message: ioctl: LOOP_CLR_FD: Device or resource busy mount: you must specify the filesystem type We also tried to mount it on another system than our backup-machine - without success but with the same error. Fsck.ext3 ends in lots of inode-errors and the following one: Error while iterating over blocks in inode 131736: Illegal triply indirect block found e2fsck: aborted Using an alternative superblock (32768) results in the same error. Due to the fact that this is our only backup of this machine it is really important for us at least to read this image and get some data off it. We are also willing to pay a fair amount of money for recovery. Is somebody able to help out quickly in this situation? Regards Tobias PS: Please don't blame me for this backup-strategy! :-) From tobias.orlamuende at googlemail.com Thu Nov 17 15:38:42 2005 From: tobias.orlamuende at googlemail.com (=?ISO-8859-1?Q?Tobias_Orlam=FCnde?=) Date: Thu, 17 Nov 2005 16:38:42 +0100 Subject: ext3-image doesn't mount anymore and reports errors Message-ID: Hi folks, please excuse, if this message come through twice. Seems like I have some troubles with this gmail-account. We made an image of a partition by using dd. Original filesystem is ext3 (4k block-size). My colleague was able to mount this image once (using mount with "-o loop"). Since then anytime we try to mount it, it ends in the following error-message: ioctl: LOOP_CLR_FD: Device or resource busy mount: you must specify the filesystem type We also tried to mount it on another system than our backup-machine - without success but with the same error. Fsck.ext3 ends in lots of inode-errors and the following one: Error while iterating over blocks in inode 131736: Illegal triply indirect block found e2fsck: aborted Using an alternative superblock (32768) results in the same error. Due to the fact that this is our only backup of this machine it is really important for us at least to read this image and get some data off it. We are also willing to pay a fair amount of money for recovery. Is somebody able to help out quickly in this situation? Regards Tobias PS: Please don't blame me for this backup-strategy! :-) From evilninja at gmx.net Thu Nov 17 16:26:13 2005 From: evilninja at gmx.net (evilninja at gmx.net) Date: Thu, 17 Nov 2005 17:26:13 +0100 Subject: ext3-image doesn't mount anymore and reports errors In-Reply-To: References: Message-ID: <437CAF25.8060606@gmx.net> Tobias Orlam?nde schrieb: > We made an image of a partition by using dd. Original filesystem is > ext3 (4k block-size). was it really a backup of a partition or did you backup a whole disk? (dd if=/dev/hda1 vs. dd if=/dev/hda) > Fsck.ext3 ends in lots of inode-errors and the following one: > > Error while iterating over blocks in inode 131736: Illegal triply > indirect block found > e2fsck: aborted please make sure to use a current version of e2fsprogs and a current kernel. > We are also willing to pay a fair amount of money for recovery. hm, a couple of weeks ago i too was in the need of ext3-recovery and some *really* famous recovery-specialist told me: "ext3? no, not possible at all." (except for grep'ing through the fs as "usual") Christian. -- BOFH excuse #416: We're out of slots on the server From dahernemtallah at hotmail.com Thu Nov 17 21:43:53 2005 From: dahernemtallah at hotmail.com (Nemtallah Daher) Date: Thu, 17 Nov 2005 16:43:53 -0500 Subject: Ext3 bad magic after upgrage FC1 to FC4 Message-ID: Dear All, I have server with a 160G disk with one partition /dev/hde1. The PC had FC1 and everything was working fine. I decided to do an upgrade to FC4. Now I can no longer mount that partition. I don't think anything happened to the file system, but changes in kernel and modules due to the upgrade is now making it inaccessible. I tried: mk2fs -n /dev/hde1 got a list of superblocks and tried all with no luck e2fsck -b xxxxxx /dev/hde1 bad magic or superblock I am sick over this and would appreciate any advice and guidance. Thank you. From evilninja at gmx.net Fri Nov 18 01:16:44 2005 From: evilninja at gmx.net (evilninja at gmx.net) Date: Fri, 18 Nov 2005 02:16:44 +0100 Subject: Ext3 bad magic after upgrage FC1 to FC4 In-Reply-To: References: Message-ID: <437D2B7C.2070603@gmx.net> Nemtallah Daher schrieb: > I have server with a 160G disk with one partition /dev/hde1. The PC had > FC1 and everything was working fine. I decided to do an upgrade to > FC4. Now I can no longer mount that partition. I don't think anything if it really is a kernel problem: can you downgrade to a previous kernel then? eg. FC1's or FC3's kernel. are there any errors in the syslog? with a new kernel, the disk-driver probably got upgraded too... -- BOFH excuse #313: your process is not ISO 9000 compliant From bunk at stusta.de Fri Nov 18 03:34:00 2005 From: bunk at stusta.de (Adrian Bunk) Date: Fri, 18 Nov 2005 04:34:00 +0100 Subject: [2.6 patch] fs/ext3/: small cleanups Message-ID: <20051118033359.GX11494@stusta.de> This patch contains the following cleanups: - there's no need for ext3_count_free() #ifndef EXT3FS_DEBUG - having prototypes for ext3_count_free() in two different headers is nonsense Signed-off-by: Adrian Bunk --- fs/ext3/balloc.c | 2 -- fs/ext3/bitmap.c | 8 +++++++- fs/ext3/bitmap.h | 8 -------- fs/ext3/ialloc.c | 1 - 4 files changed, 7 insertions(+), 12 deletions(-) --- linux-2.6.15-rc1-mm1-full/fs/ext3/bitmap.c.old 2005-11-18 02:52:02.000000000 +0100 +++ linux-2.6.15-rc1-mm1-full/fs/ext3/bitmap.c 2005-11-18 02:54:14.000000000 +0100 @@ -7,8 +7,11 @@ * Universite Pierre et Marie Curie (Paris VI) */ +#ifdef EXT3FS_DEBUG + #include -#include "bitmap.h" + +#include "ext3_fs.h" static int nibblemap[] = {4, 3, 3, 2, 3, 2, 2, 1, 3, 2, 2, 1, 2, 1, 1, 0}; @@ -24,3 +27,6 @@ nibblemap[(map->b_data[i] >> 4) & 0xf]; return (sum); } + +#endif /* EXT3FS_DEBUG */ + --- linux-2.6.15-rc1-mm1-full/fs/ext3/balloc.c.old 2005-11-18 02:52:55.000000000 +0100 +++ linux-2.6.15-rc1-mm1-full/fs/ext3/balloc.c 2005-11-18 02:53:02.000000000 +0100 @@ -20,8 +20,6 @@ #include #include -#include "bitmap.h" - /* * balloc.c contains the blocks allocation and deallocation routines */ --- linux-2.6.15-rc1-mm1-full/fs/ext3/ialloc.c.old 2005-11-18 02:53:26.000000000 +0100 +++ linux-2.6.15-rc1-mm1-full/fs/ext3/ialloc.c 2005-11-18 02:53:31.000000000 +0100 @@ -26,7 +26,6 @@ #include -#include "bitmap.h" #include "xattr.h" #include "acl.h" --- linux-2.6.15-rc1-mm1-full/fs/ext3/bitmap.h 2005-11-17 21:30:48.000000000 +0100 +++ /dev/null 2005-11-08 19:07:57.000000000 +0100 @@ -1,8 +0,0 @@ -/* linux/fs/ext3/bitmap.c - * - * Copyright (C) 2005 Simtec Electronics - * Ben Dooks - * -*/ - -extern unsigned long ext3_count_free (struct buffer_head *, unsigned int ); From jt at domainfactory.de Fri Nov 18 16:03:25 2005 From: jt at domainfactory.de (Jochen Tuchbreiter) Date: Fri, 18 Nov 2005 17:03:25 +0100 Subject: e2fsck not detecting corrupt file? Message-ID: <02a801c5ec59$9b4a27e0$6e0aa8c0@buero3> Hello, on my ext3 fs I have a file that I can not modify anymore: $ who am i root pts/0 Nov 18 19:42 (192.168.10.110) $ ls -al /mnt/path/usage_200306.html -rw-r-xrw- 1 50946 nobody 99935 Jul 1 2003 /mnt/path/usage_200306.html $ rm /mnt/path/usage_200306.html rm: remove regular file `/mnt/path/usage_200306.html'? y rm: cannot remove `/mnt/path/usage_200306.html': Operation not permitted $ chmod a+x /mnt/path/usage_200306.html chmod: changing permissions of `/mnt/path/usage_200306.html': Operation not permitted The fs is NOT mounted readonly, I can move around / change other files on the partition. However running a full e2fsck -f on the fs does not find any problem. $ ./e2fsck/e2fsck -f /dev/sdb6 e2fsck 1.38 (30-Jun-2005) Pass 1: Checking inodes, blocks, and sizes Pass 2: Checking directory structure Pass 3: Checking directory connectivity Pass 4: Checking reference counts Pass 5: Checking group summary information $ ./e2fsck/e2fsck -V e2fsck 1.38 (30-Jun-2005) Using EXT2FS Library version 1.38, 30-Jun-2005 $ $ stat /mnt/path/usage_200306.html File: `/mnt/path/usage_200306.html' Size: 99935 Blocks: 160 IO Block: 4096 regular file Device: 816h/2070d Inode: 2149222 Links: 1 Access: (0656/-rw-r-xrw-) Uid: (50946/ UNKNOWN) Gid: ( 99/ nobody) Access: 2005-11-19 00:36:35.000000000 +0100 Modify: 2003-07-01 04:55:16.000000000 +0200 Change: 2004-02-14 02:35:36.000000000 +0100 $ uname -a Linux machine 2.4.29-grsec #10 SMP Mon Jul 4 14:26:46 CEST 2005 i686 Intel(R) Pentium(R) 4 CPU 3.00GHz GenuineIntel GNU/Linux This is not a hard disk problem: dd'ed the whole partition from one disk to a brand new one and had the same problem on the new disk. Do you guys have any suggestions on how to further diagnose or fix this? I also tried it on a 2.6 kernel without grsec, same result. best regards, Jochen From alex at alex.org.uk Fri Nov 18 16:25:44 2005 From: alex at alex.org.uk (Alex Bligh) Date: Fri, 18 Nov 2005 16:25:44 +0000 Subject: e2fsck not detecting corrupt file? In-Reply-To: <02a801c5ec59$9b4a27e0$6e0aa8c0@buero3> References: <02a801c5ec59$9b4a27e0$6e0aa8c0@buero3> Message-ID: --On 18 November 2005 17:03 +0100 Jochen Tuchbreiter wrote: > $ chmod a+x /mnt/path/usage_200306.html > chmod: changing permissions of `/mnt/path/usage_200306.html': Operation > not permitted > > The fs is NOT mounted readonly, I can move around / change other files on > the partition. Is it marked as immutable (somehow)? As root, try: chattr -i Alex From jt at domainfactory.de Fri Nov 18 16:33:00 2005 From: jt at domainfactory.de (Jochen Tuchbreiter) Date: Fri, 18 Nov 2005 17:33:00 +0100 Subject: e2fsck not detecting corrupt file? In-Reply-To: Message-ID: <02b801c5ec5d$bd4ed120$6e0aa8c0@buero3> Hello, > > The fs is NOT mounted readonly, I can move around / change > other files on > > the partition. > > Is it marked as immutable (somehow)? As root, try: > chattr -i That's it: It had really strange attr-settings (+a +c). After removing the "append only" it now works. I wonder how this happened, maybe I had a hardware problem on the disk some time ago. Thank you very much Alex! regards, Jochen From puhuri at iki.fi Tue Nov 22 11:08:00 2005 From: puhuri at iki.fi (Markus Peuhkuri) Date: Tue, 22 Nov 2005 13:08:00 +0200 Subject: Doing fsck on shutdown Message-ID: <4382FC10.2080606@iki.fi> I usually shutdown computer for night (could probaly use software shutdown, but have not yet studied it). In that case, the disk is checked quite often with default settings, usually in those cases I am in hurry and want computer to start up fast :-). One alternative would be trying to run fsck at shutdown if fsck is due in a few mounts. One could abort that if one wants computer to shutdown fast, but in normal case one could just allow it and then computer would later shut ifself down. Has anyone designed initscripts for that? ps. another issue regarding to mount counts is automounting USB disks with ext3 file system. If one uses automounter, then one rapidly accumulates mount count for fsck. Of course, it is possible to set counter to zero and make fsck only based on time. Any opinions on that? From tytso at mit.edu Tue Nov 22 15:21:19 2005 From: tytso at mit.edu (Theodore Ts'o) Date: Tue, 22 Nov 2005 10:21:19 -0500 Subject: Doing fsck on shutdown In-Reply-To: <4382FC10.2080606@iki.fi> References: <4382FC10.2080606@iki.fi> Message-ID: <20051122152119.GD29179@thunk.org> On Tue, Nov 22, 2005 at 01:08:00PM +0200, Markus Peuhkuri wrote: > I usually shutdown computer for night (could probaly use software > shutdown, but have not yet studied it). In that case, the disk is > checked quite often with default settings, usually in those cases I am > in hurry and want computer to start up fast :-). > > One alternative would be trying to run fsck at shutdown if fsck is due > in a few mounts. One could abort that if one wants computer to shutdown > fast, but in normal case one could just allow it and then computer would > later shut ifself down. > > Has anyone designed initscripts for that? That's a good/interesting idea. One suggestion; if you do this, make sure you check to see if you are running on batteries; if you are, it's likely that you might be in a situation such as a laptop on an airplane and the airline attendant has just told you to shut down all electronics in preparation for landing --- or the laptop has just reported that you only have 3% battery life left, and please shut down now. Sometimes doing a 3-5 minute FSCK run at shutdown isn't always the right thing.... - Ted From bryan at kadzban.is-a-geek.net Thu Nov 17 17:53:22 2005 From: bryan at kadzban.is-a-geek.net (Bryan Kadzban) Date: Thu, 17 Nov 2005 12:53:22 -0500 Subject: ext3-image doesn't mount anymore and reports errors In-Reply-To: References: Message-ID: <20051117175322.GA5511@kadzban.is-a-geek.net> On Thu, Nov 17, 2005 at 04:27:53PM +0100, Tobias Orlam?nde wrote: > My colleague was able to mount this image once (using mount with "-o loop"). > Since then anytime we try to mount it, it ends in the following error-message: > > ioctl: LOOP_CLR_FD: Device or resource busy > mount: you must specify the filesystem type Looking at the loop driver (drivers/block/loop.c), the handler for LOOP_CLR_FD checks a ref-count on the loop device. If the ref-count is bigger than 1 (the ioctl call holds a reference), it returns -EBUSY, which corresponds to the error you're getting (device or resource busy). Does anything else on the system have a handle open to the loop device file? What about the image file? Do you have any other loopback-mounts running at the time? Does it help to manually do the losetup operation on a known-free loop device, then mount the loop device itself (without -o loop)? (You will have to "losetup -d" the device after you unmount it, also -- normally umount handles that.) What does "losetup -f" say? -------------- next part -------------- A non-text attachment was scrubbed... Name: not available Type: application/pgp-signature Size: 189 bytes Desc: not available URL: From evil at g-house.de Thu Nov 24 03:00:52 2005 From: evil at g-house.de (Christian) Date: Thu, 24 Nov 2005 04:00:52 +0100 Subject: Doing fsck on shutdown In-Reply-To: <4382FC10.2080606@iki.fi> References: <4382FC10.2080606@iki.fi> Message-ID: <43852CE4.6040202@g-house.de> Markus Peuhkuri schrieb: > I usually shutdown computer for night (could probaly use software > shutdown, but have not yet studied it). In that case, the disk is > checked quite often with default settings, usually in those cases I am > in hurry and want computer to start up fast :-). for often rebooted desktop systems i'd just tune2fs(8) the filesystem to fsck based on a given interval of time rather than on the count of the mounts: % tune2fs -i 1m /dev/sda1 ....will check sda1 every month. Christian. -- BOFH excuse #341: HTTPD Error 666 : BOFH was here From linux at horizon.com Thu Nov 24 21:42:57 2005 From: linux at horizon.com (linux at horizon.com) Date: 24 Nov 2005 16:42:57 -0500 Subject: Assertion failure in ext3_sync_file() at fs/ext3/fsync.c:50: "ext3_journal_current_handle() == 0" Message-ID: <20051124214257.4673.qmail@science.horizon.com> ------------[ cut here ]------------ kernel BUG at fs/ext3/fsync.c:50! invalid operand: 0000 [#1] CPU: 0 EIP: 0060:[] Not tainted VLI EFLAGS: 00010296 (2.6.13.1) EIP is at ext3_sync_file+0x58/0xf0 eax: 00000068 ebx: bf4a479c ecx: b03cffac edx: b03cffac esi: b0398cfc edi: b2b8f1c8 ebp: c13bcf60 esp: c13bcf18 ds: 007b es: 007b ss: 0068 Process aptitude (pid: 26952, threadinfo=c13bc000 task=d99cca80) Stack: b0398afc b0383f40 b0395746 00000032 b0398cfc 00000000 00000000 e84824c0 ca281dc0 bf4a483c c13bcf60 b01317c2 bf4a483c 00000000 00000000 e84824c0 ffffffe4 bf4a483c c13bcf80 b01458ce e84824c0 dce74d9c 00000001 a7004000 Call Trace: [] show_stack+0xab/0xf0 [] show_registers+0x164/0x200 [] die+0xc8/0x150 [] do_trap+0x89/0xd0 [] do_invalid_op+0xaa/0xc0 [] error_code+0x4f/0x54 [] msync_interval+0x8e/0xd0 [] sys_msync+0x15f/0x171 [] syscall_call+0x7/0xb Code: ba 46 57 39 b0 be fc 8c 39 b0 b8 40 3f 38 b0 89 74 24 10 89 4c 24 0c 89 54 24 08 89 44 24 04 c7 04 24 fc 8a 39 b0 e8 08 10 f9 ff <0f> 0b 32 00 46 57 39 b0 0f b7 43 28 25 00 f0 00 00 3d 00 80 00 x86, uniprocessor, 2.6.13.1, ext3 file system, data=ordered, 6-way RAID-1. Kernel is stock except for ppskit-lite patches. This is the usually-not-mounted emergency rescue partition which contains disaster recovery tools. Thus, the somewhat paranoid data integrity settings. The FS just filled up as I was doing the every-few-months update. Filesystem 1K-blocks Used Available Use% Mounted on /dev/md1 432312 425732 0 100% /boot I'm currently copying a raw device snapshot which I can make available to anyone who promises not to go grepping for secrets on it. I don't think there are any, but hunting through the whole image and maybe zeroing a few data blocks is a bit of a PITA. Anyway, thanks for what has usually been a very reliable file system! I hope there's enough info here to find the problem. Here's the tune2fs -l output. No idea why it says "clean"; it is still mounted read/write. tune2fs 1.38 (30-Jun-2005) Filesystem volume name: Last mounted on: /boot Filesystem UUID: ad036960-f1df-4c5e-9240-4e917527f20c Filesystem magic number: 0xEF53 Filesystem revision #: 1 (dynamic) Filesystem features: has_journal filetype needs_recovery sparse_super Default mount options: (none) Filesystem state: clean Errors behavior: Remount read-only Filesystem OS type: Linux Inode count: 55296 Block count: 439360 Reserved block count: 21968 Free blocks: 91538 Free inodes: 30975 First block: 1 Block size: 1024 Fragment size: 1024 Blocks per group: 8192 Fragments per group: 8192 Inodes per group: 1024 Inode blocks per group: 128 Last mount time: Thu Nov 24 20:52:16 2005 Last write time: Thu Nov 24 20:52:16 2005 Mount count: 6 Maximum mount count: 34 Last checked: Sat Aug 6 04:39:08 2005 Check interval: 15552000 (6 months) Next check after: Thu Feb 2 04:39:08 2006 Reserved blocks uid: 0 (user root) Reserved blocks gid: 0 (group root) First inode: 11 Inode size: 128 Journal inode: 8 Journal backup: inode blocks From evilninja at gmx.net Sat Nov 26 22:44:42 2005 From: evilninja at gmx.net (Christian) Date: Sat, 26 Nov 2005 23:44:42 +0100 (CET) Subject: Assertion failure in ext3_sync_file() at fs/ext3/fsync.c:50: "ext3_journal_current_handle() == 0" In-Reply-To: <20051124214257.4673.qmail@science.horizon.com> References: <20051124214257.4673.qmail@science.horizon.com> Message-ID: <26063.195.126.66.126.1133045082.squirrel@housecafe.dyndns.org> On Thu, November 24, 2005 22:42, linux at horizon.com wrote: > ------------[ cut here ]------------ > kernel BUG at fs/ext3/fsync.c:50! > invalid operand: 0000 [#1] > CPU: 0 is this error reproducible? > Here's the tune2fs -l output. No idea why it says "clean"; it is still > mounted read/write. i don't know if this is "ok" (don't have the docs atm) but does e2fsck report anything to worry about? thanks, Christian -- make bzImage, not war From linux at horizon.com Sun Nov 27 01:26:51 2005 From: linux at horizon.com (linux at horizon.com) Date: 26 Nov 2005 20:26:51 -0500 Subject: Assertion failure in ext3_sync_file() at fs/ext3/fsync.c:50: "ext3_journal_current_handle() == 0" In-Reply-To: <26063.195.126.66.126.1133045082.squirrel@housecafe.dyndns.org> Message-ID: <20051127012651.30628.qmail@science.horizon.com> > is this error reproducible? Sorry, I didn't try; it was a production server I already had one short-notice reboot on, and I didn't feel like trying for two. (Although you're right, I should have thought of leaving it like that until a good maintenance window instead of immediately e2fscking and cleaning up.) >> Here's the tune2fs -l output. No idea why it says "clean"; it is still >> mounted read/write. > i don't know if this is "ok" (don't have the docs atm) but does e2fsck > report anything to worry about? No, it didn't. I moved half of the .deb files and completed the update by halves, and all was well. From adilger at clusterfs.com Sun Nov 27 09:00:12 2005 From: adilger at clusterfs.com (Andreas Dilger) Date: Sun, 27 Nov 2005 02:00:12 -0700 Subject: Assertion failure in ext3_sync_file() at fs/ext3/fsync.c:50: "ext3_journal_current_handle() == 0" In-Reply-To: <20051124214257.4673.qmail@science.horizon.com> References: <20051124214257.4673.qmail@science.horizon.com> Message-ID: <20051127090012.GU14509@schatzie.adilger.int> On Nov 24, 2005 16:42 -0500, linux at horizon.com wrote: > ------------[ cut here ]------------ > kernel BUG at fs/ext3/fsync.c:50! > Process aptitude (pid: 26952, threadinfo=c13bc000 task=d99cca80) > Call Trace: > [] msync_interval+0x8e/0xd0 > [] sys_msync+0x15f/0x171 > [] syscall_call+0x7/0xb This BUG is: J_ASSERT(ext3_journal_current_handle() == 0); which means that somehow the aptitude process struct had a journal handle still active when it shouldn't have. Are there any console messages or before the BUG, or just ENOSPC from the program? Either way, I'd suspect a bug in the error handling code not doing a journal_stop() before exiting a function somewhere... > Here's the tune2fs -l output. No idea why it says "clean"; it is still > mounted read/write. > > Filesystem features: has_journal filetype needs_recovery sparse_super > Filesystem state: clean FYI - all ext3 filesystems say "clean" all the time, because when the journal replay is completed (note "needs_recovery" flag above) the filesystem will in fact be clean (i.e. not needing an e2fsck). If this were "error" (after the kernel detected some on-disk error) then you'd get a full e2fsck on boot regardless of ext3 recovery or not. Cheers, Andreas -- Andreas Dilger Principal Software Engineer Cluster File Systems, Inc. From linux at horizon.com Tue Nov 29 02:36:30 2005 From: linux at horizon.com (linux at horizon.com) Date: 28 Nov 2005 21:36:30 -0500 Subject: Assertion failure in ext3_sync_file() at fs/ext3/fsync.c:50: "ext3_journal_current_handle() == 0" In-Reply-To: <20051127090012.GU14509@schatzie.adilger.int> Message-ID: <20051129023630.8145.qmail@science.horizon.com> > which means that somehow the aptitude process struct had a journal handle > still active when it shouldn't have. Are there any console messages or > before the BUG, or just ENOSPC from the program? Either way, I'd suspect > a bug in the error handling code not doing a journal_stop() before exiting > a function somewhere... Sorry, nothing for 5 minutes, and that's just a martian packet. :-( >> Filesystem state: clean > FYI - all ext3 filesystems say "clean" all the time, because when the > journal replay is completed (note "needs_recovery" flag above) the > filesystem will in fact be clean (i.e. not needing an e2fsck). If this > were "error" (after the kernel detected some on-disk error) then you'd > get a full e2fsck on boot regardless of ext3 recovery or not. Neat, thanks!