From cajun at cajuninc.com  Wed Nov  1 01:15:30 2006
From: cajun at cajuninc.com (M. Lewis)
Date: Tue, 31 Oct 2006 20:15:30 -0500
Subject: e2fsck: Bad magic number in super-block
Message-ID: <4547F532.3030504@cajuninc.com>

I posted this to the Fedora-list, but thought I might get some 
additional information here as well.

I have a HD that refuses to mount with a 'bad magic number in 
super-block'. I'm running FedoraCore 6 x86_64.

[root at moe ~]# fdisk -l /dev/hdc

Disk /dev/hdc: 250.0 GB, 250059350016 bytes
255 heads, 63 sectors/track, 30401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

    Device Boot      Start         End      Blocks   Id  System
/dev/hdc1   *           1          13      104391   83  Linux
/dev/hdc2              14        9729    78043770   8e  Linux LVM


[root at moe ~]# mount -t ext3 /dev/hdc2 /Big-Drive/
mount: wrong fs type, bad option, bad superblock on /dev/hdc2,
        missing codepage or other error
        In some cases useful info is found in syslog - try
        dmesg | tail  or so

[root at moe ~]# e2fsck -b 11239425 /dev/hdc2
e2fsck 1.39 (29-May-2006)
e2fsck: Invalid argument while trying to open /dev/hdc2

The superblock could not be read or does not describe a correct ext2
filesystem.  If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
     e2fsck -b 8193 <device>

[root at moe ~]# !624
mke2fs -n /dev/hdc2
mke2fs 1.39 (29-May-2006)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
9764864 inodes, 19510942 blocks
975547 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
596 block groups
32768 blocks per group, 32768 fragments per group
16384 inodes per group
Superblock backups stored on blocks:
         32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 
2654208,
         4096000, 7962624, 11239424


I've tried 'e2fsck -b (superblock) /dev/hdc2 on all the superblocks 
listed above to no avail.

I've read about 'mke2fs -S' as being a possible solution, however I see 
that it is recommended as a last resort. Therefore I have held off on 
trying that method.

I'm afraid I'm toasted, however I'm still hopeful that I might recover 
some (or all) of my data.

Have I overlooked something?

Thanks,
Mike

-- 

  IBM: Insanely Better Marketing
   18:20:01 up 1 day,  4:08,  0 users,  load average: 0.12, 0.27, 0.25

  Linux Registered User #241685  http://counter.li.org



From mnalis-ml at voyager.hr  Wed Nov  1 11:58:37 2006
From: mnalis-ml at voyager.hr (Matija Nalis)
Date: Wed, 1 Nov 2006 12:58:37 +0100
Subject: e2fsck: Bad magic number in super-block
In-Reply-To: <4547F532.3030504@cajuninc.com>
References: <4547F532.3030504@cajuninc.com>
Message-ID: <20061101115837.GA3046@eagle102.home.lan>

On Tue, Oct 31, 2006 at 08:15:30PM -0500, M. Lewis wrote:
> I posted this to the Fedora-list, but thought I might get some 
> additional information here as well.
> 
> I have a HD that refuses to mount with a 'bad magic number in 
> super-block'. I'm running FedoraCore 6 x86_64.
> 
> [root at moe ~]# fdisk -l /dev/hdc
> 
> Disk /dev/hdc: 250.0 GB, 250059350016 bytes
> 255 heads, 63 sectors/track, 30401 cylinders
> Units = cylinders of 16065 * 512 = 8225280 bytes
> 
>    Device Boot      Start         End      Blocks   Id  System
> /dev/hdc1   *           1          13      104391   83  Linux
> /dev/hdc2              14        9729    78043770   8e  Linux LVM

are you ABSOLUTELY SURE that /dev/hdc2 really containt DIRECTLY the ext3
filesystem ?

By FDISK output, it looks like it is not ext3 partition, but a physical
volume controlled by LVM, so you should use LVM tools to find real data
(vgscan, vgdisplay, pvscan, lvscan, ...)

the ext3 partition you are looking for is probably on logical volume on that
LVM...

-- 
Opinions above are GNU-copylefted.



From cajun at cajuninc.com  Wed Nov  1 19:27:16 2006
From: cajun at cajuninc.com (M. Lewis)
Date: Wed, 01 Nov 2006 14:27:16 -0500
Subject: e2fsck: Bad magic number in super-block
In-Reply-To: <20061101115837.GA3046@eagle102.home.lan>
References: <4547F532.3030504@cajuninc.com>
	<20061101115837.GA3046@eagle102.home.lan>
Message-ID: <4548F514.60702@cajuninc.com>

Matija Nalis wrote:
> On Tue, Oct 31, 2006 at 08:15:30PM -0500, M. Lewis wrote:
>> I posted this to the Fedora-list, but thought I might get some 
>> additional information here as well.
>>
>> I have a HD that refuses to mount with a 'bad magic number in 
>> super-block'. I'm running FedoraCore 6 x86_64.
>>
>> [root at moe ~]# fdisk -l /dev/hdc
>>
>> Disk /dev/hdc: 250.0 GB, 250059350016 bytes
>> 255 heads, 63 sectors/track, 30401 cylinders
>> Units = cylinders of 16065 * 512 = 8225280 bytes
>>
>>    Device Boot      Start         End      Blocks   Id  System
>> /dev/hdc1   *           1          13      104391   83  Linux
>> /dev/hdc2              14        9729    78043770   8e  Linux LVM
> 
> are you ABSOLUTELY SURE that /dev/hdc2 really containt DIRECTLY the ext3
> filesystem ?
> 
> By FDISK output, it looks like it is not ext3 partition, but a physical
> volume controlled by LVM, so you should use LVM tools to find real data
> (vgscan, vgdisplay, pvscan, lvscan, ...)
> 
> the ext3 partition you are looking for is probably on logical volume on that
> LVM...
> 

Thanks Matija. No, at this point the only thing I am sure of is I can't 
mount the drive with my data.

I'm not familiar with any of the LVM tools. Here's the output of the 
tools you suggested:

[root at moe ~]# vgscan
   Reading all physical volumes.  This may take a while...
   Found volume group "VolGroup00" using metadata type lvm2

[root at moe ~]# vgdisplay
   --- Volume group ---
   VG Name               VolGroup00
   System ID
   Format                lvm2
   Metadata Areas        1
   Metadata Sequence No  3
   VG Access             read/write
   VG Status             resizable
   MAX LV                0
   Cur LV                2
   Open LV               2
   Max PV                0
   Cur PV                1
   Act PV                1
   VG Size               74.41 GB
   PE Size               32.00 MB
   Total PE              2381
   Alloc PE / Size       2380 / 74.38 GB
   Free  PE / Size       1 / 32.00 MB
   VG UUID               vXWCaM-XkRG-l28x-tyH1-Rrf4-KjXF-AfASLY

[root at moe ~]# pvscan
   PV /dev/hda2   VG VolGroup00   lvm2 [74.41 GB / 32.00 MB free]
   Total: 1 [74.41 GB] / in use: 1 [74.41 GB] / in no VG: 0 [0   ]

[root at moe ~]# lvscan
   ACTIVE            '/dev/VolGroup00/LogVol00' [72.44 GB] inherit
   ACTIVE            '/dev/VolGroup00/LogVol01' [1.94 GB] inherit

-- 

  Dreams are free, but you get soaked on the connect time.
   14:25:01 up 2 days, 13 min,  0 users,  load average: 0.09, 0.28, 0.25

  Linux Registered User #241685  http://counter.li.org



From bryan at kadzban.is-a-geek.net  Wed Nov  1 22:36:34 2006
From: bryan at kadzban.is-a-geek.net (Bryan Kadzban)
Date: Wed, 01 Nov 2006 17:36:34 -0500
Subject: e2fsck: Bad magic number in super-block
In-Reply-To: <4548F514.60702@cajuninc.com>
References: <4547F532.3030504@cajuninc.com>	<20061101115837.GA3046@eagle102.home.lan>
	<4548F514.60702@cajuninc.com>
Message-ID: <45492172.3050000@kadzban.is-a-geek.net>

I likewise know very little about the LVM tools, but this:

M. Lewis wrote:
> [root at moe ~]# lvscan
>   ACTIVE            '/dev/VolGroup00/LogVol00' [72.44 GB] inherit
>   ACTIVE            '/dev/VolGroup00/LogVol01' [1.94 GB] inherit

implies to me that you might try fdisk'ing /dev/VolGroup00/LogVol00 and
/dev/VolGroup00/LogVol01 instead.  Perhaps one of those has a partition
table that you could fsck.  (Or perhaps both of those *are* partitions
that you can fsck.)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 258 bytes
Desc: OpenPGP digital signature
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20061101/fb016581/attachment.sig>

From haven at thehavennet.org.uk  Thu Nov  2 16:13:34 2006
From: haven at thehavennet.org.uk (Simon Alman)
Date: Thu, 2 Nov 2006 16:13:34 -0000 (UTC)
Subject: RHEL connundrum with df and du
Message-ID: <40732.84.12.36.186.1162484014.squirrel@saratoga.thehavennet.org.uk>

Hi All

I am having an issue with disk space and since it is happening using an
ext3 formatted partition I felt that this would be the most appropriate
list to post to.

The problem is this; I have access to two RHEL systems (one running RHEL3
and one running
RHEL4). Both are in the same company and both exhibit the same problem
that I have not seen anywhere else before.

Both have full / partitions. df shows this to be the case so it must be
true ... right ?

Well du disagrees, on one system with a full 20GB / parition that df shows
to be full, df can only find 3.6GB of files.

So off to the redhat FAQ I went and found this:
http://kbase.redhat.com/faq/FAQ_35_5209.shtm

Great that explains the problem precisely ... except it didn't help. No
processes were holding large deleted files in a locked state.

So I looked at inodes thinking that they may have all been used up ...
they haven't df shows only 5% inode usage.

I forced an fsck run on the partition on reboot and neither this nor the
reboot helped, fsck shows clean and after the reboot things were still
broken.

So I'm sat here trying to explain to the client why their 5k worth of dell
server is currently no much use to them (I can't install anything on it
due to the space issue ...).

Has anyone come across anything similar before ? I've worked with linux
for six years and this is a new one on me ... and twice in the same
company. I suspect major kernel b0rkage since both systems use Dell's RHEL
build for the specific model of server but proving it is beyond me right
now.

Any help/advice would be very gratefully appreciated.

Kind regards

Simon Alman





From neilb at cse.unsw.edu.au  Fri Nov  3 01:09:09 2006
From: neilb at cse.unsw.edu.au (Neil Brown)
Date: Fri, 3 Nov 2006 12:09:09 +1100
Subject: question about exact behaviour with data=ordered.
Message-ID: <17738.38581.928915.23988@cse.unsw.edu.au>


Suppose I have a large machine with 24Gig of memory, and I write a 2
gig file.  This is below 10% and so background_dirty_threshold wont
cause any write out.
Suppose that a regular journal commit is then triggered.  

Am I correct in thinking this will flush out the full 2Gig, causing
the commit to take about 30 seconds if the drive sustains
60Meg/second?

If so, what other operations will be blocked while the commit happens?
I assume sync updates (rm, chmod, mkdir etc) will block?
Is it safe to assume that normaly async writes won't block?
What about if they extend the file and so change the file size?
What about atime updates? Could they ever block for the full 30
seconds?


Supposing lots of stuff would block for 30seconds, is there anything
that could be done to improve this?  Would it be possible (easy?) to
modify the commit process to flush out 'ordered' data without locking
the journal?


As you might guess, we have a situation where writing large files on a
large-memory machine is causing occasional bad fs delays and I'm
trying to understand what is going on.

Thanks for any input,
NeilBrown



From keld at dkuug.dk  Sun Nov  5 00:25:22 2006
From: keld at dkuug.dk (Keld =?iso-8859-1?Q?J=F8rn?= Simonsen)
Date: Sun, 5 Nov 2006 01:25:22 +0100
Subject: compressed read-only ext3 file system
Message-ID: <20061105002522.GA6981@rap.rap.dk>

Hi

I am looking for a compressed ext3 file system, for a read-only
purpose.

The idea is that I would like to make a live-cd that is fast to install.
The install should be almost just a raw copy of what is on the cd,
uncompressed. In that way I should be able to make an install of a full
Linux system of say 3 GB in under 2 minutes. The fs type I would like
to unpack is ext3 - but other fs types should be doable as well.

Then I would like to run the cd from the cdrom drive, so some kind of 
live-cd running code should also be available.

Has this been done before? Is the idea feasible?

best regards
keld



From adilger at clusterfs.com  Tue Nov  7 00:06:02 2006
From: adilger at clusterfs.com (Andreas Dilger)
Date: Mon, 6 Nov 2006 17:06:02 -0700
Subject: e2defrag - Unable to allocate buffer for inode priorities
In-Reply-To: <87iri0ma8s.fsf@informatik.uni-tuebingen.de>
References: <F97B964C23FAFC4990BEFE9DB220E18C013B5E77@msx.valhalla.local>
	<20061031171050.GG5655@schatzie.adilger.int>
	<20061031192947.GA12277@thunk.org>
	<87iri0ma8s.fsf@informatik.uni-tuebingen.de>
Message-ID: <20061107000602.GE6012@schatzie.adilger.int>

On Oct 31, 2006  22:44 +0100, Goswin von Brederlow wrote:
> It should be doing that (checking for ext3 I can confirm) as of
> 
> It doesn't handle ext3 right and does know so:
> 
> # mke2fs -j /dev/ram0 
> # e2defrag -r /dev/ram0
> 
> e2defrag (/dev/ram0): ext3 filesystems not (yet) supported
> 
> It hapily defrags a filesystem with resize_inode. Is it destroying
> resize capability or directly destroying data?

It is destroying the resize capability.  The primary issue here is
that tools which manipulate the filesystem directly (e.g. e2fsprogs)
have to understand ALL of the *COMPAT flags, and not just the INCOMPAT
flags.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.



From brederlo at informatik.uni-tuebingen.de  Tue Nov  7 03:36:17 2006
From: brederlo at informatik.uni-tuebingen.de (Goswin von Brederlow)
Date: Tue, 07 Nov 2006 04:36:17 +0100
Subject: e2defrag - Unable to allocate buffer for inode priorities
In-Reply-To: <20061107000602.GE6012@schatzie.adilger.int> (Andreas Dilger's
	message of "Mon, 6 Nov 2006 17:06:02 -0700")
References: <F97B964C23FAFC4990BEFE9DB220E18C013B5E77@msx.valhalla.local>
	<20061031171050.GG5655@schatzie.adilger.int>
	<20061031192947.GA12277@thunk.org>
	<87iri0ma8s.fsf@informatik.uni-tuebingen.de>
	<20061107000602.GE6012@schatzie.adilger.int>
Message-ID: <87bqnknd1q.fsf@informatik.uni-tuebingen.de>

Andreas Dilger <adilger at clusterfs.com> writes:

> On Oct 31, 2006  22:44 +0100, Goswin von Brederlow wrote:
>> It should be doing that (checking for ext3 I can confirm) as of
>> 
>> It doesn't handle ext3 right and does know so:
>> 
>> # mke2fs -j /dev/ram0 
>> # e2defrag -r /dev/ram0
>> 
>> e2defrag (/dev/ram0): ext3 filesystems not (yet) supported
>> 
>> It hapily defrags a filesystem with resize_inode. Is it destroying
>> resize capability or directly destroying data?
>
> It is destroying the resize capability.  The primary issue here is
> that tools which manipulate the filesystem directly (e.g. e2fsprogs)
> have to understand ALL of the *COMPAT flags, and not just the INCOMPAT
> flags.
>
> Cheers, Andreas

Defrag should leave special inodes well enough alone (and I know it
does not, hence the ext3 incompatibiliy) and then it should preserve
all compat features.

Time for some more fixing.

MfG
        Goswin



From adilger at clusterfs.com  Tue Nov  7 17:30:51 2006
From: adilger at clusterfs.com (Andreas Dilger)
Date: Tue, 7 Nov 2006 10:30:51 -0700
Subject: e2defrag - Unable to allocate buffer for inode priorities
In-Reply-To: <87bqnknd1q.fsf@informatik.uni-tuebingen.de>
References: <F97B964C23FAFC4990BEFE9DB220E18C013B5E77@msx.valhalla.local>
	<20061031171050.GG5655@schatzie.adilger.int>
	<20061031192947.GA12277@thunk.org>
	<87iri0ma8s.fsf@informatik.uni-tuebingen.de>
	<20061107000602.GE6012@schatzie.adilger.int>
	<87bqnknd1q.fsf@informatik.uni-tuebingen.de>
Message-ID: <20061107173051.GH6012@schatzie.adilger.int>

On Nov 07, 2006  04:36 +0100, Goswin von Brederlow wrote:
> Andreas Dilger <adilger at clusterfs.com> writes:
> > The primary issue here is
> > that tools which manipulate the filesystem directly (e.g. e2fsprogs)
> > have to understand ALL of the *COMPAT flags, and not just the INCOMPAT
> > flags.
> 
> Defrag should leave special inodes well enough alone (and I know it
> does not, hence the ext3 incompatibiliy) and then it should preserve
> all compat features.

Compat features are not always related to special inodes.  For example,
the COMPAT_DIR_INDEX feature is for the directory indexing and has
nothing to do with special inodes.  In this case, defrag shouldn't break
the indexing, but it depends on each specific feature.  Hence my assertion
that defrag needs to understand EVERY feature flag in the filesystem
before it touches the filesystem.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.



From raghuni at cossindia.org  Thu Nov  9 06:44:34 2006
From: raghuni at cossindia.org (Raghu Ni)
Date: Thu, 9 Nov 2006 12:14:34 +0530
Subject: How to create a huge file system - 3-4TB?
Message-ID: <6b16facb0611082244w2d355f3ka9a55e9ea31471c0@mail.gmail.com>

 We have a server with about 6x750Gb SATA drives setup on a hardware RAID
controller. We created hardware RAID 5 on these 6x750GB HDDs. The effective
size after RAID 5 implementation is 3.4TB. This server we want to use it as
a data backup server.

Here is the problem we are stuck with, when we use fdisk -l, we can see the
drive specs and its size as 3.4TB. But when we want to create two different
partitions of 1.7TB each, then we get the error "out of range" while
specifying cylinders.

And if we go for one single partition of 3.4TB, mke2fs returns error when we
format the partition for ext3 file system and after some specific duration
it exits with a error "Inodes not found... " similar errors.

Any help / suggesstions / ideas to get around this problem are highly
appreciated.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20061109/fbacac71/attachment.htm>

From adilger at clusterfs.com  Thu Nov  9 08:24:19 2006
From: adilger at clusterfs.com (Andreas Dilger)
Date: Thu, 9 Nov 2006 01:24:19 -0700
Subject: How to create a huge file system - 3-4TB?
In-Reply-To: <6b16facb0611082244w2d355f3ka9a55e9ea31471c0@mail.gmail.com>
References: <6b16facb0611082244w2d355f3ka9a55e9ea31471c0@mail.gmail.com>
Message-ID: <20061109082419.GC6012@schatzie.adilger.int>

On Nov 09, 2006  12:14 +0530, Raghu Ni wrote:
> We have a server with about 6x750Gb SATA drives setup on a hardware RAID
> controller. We created hardware RAID 5 on these 6x750GB HDDs. The effective
> size after RAID 5 implementation is 3.4TB. This server we want to use it as
> a data backup server.
> 
> Here is the problem we are stuck with, when we use fdisk -l, we can see the
> drive specs and its size as 3.4TB. But when we want to create two different
> partitions of 1.7TB each, then we get the error "out of range" while
> specifying cylinders.
> 
> And if we go for one single partition of 3.4TB, mke2fs returns error when we
> format the partition for ext3 file system and after some specific duration
> it exits with a error "Inodes not found... " similar errors.

Don't use a partition at all.  Just make the filesystem directly on the whole
device (e.g. mke2fs /dev/sda).

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.



From raghuni at cossindia.org  Thu Nov  9 09:12:00 2006
From: raghuni at cossindia.org (Raghu Ni)
Date: Thu, 9 Nov 2006 14:42:00 +0530
Subject: How to create a huge file system - 3-4TB?
In-Reply-To: <20061109082419.GC6012@schatzie.adilger.int>
References: <6b16facb0611082244w2d355f3ka9a55e9ea31471c0@mail.gmail.com>
	<20061109082419.GC6012@schatzie.adilger.int>
Message-ID: <6b16facb0611090112n3a2d59edl9177833953def72b@mail.gmail.com>

Can also use this technique for md device ?

On 11/9/06, Andreas Dilger <adilger at clusterfs.com> wrote:
>
> On Nov 09, 2006  12:14 +0530, Raghu Ni wrote:
> > We have a server with about 6x750Gb SATA drives setup on a hardware RAID
> > controller. We created hardware RAID 5 on these 6x750GB HDDs. The
> effective
> > size after RAID 5 implementation is 3.4TB. This server we want to use it
> as
> > a data backup server.
> >
> > Here is the problem we are stuck with, when we use fdisk -l, we can see
> the
> > drive specs and its size as 3.4TB. But when we want to create two
> different
> > partitions of 1.7TB each, then we get the error "out of range" while
> > specifying cylinders.
> >
> > And if we go for one single partition of 3.4TB, mke2fs returns error
> when we
> > format the partition for ext3 file system and after some specific
> duration
> > it exits with a error "Inodes not found... " similar errors.
>
> Don't use a partition at all.  Just make the filesystem directly on the
> whole
> device (e.g. mke2fs /dev/sda).
>
> Cheers, Andreas
> --
> Andreas Dilger
> Principal Software Engineer
> Cluster File Systems, Inc.
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20061109/54e25b2b/attachment.htm>

From jlb17 at duke.edu  Thu Nov  9 13:03:03 2006
From: jlb17 at duke.edu (Joshua Baker-LePain)
Date: Thu, 9 Nov 2006 08:03:03 -0500 (EST)
Subject: How to create a huge file system - 3-4TB?
In-Reply-To: <6b16facb0611082244w2d355f3ka9a55e9ea31471c0@mail.gmail.com>
References: <6b16facb0611082244w2d355f3ka9a55e9ea31471c0@mail.gmail.com>
Message-ID: <Pine.LNX.4.62.0611090801260.32074@chaos.egr.duke.edu>

On Thu, 9 Nov 2006 at 12:14pm, Raghu Ni wrote

> Here is the problem we are stuck with, when we use fdisk -l, we can see the
> drive specs and its size as 3.4TB. But when we want to create two different
> partitions of 1.7TB each, then we get the error "out of range" while
> specifying cylinders.
>
> And if we go for one single partition of 3.4TB, mke2fs returns error when we
> format the partition for ext3 file system and after some specific duration
> it exits with a error "Inodes not found... " similar errors.
>
> Any help / suggesstions / ideas to get around this problem are highly
> appreciated.

fdisk can't handle devices larger than 2TiB.  If you really want to use 
partitions, use parted and create a gpt disklabel (the standard msdos 
won't work either).  Note that you won't be able to boot from this disk.

-- 
Joshua Baker-LePain
Department of Biomedical Engineering
Duke University



From raghuni at cossindia.org  Fri Nov 10 06:07:28 2006
From: raghuni at cossindia.org (Raghu Ni)
Date: Fri, 10 Nov 2006 11:37:28 +0530
Subject: How to create a huge file system - 3-4TB?
In-Reply-To: <Pine.LNX.4.62.0611090801260.32074@chaos.egr.duke.edu>
References: <6b16facb0611082244w2d355f3ka9a55e9ea31471c0@mail.gmail.com>
	<Pine.LNX.4.62.0611090801260.32074@chaos.egr.duke.edu>
Message-ID: <6b16facb0611092207p2f090f1ch4d34465ac9172e14@mail.gmail.com>

Thaks for all your inputs.. We tried with parted and we are success in
createing two 1.7 TB partitions.

RaghuNi
On 11/9/06, Joshua Baker-LePain <jlb17 at duke.edu> wrote:
>
> On Thu, 9 Nov 2006 at 12:14pm, Raghu Ni wrote
>
> > Here is the problem we are stuck with, when we use fdisk -l, we can see
> the
> > drive specs and its size as 3.4TB. But when we want to create two
> different
> > partitions of 1.7TB each, then we get the error "out of range" while
> > specifying cylinders.
> >
> > And if we go for one single partition of 3.4TB, mke2fs returns error
> when we
> > format the partition for ext3 file system and after some specific
> duration
> > it exits with a error "Inodes not found... " similar errors.
> >
> > Any help / suggesstions / ideas to get around this problem are highly
> > appreciated.
>
> fdisk can't handle devices larger than 2TiB.  If you really want to use
> partitions, use parted and create a gpt disklabel (the standard msdos
> won't work either).  Note that you won't be able to boot from this disk.
>
> --
> Joshua Baker-LePain
> Department of Biomedical Engineering
> Duke University
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20061110/41ff9336/attachment.htm>

From scinteeb at yahoo.com  Thu Nov 16 01:52:50 2006
From: scinteeb at yahoo.com (Bogdan Scintee)
Date: Wed, 15 Nov 2006 17:52:50 -0800 (PST)
Subject: ext3 corrupted
Message-ID: <20061116015250.76652.qmail@web56503.mail.re3.yahoo.com>



   Hi there,

 For years I've been using the ext3 file system without to think that it can ever gets broken so bad.
This was until last week when a box that I have running Linux from a SanDisk CF went down.
Since then I am struggling with this CF trying to understand what is happening.

 The CF is SanDisk ultra II 1GB.
  On this I have 4 partitions all of them with ext3:
   boot
   /
   swap
   data

 On "data" partition I am doing a kind of logging. On power failure the box went down and never came
back.
  The problem is of course the data partition. 

running e2fsck /dev/sdc8 returns:

e2fsck 1.38 (30-Jun-2005)
/dev/sdc8: Attempt to read block from filesystem resulted in short read while reading block 275
/dev/sdc8: Attempt to read block from filesystem resulted in short read reading journal superblock
e2fsck: Attempt to read block from file-system resulted in short read while checking ext3 journal for /dev/sdc8


which looks like being the result of the write access on power failure.

Then I did:

mke2fs -n /dev/sdc8:

mke2fs 1.38 (30-Jun-2005)
Filesystem label=
OS type: Linux
Block size=1024 (log=0)
Fragment size=1024 (log=0)
108544 inodes, 433940 blocks
21697 blocks (5.00%) reserved for the super user
First data block=1
53 block groups
8192 blocks per group, 8192 fragments per group
2048 inodes per group
Superblock backups stored on blocks:
        8193, 24577, 40961, 57345, 73729, 204801, 221185, 401409


I performed than the fsck.ext3 using the backup superblocks (e2fsck -c -b <superblock_bckp> /dev/sdc8) but I got constantly the result:

e2fsck 1.38 (30-Jun-2005)
/dev/sdc8: Attempt to read block from filesystem resulted in short read while reading block 275
/dev/sdc8: Attempt to read block from filesystem resulted in short read reading journal superblock
e2fsck: Attempt to read block from filesystem resulted in short read while checking ext3 journal for /dev/sdc8


The disappointing thing is that a windows application (Nucleus Kernel Linux) shows the file system
and recover completely the partition in very short time.

 There is any suggestion about what should I do?

 Best regards,
 Bogdan.





__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20061115/e50b3ba4/attachment.htm>

From lists at nerdbynature.de  Thu Nov 16 07:45:27 2006
From: lists at nerdbynature.de (Christian Kujau)
Date: Thu, 16 Nov 2006 07:45:27 +0000 (GMT)
Subject: ext3 corrupted
In-Reply-To: <20061116015250.76652.qmail@web56503.mail.re3.yahoo.com>
References: <20061116015250.76652.qmail@web56503.mail.re3.yahoo.com>
Message-ID: <Pine.LNX.4.64.0611160739130.378@sheep.housecafe.de>

On Wed, 15 Nov 2006, Bogdan Scintee wrote:
> e2fsck 1.38 (30-Jun-2005)

If you can, please upgrade to a current version of e2fsprogs:
http://sourceforge.net/project/showfiles.php?group_id=2406

> /dev/sdc8: Attempt to read block from filesystem resulted in short read while reading block 275
> /dev/sdc8: Attempt to read block from filesystem resulted in short read reading journal superblock
> e2fsck: Attempt to read block from file-system resulted in short read while checking ext3 journal for /dev/sdc8

Please check your system logfiles/dmesg for disk/IO errors.

If there are any, your best bet is to save the raw data to a save place 
(with "dd" or better "dd_rescue") and try e2fsck again on the saved 
image, eg.:

# dd_rescue if=/dev/your_CF_disk of=~/not-on-your-CF-disk/CF.img
# e2fsck ~/not-on-your-CF-disk/CF.img

...and see what it gives.

Christian.
-- 
BOFH excuse #413:

Cow-tippers tipped a cow onto the server.



From lists at nerdbynature.de  Thu Nov 16 19:59:46 2006
From: lists at nerdbynature.de (Christian Kujau)
Date: Thu, 16 Nov 2006 19:59:46 +0000 (GMT)
Subject: ext3 corrupted
In-Reply-To: <20061116191824.4948.qmail@web56503.mail.re3.yahoo.com>
References: <20061116191824.4948.qmail@web56503.mail.re3.yahoo.com>
Message-ID: <Pine.LNX.4.64.0611161939390.16316@sheep.housecafe.de>


[ please reply on-list, so that others can help too ]


On Thu, 16 Nov 2006, Bogdan Scintee wrote:
> Buffer I/O error on device sdc8, logical block 512
> sd 4:0:0:0: SCSI error: return code = 0x8000002
> sdc: Current: sense key: Medium Error
>    Additional sense: Unrecovered read error
> end_request: I/O error, dev sdc, sector 1134514

So, it really is the device generating the errors, the filesystem can't 
do much here :(

> I got finally a img file. Running the e2fsck on this image I got
>
> e2fsprogs-1.39/e2fsck/e2fsck -v -d sdc8.img

Very good, but I forgot one thing: if you have enough space, make a 
backup of sdc8.img, then try e2fsck. If e2fsck screws up, you still have 
the backup, even if the device (your CF card) fails completly.

> e2fsck 1.39 (29-May-2006)
> Superblock has an invalid ext3 journal (inode 8).
> Clear<y>? yes
> *** ext3 journal has been deleted - filesystem is now ext2 only ***
> Superblock doesn't have has_journal flag, but has ext3 journal inode.
> Clear<y>? yes
> sdc8.img was not cleanly unmounted, check forced.
> Pass 1: Checking inodes, blocks, and sizes
> Journal inode is not in use, but contains data.  Clear<y>? yes
> Pass 2: Checking directory structure
> Directory inode 2, block 0, offset 0: directory corrupted
> Salvage<y>? yes
> Missing '.' in directory inode 2.
> Fix<y>? yes
> Setting filetype for entry '.' in ??? (2) to 2.
> Missing '..' in directory inode 2.
> Fix<y>? yes
> Setting filetype for entry '..' in ??? (2) to 2.
> Pass 3: Checking directory connectivity
> '..' in / (2) is <The NULL inode> (0), should be / (2).
> Fix<y>? yes
> Unconnected directory inode 4097 (/???)
> Connect to /lost+found<y>? yes
> /lost+found not found.  Create<y>? yes
> Unconnected directory inode 61441 (/???)
> Connect to /lost+found<y>? yes
> Unconnected directory inode 8193 (/???)
> Connect to /lost+found<y>? yes
> Unconnected directory inode 28673 (/???)
> Connect to /lost+found<y>? yes
> Unconnected directory inode 30721 (/???)
> Connect to /lost+found<y>? yes
> Unconnected directory inode 57345 (/???)
> Connect to /lost+found<y>? yes
> Pass 4: Checking reference counts
> Inode 4097 ref count is 3, should be 2.  Fix<y>? yes
> Inode 8193 ref count is 5, should be 4.  Fix<y>? yes
> Inode 28673 ref count is 3, should be 2.  Fix<y>? yes
> Inode 30721 ref count is 7, should be 6.  Fix<y>? yes
> Inode 57345 ref count is 3, should be 2.  Fix<y>? yes
> Inode 61441 ref count is 9, should be 8.  Fix<y>? yes
> Pass 5: Checking group summary information
> Block bitmap differences:  -(276--8192) -(8454--8761)
> Fix<y>? yes
> Free blocks count wrong for group #0 (65535, counted=7917).
> Fix<y>? yes
> Free blocks count wrong for group #1 (4763, counted=5071).
> Fix<y>? yes
> Free blocks count wrong (285521, counted=293747).
> Fix<y>? yes
>
> sdc8.img: ***** FILE SYSTEM WAS MODIFIED *****
>    1051 inodes used (0%)
>     125 non-contiguous inodes (11.9%)
>         # of inodes with ind/dind/tind blocks: 405/68/0
>  140196 blocks used (32%)
>       0 bad blocks
>       0 large files
>     967 regular files
>      74 directories
>       0 character device files
>       0 block device files
>       0 fifos
>      -6 links
>       0 symbolic links (0 fast symbolic links)
>       0 sockets
> --------
>    1035 files
>
>  I tried different options during the recovery of the image but
>  unfortunately the result wasn't
> as expected. At the end part of the files where recovered but the
> parent folders names where replaced by
> inode number (I guess) and attached to lost+found directory.
>
>  I tried different options during the recovery of the image but unfortunately the result wasn't
> as expected. At the end part of the files where recovered but the parent folders names where replaced by
> inode number (I guess) and attached to lost+found directory.

Yes, when you had a lot of small files, even 1 GB of data in lost+found 
is a pain to reconstruct :(

> As I said in my previous e-mail the windoze application (Nucleus Kernel Linux) is very fast and seems to recover
> more information from CF.

I too came across some win32 tools to recover b0rked filesystems: one is 
called "R-Linux", the other one "Stellar Phoenix Linux", demos 
available.

> I have a question now: The journal is kept only on the primary superblock or it has also copies on every alternative
> superblock?

I don't know...

> My feeling is that the CF got a badblock exactly on the journal and the e2fsck can't correct the information, therefore
> can't complete the job.
>
>  Do you have any knowledge about a application which is able to handle such situation?

Dunno, maybe somebody else will have a look on this...

Christian.
-- 
BOFH excuse #453:

Spider infestation in warm case parts



From oliver.hookins at anchor.com.au  Thu Nov 16 01:42:09 2006
From: oliver.hookins at anchor.com.au (Oliver Hookins)
Date: Thu, 16 Nov 2006 12:42:09 +1100
Subject: Online filesystem check
Message-ID: <20061116014209.GA10244@captain.bridge.anchor.net.au>

Hi all,

a while ago one of my work colleagues attended a seminar with Ted Tso, and
at the time Ted was talking about a way to perform online ext3 filesystem
checks. I can't find any information on it by googling, is it implemented in
current versions of the filesystem utilities? Either way, does anyone know
where I can find more information on it?

-- 
Regards,
Oliver Hookins
Anchor Systems



From oliver.hookins at anchor.com.au  Mon Nov 20 06:51:41 2006
From: oliver.hookins at anchor.com.au (Oliver Hookins)
Date: Mon, 20 Nov 2006 17:51:41 +1100
Subject: Online filesystem check
In-Reply-To: <d15adc960611192105n74fdb083w1a469645989fed8e@mail.gmail.com>
References: <20061116014209.GA10244@captain.bridge.anchor.net.au>
	<d15adc960611192105n74fdb083w1a469645989fed8e@mail.gmail.com>
Message-ID: <20061120065141.GA5612@captain.bridge.anchor.net.au>

On Mon Nov 20, 2006 at 10:35:09 +0530, Saswat Praharaj wrote:
>If you are talking about checking filesystem from the web , then it should
>be straight forward.
>You need to write a simple web server (free C source should be available in
>the net).
>
>On "check file system" event, just invoke e2fsck and return the status to
>browser.
>You need to write an IPC to communicate between your webserver and e2fsck.
>
>Best,
>-Saswat

Sorry maybe you misunderstand. By "online" I am referring to "while the ext3
filesystem is mounted r/w".

>
>On 11/16/06, Oliver Hookins <oliver.hookins at anchor.com.au> wrote:
>>
>>Hi all,
>>
>>a while ago one of my work colleagues attended a seminar with Ted Tso, and
>>at the time Ted was talking about a way to perform online ext3 filesystem
>>checks. I can't find any information on it by googling, is it implemented
>>in
>>current versions of the filesystem utilities? Either way, does anyone know
>>where I can find more information on it?
>>
>>--
>>Regards,
>>Oliver Hookins
>>Anchor Systems
>>
>>_______________________________________________
>>Ext3-users mailing list
>>Ext3-users at redhat.com
>>https://www.redhat.com/mailman/listinfo/ext3-users
>>

-- 
Regards,
Oliver Hookins
Anchor Systems



From lists at nerdbynature.de  Tue Nov 21 19:09:41 2006
From: lists at nerdbynature.de (Christian Kujau)
Date: Tue, 21 Nov 2006 19:09:41 +0000 (GMT)
Subject: 2.6.19-rc5-git4 benchmarks
Message-ID: <Pine.LNX.4.64.0611211802570.6933@sheep.housecafe.de>

Apologies for the wide alias, but as it may interest serveral
fs groups, here it is:

In the everlasting search for the best fs for my shiny new disks, I
was interested in some numbers, here're the results:

http://nerdbynature.de/bench/amd64/2.6.19-rc5-git4/test-3/dm-crypt-3.html

details: http://nerdbynature.de/wp/?cat=4

(in short: ext3 pretty fast in all operations. then again, the numbers 
suggest that sometimes a crypto-fs is faster than withou crypto, eg.
'ext3_no-cipher' vs. 'ext3_aes-cbc-essiv:md5'...that's strange, no?)

Thanks,
Christian.
-- 
BOFH excuse #11:

magnetic interference from money/credit cards



From lists at nerdbynature.de  Tue Nov 21 21:31:22 2006
From: lists at nerdbynature.de (Christian Kujau)
Date: Tue, 21 Nov 2006 21:31:22 +0000 (GMT)
Subject: Online filesystem check
In-Reply-To: <20061116014209.GA10244@captain.bridge.anchor.net.au>
References: <20061116014209.GA10244@captain.bridge.anchor.net.au>
Message-ID: <Pine.LNX.4.64.0611212127090.6933@sheep.housecafe.de>

On Thu, 16 Nov 2006, Oliver Hookins wrote:
> checks. I can't find any information on it by googling, is it implemented in
> current versions of the filesystem utilities?

No, current version of e2fsprogs do not support online fsck. The most 
"online" you can get is trying to remount ro, then fsck the ro device.

> Either way, does anyone know where I can find more information on it?

I'm not aware that this was a planned feature of ext2/3 fs. I've heard 
about the *BSD guys trying to get this done for UFS, but it's not 
implemented there either, AFAIK.

Not being an fs guru, online fsck really sounds difficult and I'm not 
sure if it's worth the battle....

Christian.
-- 
BOFH excuse #72:

Satan did it



From lists at nerdbynature.de  Thu Nov 23 00:58:18 2006
From: lists at nerdbynature.de (Christian Kujau)
Date: Thu, 23 Nov 2006 00:58:18 +0000 (GMT)
Subject: BUG: warning at kernel/softirq.c:141
Message-ID: <Pine.LNX.4.64.0611230009120.10944@sheep.housecafe.de>

Hello ext3-users,

we have an oopsy situation here:

we have 4 machines: 3 client nodes, 1 master: the master holds a fairly 
big repository of small files. The repo's current size is ~40GB 
with ~1.2 M files in ~100 directories. Now, we like to rsync changes 
from the master to the client nodes, which is working perfectly for 2 
nodes, but our 3rd node oopses "sometimes", rendering the machine 
unusable and we are forced to reboot the box (no serial console, no sysrq 
possible). Below is the oops and a few details, more details, .config, 
dmesg, tune2fs-l are here:

http://nerdbynature.de/bits/2.6.18-debian

Yes, it's a debian kernel, 2.6.18-2-k7 to be specific, it happend with 
2.6.17-2-k7 too. We haven't tried vanilla yet. All boxen are the same 
hardware (amd64, 32bit kernel+userland (debian/unstable) 1GB ram).
The filesystem is residing on a raid0-md, consisting of 2 sata-disks.

Any ideas what could cause this?

Thanks,
Christian.


f8836d37
Modules linked in: ipt_TCPMSS xt_tcpudp xt_state iptable_filter 
ip_conntrack_ftp ip_conntrack_irc ip_conntrack nfnetlink ip_tables 
x_tables ipv6 ipip tunnel4 dm_snapshot dm_mirror dm_mod shpchp 
pci_hotplug i2c_viapro psmouse i2c_core serio_raw pcspkr evdev amd64_agp 
agpgart parport_pc parport rtc floppy ide_generic r8169 uhci_hcd 
ehci_hcd usbcore thermal processor fan raid0 raid1 md_mod sata_via 
sd_mod libata scsi_mod via82cxxx ide_core ext3 jbd mbcache
  EIP:    0060:[<f8836d37>]    Not tainted VLI
  EFLAGS: 00010283   (2.6.18-2-k7 #1)
   [<f8832d2f>] journal_try_to_free_buffers+0x59/0x13a [jbd]
   [<f8868318>] ext3_releasepage+0x0/0x61 [ext3]
   [<c015bd3f>] try_to_release_page+0x34/0x46
   [<c0148e23>] shrink_inactive_list+0x44b/0x71c
   [<c01050ea>] do_IRQ+0x48/0x52
   [<c0103692>] common_interrupt+0x1a/0x20
   [<c016e3cb>] dput+0x1a/0x119
   [<c016e551>] prune_one_dentry+0x68/0x74
   [<f881b2ea>] mb_cache_shrink_fn+0x1d/0xb5 [mbcache]
   [<c01491a3>] shrink_zone+0xaf/0xd0
   [<c014962c>] kswapd+0x295/0x399
   [<c012dbb1>] autoremove_wake_function+0x0/0x2d
   [<c0149397>] kswapd+0x0/0x399
   [<c012dae3>] kthread+0xc2/0xef
   [<c012da21>] kthread+0x0/0xef
   [<c0101005>] kernel_thread_helper+0x5/0xb

and for 2.6.17-2-k7:

  f0872d73
  Modules linked in: ipv6 ipip tunnel4 dm_snapshot dm_mirror dm_mod 
shpchp pci_hotplug floppy i2c_viapro parport_pc i2c_core psmouse parport 
8250_pnp serio_raw evdev amd64_agp agpgart pcspkr rtc raid10 raid6 raid5 
xor multipath linear ide_generic r8169 uhci_hcd ehci_hcd usbcore thermal 
processor fan raid0 raid1 md_mod sata_via sd_mod libata scsi_mod 
via82cxxx ide_core ext3 jbd mbcache
  EIP:    0060:[<f0872d73>]    Not tainted VLI
  EFLAGS: 00210246   (2.6.17-2-k7 #1)
   BUG: warning at kernel/softirq.c:141/local_bh_enable()
   <b0120e27> local_bh_enable+0x25/0x64  <b0218215> lock_sock+0x85/0x8d
   <b021604b> sock_fasync+0x5c/0x111  <b0216fa1> sock_close+0x1e/0x2a
   <b0153fcd> __fput+0x87/0x13c  <b0151b57> filp_close+0x4e/0x54
   <b011e0a1> put_files_struct+0x64/0xa6  <b011f006> do_exit+0x1b0/0x6be
   <b0114e08> bust_spinlocks+0x3a/0x43  <b0103ee6> die+0x1d3/0x288
   <b0103f76> die+0x263/0x288  <b011533a> do_page_fault+0x441/0x526
   <b0114ef9> do_page_fault+0x0/0x526  <b01036f7> error_code+0x4f/0x54
   <f0872d73> ext3_xattr_delete_inode+0x5/0xab [ext3]  <f0865bcb> 
ext3_free_inode+0x92/0x2c7 [ext3]
   <f08666ee> ext3_mark_inode_dirty+0x20/0x27 [ext3]  <f08687dd> 
ext3_delete_inode+0xa3/0xba [ext3]
   <f086873a> ext3_delete_inode+0x0/0xba [ext3]  <b016809c> 
generic_delete_inode+0x9e/0x101
   <b0167b87> iput+0x5e/0x60  <b0166d66> dput+0xfe/0x116
   <b0160a4f> sys_renameat+0x15f/0x1b9  <b01b04d2> 
_atomic_dec_and_lock+0x2a/0x44
   <b0160aba> sys_rename+0x11/0x15  <b0102af3> 
sysenter_past_esp+0x54/0x75
  BUG: warning at kernel/softirq.c:141/local_bh_enable()
   <b0120e27> local_bh_enable+0x25/0x64  <b02160f4> 
sock_fasync+0x105/0x111
   <b0216fa1> sock_close+0x1e/0x2a  <b0153fcd> __fput+0x87/0x13c
   <b0151b57> filp_close+0x4e/0x54  <b011e0a1> put_files_struct+0x64/0xa6
   <b011f006> do_exit+0x1b0/0x6be  <b0114e08> bust_spinlocks+0x3a/0x43
   <b0103ee6> die+0x1d3/0x288  <b0103f76> die+0x263/0x288
   <b011533a> do_page_fault+0x441/0x526  <b0114ef9> 
do_page_fault+0x0/0x526
   <b01036f7> error_code+0x4f/0x54  <f0872d73> 
ext3_xattr_delete_inode+0x5/0xab [ext3]
   <f0865bcb> ext3_free_inode+0x92/0x2c7 [ext3]  <f08666ee> 
ext3_mark_inode_dirty+0x20/0x27 [ext3]
   <f08687dd> ext3_delete_inode+0xa3/0xba [ext3]  <f086873a> 
ext3_delete_inode+0x0/0xba [ext3]
   <b016809c> generic_delete_inode+0x9e/0x101  <b0167b87> iput+0x5e/0x60
   <b0166d66> dput+0xfe/0x116  <b0160a4f> sys_renameat+0x15f/0x1b9
   <b01b04d2> _atomic_dec_and_lock+0x2a/0x44  <b0160aba> 
sys_rename+0x11/0x15
   <b0102af3> sysenter_past_esp+0x54/0x75
  BUG: warning at kernel/softirq.c:141/local_bh_enable()
   <b0120e27> local_bh_enable+0x25/0x64  <b0269ca6> 
unix_release_sock+0x5c/0x1bf
   <b0216ce8> sock_release+0x11/0x85  <b0216fa9> sock_close+0x26/0x2a
   <b0153fcd> __fput+0x87/0x13c  <b0151b57> filp_close+0x4e/0x54
   <b011e0a1> put_files_struct+0x64/0xa6  <b011f006> do_exit+0x1b0/0x6be
   <b0114e08> bust_spinlocks+0x3a/0x43  <b0103ee6> die+0x1d3/0x288
   <b0103f76> die+0x263/0x288  <b011533a> do_page_fault+0x441/0x526
   <b0114ef9> do_page_fault+0x0/0x526  <b01036f7> error_code+0x4f/0x54
   <f0872d73> ext3_xattr_delete_inode+0x5/0xab [ext3]  <f0865bcb> 
ext3_free_inode+0x92/0x2c7 [ext3]
   <f08666ee> ext3_mark_inode_dirty+0x20/0x27 [ext3]  <f08687dd> 
ext3_delete_inode+0xa3/0xba [ext3]
   <f086873a> ext3_delete_inode+0x0/0xba [ext3]  <b016809c> 
generic_delete_inode+0x9e/0x101
   <b0167b87> iput+0x5e/0x60  <b0166d66> dput+0xfe/0x116
   <b0160a4f> sys_renameat+0x15f/0x1b9  <b01b04d2> 
_atomic_dec_and_lock+0x2a/0x44
   <b0160aba> sys_rename+0x11/0x15  <b0102af3> 
sysenter_past_esp+0x54/0x75


-- 
BOFH excuse #324:

Your packets were eaten by the terminator



From coywolf at sosdg.org  Thu Nov 23 09:06:08 2006
From: coywolf at sosdg.org (Coywolf Qi Hunt)
Date: Thu, 23 Nov 2006 04:06:08 -0500
Subject: how does ext3 handle no communication  to storage
In-Reply-To: <20060829170351.GA30599@thunk.org>
References: <44F33E3A.8020805@bnl.gov> <20060828205822.GB4944@thunk.org>
	<44F37285.8000104@bnl.gov>
	<20060829082003.GM20105@schatzie.adilger.int>
	<44F458AF.7040506@bnl.gov> <20060829170351.GA30599@thunk.org>
Message-ID: <20061123090608.GA17728@everest.sosdg.org>

On Tue, Aug 29, 2006 at 01:03:51PM -0400, Theodore Tso wrote:
> On Tue, Aug 29, 2006 at 11:09:35AM -0400, Sev Binello wrote:
> > From a strictly practical and immediate stand point,
> > what is the best way to handle this situation if it should occur again in 
> > the near future ?
> 
> Without any kernel patches, the best thing to do is, (a) don't restore
> the path to the device, (b) unmount the filesystem, (c) Compile the
> enclosed flushb program (also found in the e2fsprogs sources, but not
> compiled by most or all distributions), and run it: "flushb
> /dev/hdXX", and only after completing all of these steps, you can
> restore the path and do fsck of the filesystem if you are feeling
> sufficiently paranoid, and then remount it.

option (d) run blockdev --flushbufs /dev/hdXX

Ted, you may drop flushb. blockdev from util-linux can do it.

 - coywolf

> 
> I wish I could offer you something better, but that's what we have at
> the moment.
> 
> 						- Ted
> 
> /*
>  * flushb.c --- This routine flushes the disk buffers for a disk
>  *
>  * Copyright 1997, 2000, by Theodore Ts'o.
>  * 
>  * WARNING: use of flushb on some older 2.2 kernels on a heavily loaded
>  * system will corrupt filesystems.  This program is not really useful
>  * beyond for benchmarking scripts.
>  *
>  * %Begin-Header%
>  * This file may be redistributed under the terms of the GNU Public
>  * License.
>  * %End-Header%
>  */
> 
> #include <stdio.h>
> #include <string.h>
> #include <unistd.h>
> #include <stdlib.h>
> #include <fcntl.h>
> #include <sys/ioctl.h>
> #include <sys/mount.h>
> 
> /* For Linux, define BLKFLSBUF if necessary */
> #if (!defined(BLKFLSBUF) && defined(__linux__))
> #define BLKFLSBUF	_IO(0x12,97)	/* flush buffer cache */
> #endif
> 
> const char *progname;
> 
> static void usage(void)
> {
> 	fprintf(stderr, "Usage: %s disk\n", progname);
> 	exit(1);
> }	
> 	
> int main(int argc, char **argv)
> {
> 	int	fd;
> 	
> 	progname = argv[0];
> 	if (argc != 2)
> 		usage();
> 
> 	fd = open(argv[1], O_RDONLY, 0);
> 	if (fd < 0) {
> 		perror("open");
> 		exit(1);
> 	}
> 	/*
> 	 * Note: to reread the partition table, use the ioctl
> 	 * BLKRRPART instead of BLKFSLBUF.
> 	 */
> 	if (ioctl(fd, BLKFLSBUF, 0) < 0) {
> 		perror("ioctl BLKFLSBUF");
> 		exit(1);
> 	}
> 	return 0;
> }

-- 
Coywolf Qi Hunt



From lists at nerdbynature.de  Sun Nov 26 01:58:19 2006
From: lists at nerdbynature.de (Christian Kujau)
Date: Sun, 26 Nov 2006 01:58:19 +0000 (GMT)
Subject: BUG: warning at kernel/softirq.c:141 [SOLVED]
In-Reply-To: <Pine.LNX.4.64.0611230009120.10944@sheep.housecafe.de>
References: <Pine.LNX.4.64.0611230009120.10944@sheep.housecafe.de>
Message-ID: <Pine.LNX.4.64.0611260154460.16392@sheep.housecafe.de>

On Thu, 23 Nov 2006, Christian Kujau wrote:
> in ~100 directories. Now, we like to rsync changes from the master to the 
> client nodes, which is working perfectly for 2 nodes, but our 3rd node oopses 
> "sometimes", rendering the machine unusable and we are forced to reboot the

running memtest86 did not reveal anything but the hosting company was 
kind enough to replace a few parts of the box and the oopses seem to 
be gone now. We suspect a faulty DIMM...

sorry for the noise,
Christian.
-- 
BOFH excuse #169:

broadcast packets on wrong frequency



From Ralf-Lists at ralfgross.de  Sun Nov 26 09:49:59 2006
From: Ralf-Lists at ralfgross.de (Ralf Gross)
Date: Sun, 26 Nov 2006 10:49:59 +0100 (CET)
Subject: ext3 4TB fs limit on amd64 (FAQ?)
Message-ID: <38749.85.220.132.176.1164534599.squirrel@www.stz-softwaretechnik.de>

Hi,

I've a question about the max. ext3 FS size. The ext3 FAQ explains that
the limit is 4TB.

http://batleth.sapienti-sat.org/projects/FAQs/ext3-faq.html

| Ext3 can support files up to 1TB. With a 2.4 kernel the filesystem size
is | limited by the maximal block device size, which is 2TB. In 2.6 the
maximum | (32-bit CPU) limit is of block devices is 16TB, but ext3
supports only up | to 4TB.

Other sources claim that the limit is 8TB (RedHat ES). I'm using ubuntu
dapper drake 6.06 with kernel 2.6.15. At the momente I'm running the amd64
port. I successfully created the ext3 FS on a 4,5TB lvm partition
(standard ext3 fs options) and was able to fill the whole fs with data.
Afterwards I checked the data with md5sum and did a fsck, everything seems
to be fine so far.

Was it just luck that I didn't see any data corruption? Can I use ext3 for
fs >4TB<8TB on amd64 these days? I also tried xfs, but unlike ext3 it
repeatable froze the system when I ran the tiobench benchmark.

Ralf





From witscher at kulturbeutel.org  Thu Nov  9 12:35:44 2006
From: witscher at kulturbeutel.org (witscher)
Date: Thu, 09 Nov 2006 12:35:44 -0000
Subject: Ext3  - which blocksize for small files?
Message-ID: <7257363.post@talk.nabble.com>


Hi,

I want to use an ext3 Partition (~1TB) for Mail Storage, this means tons of
small files.

Has anyone recommendations about blocksize, inodes, etc. for mkfs.ext3 ?

Thanks in advance,

David
-- 
View this message in context: http://www.nabble.com/Ext3----which-blocksize-for-small-files--tf2601442.html#a7257363
Sent from the Ext3 - User mailing list archive at Nabble.com.



From Ralf-Lists at ralfgross.de  Mon Nov 20 08:44:58 2006
From: Ralf-Lists at ralfgross.de (Ralf Gross)
Date: Mon, 20 Nov 2006 09:44:58 +0100 (CET)
Subject: ext3 4TB FS limit (FAQ)
Message-ID: <37856.141.113.101.32.1164012298.squirrel@www.stz-softwaretechnik.com>

Hi,

I've a question about the max. ext3 FS size. The ext3 FAQ explains that
the limit is 4TB.

http://batleth.sapienti-sat.org/projects/FAQs/ext3-faq.html

| Ext3 can support files up to 1TB. With a 2.4 kernel the filesystem size is
| limited by the maximal block device size, which is 2TB. In 2.6 the maximum
| (32-bit CPU) limit is of block devices is 16TB, but ext3 supports only up
| to 4TB.

Other sources claim that the limit is 8TB (RedHat ES). I'm using ubuntu
dapper drake 6.06 with kernel 2.6.15. At the momente I'm running the i386
port. I successfully created the ext3 FS on a 4,5TB lvm partition
(standard options) and was able to fill the whole FS with data. Afterwards
I checked the data with md5sum and everything seems to be fine so far.

Was it just luck that I didn't see any errors? Can one use ext3 for FS
>4TB<8TB these days? Would it make any difference using the amd64/x86-64
port (Xeon, Core 2 Duo CPU).

Ralf



From lists at nerdbynature.de  Tue Nov 28 01:34:38 2006
From: lists at nerdbynature.de (Christian Kujau)
Date: Tue, 28 Nov 2006 01:34:38 +0000 (GMT)
Subject: Ext3  - which blocksize for small files?
In-Reply-To: <7257363.post@talk.nabble.com>
References: <7257363.post@talk.nabble.com>
Message-ID: <Pine.LNX.4.64.0611280132020.19073@sheep.housecafe.de>

On Thu, 9 Nov 2006, witscher wrote:
> I want to use an ext3 Partition (~1TB) for Mail Storage, this means tons of
> small files.
> Has anyone recommendations about blocksize, inodes, etc. for mkfs.ext3 ?

from a recent mkfs.ext3 manpage:

-T fs-type
        Specify how the filesystem is going to be used, so that
        mke2fs can choose optimal filesystem parameters for that use.
        The filesystem types that are can be supported are defined in the
        configuration file /etc/mke2fs.conf(5).  The default
        configuration file contains definitions for the filesystem types:
        small, floppy, news, largefile, and largefile4.

and a /etc/mke2fs.conf on a debian system reveals:

[defaults]
         base_features = sparse_super,filetype,resize_inode,dir_index
         blocksize = 4096
         inode_ratio = 8192

[fs_types]
         small = {
                 blocksize = 1024
                 inode_ratio = 4096
         }
         floppy = {
                 blocksize = 1024
         }
         news = {
                 inode_ratio = 4096
         }
         largefile = {
                 inode_ratio = 1048576
         }
         largefile4 = {
                 inode_ratio = 4194304
         }


-- 
BOFH excuse #433:

error: one bad user found in front of screen



From lists at nerdbynature.de  Tue Nov 28 07:16:57 2006
From: lists at nerdbynature.de (Christian Kujau)
Date: Tue, 28 Nov 2006 07:16:57 +0000 (GMT)
Subject: ext3 4TB fs limit on amd64 (FAQ?)
In-Reply-To: <38749.85.220.132.176.1164534599.squirrel@www.stz-softwaretechnik.de>
References: <38749.85.220.132.176.1164534599.squirrel@www.stz-softwaretechnik.de>
Message-ID: <Pine.LNX.4.64.0611280652020.19073@sheep.housecafe.de>

On Sun, 26 Nov 2006, Ralf Gross wrote:
> I've a question about the max. ext3 FS size. The ext3 FAQ explains that
> the limit is 4TB.

Hm, strange: I'm pretty sure that mkfs.ext3 understands bigger 
blocksizes for quite a while now. Then again, the FAQ says 
"Version: 2004-10-14"...

So, although I'd really love to have this information (and the FAQ!) on
http://e2fsprogs.sf.net/  this is what I found:

blocksize	file size limit 	filesystem size limit
  1 KiB		16448 MiB (~ 16 GiB)	2048 GiB (= 2 TiB)
  2 KiB	 	256 GiB 		8192 GiB (= 8 TiB)
  4 KiB		2048 GiB (= 2 TiB) 	16384 GiB (= 16 TiB)
  8 KiB	 	65568 GiB (~ 64 TiB) 	32768 GiB (= 32 TiB)

Note that an 8 KiB blocksize is only supported on systems with 8 KiB 
pagesize (i.e. linux/alpha).

So, it really looks like 16TiB shouldn't be a problem...but I just 
stumbled over this:
https://www.redhat.com/archives/ext3-users/2006-October/msg00000.html

While the ext* gurus are busy on the ext4 list, I too would appreciate 
a comment on the current limitation of ext2/ext3/ext4, so that we can 
update the FAQ. These questions really come up way too often...

Thanks,
Christian.
-- 
BOFH excuse #387:

Your computer's union contract is set to expire at midnight.



From Ralf-Lists at ralfgross.de  Tue Nov 28 09:05:48 2006
From: Ralf-Lists at ralfgross.de (Ralf Gross)
Date: Tue, 28 Nov 2006 10:05:48 +0100 (CET)
Subject: ext3 4TB fs limit on amd64 (FAQ?)
In-Reply-To: <Pine.LNX.4.64.0611280652020.19073@sheep.housecafe.de>
References: <38749.85.220.132.176.1164534599.squirrel@www.stz-softwaretechnik.de>
	<Pine.LNX.4.64.0611280652020.19073@sheep.housecafe.de>
Message-ID: <4281.141.113.101.32.1164704748.squirrel@www.stz-softwaretechnik.com>

Christian Kujau said:
> On Sun, 26 Nov 2006, Ralf Gross wrote:
>> I've a question about the max. ext3 FS size. The ext3 FAQ explains that
>> the limit is 4TB.
>
> Hm, strange: I'm pretty sure that mkfs.ext3 understands bigger
> blocksizes for quite a while now. Then again, the FAQ says
> "Version: 2004-10-14"...
>
> So, although I'd really love to have this information (and the FAQ!) on
> http://e2fsprogs.sf.net/  this is what I found:
>
> blocksize	file size limit 	filesystem size limit
>   1 KiB		16448 MiB (~ 16 GiB)	2048 GiB (= 2 TiB)
>   2 KiB	 	256 GiB 		8192 GiB (= 8 TiB)
>   4 KiB		2048 GiB (= 2 TiB) 	16384 GiB (= 16 TiB)
>   8 KiB	 	65568 GiB (~ 64 TiB) 	32768 GiB (= 32 TiB)
>
> Note that an 8 KiB blocksize is only supported on systems with 8 KiB
> pagesize (i.e. linux/alpha).
>
> So, it really looks like 16TiB shouldn't be a problem...but I just
> stumbled over this:
> https://www.redhat.com/archives/ext3-users/2006-October/msg00000.html

Thus with 4 KiB blocksize and a < 8TiB fs I should be on the safe side.

> While the ext* gurus are busy on the ext4 list, I too would appreciate
> a comment on the current limitation of ext2/ext3/ext4, so that we can
> update the FAQ. These questions really come up way too often...

Yes, the FAQ is a bit misleading.

ralf



From itlistuser at rapideye.de  Tue Nov 28 14:05:51 2006
From: itlistuser at rapideye.de (Sebastian Reitenbach)
Date: Tue, 28 Nov 2006 14:05:51 -0000
Subject: how to prevent filesystem check
Message-ID: <20061128140551.A110E4087B2@ogo.rapideye.de>

Hi all,

I want to setup a RAID storage system, where i have two systems connected to 
it. the filesystems are mapped out to both connectors. I want the master host 
mount them read write, and the slave read only.

in my fstab on the slave I have a line like the following:
/dev/sdb1   /mount    ext3   acl,noauto,user_xattr,nosuid,ro    0 0

so in man 5 fstab, it is written, that when the 6. field is 0, no filesystem 
check will be done at mount time.

and in man mount, I read that, the nocheck parameter is the default, that 
means, that no filesystem checks should be performed when the partition is 
mounted.

but when I mount the filesystem on the slave, I see the following messages 
in /var/log/messages:
EXT3-fs: mounted filesystem with ordered data mode.
EXT3-fs: INFO: recovery required on readonly filesystem.
EXT3-fs: write access will be enabled during recovery.
(fs/jbd/recovery.c, 255): journal_recover: JBD: recovery, exit status 0, 
recovered transactions 99610 to 100072
(fs/jbd/recovery.c, 257): journal_recover: JBD: Replayed 12498 and revoked 
589/918 blocks
kjournald starting.  Commit interval 5 seconds
EXT3-fs: recovery complete.


since I test this, the master server had occassional problems with the 
filesystem, so he decided to mount these read-only, and I had to fsck it.
I think the filesystem got destroyed because of the filesystem ckecks, while 
mounting it readonly on the second server.

I googled around, and found a similar message from someone mounting a XFS file 
system. So I am not sure, whether this is a mount or a ext3 problem.

my kernel is a 2.6.12.6-bigsmp, on a SuSE 10.1.

Is there any other way to prevent the slave server from doing any filesystem 
checks?


kind regards
Sebastian



From tweeks at rackspace.com  Tue Nov 28 16:58:09 2006
From: tweeks at rackspace.com (Thomas Weeks)
Date: Tue, 28 Nov 2006 10:58:09 -0600
Subject: Best Practices for Data Recovery for corrupted EXT2/3?
Message-ID: <200611281058.09288.tweeks@rackspace.com>

Hey all..

I had a bad IDE controller that hosed my EXT3 filesystems.  A resulting fsck 
damaged pat of the filesystem and the root inode is gone (on my main drive 
AND the backup drive).   I immediately DD'd the main drive over to an 
identical drive that I have been working on.  But every time.. a fsck 
destroys all the data (moves everything to lost+found) and nothing that I've 
found is able to restore the dir structure... or allow me to superposition 
any of the subdirs (such as /home/*).

I've tried testdisk, dd_recover, and Autopsy.. mounting and fsking with 
alternate superblocks, all with no success. 

I would like to retain file names.. as I see that SOME filename/dir structure 
is intact when the fsck starts nuking all my files that don't have a parent 
dir (e.g. ../home/user/file1 --> lost+found).  Is there a way that this 
information can be salvaged?  Or a new fake root inode be slid into place and 
all the links associated?

My last ditch effort will be to allow the migration to lost+found and then try 
to copy off files based on UID/GID/date, but I would really like to retain 
file names. 

Any related into would be useful... but my hopes are not high.

Tweeks



From richard.c.wolber at boeing.com  Tue Nov 28 20:33:42 2006
From: richard.c.wolber at boeing.com (Wolber, Richard C)
Date: Tue, 28 Nov 2006 12:33:42 -0800
Subject: how to prevent filesystem check
In-Reply-To: <20061128140551.A110E4087B2@ogo.rapideye.de>
Message-ID: <8C7C41A176AC0B468BEFB2EFD9BDAB9902426708@XCH-NW-5V2.nw.nos.boeing.com>


Running the following command on your slave server should do the trick:

echo "AUTOFSCK_DEF_CHECK=\"no\"" >> /etc/sysconfig/autofsck

..Chuck..


> -----Original Message-----
> From: Sebastian Reitenbach [mailto:itlistuser at rapideye.de] 
> Sent: Tuesday, November 28, 2006 6:06 AM
> To: ext3-users at redhat.com
> Subject: how to prevent filesystem check
> 
> Hi all,
> 
> I want to setup a RAID storage system, where i have two 
> systems connected to it. the filesystems are mapped out to 
> both connectors. I want the master host mount them read 
> write, and the slave read only.
> 
> in my fstab on the slave I have a line like the following:
> /dev/sdb1   /mount    ext3   acl,noauto,user_xattr,nosuid,ro    0 0
> 
> so in man 5 fstab, it is written, that when the 6. field is 
> 0, no filesystem check will be done at mount time.
> 
> and in man mount, I read that, the nocheck parameter is the 
> default, that means, that no filesystem checks should be 
> performed when the partition is mounted.
> 
> but when I mount the filesystem on the slave, I see the 
> following messages in /var/log/messages:
> EXT3-fs: mounted filesystem with ordered data mode.
> EXT3-fs: INFO: recovery required on readonly filesystem.
> EXT3-fs: write access will be enabled during recovery.
> (fs/jbd/recovery.c, 255): journal_recover: JBD: recovery, 
> exit status 0, recovered transactions 99610 to 100072 
> (fs/jbd/recovery.c, 257): journal_recover: JBD: Replayed 
> 12498 and revoked
> 589/918 blocks
> kjournald starting.  Commit interval 5 seconds
> EXT3-fs: recovery complete.
> 
> 
> since I test this, the master server had occassional problems 
> with the filesystem, so he decided to mount these read-only, 
> and I had to fsck it.
> I think the filesystem got destroyed because of the 
> filesystem ckecks, while mounting it readonly on the second server.
> 
> I googled around, and found a similar message from someone 
> mounting a XFS file system. So I am not sure, whether this is 
> a mount or a ext3 problem.
> 
> my kernel is a 2.6.12.6-bigsmp, on a SuSE 10.1.
> 
> Is there any other way to prevent the slave server from doing 
> any filesystem checks?
> 
> 
> kind regards
> Sebastian
> 
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users
> 



From adilger at clusterfs.com  Wed Nov 29 05:20:26 2006
From: adilger at clusterfs.com (Andreas Dilger)
Date: Tue, 28 Nov 2006 21:20:26 -0800
Subject: how to prevent filesystem check
In-Reply-To: <8C7C41A176AC0B468BEFB2EFD9BDAB9902426708@XCH-NW-5V2.nw.nos.boeing.com>
References: <20061128140551.A110E4087B2@ogo.rapideye.de>
	<8C7C41A176AC0B468BEFB2EFD9BDAB9902426708@XCH-NW-5V2.nw.nos.boeing.com>
Message-ID: <20061129052026.GA6429@schatzie.adilger.int>

On Nov 28, 2006  12:33 -0800, Wolber, Richard C wrote:
> Running the following command on your slave server should do the trick:
> 
> echo "AUTOFSCK_DEF_CHECK=\"no\"" >> /etc/sysconfig/autofsck

This is incorrect.  As soon as the ext3 code mounts the filesystem
it will do journal recovery and potentially corrupt the filesystem.
Then, the read-only copy will become out-of-date in the cache of
that client and it will get bogus data back, eventually deciding
that the filesystem is corrupt (whether it is or not).

You should just mount the filesystem on the client via NFS, that's
what it's SUPPOSED to do.

This is a good reason for the multi-mount protection feature that I
proposed previously.  It would mark the filesystem as in-use on one 
node and the filesystem itself would refuse to mount on the second
node.  Unfortunately, this idea met resistance from some of the
other ext3 developers from merging it upstream.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.



From adilger at clusterfs.com  Tue Nov 28 21:35:56 2006
From: adilger at clusterfs.com (Andreas Dilger)
Date: Tue, 28 Nov 2006 13:35:56 -0800
Subject: how to prevent filesystem check
In-Reply-To: <20061128140551.A110E4087B2@ogo.rapideye.de>
References: <20061128140551.A110E4087B2@ogo.rapideye.de>
Message-ID: <20061128213556.GA5673@schatzie.adilger.int>

On Nov 28, 2006  14:05 -0000, Sebastian Reitenbach wrote:
> I want to setup a RAID storage system, where i have two systems connected to 
> it. the filesystems are mapped out to both connectors. I want the master host 
> mount them read write, and the slave read only.

This is NOT possible with ext2 or ext3 and can result in filesystem
corruption.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.



From pengchengzou at gmail.com  Wed Nov 29 05:58:58 2006
From: pengchengzou at gmail.com (Pengcheng Zou)
Date: Wed, 29 Nov 2006 13:58:58 +0800
Subject: how to prevent filesystem check
In-Reply-To: <20061128140551.A110E4087B2@ogo.rapideye.de>
References: <20061128140551.A110E4087B2@ogo.rapideye.de>
Message-ID: <24a313060611282158i593de443iac831ad668900ba2@mail.gmail.com>

use 'noload' option to mount the readonly ext3 filesystem on the slave
host, so the journal will not be loaded.

BTW, this kind of setting could have some cache-coherence problem. why
not do it correct way by using some kind of network filesystem (NFS)
or clustering filesystem (GFS,Lustre)?

On 11/28/06, Sebastian Reitenbach <itlistuser at rapideye.de> wrote:
> Hi all,
>
> I want to setup a RAID storage system, where i have two systems connected to
> it. the filesystems are mapped out to both connectors. I want the master host
> mount them read write, and the slave read only.
>
> in my fstab on the slave I have a line like the following:
> /dev/sdb1   /mount    ext3   acl,noauto,user_xattr,nosuid,ro    0 0
>
> so in man 5 fstab, it is written, that when the 6. field is 0, no filesystem
> check will be done at mount time.
>
> and in man mount, I read that, the nocheck parameter is the default, that
> means, that no filesystem checks should be performed when the partition is
> mounted.
>
> but when I mount the filesystem on the slave, I see the following messages
> in /var/log/messages:
> EXT3-fs: mounted filesystem with ordered data mode.
> EXT3-fs: INFO: recovery required on readonly filesystem.
> EXT3-fs: write access will be enabled during recovery.
> (fs/jbd/recovery.c, 255): journal_recover: JBD: recovery, exit status 0,
> recovered transactions 99610 to 100072
> (fs/jbd/recovery.c, 257): journal_recover: JBD: Replayed 12498 and revoked
> 589/918 blocks
> kjournald starting.  Commit interval 5 seconds
> EXT3-fs: recovery complete.
>
>
> since I test this, the master server had occassional problems with the
> filesystem, so he decided to mount these read-only, and I had to fsck it.
> I think the filesystem got destroyed because of the filesystem ckecks, while
> mounting it readonly on the second server.
>
> I googled around, and found a similar message from someone mounting a XFS file
> system. So I am not sure, whether this is a mount or a ext3 problem.
>
> my kernel is a 2.6.12.6-bigsmp, on a SuSE 10.1.
>
> Is there any other way to prevent the slave server from doing any filesystem
> checks?
>
>
> kind regards
> Sebastian
>
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users
>



From itlistuser at rapideye.de  Wed Nov 29 08:29:16 2006
From: itlistuser at rapideye.de (Sebastian Reitenbach)
Date: Wed, 29 Nov 2006 08:29:16 -0000
Subject: how to prevent filesystem check
Message-ID: <20061129082916.E2A2A78C056@ogo.rapideye.de>

Hi,

"Pengcheng Zou" <pengchengzou at gmail.com> wrote: 
> use 'noload' option to mount the readonly ext3 filesystem on the slave
> host, so the journal will not be loaded.
with the noload option, following output is shown at the mount 
 command:
 mount: wrong fs type, bad option, bad superblock on /dev/sdb2,
 missing codepage or other error
 In some cases useful info is found in syslog - try
 dmesg | tail or so
 and in /var/log/messages:
 Nov 29 09:17:17 srv3 kernel: ext3: No journal on filesystem on sdb2

as this is a valid option, the same fs type, must be a bad superblock?
is ther anything I can do about it?

> 
> BTW, this kind of setting could have some cache-coherence problem. why
> not do it correct way by using some kind of network filesystem (NFS)
> or clustering filesystem (GFS,Lustre)?
> 

yes, I need to take a look at these file systems.

Sebastian



From pengchengzou at gmail.com  Wed Nov 29 09:34:19 2006
From: pengchengzou at gmail.com (Pengcheng Zou)
Date: Wed, 29 Nov 2006 17:34:19 +0800
Subject: how to prevent filesystem check
In-Reply-To: <20061129082916.E2A2A78C056@ogo.rapideye.de>
References: <20061129082916.E2A2A78C056@ogo.rapideye.de>
Message-ID: <24a313060611290134u7c4bb309vf6bc97cc34ebc57b@mail.gmail.com>

noload is deprecated. :(

On 11/29/06, Sebastian Reitenbach <itlistuser at rapideye.de> wrote:
> Hi,
>
> "Pengcheng Zou" <pengchengzou at gmail.com> wrote:
> > use 'noload' option to mount the readonly ext3 filesystem on the slave
> > host, so the journal will not be loaded.
> with the noload option, following output is shown at the mount
>  command:
>  mount: wrong fs type, bad option, bad superblock on /dev/sdb2,
>  missing codepage or other error
>  In some cases useful info is found in syslog - try
>  dmesg | tail or so
>  and in /var/log/messages:
>  Nov 29 09:17:17 srv3 kernel: ext3: No journal on filesystem on sdb2
>
> as this is a valid option, the same fs type, must be a bad superblock?
> is ther anything I can do about it?
>
> >
> > BTW, this kind of setting could have some cache-coherence problem. why
> > not do it correct way by using some kind of network filesystem (NFS)
> > or clustering filesystem (GFS,Lustre)?
> >
>
> yes, I need to take a look at these file systems.
>
> Sebastian
>
> _______________________________________________
> Ext3-users mailing list
> Ext3-users at redhat.com
> https://www.redhat.com/mailman/listinfo/ext3-users
>



From tytso at mit.edu  Wed Nov 29 14:02:45 2006
From: tytso at mit.edu (Theodore Tso)
Date: Wed, 29 Nov 2006 09:02:45 -0500
Subject: how to prevent filesystem check
In-Reply-To: <20061129052026.GA6429@schatzie.adilger.int>
References: <20061128140551.A110E4087B2@ogo.rapideye.de>
	<8C7C41A176AC0B468BEFB2EFD9BDAB9902426708@XCH-NW-5V2.nw.nos.boeing.com>
	<20061129052026.GA6429@schatzie.adilger.int>
Message-ID: <20061129140245.GB5771@thunk.org>

On Tue, Nov 28, 2006 at 09:20:26PM -0800, Andreas Dilger wrote:
> This is a good reason for the multi-mount protection feature that I
> proposed previously.  It would mark the filesystem as in-use on one 
> node and the filesystem itself would refuse to mount on the second
> node.  Unfortunately, this idea met resistance from some of the
> other ext3 developers from merging it upstream.

The resistance was because it means we have to put what is effectively
a cluster filesystem's distributed lock manager (DLM) just to tell
users that "News flash!  ext3 isn't a cluster filesystem" and then
error-out the mount.  Granted, it was a relatively simple cluster DLM,
but that's what you effectively need, complete with issues surrounding
heartbeats for liveness detection --- and since it was a simple
cluster DLM, it didn't handle temporary connectivity failure since
there was no STONITH (shoot-the-other-node-in-the-head) functionality.
So it didn't even solve the problem completely.

Still, if a lot of users are making this fundamental mistake of trying
to use ext3 as a cluster filesystem, maybe we need to revisit this
question, since hopefully once the user sees the error message they
won't keep doing this.  It doesn't stop them from wasting a lot of
time trying to set up such a system only discover that they used the
wrong tool in the first place, though.  So this feels more like a
documentation problem; but maybe it's worth it just as a backup to
some kind of documentation telling users that they really want to be
using OCFS2, GFS, GPFS, or some other cluster filesystem if they want
to do something like this.

					- Ted



From tytso at mit.edu  Wed Nov 29 14:12:11 2006
From: tytso at mit.edu (Theodore Tso)
Date: Wed, 29 Nov 2006 09:12:11 -0500
Subject: Best Practices for Data Recovery for corrupted EXT2/3?
In-Reply-To: <200611281058.09288.tweeks@rackspace.com>
References: <200611281058.09288.tweeks@rackspace.com>
Message-ID: <20061129141211.GC5771@thunk.org>

On Tue, Nov 28, 2006 at 10:58:09AM -0600, Thomas Weeks wrote:
> 
> I had a bad IDE controller that hosed my EXT3 filesystems.  A resulting fsck 
> damaged pat of the filesystem and the root inode is gone (on my main drive 
> AND the backup drive).   I immediately DD'd the main drive over to an 
> identical drive that I have been working on.  But every time.. a fsck 
> destroys all the data (moves everything to lost+found) and nothing that I've 
> found is able to restore the dir structure... or allow me to superposition 
> any of the subdirs (such as /home/*).

Unfortunately, if the root inode is gone, you've lost the names of the
inodes in the root directory.  Usually though most of the inodes in
the root directory are directories, and so the directory hierarchy is
moved to lost+found.  So if you see a directory /lost+found/#5612 that
has files such as /lost+found/#5612/passwd and /lost+found/#5612/motd,
then you could probably guess that #5612 was /etc, and you could then
just move /lost+found/#5612 to /etc.  

Of course, if more of the filesystem than just the root directory is
gone, then you may have lost more directory information and so things
might be not be quite that simple to recover from.

> I would like to retain file names.. as I see that SOME filename/dir structure 
> is intact when the fsck starts nuking all my files that don't have a parent 
> dir (e.g. ../home/user/file1 --> lost+found).  Is there a way that this 
> information can be salvaged?  Or a new fake root inode be slid into place and 
> all the links associated?

It's not that fsck is nuking the names --- the names were gone from
your hardware corruption.  Fsck is seeing that inode doesn't have a
name, which is why it is moving it to /lost+found so you at least
don't lose the data.  Please don't blame fsck; it's doing the best job
that it can!

> My last ditch effort will be to allow the migration to lost+found and then try 
> to copy off files based on UID/GID/date, but I would really like to retain 
> file names. 

Sorry, the file names are gone; if they were there, fsck would have
used them.  If you have a valid locatedb database, you might be able
to use that to help reconstruct the filenames, and of course as a
responsible sysadmin you have been doing regular backups (RIGHT? :-),
so you could use that information as well.

Regards and good luck,

						- Ted



From lists at nerdbynature.de  Wed Nov 29 22:56:49 2006
From: lists at nerdbynature.de (Christian Kujau)
Date: Wed, 29 Nov 2006 22:56:49 +0000 (GMT)
Subject: how to prevent filesystem check
In-Reply-To: <20061129052026.GA6429@schatzie.adilger.int>
References: <20061128140551.A110E4087B2@ogo.rapideye.de>
	<8C7C41A176AC0B468BEFB2EFD9BDAB9902426708@XCH-NW-5V2.nw.nos.boeing.com>
	<20061129052026.GA6429@schatzie.adilger.int>
Message-ID: <Pine.LNX.4.64.0611292247590.3735@sheep.housecafe.de>

On Tue, 28 Nov 2006, Andreas Dilger wrote:
> You should just mount the filesystem on the client via NFS, that's
> what it's SUPPOSED to do.

Would a "-o bind" mount suffice too? That way one does not need to setup 
NFS, not to mention the network overhead this solution might have.

Like:
  # mount -t ext3 /dev/sdb1 /home
  # mount -o bind,ro /home /mnt/

I just tested this: /dev/sdb1 did not get altered with the 
bind-mount.

C.
-- 
BOFH excuse #201:

RPC_PMAP_FAILURE



From richard.c.wolber at boeing.com  Thu Nov 30 00:41:38 2006
From: richard.c.wolber at boeing.com (Wolber, Richard C)
Date: Wed, 29 Nov 2006 16:41:38 -0800
Subject: how to prevent filesystem check
In-Reply-To: <20061129052026.GA6429@schatzie.adilger.int>
Message-ID: <8C7C41A176AC0B468BEFB2EFD9BDAB9902426714@XCH-NW-5V2.nw.nos.boeing.com>

 
On Nov 28, 2006 9:20pm Andreas Dilger wrote:
> 
> On Nov 28, 2006  12:33 -0800, Wolber, Richard C wrote:
> > Running the following command on your slave server should 
> > do the trick:
> > 
> > echo "AUTOFSCK_DEF_CHECK=\"no\"" >> /etc/sysconfig/autofsck
> 
> This is incorrect.  As soon as the ext3 code mounts the 
> filesystem it will do journal recovery and potentially 
> corrupt the filesystem.
> Then, the read-only copy will become out-of-date in the cache 
> of that client and it will get bogus data back, eventually 
> deciding that the filesystem is corrupt (whether it is or not).

Even if you "mount -oro -text2 $DEV $DIR"?


..Chuck..



From adilger at clusterfs.com  Thu Nov 30 05:53:34 2006
From: adilger at clusterfs.com (Andreas Dilger)
Date: Wed, 29 Nov 2006 21:53:34 -0800
Subject: how to prevent filesystem check
In-Reply-To: <8C7C41A176AC0B468BEFB2EFD9BDAB9902426714@XCH-NW-5V2.nw.nos.boeing.com>
References: <20061129052026.GA6429@schatzie.adilger.int>
	<8C7C41A176AC0B468BEFB2EFD9BDAB9902426714@XCH-NW-5V2.nw.nos.boeing.com>
Message-ID: <20061130055334.GF6429@schatzie.adilger.int>

On Nov 29, 2006  16:41 -0800, Wolber, Richard C wrote:
> On Nov 28, 2006 9:20pm Andreas Dilger wrote:
> > On Nov 28, 2006  12:33 -0800, Wolber, Richard C wrote:
> > > Running the following command on your slave server should 
> > > do the trick:
> > > 
> > > echo "AUTOFSCK_DEF_CHECK=\"no\"" >> /etc/sysconfig/autofsck
> > 
> > This is incorrect.  As soon as the ext3 code mounts the 
> > filesystem it will do journal recovery and potentially 
> > corrupt the filesystem.
> > Then, the read-only copy will become out-of-date in the cache 
> > of that client and it will get bogus data back, eventually 
> > deciding that the filesystem is corrupt (whether it is or not).
> 
> Even if you "mount -oro -text2 $DEV $DIR"?

Even then, yes.  It is NOT SAFE to access the same block device on
multiple nodes at one time.  Even with "-o ro" the mount will cause
the journal to be recovered.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.



From adilger at clusterfs.com  Thu Nov 30 06:54:01 2006
From: adilger at clusterfs.com (Andreas Dilger)
Date: Wed, 29 Nov 2006 22:54:01 -0800
Subject: how to prevent filesystem check
In-Reply-To: <20061129140245.GB5771@thunk.org>
References: <20061128140551.A110E4087B2@ogo.rapideye.de>
	<8C7C41A176AC0B468BEFB2EFD9BDAB9902426708@XCH-NW-5V2.nw.nos.boeing.com>
	<20061129052026.GA6429@schatzie.adilger.int>
	<20061129140245.GB5771@thunk.org>
Message-ID: <20061130065401.GH6429@schatzie.adilger.int>

On Nov 29, 2006  09:02 -0500, Theodore Tso wrote:
> On Tue, Nov 28, 2006 at 09:20:26PM -0800, Andreas Dilger wrote:
> > This is a good reason for the multi-mount protection feature that I
> > proposed previously.  It would mark the filesystem as in-use on one 
> > node and the filesystem itself would refuse to mount on the second
> > node.  Unfortunately, this idea met resistance from some of the
> > other ext3 developers from merging it upstream.
> 
> The resistance was because it means we have to put what is effectively
> a cluster filesystem's distributed lock manager (DLM) just to tell
> users that "News flash!  ext3 isn't a cluster filesystem" and then
> error-out the mount.  Granted, it was a relatively simple cluster DLM,
> but that's what you effectively need, complete with issues surrounding
> heartbeats for liveness detection --- and since it was a simple
> cluster DLM, it didn't handle temporary connectivity failure since
> there was no STONITH (shoot-the-other-node-in-the-head) functionality.
> So it didn't even solve the problem completely.

I agree that the proposed MMP code is by no means a 100% solution, and
is not intended to replace HA + STONITH.  Rather, it is intended to
handle the "oops, HA is broken, admin set it up incorrectly, FC routing
broke, SCSI devices were renamed, etc" kind of issues.

> Still, if a lot of users are making this fundamental mistake of trying
> to use ext3 as a cluster filesystem, maybe we need to revisit this
> question, since hopefully once the user sees the error message they
> won't keep doing this.

The only reason I raised this again was because this "mount ext2/3 on
two nodes, one being read-only" is a fairly common thing for users to
try and it really deserves some kind of attention.  The ability to
have multi-host block devices is only increasing I think, especially
in server-type environments with FC, IB, ISCSI, etc.

Cheers, Andreas
--
Andreas Dilger
Principal Software Engineer
Cluster File Systems, Inc.



From tytso at mit.edu  Thu Nov 30 20:36:45 2006
From: tytso at mit.edu (Theodore Tso)
Date: Thu, 30 Nov 2006 15:36:45 -0500
Subject: how to prevent filesystem check
In-Reply-To: <20061130055334.GF6429@schatzie.adilger.int>
References: <20061129052026.GA6429@schatzie.adilger.int>
	<8C7C41A176AC0B468BEFB2EFD9BDAB9902426714@XCH-NW-5V2.nw.nos.boeing.com>
	<20061130055334.GF6429@schatzie.adilger.int>
Message-ID: <20061130203645.GA24959@thunk.org>

On Wed, Nov 29, 2006 at 09:53:34PM -0800, Andreas Dilger wrote:
> > Even if you "mount -oro -text2 $DEV $DIR"?
> 
> Even then, yes.  It is NOT SAFE to access the same block device on
> multiple nodes at one time.  Even with "-o ro" the mount will cause
> the journal to be recovered.

And even with a ext2 filesystem, as the filesystem changes out from
under the kernel (as the system that has the filesystem mounted
read/write makes changes), the system that has the filesystem mounted
read-only will have see inconsistencies caused by some blocks being
cached and some blocks being not cached, and this could result in
security violations (when blocks containing another user are read by
another non-privileged users) and possibly kernel panics.

The only safe way to mount a block device in a shared mode where one
or more of the systems have the shared block device mounted read/write
is to use a cluster-aware filesystem, such as GFS, OCFS2, or GPFS.

						- Ted