<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
<title></title>
</head>
<body bgcolor="#ffffff" text="#000000">
Hi -<br>
<br>
I did answer but forgot to cc all, apologies.<br>
Here's the communication I had with Andreas Dilger ....<br>
<blockquote>
<blockquote> Andreas Dilger wrote:
<blockquote cite="mid20060411185950.GD17364@schatzie.adilger.int"
type="cite">
<pre wrap="">On Apr 11, 2006 14:25 -0400, Sev Binello wrote:
</pre>
<blockquote type="cite">
<blockquote type="cite">
<pre wrap="">Does this imply you have a 6TB ext3 filesystem?
</pre>
</blockquote>
<pre wrap=""> No, it is divided into 6 filesystems, the largest ~ 1.8TB
</pre>
</blockquote>
<pre wrap=""><!---->
I wouldn't exactly trust the 2.4 kernel for devices larger than 2TB.
Some SCSI drivers also had problems over 2TB due to signed/unsigned
issues. Is the 6TB of storage split into < 2TB LUNs by the hardware,
or is it a single 6TB block device (with CONFIG_LBD) that is partitioned
by Linux? The latter case would be in "not very well tested" waters.
</pre>
</blockquote>
The 6TB are split on the raid hardware into 6 LUNS.<br>
So Linux sees them as devices smaller than 2TB,<br>
</blockquote>
</blockquote>
<br>
Would it be a problem if the two 1.8TB systems appeared on one host ?<br>
<br>
Thanks<br>
-Sev<br>
<br>
<br>
Damian Menscher wrote:
<blockquote
cite="midPine.LNX.4.63.0604121904140.13237@zeus.itg.uiuc.edu"
type="cite">I've seen similar errors when attempting to have a >2TB
filesystem on a 32-bit RHEL3 machine. We have since implemented a
3.5TB filesystem on a 64-bit RHEL4 machine.
<br>
<br>
It would help if you could answer the question Andreas Dilger posed:
<br>
<br>
"Does this imply you have a 6TB ext3 filesystem?"
<br>
<br>
Damian
<br>
<br>
On Wed, 12 Apr 2006, Sev Binello wrote:
<br>
<br>
<blockquote type="cite"><br>
Hi -
<br>
<br>
In case this helps,
<br>
we got the following messages from EXT3 before the filesystem went
<br>
Does anyone recognize these.....
<br>
<br>
//seems to mount okay
<br>
Mar 25 17:52:30 acnlin82 kernel: EXT3 FS 2.4-0.9.19, 19 August 2002
on sd(8,33),
<br>
internal journal
<br>
Mar 25 17:52:30 acnlin82 kernel: EXT3-fs: recovery complete.
<br>
Mar 26 00:04:01 acnlin82 kernel: EXT3-fs: mounted filesystem with
ordered data
<br>
mode.
<br>
<br>
//soon as nfs clients start get a TON of errors like this
<br>
Mar 26 00:07:19 acnlin82 kernel: EXT3-fs error (device sd(8,49)):
ext3_free_blocks:
<br>
Freeing blocks not in datazone - block = 3443589120, count = 1
<br>
Mar 26 00:07:19 acnlin82 kernel: EXT3-fs error (device sd(8,49)):
ext3_free_blocks:
<br>
Freeing blocks not in datazone - block = 2113834232, count = 1
<br>
Mar 26 00:07:22 acnlin82 kernel: EXT3-fs error (device sd(8,49)):
ext3_free_blocks:
<br>
bit already cleared for block 49125
<br>
<br>
//interspersed with some of these
<br>
Mar 26 00:10:56 acnlin82 kernel: attempt to access beyond end of device
<br>
Mar 26 00:10:56 acnlin82 kernel: 08:31: rw=0, want=1891463980,
limit=1722264358
<br>
Mar 26 00:10:56 acnlin82 kernel: attempt to access beyond end of device
<br>
Mar 26 00:10:56 acnlin82 kernel: 08:31: rw=0, want=1824250576,
limit=1722264358
<br>
Mar 26 00:10:56 acnlin82 kernel: attempt to access beyond end of device
<br>
<br>
Then we had to reboot and basically filesystem is shot
<br>
<br>
Thanks
<br>
-Sev
<br>
<br>
Sev Binello wrote:
<br>
Hi -
<br>
<br>
We have had 3 rather major occurances of ext3 filesystem
corruption
<br>
lately,
<br>
i.e. so bad we couldn't event mount, and fsck didn't help.
<br>
<br>
I am looking for pointers, that could help us investigate the
root
<br>
cause.
<br>
<br>
In general...
<br>
We are running RedHat WS 3 Update 6, 2.4.21-40.2.ELsmp or
<br>
2.4.21-37.ELsmp
<br>
<br>
We have a small SAN system that looks like this
<br>
3 NFS servers each containing 2 Qlocic hba's
connected to 2
<br>
qlogic switches
<br>
connected to an nstor (now xyratex) 6TB raid system
containing 2
<br>
(active-active) controllers.
<br>
<br>
On the first 2 occasions one of the controllers was failed over.
<br>
On a 3rd occasion both SAN switches lost power, and the hosts
and raid
<br>
lost communication.
<br>
<br>
<br>
On all occasions the qlocic failover driver tried to start up on
the
<br>
alternate HBA.
<br>
<br>
On the first 2 instances we sort of tried to blame the
controller.
<br>
On the 3rd, that was harder to do since the raid system and the
hosts
<br>
stayed up
<br>
but lost communication.
<br>
<br>
I can provide more detail if anyone as any info on how to
proceed.
<br>
<br>
Thanks
<br>
-Sev
<br>
<br>
<br>
<br>
-- <br>
Sev Binello
<br>
Brookhaven National Laboratory
<br>
Upton, New York
<br>
631-344-5647
<br>
<a class="moz-txt-link-abbreviated" href="mailto:sev@bnl.gov">sev@bnl.gov</a>
<br>
<br>
<br>
</blockquote>
<br>
Damian Menscher
<br>
</blockquote>
<br>
<br>
<pre class="moz-signature" cols="100">--
Sev Binello
Brookhaven National Laboratory
Upton, New York
631-344-5647
<a class="moz-txt-link-abbreviated" href="mailto:sev@bnl.gov">sev@bnl.gov</a>
</pre>
</body>
</html>