ext3 +2TB fs

Andreas Dilger adilger at clusterfs.com
Sat Feb 26 02:40:59 UTC 2005

On Feb 25, 2005  18:58 -0600, Damian Menscher wrote:
> On Fri, 25 Feb 2005, Andreas Dilger wrote:
> >I would start by testing whether the large device works properly by
> >writing some pattern (e.g. 64-bit byte offset) to the start of each
> >4k block on disk, and then read them back to verify nothing has been
> >overwritten.
> Out of curiosity, how would one do this?  All I can think of is to 
> script something to call dd with the seek/skip argument.  But making 
> 3.5TB/4k = a billion calls to dd out a shell seems kinda silly.  What do 
> you suggest?

I'd say a simple C program like the following (not much error checking,
untested, be careful ;-):

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>

#define BUFSZ 4096
int main()
	char buf[BUFSZ] = { 0 };
	long long offset, *bufp = (long long *)buf;
	int fd = open(argv[1], O_RDWR | O_LARGEFILE);
	int rc;

	while (write(fd, buf, sizeof(buf)) == sizeof(buf))
		*bufp += sizeof(buf);

	printf("end at %llu: %s\n", lseek64(fd, 0, SEEK_CUR),

	offset = lseek64(fd, 0, SEEK_SET);
	while (read(fd, buf, sizeof(buf)) == sizeof(buf)) {
		if (*bufp != offset)
			fprintf(stderr, "offset %llu data is %llu\n",
				offset, *bufp);
		offset += sizeof(buf);

	printf("end at %llu: %s\n", lseek64(fd, 0, SEEK_CUR),

	return 0;

> >Next, create directories (you may need as many as 16k) to get one that
> >is in the >2TB part of the disk.  You can tell by the inode number and
> >the output from dumpe2fs.  If you write a file in that directory it
> >should allocate space at > 2TB on the disk, and debugfs "stat file" will
> >tell you the block layout of the file.
> As I understand it, the first test is to identify if the flaw exists in 
> the kernel block-device code, and the second test whether the bug is in 
> the ext2 code?


> Anyone out there actually using a >2TB filesystem on a 32-bit machine?

I've heard sporadic reports about it, but there is definitely a problem
somewhere after 2TB.  For Lustre we don't care so much about gigantic
individual filesystems because we aggregate them at a higher level (100's
of TB) and having "smaller" (i.e. 2TB) filesystems allows more parallelism
for IO, e2fsck, etc.

Cheers, Andreas
Andreas Dilger
http://members.shaw.ca/adilger/             http://members.shaw.ca/golinux/

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/ext3-users/attachments/20050225/62691309/attachment.sig>

More information about the Ext3-users mailing list