[linux-lvm] Data corruption on large, multi-device filesystem
Randall A. Jones
rajones at svs.gsfc.nasa.gov
Thu Jan 20 14:06:44 UTC 2005
joe at eiler.net wrote:
>I have recently run into this problem also. I have seen it happen on SuSe 9.2,
>Fedora Core 2 and 3, and vanilla kernels 18.104.22.168, 2.6.9, and 2.6.10.
>All of my tests were using xfs.
>It happens whenever 2 or more devices are striped together with a total volume
>size greater than 2TB. I have played with a single 4TB raid (12x 400GB RAID5)
>and did not see any corruption (but I did not fill the disk either).
>I initially saw the problem running video files over samba. But have recreated
>the problem by simply copying some large (5GB+) files and then checking
>I don't see any corruption on the files unless I specify the -i option to
>lvcreate. I usually see data corruption within an hour using my current tests.
To verify, this corruption you are seeing only happens when you have a
LV larger than 2TB
and when you use striping specifically with lvcreate -i.
Has anyone experienced data corruption with >2TB LV and no striping?
>Let me know if I can be of any assistance.
>Quoting Jens Beyer <jbe at webde-ag.de>:
>>I get severe data corruption using an logical volume larger
>>then 2 TB. Finally I was able to track down device mappper or
>>lvm as last suspects.
>>My first guess where problems with filesystems but recently
>>I tried using md / RAID0 - and didnt have any errors of any
>>kind. I would prefer using LVM since we want to use snapshots
>>to simplify backup, but I have no clue how to further debug.
>>On a system with 3 devices each larger then 1 TB and a logical
>>volume striped over all devices some data gets corrupted while
>>written (or read ?) from disk. This shows up as md5 or crc sums
>>changes on sequenced reads of files if filecache is not involved
>>(by reading a lot data).
>>On ext2fs there are error while writing data (kernel: EXT2-fs error
>>(device dm-0): ext2_new_block: Allocating block in system zone -
>> block = 722239884), on other filesystems successive fsck/repairs
>>shows corrupted metadata.
>>The system setup is
>>- Three 29160B Adaptec scsi-controller each with one
>> ATA-Disk Raid sized 1240 GB, (dual PIII, HP DL360 G2, 2 GB Ram)
>>- Volume group over all three devices, logical volume stripped
>> full size (3.7 TB)
>>- Filesystem either ext2fs/ext3fs (1.34), reiserfs (3.6.13) or
>> xfs (2.6.25)
>>- host:~ # lvm version
>> LVM version: 2.00.33 (2005-01-07)
>> Library version: 1.00.21-ioctl (2005-01-07)
>> Driver version: 4.3.0
>>- 2.6.10 vanilla + 2.6.10-udm1 patches
>>The problems where initially discovered on 2.6.8, tracked on 2.6.9-udm
>>and also occurs if only 2 devices (sum 2.4 TB) are used.
>>For a limited time I will be able to further debug the system though
>>it takes some time to generate more then 2 TB of data
>>(max seq read/write rate is ~80 MB/s).
>>Nur tote Fische schwimmen mit dem Strom
Randall Jones GST NASA Goddard Space Flight Center
HPC Visualization Support http://hpcvis.gsfc.nasa.gov
Scientific Visualization Studio http://svs.gsfc.nasa.gov
rajones at svs.gsfc.nasa.gov Code 610.3 301-286-2239
More information about the linux-lvm