(LONG) Delay when writing to ext4 LVM after boot
adilger at dilger.ca
Wed Apr 24 15:58:21 UTC 2013
On 2013-04-23, at 18:14, Ken Bass <daytooner at gmail.com> wrote:
> I have a large LV, about 6.5T, consisting of 4 physical drives of various sizes. The LV is formatted as ext4. There is no raid involved (hardware of software).
> After I first boot, if I try to write a large file (>~ 80M) to this LV, the write hangs for about 1minute or more, then continues on at full speed and finishes successfully. Writes of small files don't show this delay. After that first write and delay, all subsequent writes to other large files proceed at full speed.
This is a problem that I am very familiar with for large filesystems. The issue is that if the filesystem is relatively full, the first write needs to load and search a lot of the block bitmaps to try and find enough space to allocate blocks for the write. Depending on how it was formatted, each block bitmap read needs a seek.
> I am currently running Fedora 17 64bit (kernel 3.8.4-102.fc17.x86_64) but have noticed this also in previous systems (both 64 and 32bit). With smaller file systems ( < 1T ), there was a delay, but it was small, and it increased significantly as I increased the LV size.
Might I guess that this filesystem was formatted as ext3 and not as ext4? In particular, is the "flex_bg" option missing from the Features line in the "dumpe2fs -h /dev/XXX" output? This feature is enabled by default if formatting as ext4, but not as ext3.
The flex_bg feature will allocate the block bitmaps in large chunks on the disk so that they can be loaded quickly at mount and e2fsck time. On a 16TB filesystem with 10 ms seek time, in the worst case without flex_bg it could take up to 20 minutes to load all of the bitmaps at boot time without flex_bg...
> I have run e2fsck with the -D option (before attempting a write), which made no difference. Also, fwiw, I am mounting this with the default options. I've tried other options that were suggested to tweak ext4, but, again, no effect. This LV is also not my system (root) partition - that is on a separate physical drive.
> Any ideas? Suggestions?
Unfortunately, flex_bg is a format-time option, so you would need a full backup-restore to benefit from it for your filesystem.
If there is a delay between mounting and the first write, you could prefetch the bitmaps with "dumpe2fs /dev/XXX > /dev/null" so that it loads all of the bitmaps before they are needed. Some people do this in a startup script as a workaround for the initial write slowness.
Changing the allocation policy would not help in your case, since the large file would need more blocks than could be satisfied by the early groups. That is why you don't see a slowdown for small files.
In theory, it would be possible to modify resize2fs to co-locate the bitmaps on disk to enable flex_bg, in the same manner as it currently moves the inode table to add group descriptor blocks, but that would need some non-trivial development.
More information about the Ext3-users