Unlink performance

Andreas Dilger adilger at sun.com
Mon Oct 27 19:51:23 UTC 2008

On Oct 27, 2008  10:30 +0100, Alex Bligh wrote:
> --On 27 October 2008 11:40:21 +0200 Markus Peuhkuri <puhuri at iki.fi> wrote:
>> However, as my delete script malfunctioned, and at one point it had
>> 2x100 GB files to delete; thus running 'rm file' one after one for those
>> 400 files, about 500 MB each.  What then resulted was  that the
>> real-time data processing became too slow and and buffers overfload.
> Are all the files in the same directory? Even with HTREE there seem
> to be cases where this is surprisingly slow. Look into using nested
> directories (e.g. A/B/C/D/foo where A, B, C, D are truncated hashes
> of the file name).
> Or, if you don't mind losing data in a power off and the job suits,
> unlink the file name immediately your processor has opened it. Then
> it will be deleted on close.

No, it is likely the problem is with the ext3 indirect block pointer
updates for large files.  This will also put a lot of blocks into the
journal and if the journal is full it can block all other operations.

If you run with ext4 extents the unlink time is much shorter, though
you should test ext4 yourself before putting it into production.

Doing the "unlink; sleep 1" will keep the traffic to the journal lower,
as would deleting fewer files more often to ensure you don't delete
200GB of data at one time if you have real-time requirements.  If you
are not creating files faster than 1/s unlinks should be able to keep up.

Cheers, Andreas
Andreas Dilger
Sr. Staff Engineer, Lustre Group
Sun Microsystems of Canada, Inc.

More information about the Ext3-users mailing list