Problem in backing up Millions of files

Michal Ludvig mludvig at logix.net.nz
Wed Apr 1 00:36:29 UTC 2009


Rangi, Jai wrote:
> I have Linux FTP server that I need to backup. It had 4.5 million files
> and 900GB disk space. 
> Backup client takes around 35 hours to do incremental backup. Simple
> Rsync on nfs mount (on datadomain) takes around 25 hours.

Is the NFS tuned up? Eg use tcp instead of udp, increase r/w buffers
(rsize and wsize mount parameters), jumbo frames if possible, etc.

> System takes
> way too long just to calculate the number of files in the directory and
> to start rsync. Even not many files are changed. 

Are you using rsync 3.x? It's not the default on RHEL5, you'd have to
compile it or grab the RPM from elsewhere.

>From http://www.samba.org/ftp/rsync/src/rsync-3.0.0-NEWS ...

ENHANCEMENTS:
- A new incremental-recursion algorithm is now used when rsync is
  talking to another 3.x version.  This starts the transfer going more
  quickly (before all the files have been found), and requires much less
  memory.

> Does any one have an idea what can be a better was to backup 4.5
> millions files. I am looking for any option in rsync that can speed up
> the process. 

Try splitting the backup into smaller chunks. For instance instead of
backing up /home in a single run do /home/[a-d]* in one run,
/home/[e-h]* in the next run, etc. That will have a positive effect on
memory consumption on both sides of the transfer and perhaps on the
overall performance as well.

Michal
--
* http://s3tools.org/s3cmd - Amazon S3 backup project (GPL)










More information about the redhat-list mailing list