[linux-lvm] Data deduplication in LVM?

Stuart D. Gathman stuart at bmsi.com
Wed Jun 10 22:30:52 UTC 2009


On Wed, 10 Jun 2009, Roy Sigurd Karlsbakk wrote:

> Is this nonsense, or might it be an idea?

It's an idea.  With loosely coupled distributed computing, deduplication on
the nodes is not all that helpful, since each node needs its own copy anyway.
However, it is very helpful for backup.  One OSS backup product that does
deduplication is BackupPC (written in Perl).  In the backup server, every file
gets hard linked to a name in a special directory that is its md5 checksum
(plus some fiddly logic to handle metadata).  Handling the metadata separately
also lets the backup repository run as an ordinary user, yet reuse the
OS filesystem to store the files.

Another product that implements its own datastore is Box Backup (written in C).

-- 
	      Stuart D. Gathman <stuart at bmsi.com>
    Business Management Systems Inc.  Phone: 703 591-0911 Fax: 703 591-6154
"Confutatis maledictis, flammis acribus addictis" - background song for
a Microsoft sponsored "Where do you want to go from here?" commercial.




More information about the linux-lvm mailing list