[linux-lvm] Data deduplication in LVM?
Les Mikesell
lesmikesell at gmail.com
Thu Jun 11 15:35:09 UTC 2009
Les Mikesell wrote:
> Roy Sigurd Karlsbakk wrote:
>> On 11. juni. 2009, at 00.30, Stuart D. Gathman wrote:
>>
>>> One OSS backup product that does
>>> deduplication is BackupPC (written in Perl). In the backup server,
>>> every file
>>> gets hard linked to a name in a special directory that is its md5
>>> checksum
>>> (plus some fiddly logic to handle metadata)
>>
>>
>> This sounds like file-level deduplication. Most storage systems sing
>> dedup, uses block-level dedup. NetApp is one example; they dedup
>> everything with 4k blocks, doing the actual deduplication at night.
>
> Yes, it is a different concept. However it does work very well when you
> are storing your backups on a filesystem without block-level dedup. And
> that is probably the place where you have the most redundancy - or if
> you don't already, you'll be able to store a much longer history.
Apologies for following up my own post, but this does remind me of a
slightly related problem that someone here might have solved. The
backuppc archive ends up containing such a large number of directory
entries and hardlinks that it is typically impractical to copy by any
file-oriented means or even rsync. A recurring topic on the backuppc
mail list is how to make a copy for offsite storage.
Personally I use a RAID1 created with 3 mirror members and periodically
swap one out and resync, but that's not very elegant. Is there a better
way or one that could be incrementally updated across a WAN? Does LVM
have a mechanism like zfs's incremental snapshot send/receive? (Not sure
if that would work either but it sounds promising). Is there any other
way to do a block-oriented remote copy? Would LVM mirroring work as
well or better than md-device raid? The partition can stay mounted
while the raid rebuilds but realistically not much else can be happening
because of the performance impact, and I unmount momentarily while
removing the member to get a clean filesystem.
Are there tricks with drbd or perhaps raid over iscsi that would let a
periodic sync work incrementally - well enough to use over a WAN?
--
Les Mikesell
lesmikesell at gmail.com
More information about the linux-lvm
mailing list