Eliminating duplicate photos

Nifty Fedora Mitch niftyfedora at niftyegg.com
Mon Sep 29 22:18:33 UTC 2008


On Mon, Sep 29, 2008 at 02:09:05PM -0430, Patrick O'Callaghan wrote:
> On Mon, 2008-09-29 at 14:00 -0400, Trapper wrote:
> > Itamar - IspBrasil wrote:
> > > create a list of md5 of all files,
> > >
> > > with md5 you will find duplicated files.
> > >
> > > On 9/29/2008 9:04 AM, Timothy Murphy wrote:
> > >> What is the best way of eliminating duplicate photos
> > >> on a number of machines, all running Fedora or CentOS?
> > >>
> > >> I suppose one could ask the same question about files generally;
> > >> how to tag or delete duplicates.
> > >>
> > >>    
> > I have a problem similar to Timothy's. If I run "md5sum *" on a folder, 
> > in a terminal,  it lists all the sums. My problem is that I have several 
> > thousand files. Is there some way I can output the results to a text 
> > file? Can't copy and paste unless there's some way for me to adjust the 
> > terminal to allow the last several thousand lines to display. Then I'm 
> > also going to have to sort all those lines into some alphabetical order 
> > to reasonably detect duplicate sums. Any ideas?
> 
> You're using Linux here. Anything that outputs text to a terminal can
> send it to a file or to another program. You need to read up on Shell
> redirection and filters, e.g.:
> 
> md5sum * > sums
> 
> or
> 
> md5sum * | sort > sorted_sums
> 

The below script is not very general but can be edited to 
your need.   The SIZER value is to make it easy to find lumpy
things like duplicate ISO images.   The odd md5sum value 
pops up often for interesting reasons and is excluded.

============================================================
#!  /bin/bash
# Copyright (C) 1985-2008 by Tom Mitchell 
#
# This program is free software, licensed under the GNU GPL, >=2.0. http://www.gnu.org/.
# This software comes with absolutely NO WARRANTY. Use at your own risk!
#
#SIZER=' -size +10240k'
SIZER=' -size +0'
#
DIRLIST=". "
find $DIRLIST  -type f $SIZER -print0 | xargs -0 md5sum |\
	egrep -v "d41d8cd98f00b204e9800998ecf8427e|LemonGrassWigs" |\
sort > /tmp/looking4duplicates
tput bel; sleep 2
tput bel; sleep 2
tput bel; sleep 2
cat /tmp/looking4duplicates |  uniq --check-chars=32 --all-repeated=prepend | less


-- 
	T o m  M i t c h e l l 
	Found me a new hat, now what?




More information about the fedora-list mailing list