Find duplicated files

Marcin Struzak marcin-list at struzak.com
Wed Oct 5 19:28:42 UTC 2005


> > > Is there anyway I can now which files are
> > > duplicated in some directories?
> >
> > One approach is to use find, then use md5sum, then use
> > sort on the output of md5sum, then look for duplicate
> > md5's.
> >
> >   $ find /home -type f -print0 \
> >     | xargs -0 md5sum \
> >     | sort ... \
> >     | less
>
> Not a bad idea.
>
> Only thing with this good idea is it needs more of a script to actually
> look for the duplicate md5sum. A huge directory will most definitely
> have an issue.

In a two-step process, you could do:

  find . -type f -print0 | \
  xargs -0 md5sum | \
  sort | \
  cut -c1-32 | \
  uniq -d

(cut will get rid of the path-file-names, which most probably will be
different, at least at the path level, and uniq will only list md5sums that
appeared more than once).  Now, for each line returned, run the find again,
this time with a grep; using egrep you could even do this:

  find . -type f -print0 | \
  xargs -0 md5sum | \
  sort | \
  egrep '(<val1>|<val2>|...)'

--Marcin




More information about the fedora-list mailing list