Bash script to delete duplicate files in system

Shell Scripting

Below command will find and list all duplicate files by its size and md5hash which you can later delete on your choice.
This command finds duplicate files by comparing size first, then md5sum, it doesn’t delete anything, just lists them.

find -not -empty -type f -printf "%s\n" | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate | cut -d" " -f3

Example:

[[email protected] temp]$ find -not -empty -type f -printf "%s\n" | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate
ae474fbc97b9d6ff6e1fb37c2b5c0a1d  ./abc
ae474fbc97b9d6ff6e1fb37c2b5c0a1d  ./abc2

[[email protected] temp]$ find -not -empty -type f -printf "%s\n" | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate | cut -d" " -f3
./abc
./abc2

Below command will find and list all duplicate files by its md5sum which you can later delete on your choice.
Find Duplicate Files (based on MD5 hash)
Calculates md5 sum of files. sort. uniq based on only the hash. use cut to remove the hash from the result.

find -type f -exec md5sum '{}' ';' | sort | uniq --all-repeated=separate -w 33 | cut -c 35-
[[email protected] temp]$ find -type f -exec md5sum '{}' ';' | sort | uniq --all-repeated=separate -w 33 | cut -c 35-
./abc
./abc2

Below script searches and removes ALL duplicate files(doesn’t keep a sample of duplicate deletes all). So please test it on sample files before real use.

#!/bin/bash
#Filename: rmdups.sh
#Description: Find and remove duplicate files.
ls -lS | awk 'BEGIN {
getline;getline;
name1=$8; size=$5
}
{ name2=$8;
if (size==$5)
{
"md5sum "name1 | getline; csum1=$1;
"md5sum "name2 | getline; csum2=$1;
if ( csum1==csum2 )
{print name1; print name2 }
};
size=$5; name1=name2;
}' | sort -u > duplicate_files
cat duplicate_files | xargs -I {} md5sum {} | sort | uniq -w 32 | awk '{ print "^"$2"$" }' | sort -u > duplicate_sample
echo Removing..
comm duplicate_files duplicate_sample -2 -3 | tee /dev/stderr | xargs rm
echo Removed duplicates files successfully.

In case of any ┬ęCopyright or missing credits issue please check CopyRights page for faster resolutions.

9 Responses

  1. Pentarson says:

    Very very useful post. After searching for several concepts I’ve just found from your blog post what I was looking for. Bash script to remove the duplicate files in system is actually the thing which I was needed. Thanks for your great help.

  2. Polskii says:

    I was searching for a free tool which finds and eliminates duplicate photos,files.In my search around i got one Duplicate File Finder it is totally free and beneficial.

  3. rusl says:

    This script does not work! Beware! It is overzealous. I tested it with a sample of 6 files of which there were 4 unique but it only left 2. The files are all close. Perhaps the MD5 was identical? Anyway, be careful! This script is not safe! Test before use. Maybe for different sorts of files it would work? (I was using simple txt files)

    Cheers,
    rusl

  4. rusl says:

    OK, the problem is this script deletes ALL duplicate files and does not leave a single sample of each duplicate (which the comments say it is doing in the code)

    I did a little more digging and came across the free program fdupes (in the ubuntu main repo no less) which does this properly. I just ran fdupes -dN and it did what I wanted. See the fdupes manpage for an explanation

    Hope this helps someone. I only used this script because I was being lazy and it was the first hit on google. It is dangerous because it doesn’t work as advertised. And it is silly because you might as well just install and use fdupes if it works on your system.

    Cheers,
    reusl

  5. admin says:

    thanks reusl good catch, it indeed removes duplicate files, I have updated the comment in script.

  6. olibre says:

    Remove all the unwanted, duplicated files from your machine. Software name is DuplicateFilesDeleter.

  7. Gary1991 says:

    Delete duplicate files with ease!
    Try DuplicateFilesDeleter program and get rid of duplicate files.
    Thank you!

Leave a Reply