How to Delete Duplicate Files in Linux

A common occurrence on a Linux dedicated server is the moving, deleting, copying, and renaming of files. The end result is that you often will end up with duplicates of files or even entire directories. With any server, disk space is important, and unnecessary files take up valuable space in a business where space equals money.

Fortunately, there is a Linux tool for those of us who like our servers to be organized and efficient. It is called fdupes and is available in most Linux software repositories. On Redhat/Centos/Fedora systems, first add the rpmforge repositories, and then you can install it with:

yum install fdupes

On Debian-based systems, use:

apt-get install fdupes

Once installed, finding files is rather easy. If, for example, you want to find duplicates in the /var/www directory, enter the following from the command line:

fdupes /var/www

If you would like fdupes to prompt you when it finds a duplicate and ask whether to preserve or delete the files, enter:

fdupes -d /var/www

This command will read the files in a directory and only list the ones that have identical contents. Even if one of the files has a single byte that is different, it will not consider it a duplicate. This way, you can be sure the files you find will be true duplicates.

To scan files in subdirectories, add the “-r” flag:

fdupes -r /var/www

This will check the specified directory and all directories beneath it. This is useful if you have copied files from one directory to another and do not remember which ones contain the identical files. Just be careful not to delete duplicates you need, such as multiple installations of Joomla. For more information on fdupes, type “man fdupes” for documentation or view it online.