I have folders that contain some backup files and configs. many of these backups are taken daily, and most of them don't change daily. The folder looks like this
config_A_2021-01-19.txt
config_A_2021-01-18.txt
config_A_2021-01-17.txt
config_B_2021-01-19.txt
config_B_2021-01-18.txt
config_B_2021-01-17.txt
It is possible that the first 3 files are identical, and the last 3 files are identical. I needed a script to clean them up, by keeping only the most recent files.
config_A_2021-01-19.txt
config_B_2021-01-19.txt
So, here we go:
declare -A hashes
for file1 in *
do
sha256_output=$(sha256sum $file1|awk '{print $1}')
if [ -z "${hashes[$sha256_output]}" ]; then
hashes[$sha256_output]=$file1
else
echo "Duplicate with files:"
file2="${hashes[$sha256_output]}"
echo $file1
echo $file2
echo '---------------------'
if [ "$file1" -nt "$file2" ]
then
printf '%s\n' "$file1 is newer than $file2"
rm $file2
hashes[$sha256_output]=$file1
else
printf '%s\n' "$file2 is newer than $file1"
rm $file1
hashes[$sha256_output]=$file2
fi
fi
done
TODO: The only current problem with the script, is that if it finds a directory in the folder, it will error. To be fixed later.
My name is Omar Qunsul. I write these articles mainly as a future reference for me. So I dedicate some time to make them look shiny, and share them with the public.
You can find me on twitter @OmarQunsul, and on Linkedin.