Bitrot, Part 2

This article has a link to a simple script I’ve used for over a decade to detect corrupted files. It will detect and report on files that have changed, been added, deleted or possibly moved within the same directory structure.

Obviously, this is mostly for “read-only” or “read-mostly” files that won’t normally change. Detecting corruption in files being actively updated (e.g. database files) is an entirely different matter (can of worms) and beyond the scope of this article.

This  script does help identify a large class of situations of corruption for (relatively) static files however and hopefully enables you to recover if you have redundant copies. It can help in cases where read-only files don’t have a built in mechanism for detecting corruption, which is the usual case.

IDS (Intrusion Detection Systems) that detect file changes like AIDE or Tripwire can also be used (and detect other changes in metadata), but I find them less convenient for this purpose.

If you’re using Windows, you’ll need something like Cygwin to be able to run the script. (Sorry, I haven’t tested that configuration as I don’t do Windows..)

If you have MacOS, you’ll need to build the coreutils package that includes md5sum, or change the script to use shasum. (I should make that an option anyway, however after checking millions of files, I’ve never run into a collision with md5.) Also note that I don’t know of an equivalent command to “chattr” for MacOS, so if you’re using rsync to sync directories, make sure that you use the –exclude=’.md5*’ option! (This will also reduce noise for Linux systems.)

On the upside, if you regularly maintain redundant copies of files with rsync or something similar, this utility might save you from silent corruption some day! I’ve used this method in maintaining terabytes of documentation, photo, video and other files and it’s saved me several times.

The github repo can be found at:  checkfiles github repo

Feel free to send me updates. I’ve wrapped it into scripts that execute via crontab and notify me (via email) of files changes, but perhaps that would be better built-in? Use shasum? Any other suggestions?

 

e.g. Create half a dozen files; remove, rename, change and add a file. All are detected.


macbook:~/tmp$ sudo bash
macbook:~/tmp# for m in {1..6} ; do cal $m 2018 > 2018.$m ; done
macbook:~/tmp# checkfiles .
feb3ff7825f538a7e38ecdf5567a03e1  ./2018.1
351bdc5da7a1c7f0bbac3c33b2447dbe  ./2018.2
b0a82cedf56f728228b3c3249b69bc80  ./2018.3
050b074dfc9d9e0949527918cc9969fd  ./2018.4
d2cc04f1a316069b0dc519be4b8a6260  ./2018.5
add3b65e2658b7c61e5af4e819b47fae  ./2018.6
macbook:~/tmp# ls -l 2018.?
-rw-r--r-- 1 root staff 135 Jun 20 00:27 2018.1
-rw-r--r-- 1 root staff 135 Jun 20 00:27 2018.2
-rw-r--r-- 1 root staff 143 Jun 20 00:27 2018.3
-rw-r--r-- 1 root staff 128 Jun 20 00:27 2018.4
-rw-r--r-- 1 root staff 136 Jun 20 00:27 2018.5
-rw-r--r-- 1 root staff 142 Jun 20 00:27 2018.6
macbook:~/tmp# rm 2018.1 
macbook:~/tmp# mv 2018.2 shortmonth
macbook:~/tmp# echo >> 2018.3
macbook:~/tmp# cal 7 2018 > 2018.7
macbook:~/tmp# 

macbook:~/tmp# checkfiles .
b0a82cedf56f728228b3c3249b69bc80  ./2018.3
050b074dfc9d9e0949527918cc9969fd  ./2018.4
d2cc04f1a316069b0dc519be4b8a6260  ./2018.5
add3b65e2658b7c61e5af4e819b47fae  ./2018.6
a5e692a057b5e4adc5a775d245ab51e8  ./2018.7
351bdc5da7a1c7f0bbac3c33b2447dbe  ./shortmonth

macbook:~/tmp# cat .md5.log 
# 2018.06.20-00:29:21 Start checkfiles 1.29
Changed:  b0a82cedf56f728228b3c3249b69bc80 b6a838650afec911f6eb837d72d169d7 ./2018.3
NewFile:  a5e692a057b5e4adc5a775d245ab51e8 ./2018.7
NewFile:  351bdc5da7a1c7f0bbac3c33b2447dbe ./shortmonth
Missing:  feb3ff7825f538a7e38ecdf5567a03e1 ./2018.1   MOVED
Missing:  351bdc5da7a1c7f0bbac3c33b2447dbe ./2018.2   MOVED:./shortmonth
# 2018.06.20-00:29:21 Missing=2 Changed=1 NewFile=2