A colleague of mine once posted a hiring question to ask prospective developers: “What is the least significant 10 digits of the series: .. ?”
In previous posts, I’ve emphasized that averages are particularly bad at characterizing most things that you might be looking for. However, storing aggregated data of any type can limit your ability to analyze data later.
Many people use awk for one-liners; picking out fields from logs, doing pattern matching. It’s capable of so much more however. IMO, the “littleness” of the language is one of it’s strengths.
This article has a link to a simple script I’ve used for over a decade to detect corrupted files. It will detect and report on files that have changed, been added, deleted or possibly moved within the same directory structure.
Your systems have drives set up in RAID configurations and besides, you have data copied to redundant systems and backups, right? Safe? Maybe not. I recently found corruption in a quarter of a million files that had not previously been detected, for years!
Do you have insatiable curiosity and are driven by a relentless pursuit of the truth? You might make a great problem solver, but be careful how you deal with your findings!
If you’ve been around systems long enough, you know that opportunity for performance gains goes up dramatically, the further up the stack you look..