During the recent Holidays I visited some friends who are avid Scrabble players. After the visit, I thought it would be fun to write a simple anagram script.
Not much to this. It uses whatever dictionary of words you might have around, filters out any that have an apostrophe, generates a “signature” for each word (which is just all the characters sorted).
The interesting part of the script is using the “split()” function with an empty separator string to split a word into an array of characters. It then uses the gawk function for sorting an array to create the signature for each word, collecting words with the same signature into an associative array with the list of words.
#!/bin/bash
apostrophe="'"
if [ -r /etc/dictionaries-common/words ] ; then
words=/etc/dictionaries-common/words
elif [ -r `dirname $0`/words ] ; then
words=`dirname $0`/words
elif [ -n "$1" -a -r "$1" ] ; then
words="$1"
else
echo Cannot find a dictionary
exit 1
fi
gawk '
/'$apostrophe'/ { next } # no words with apostrphe
!NF { next } # no blank lines
{ $1=tolower($1) } # same case in everything
$1 in words { next } # eliminate dup words
{ words[$1]++ } # store the key
{
# create the signature, from a sorted array of characters in t
nf = split($1,t,"")
asort(t)
signature = ""
for (i=1;i <= nf; i++)
signature = signature t[i]
if (signature in sigs) # more than one with this signature
dups[signature]++ # mark it for inclusion in output
sigs[signature] = sigs[signature] " " $1 # add to list
}
END {
# just the signatures with >1 word
for (signature in dups)
printf "%-16s %s\n", signature, sigs[signature]
}
' $words
Using this script and some simple command line tools we can find out some interesting things about the anagrams with a given dictionary. The script outputs the signature for each set of anagrams.
For example, how many sets of anagrams are there? Which set of characters have the largest number of anagrams? Which anagrams have the most number of characters?
twiggy:~$ anagrams | wc -l
4478
twiggy:~$ anagrams | awk '{print NF,$0}' | sort -nr | head | cat -n
1 9 aelst stael tesla least slate stale steal tales teals
2 8 aeprs pares parse pears rapes reaps spare spear
3 8 acerst carets caster caters crates reacts recast traces
4 7 opst post opts pots spot stop tops
5 7 aerst aster rates stare tares taser tears
6 7 aers ares arse ears eras sear sera
7 7 aels elsa lesa ales leas sale seal
8 7 aelpst palest pastel petals plates pleats staple
9 7 aelps lapse leaps pales peals pleas sepal
10 7 aekst keats skate stake steak takes teaks
twiggy:~$ anagrams | awk '{print length($1),$0}' | sort -nr | head | cat -n
1 14 eeeiimnprssssv impressiveness permissiveness
2 14 accefiiinorstt certifications rectifications
3 14 abefllnoopsstu tablespoonfuls tablespoonsful
4 13 ceefiimnoprst imperfections perfectionism
5 13 accefiiinortt certification rectification
6 13 aaceiilnprstt antiparticles paternalistic
7 12 ehhiilooppss philosophies philosophise
8 12 eeiilmprssvy impressively permissively
9 12 eehloprrsstu reupholsters upholsterers
10 12 eeeinprrrstt interpreters reinterprets
How would you do it with your favourite language?
Have fun with it!