During the recent Holidays I visited some friends who are avid Scrabble players. After the visit, I thought it would be fun to write a simple anagram script.
Not much to this. It uses whatever dictionary of words you might have around, filters out any that have an apostrophe, generates a “signature” for each word (which is just all the characters sorted).
The interesting part of the script is using the “split()” function with an empty separator string to split a word into an array of characters. It then uses the gawk function for sorting an array to create the signature for each word, collecting words with the same signature into an associative array with the list of words.
#!/bin/bash apostrophe="'" if [ -r /etc/dictionaries-common/words ] ; then words=/etc/dictionaries-common/words elif [ -r `dirname $0`/words ] ; then words=`dirname $0`/words elif [ -n "$1" -a -r "$1" ] ; then words="$1" else echo Cannot find a dictionary exit 1 fi gawk ' /'$apostrophe'/ { next } # no words with apostrphe !NF { next } # no blank lines { $1=tolower($1) } # same case in everything $1 in words { next } # eliminate dup words { words[$1]++ } # store the key { # create the signature, from a sorted array of characters in t nf = split($1,t,"") asort(t) signature = "" for (i=1;i <= nf; i++) signature = signature t[i] if (signature in sigs) # more than one with this signature dups[signature]++ # mark it for inclusion in output sigs[signature] = sigs[signature] " " $1 # add to list } END { # just the signatures with >1 word for (signature in dups) printf "%-16s %s\n", signature, sigs[signature] } ' $words
Using this script and some simple command line tools we can find out some interesting things about the anagrams with a given dictionary. The script outputs the signature for each set of anagrams.
For example, how many sets of anagrams are there? Which set of characters have the largest number of anagrams? Which anagrams have the most number of characters?
twiggy:~$ anagrams | wc -l 4478 twiggy:~$ anagrams | awk '{print NF,$0}' | sort -nr | head | cat -n 1 9 aelst stael tesla least slate stale steal tales teals 2 8 aeprs pares parse pears rapes reaps spare spear 3 8 acerst carets caster caters crates reacts recast traces 4 7 opst post opts pots spot stop tops 5 7 aerst aster rates stare tares taser tears 6 7 aers ares arse ears eras sear sera 7 7 aels elsa lesa ales leas sale seal 8 7 aelpst palest pastel petals plates pleats staple 9 7 aelps lapse leaps pales peals pleas sepal 10 7 aekst keats skate stake steak takes teaks twiggy:~$ anagrams | awk '{print length($1),$0}' | sort -nr | head | cat -n 1 14 eeeiimnprssssv impressiveness permissiveness 2 14 accefiiinorstt certifications rectifications 3 14 abefllnoopsstu tablespoonfuls tablespoonsful 4 13 ceefiimnoprst imperfections perfectionism 5 13 accefiiinortt certification rectification 6 13 aaceiilnprstt antiparticles paternalistic 7 12 ehhiilooppss philosophies philosophise 8 12 eeiilmprssvy impressively permissively 9 12 eehloprrsstu reupholsters upholsterers 10 12 eeeinprrrstt interpreters reinterprets
How would you do it with your favourite language?
Have fun with it!