Anagrams for Fun

During the recent Holidays I visited some friends who are avid Scrabble players. After the visit, I thought it would be fun to write a simple anagram script.

Not much to this. It uses whatever dictionary of words you might have around, filters out any that have an apostrophe, generates a “signature” for each word (which is just all the characters sorted).

The interesting part of the script is using the “split()” function with an empty separator string to split a word into an array of characters. It then uses the gawk function for sorting an array to create the signature for each word, collecting words with the same signature into an associative array with the list of words.

 

#!/bin/bash

apostrophe="'"


if   [ -r /etc/dictionaries-common/words ] ; then
  words=/etc/dictionaries-common/words
elif [ -r `dirname $0`/words ] ; then
  words=`dirname $0`/words
elif [ -n "$1" -a -r "$1" ] ; then
  words="$1"
else
  echo Cannot find a dictionary
  exit 1
fi


gawk '
  /'$apostrophe'/  { next }            # no words with apostrphe
  !NF              { next }            # no blank lines
                   { $1=tolower($1) }  # same case in everything
  $1 in words      { next }            # eliminate dup words
                   { words[$1]++ }     # store the key

  {
    # create the signature, from a sorted array of characters in t
    nf = split($1,t,"")
    asort(t)
    signature = ""
    for (i=1;i <= nf; i++)
      signature = signature t[i]

    if (signature in sigs) # more than one with this signature
      dups[signature]++    # mark it for inclusion in output

    sigs[signature] = sigs[signature] " " $1 # add to list
  }
  END {
    # just the signatures with >1 word
    for (signature in dups)
      printf "%-16s %s\n", signature, sigs[signature]
  }
' $words

 

Using this script and some simple command line tools we can find out some interesting things about the anagrams with a given dictionary. The script outputs the signature for each set of anagrams.

For example, how many sets of anagrams are there?  Which set of characters have the largest number of anagrams?  Which anagrams have the most number of characters?

 

twiggy:~$ anagrams | wc -l
4478

twiggy:~$ anagrams | awk '{print NF,$0}' | sort -nr | head | cat -n
     1	9 aelst             stael tesla least slate stale steal tales teals
     2	8 aeprs             pares parse pears rapes reaps spare spear
     3	8 acerst            carets caster caters crates reacts recast traces
     4	7 opst              post opts pots spot stop tops
     5	7 aerst             aster rates stare tares taser tears
     6	7 aers              ares arse ears eras sear sera
     7	7 aels              elsa lesa ales leas sale seal
     8	7 aelpst            palest pastel petals plates pleats staple
     9	7 aelps             lapse leaps pales peals pleas sepal
    10	7 aekst             keats skate stake steak takes teaks

twiggy:~$ anagrams | awk '{print length($1),$0}' | sort -nr | head | cat -n
     1	14 eeeiimnprssssv    impressiveness permissiveness
     2	14 accefiiinorstt    certifications rectifications
     3	14 abefllnoopsstu    tablespoonfuls tablespoonsful
     4	13 ceefiimnoprst     imperfections perfectionism
     5	13 accefiiinortt     certification rectification
     6	13 aaceiilnprstt     antiparticles paternalistic
     7	12 ehhiilooppss      philosophies philosophise
     8	12 eeiilmprssvy      impressively permissively
     9	12 eehloprrsstu      reupholsters upholsterers
    10	12 eeeinprrrstt      interpreters reinterprets

How would you do it with your favourite language?

Have fun with it!