Hiring Questions, Problem 3

This was an interesting question, so I thought I’d share it here..

The challenge was to create a rate limiting function which would accept connection requests up to a limit of 5 per minute per ip address. The function would look something like this:

bool ratelimit( ip, timestamp )

I rejected a fixed window design as there was a significant probability that it would allow more than acceptable limit. For example, if 5 requests came in at 10:00:59 and another 5 came in at 10:01:01 all ten would get through with only 2 seconds difference.

I wanted a sliding window. The design I came up with used ring buffer based on the modulus of the timestamp. Researching this after the interview, I found out that it was pretty unique and probably would perform well and have a smaller memory footprint compared with other designs.

The basic design in an awk-ish pseudo-code:

THRESHOLD = 5
WINDOW    = 60
iplist[     WINDOW ]
timestamps[ WINDOW ]
ipcounts[ hash ] 

function ratelimit( ip, ts ) {
  uint bucket = ts % window

  if (timestamps[bucket] != ts) {
    // invalidate all IPs in this bucket as 
    // well as others which may have expired
    for (thisbucket in timestamps) {
      // check all buckets to see if timestamps are within window
      if (timestamps[thisbucket] <= (ts-WINDOW)) {
        for (thisip in iplist[thisbucket]) {
          if (--ipcounts[thisip] == 0) {
            delete ipcounts[thisip]
          }
          delete iplist[thisbucket]
          delete timestamps[thisbucket]
        }
      }
    }
    timestamps[bucket] = ts
  }
  if (ipcounts[ip] >= THRESHOLD) {
    return DROP
  }
  else {
    ++ipcounts[ip]
    iplist[bucket] = iplist[bucket] " " ip  # add to list
    return OK
  }
}

Obviously in a real system, the variables would be replaced with something like a redis store and the code would be in a language that would perform well. But for the purposes of demonstrating the concept, I’ve created a simple script that tests the logic and it’s available here:  github link

If you’re interested in other techniques for rate limiting, here are some resources I found after the interview:

System Design — Rate limiter and Data modelling

An alternative approach to rate limiting

Wikipedia article on Rate Limiting

The last link also links to token bucket and leaky bucket algorithms which are commonly used.