Tuesday, September 18, 2012

Implementing a ReDiS COUNT command in Lua

I have recently been implementing a master/slave redis cache for our enterprise application and at certain points during the development/test phases, have found it useful to be able to query redis for the # of keys that have been inserted that match a certain regex.
redis has the 'keys' command that returns all keys that match a certain pattern, so for example, if I wanted to see all the keys that match the pattern '*:D' I would simply issue the command :

keys *:D

and redis would return a list of all keys that meet that pattern. This is all very well when the number of keys is small, but when you are handling millions of keys, it quickly becomes unmanageable. I've seen a lot of requests on the redis message boards requesting that the development team implement the COUNT command and at first thought it does seem intuitive, however its implementation would violate the O(1) performance requirements of basic commands that the developers have (rightly, in my NSHO) imposed upon themselves.

Thankfully, the redis lua integration comes quickly to your rescue. I'm not suggesting that the lua implementation of this command achieves O(1), far from it, in fact, but I'm pointing out that the functionality of a COUNT command is already easily achievable with the following simple command issued in the client :

eval "return #redis.pcall('keys', '*')" 0

Very smartly, the developers have implemented lua bindings so that if you need a command that would violate their very stringent requirements for performance, you can create as much hanging rope as you need.

Smart.

2 comments:

Itamar Haber said...

As of Redis 2.8 you should always use SCAN instead of KEYS where applicable. While the usual argument against using KEYS (that it is blocking) doesn't apply in Lua's case (because it is also blocking by nature), SCAN is still a better choice because - unlike KEYS - it will not bloat the script's runtime heap.

Explanation: KEYS is an atomic operation and the reply to it consumes RAM (the reply buffer) - the bigger your keyspace, the more RAM Redis will need just for storing the reply, which in turn will be copied back to Lua. SCAN, on the other hand, iterates over the keyspace and returns only a handful (100 by default) keys with each call, hence lessening the requirement for RAM. On the other hand, since there are multiple calls to SCAN, using it may prove less performant due to the additional context switches.

So, without further ado, here's an updated version of the above that uses SCAN instead of KEYS:

local cursor = "0"
local count = 0

repeat
local r = redis.call("SCAN", cursor, "MATCH", "*")
cursor = r[1]
count = count + #r[2]
until cursor == "0"

return count

Itamar Haber said...

(no editing of comments allowed so here's an update)

@antirez reminded me that SCAN can return duplicates, effectively bombing my previous suggestion :) Perhaps it can be used as an inaccurate COUNT variation... or maybe one should actually use HyperLogLog to store the keys returned by SCAN to count uniqueness... but because SCAN is non-deterministic you can't do that...

...or as long as you're matching '*' just use DBSIZE :P