Thursday, August 25, 2011

Compare-And-Set in Memcache

With the most recent release (1.5.3, last week) App Engine's Python API for Memcache has added a new feature, Compare-And-Set. This feature (with a different API) was already available in Java; it has also been available in the non-App-Engine pure-Python memcache client. In fact, I designed the App Engine Python API for this feature to be compatible with the latter, since most of the rest of the App Engine Python API also strives to be at least a superset of that package.

But what is it? There seems to be little information on how to use Compare-And-Set with memcache. It is also sometimes (incorrectly) referred to as Compare-And-Swap -- incorrect, because the cas() operation does not actually "swap" anything. The first response when we closed the bug requesting this feature was "Some examples of usage are appreciated." So here goes.

The basic use case for Compare-And-Set is when multiple requests that are being handled concurrently need to update the same memcache key in an atomic fashion. Let's assume you are managing a counter in memcache. (Actually, you could use the incr() and decr() operations to update 64-bit integer counters atomically, but just for argument's sake assume you cannot use those -- there are other data types for which the memcache service does not have built-in support.)

The naive code to update a counter would be something like this:

def init_counter(key):
. memcache.set(key, 0)

def bump_counter(key):
. counter = memcache.get(key)
. assert counter is not None, 'Uninitialized counter'
. memcache.set(key, counter+1)

(Aside: I don't want to have to think about how to get blogger to properly format Python code. I really don't. So just bear with the dots I use for indentation. Okay? Comments pointing me to solutions will be DELETED.)

(Aside 2: The assert is kind of naive; in practice you'll have to somehow deal with counter initialization. You also should implement a backup for your counter using the App Engine datastore, so that it can survive eviction by the memcache service. However interesting these details are on their own, I leave them for another time.)

Hopefully you can spot the problem in this version of bump_counter(): if two requests execute concurrently (on different instances of the same app), the sequence of operations might be as follows, labeling the two requests as A and B:

A: counter = memcache.get(key) # Reads 42
B: counter = memcache.get(key) # Reads 42
A: memcache.set(key, counter+1) # Writes 43
B: memcache.set(key, counter+1) # Writes 43

So even though two requests were executed, the counter only gets incremented by one. This is called a race condition. Various interlacings of these lines can have the same effect; a race condition occurs whenever B reads the counter before A has written it (or vice versa).

You could try to guard against this by reading the counter value back and checking that it was incremented by one; however this solution still has a race condition (see if you can figure it out for yourself). There are other solutions possible involving a separate "lock" variable, managed using the add() and delete() operations. However these in general require more server roundtrips, and it is pretty hard to manufacture a decent lock out of the basic memcache operations (try for yourself -- think about what would happen if your request was somehow aborted after acquiring the lock, without having a chance of releasing it).

Using the Compare-And-Set operation, writing a reliable bump_counter() function is a cinch:

def bump_counter(key):
. client = memcache.Client()
. while True: # Retry loop
. . counter = client.gets(key)
. . assert counter is not None, 'Uninitialized counter'
. . if client.cas(key, counter+1):
. . . break

I've highlighted the changes from the previous version. There are several essential differences:
The Client object is required because the gets() operation actually squirrels away some hidden information that is used by the subsequent cas() operation. Because the memcache functions are stateless (meaning they don't alter any global values), these operations are only available as methods on the Client object, not as functions in the memcache module. (Apart from these two, the methods on the Client object are exactly the same as the functions in the module, as you can tell by comparing the documentation.)

The retry loop is necessary because this code doesn't actually avoid race conditions -- it just detects them! The memcache service guarantees that when used in the pattern shown here (i.e. using gets() instead of get() and cas() instead of set()), if two (or more) different client instances happen to be involved a race condition like I showed earlier, only the first one to execute the cas() operation will succeed (return True), while the second one (and later ones) will fail (return False). Let's spell out the events that happen when a race condition occurs:

A: counter = memcache.gets(key) # Reads 42
B: counter = memcache.gets(key) # Reads 42
A: memcache.cas(key, counter+1) # Writes 43, returns True
B: memcache.cas(key, counter+1) # Returns False
B: counter = memcache.gets(key) # Reads 43
B: memcache.cas(key, counter+1) # Writes 44, returns True

Another refinement I've left out here for brevity is to set a limit on the number of retries, to avoid an infinite loop in worst-case scenarios where there is a lot of contention for the same counter (meaning more requests are trying to update the counter than the memcache service can process in real time). You can figure out how to code this for yourself. (UPDATE: see comments #1 and #2 for a note about busy-waiting.)

Now let me explain roughly how this actually works. For some people that helps understanding how to use it: this is often the case for me when I am trying to understand some new concept.

The gets() operation internally receives two values from the memcache service: the value stored for the key (in our example the counter value), and a timestamp (also known as the cas_id). The timestamp is an opaque number; only the memcache service knows what it means. The important thing is that each time the value associated with a memcache key is updated, the associated timestamp is changed. The gets() operation stores this timestamp in a Python dict on the Client object, using the key passed to gets() as the dict key.

The cas() operation internally adds the timestamp to the request it sends to the memcache service. The service then compares the timestamp received with a cas() operation to the timestamp currently associated with the key. If they match, it updates the value and the timestamp, and returns success. If they don't match, it leaves the value and timestamp alone, and returns failure. (By the way, it does not send the new timestamp back with a successful response. The only way to retrieve the timestamp is to call gets().)

Of course, there's one more important ingredient: the App Engine memcache service itself behaves atomically. That is, when two concurrent requests (for the same app id) use memcache, they will go to the same memcache service instance (for historic reasons called a shard), and the memcache service has enough internal locking so that concurrent requests for the same key are properly serialized. In particular this means that two cas() requests for the same key do not actually run in parallel -- the service handles the first request that came in until completion (i.e., updating the value and timestamp) before it starts handling the second request.

And that's Compare-And-Set in a nutshell. If you have questions please don't hesitate to ask!

(UPDATE: The memcache API defines batch versions of most of its APIs. For example, to get multiple keys in a single call, there is get_multi(); to set multiple keys, there is set_multi(). Corresponding to cas(), there is cas_multi(). But there is no gets_multi(): instead, you can use get_multi(keys, for_cas=True). Finally, there's cas_reset(), which clears the dict used to store timestamps. But I haven't figured out what to do with it yet. :-)

15 comments:

Filip GruszczyƄski said...

Isn't code snippet for increasing counter using active waiting? Isn't it bad?

Guido van Rossum said...

@Filip: You're right it is active waiting, although it is moderated by the time required for memcache server roundtrips (typically 1-5 msec). That's why there should be a retry limit (e.g. max 3 tries).

You could also use exponential back-off, but very quickly you'd hit the point where it is probably more important to get on with handling the user's request than trying to bump the counter; while you're waiting to bump the counter you're not doing anything else.

In the end this is still just a simplified example to illustrate what cas() does rather than a recommendation on how to implement counters; cas() really is a low-level tool for building more reliable caches.

Dustin said...

Thanks for the update. I've done some fun things with memcached CAS, but it seems to be difficult to explain to people how to use it correctly. :)

In most languages where I've implemented memcached CAS, I've found it useful to provide a functional abstraction over it similar to this (which I'm kind of sketching in this comment thing, so it quite possibly has a bug):

def do_cas(k, f, initial=None):
. client = memcache.Client()
. while True:
. . ob = client.gets(k)
. . if ob:
. . if client.cas(k, f(ob)):
. . . break
. . elif initial:
. . . if client.add(k, initial):
. . . . break
. . else:
. . . raise Something("Uninitialized...")

Filip GruszczyƄski said...

@Guido: Yeah, I bet in real life it's not that bad. I just wanted to know from theoretical point of view :-) Anyway, this reminds me of an university course about synchronized programming and using cas operations to synchronize some tasks. Ah, I'm getting sentimental :-)

mmocny said...

Does the library handle the ABA problem, or is that left to the user?

Guido van Rossum said...

Yes, it solves the ABA problem (that's why I explained that it works by using timestamps).

(For readers like me who had never heard of that problem before: http://en.wikipedia.org/wiki/ABA_problem )

Chris said...

This is (strong) load-linked/store-conditional. CAS is weaker. Too late to rename your API?

Araminos said...

Waiting for java verison!

Guido van Rossum said...

Yes, years too late.

It does exist in Java, I even gave a link to the docs: http://code.google.com/appengine/docs/java/javadoc/com/google/appengine/api/memcache/MemcacheService.html#putIfUntouched%28java.lang.Object,%20com.google.appengine.api.memcache.MemcacheService.IdentifiableValue,%20java.lang.Object%29 . You also need to use http://code.google.com/appengine/docs/java/javadoc/com/google/appengine/api/memcache/MemcacheService.html#getIdentifiable%28java.lang.Object%29 .

Greg Fawcett said...

I was obsessing over this a few weeks back when I was wondering about implementing transactional writes to memcache. Another simpler solution presented itself.

Also a small typo - there's a "gest()" that should be a "gets()".

amazing said...

Hi Guido,

Remembers me (kind of sentimental) of the MC68000 Test-And-Set instruction back in the 80-ies:
MC 68000 Advanced instructions which should be used for the same purpose.

cheers,
Fred

Michael said...

Isn't a major problem with using the memcache for this that the memcache can be cleared at any point in time? If we test and then set a value, what's to stop the memcache from being cleared?

Michael said...

Another problem I see with this. How do we set counter to 0 the first time? cas only works if we've set the memcache value once before. What happens if we want to increment the counter, but we're not sure if there is anything in the memcache. We want to do something like this:

A: if counter is in the memcache, then reset=True
A: if reset, set counter to 0
A: cas(counter+1)

But if we have two requests running, something like this could occur


A: if counter is in the memcache, then reset=True
B: if counter is in the memcache, then reset=True
A: if reset, set counter to 0
A: cas(counter+1)
B: if reset, set counter to 0
B: cas(counter+1)

Now counter will be 1 instead of 2

Michael said...

There was a typo in my previous comment. I meant "if counter is *not* in the memcache, then reset=True"

Steven Prosser said...

Late to the comment party, but wanted to say thanks for a very excellent explanation of a very important topic. Not sure if this has made it into the GAE docs, but it really should. Could this somehow be leveraged in a way that would support the holy grail of a monotonically increasing app counter? Everything seems to be here except for the issue of memcache ephemerality. Again, thanks. -stevep