Thursday, November 14, 2013

non-blocking cache with guava and listenable futures (or why I hate ehcache some days)

trying to scale? just throw more cache at the problem!

Yes, I'm going to use the term "cache" in lots of ironic and financially metaphoric ways in this post. My apologies.

Caching makes a lot of things possible in everything we do on the Internet and on computer systems in general. That said, caching can also get you into trouble for a variety of reasons such as how wisely you use memory, how performant your cache is under contention, and how effective your cache is (i.e. cache-hit ratio).

If you're using Java, chances are you've heard of ehcache at some point. While there's a lot that ehcache does well, there's a particular aspect of it that in my experience doesn't scale well, and under certain conditions can take down your application. In fact, part of the reason I'm writing this blog post is the aftermath of raising my arms in the air and screaming after examining a performance issue related to ehcache which caused a failed load test today.

mo' (eh)cache, mo' problems

When reading data from ehcache, you end up blocking until a result can be returned (more on that here). While this is necessary for the first time you fetch data from the cache, since something has to be loaded in order to be returned, it's probably not the case that you need to block for subsequent requests. To clarify this, if you're caching something for 1 hour and it takes 5 seconds to load, you probably don't care about loading data that's 1 hour and 5 seconds old, especially if the alternative is blocking the request to your application for 5 seconds to do it as well as every other request trying to load that data.

Unfortunately, and if I'm wrong here I hope someone will call me out in the comments, ehcache blocks every time the data needs to be reloaded. Furthermore, it uses a ReadWriteLock for all reads as well along with a fixed number of mutexes (2048 by default), so you can end up with read contention as well given enough load. While I understand the decisions that were made and why, there are cases where it isn't ideal and you don't want to grab any locks to create blocking conditions.

making your cache work for you

To be fair, this problem really manifests itself when you have high contention on specific keys, particularly when reloading events occur. In most cases ehcache performs perfectly fine; this post isn't meant to be a general condemnation of a very popular and useful library. That said, in order to solve the problem we don't really want to block on reads or writes; we want to refresh our data in the background, and only update what consumers of the cache see when we have that refreshed data.

This can be accomplished by having a thread go and reload the data while the cache returns stale entries until new data is available. This accomplishes both the goals of not requiring read or write locks outside of the initial population of the cache, which is unavoidable. Even better, Guava Cache has all of this functionality baked in.

refresh after write and listenable futures in guava cache

Guava Cache can handle this by telling the CacheBuilder via refreshAfterWrite to refresh entries by calling the reload method in the CacheLoader instance used to construct your LoadingCache instance. The reload method returns a ListenableFuture, which is the same as a regular Future but exposes a method to register a callback. In this case, the callback is used to update the value in the cache once we've finished retrieving it.

Here's an example of this in action:

The sleeps are in there to create artificial latency to show what this looks like in action. If you run this you'll see the asynchronous load events kick off and can witness the five seconds of latency in between that event firing and the data being updated. You should also notice that reads keep succeeding in the meantime. There is a small spike the first time an asynchronous load fires, which I assume is a one-time cost resource allocation within Guava Cache.

There is one point to consider when doing this, which is how to shut down your refresh thread. In my example I used a ThreadFactory (courtesy of ThreadFactoryBuilder) to set my refresh thread as a daemon thread, which allows the JVM to shut down at the end. I also used the ThreadFactory to name the thread, which I would recommend as a general practice to make debugging easier on yourself whenever you're creating thread pools. In my example there aren't any resource concerns, so it doesn't matter if the thread is terminated, but if you had resource cleanup to perform for some reason you'd want to wire up a shutdown hook to your ExecutorService in your application since the pool would exist eternally.

For a use case like this, you'd want to be judicious about how many threads you're willing to allocate to this process as well. The number should scale somewhat to the maximum number of entries and refresh interval you choose so that you can refresh in a timely manner without consuming too many resources in your application.

conclusion

If you've come across this problem, then I hope this post helps you get past it. To reiterate what I said before, ehcache is a solid API overall, it just doesn't handle this case well. I haven't tested the Guava Cache implementation under high load conditions yet, so it's certainly possible that it has issues I've left out of the post, but from a face value standpoint it addresses the issues I've seen with ehcache in a way that doesn't involve rolling your own solution from scratch.

Feel free to share any feedback or things I may have missed in the comments!