Google Code offered in: 中文 - English - Português - Pусский - Español - 日本語
When developing an efficient application on Google App Engine you need to pay attention to how often an entity is updated. While the datastore for App Engine scales to support a huge number of entities it is important to note that you can only expect to update any single entity, or entity-group, about five times a second. That is an estimate and the actual update rate for an entity is going to be dependent several attributes of the entity, including how many properties it has, how large the entity is, and how many indexes need updating. While a single entity or entity-group has a limit on how quickly it can be updated, App Engine excels at handling many parallel requests distributed across distinct entites, and so in aggregate can handle a much higher update rate, which is something we will exploit in this article by using sharding.
The question is, what if you had an entity that you wanted to update faster than five times a second? For example you might count the number of votes in a poll, or the number of comments, etc. Take this simple example:
class Counter(db.Model): count = db.IntergerProperty()
If you had a single entity that was the counter and the update rate
was too fast then you would have contention as the serialized
writes would stack up and start to timeout. The way to solve this
problem is a little counter-intuitive if you are coming from a
relational database, and the solution relies on the fact that reads
from the App Engine datastore are extremely fast and cheap since
entities that have been recently read or updated are cached in
memory. The way to reduce the contention is to build a sharded
counter; break the counter up into N
different counters. When you
want to increment the counter you pick one of the shards at random
and increment it. When you want to know the value of the counter
you read all of the counter shards and add them all together. The
more shards you have the higher the throughput you will have for
increments on your counter. This technique works for a lot more
than just counters and an important skill to learn is spotting the
entities in your application with a lot of writes and then finding
good ways to shard them.
Here is a very simple implemenation of a sharded counter:
from google.appengine.ext import db import random class SimpleCounterShard(db.Model): """Shards for the counter""" count = db.IntegerProperty(required=True, default=0) NUM_SHARDS = 20 def get_count(): """Retrieve the value for a given sharded counter.""" total = 0 for counter in SimpleCounterShard.all(): total += counter.count return total def increment(): """Increment the value for a given sharded counter.""" def txn(): index = random.randint(0, NUM_SHARDS - 1) shard_name = "shard" + str(index) counter = SimpleCounterShard.get_by_key_name(shard_name) if counter is None: counter = SimpleCounterShard(key_name=shard_name) counter.count += 1 counter.put() db.run_in_transaction(txn)
In get_count()
we
simple loop over all the shards (CounterShard.all()
) and add up
the individual shard counts. In increment()
we need to read,
increment, and then write one of the counter shards chosen at
random and that needs to be done inside a transaction.
Note that we create the shards lazily, only creating them when they are first incremented. The lazy creation of the shards allows the number of shards to be increased (but never decreased) in the future if more are needed. The value of NUM_SHARDS could be doubled to 20 and the results from get_count() would not change since the query only selects the shards that have been added to the datastore, and increment() will lazily create shards that aren't there.
That is useful as an example to learn from, but a more general purpose counter would allow you to create named counters on the fly, increase the number of shards dynamically, and use memcache to speed up reads to shards. The exampe code that Brett Slatkin gave in his Google I/O talk does just that and I've included that code here, along with a function to increase the number of shards for a particular counter:
from google.appengine.api import memcache from google.appengine.ext import db import random class GeneralCounterShardConfig(db.Model): """Tracks the number of shards for each named counter.""" name = db.StringProperty(required=True) num_shards = db.IntegerProperty(required=True, default=20) class GeneralCounterShard(db.Model): """Shards for each named counter""" name = db.StringProperty(required=True) count = db.IntegerProperty(required=True, default=0) def get_count(name): """Retrieve the value for a given sharded counter. Parameters: name - The name of the counter """ total = memcache.get(name) if total is None: total = 0 for counter in GeneralCounterShard.all().filter('name = ', name): total += counter.count memcache.add(name, str(total), 60) return total def increment(name): """Increment the value for a given sharded counter. Parameters: name - The name of the counter """ config = GeneralCounterShardConfig.get_or_insert(name, name=name) def txn(): index = random.randint(0, config.num_shards - 1) shard_name = name + str(index) counter = GeneralCounterShard.get_by_key_name(shard_name) if counter is None: counter = GeneralCounterShard(key_name=shard_name, name=name) counter.count += 1 counter.put() db.run_in_transaction(txn) memcache.incr(name) def increase_shards(name, num): """Increase the number of shards for a given sharded counter. Will never decrease the number of shards. Parameters: name - The name of the counter num - How many shards to use """ config = GeneralCounterShardConfig.get_or_insert(name, name=name) def txn(): if config.num_shards < num: config.num_shards = num config.put() db.run_in_transaction(txn)
The source for both of these counters is available in the Google App Engine Samples as the sharded-counter example. While the web interface to the examples isn't much to look at, it's instructive to use the admin interface an inspect the data models after you have incremented both counters a few times.
Sharding is one of many important techniques in building a scalable application and hopefully these examples will give you ideas of where you apply the technique in your application. The code in these articles is available under the Apache 2 license so feel free to start with them as you build your solutions.
Watch Brett Slatkin's Google I/O talk "Building Scalable Web Applications with Google AppEngine".