Sunday, March 13, 2011

App Engine Datastore: Fast, consistent check if entity exists by key

Many people have asked if it is possible in App Engine to check if an entity exists, so that the whole operation satisfies the following conditions:
  • It is as fast as possible
  • It is strongly consistent (i.e. returns true immediately after a put and false immediately after a delete)
In App Engine Datastore the fastest operation is "get", however it will always retrieve the whole entity. If your entity is big (e.g. contains blobs), then using "get" may be too slow for our purpose. On the other hand, queries allow you to retrieve only keys (key-only queries). However, queries are inherently slower than "get" because they have to scan the index and in the case of HRD are not strongly consistent.

Here's the solution. Split your logical entity (e.g. a blog post) into two physical entities:
  • The first entity (e.g. BlogPost) contains all the data fields and the key (generated and assigned key types will both work). We will only retrieve this entity when we actually need to work with the data.
  • The second entity (let's call it BlogPostCheck) does not contain any fields, only the key. When we want to check if a blog post exists, we will run a "get" against the BlogPostCheck. Because the latter does not contain any data "get" will be extremely light and fast.
So what does the existence of BlogPostCheck have to do with the existence of BlogPost? The trick is that we will put both entities into the same entity group by making BlogPost the parent of BlogPostCheck. When saving a blog post we will create both a BlogPost and a BlogPostCheck in a single transaction. When deleting a blog post we will delete both in a single transaction. This way a BlogPostCheck exists if and only if a BlogPost exists (in a strongly consisten manner), therefore a "get" on BlogPostCheck is equivalent to "get" on BlogPost, except it doesn't retrieve the data, so it's faster. Here's some Java pseudo-code for this example:


// Checks if a blog post exists
boolean check(Key blogPostKey) {
DatastoreService ds = ...;
Key blogPostCheckKey = KeyFactory.createKey(blogPostKey, "check");
Transaction tx = ds.beginTransaction();
try {
ds.get(blogPostCheckKey);
return true;
} catch (EntityNotFoundException notFound) {
return false;
} finally {
tx.commit();
}
}

// Saving a blog post
DatastoreService ds = ...;
Transaction tx = ds.beginTransaction();
Key blogPostKey = ds.allocateIds("BlogBost", 1).getStart();
Key blogPostCheckKey = KeyFactory.createKey(blogPostKey, "check");
BlogPost bp = new BlogPost(blogPostKey, ...);
BlogPostCheck bpc = new BlogPostCheck(blogPostCheckKey);
save(bp);
save(bpc);
tx.commit();

// Deleting a blog post
Key blogPostKey = ...;
DatastoreService ds = ...;
Transaction tx = ds.beginTransaction();
Key blogPostCheckKey = KeyFactory.createKey(blogPostKey, "check");
ds.delete(blogPostKey, blogPostCheckKey);
tx.commit();