Personal Project

Showing posts with label Google App Engine. Show all posts
Showing posts with label Google App Engine. Show all posts

Thursday, May 25, 2017

Solutions to Solve Writing Contention Problems in Google Datastore

Google Cloud Platform provides lots of technologies to save developer`s time while thinking about how to scale or maintain our services to serve millions of thousands of users. If you are familiar with MYSQL`s architecture, you will face different kinds of problems to serve a high volume of users. The reason is MYSQL implements the blocking mode when writing data to databases. Google Datastore solved this issue and can write a lot of data in parallel. Especially, Google Datastore is designed for high availability and scalability. 

However, you must test and know deeply about how to write simultaneous data in datastore. Otherwise, your services could only serve current 5 requests per second. It seems like a buggy solution. What is worst, Google's documentations do not give you an easy-to-understand example to implement Sharding Counter. You may wonder how to apply this technique to your entity model and how to avoid this common mistake.


Limitations of Google Datastore
  • Any entity group can only be written at rate 1 request per second.
  • If using @ndb.transactional or @ndb.transactional(xg=true) to write the data, your API can only serve current 5 requests per second. Otherwise, you will get an error of writing contention in datastore.

Why is writing data in Datastore so slow?

Because Datastore needs to copy your data globally and make your data with high availability.



Solutions to Solve Writing Contention Problems

  • Sharding Counter 
  • Use Memcache to batch writing requests and do all the operations in memory and return back to your clients
  • Defer a task queue to write data in datastore


In fact, Sharding Counter is just an example provided by Google. The key point is we can do sharding on our entity group with a unique id as shown below. 

The following codes show how to simultaneously write a thousand of Friendship entities in parallel. If we need to improve its performance, just increase the number of NUM_SHARDS.


NUM_SHARDS = 1000
shard_string_index = str(random.randint(0, NUM_SHARDS - 1))
FriendShip(id=shard_string_index,
           user_key='user Id', 
           friend_key='frind Id')


If you have lots of data models needed to update in one request, please use the task queue to update and return back to your clients only a few amount of information.

If you need to write transactional data using @ndb.transactional or @ndb.transactional(xg=true), defer a task queue to get it done and return a few amount of information to the clients.

Depending on your data model's design, you can use Sharding Counter or Memcache or Task Queue or a hybrid approach to performing the best performance using Google's Datastore and Google App Engine.