App Engine datastore tip: monotonically increasing values are bad

题目的关键词是monotonically increasing values are bad,这个在我知道到nosql数据库中的hbase/mongodb都会存在这个问题,所以如果处理单调递增型的row-key很关键,另外作者Ikai Lan画的图很有意思,超赞啊!

 

原文地址:http://ikaisays.com/2011/01/25/app-engine-datastore-tip-monotonically-increasing-values-are-bad/

 

 

When saving entities to App Engine’s datastore at a high write rate, avoid monotonically increasing values such as timestamps. Generally speaking, you don’t have to worry about this sort of thing until your application hits 100s of queries per second. Once you’re in that ballpark, you may want to examine potential hotspots in your application that can increase datastore latency.

To explain why this is, let’s examine what happens to the underlying Bigtable of an application with a high write rate. When a Bigtable tablet, a contiguous unit of storage, experiences a high write rate, the tablet will have to “split” into more than one tablet. This “split” allows new writes to shard. Here’s a visual approximation of what happens:

 

App Engine datastore tip: monotonically increasing values are bad_第1张图片

 

There’s a moment of pain – this is one of the causes of datastore timeouts in high write applications, as discussed in Nick Johnson‘s article, “Handling Datastore Errors“.

Remember that for indexed values, we must write corresponding index rows. When values are randomly or even semi-randomly distributed, like, say, user email addresses, tablet splits function well. This is because the work to write multiple values is distributed amongst several Bigtable tablets:

 

App Engine datastore tip: monotonically increasing values are bad_第2张图片

The problems appear when we start saving monotonically increasing values like timestamps, or insert dictionary words in alphabetical order:

 

App Engine datastore tip: monotonically increasing values are bad_第3张图片

 

The new writes aren’t evenly distributed, and whichever tablet they end up going to end up becoming a new hot tablet in need of a split.

As a developer, what can you do to avoid this situation?

  • Avoid indexes unless you need to query against the values. No index = no hot tablet on increasing value
  • Lower your write rate, or figure out how to better distribute values. A pure random distribution is best, but even a distribution that isn’t random will be better than a predictable, monotonically increasing value
  • Prefix a shard identifier to your value. This is problematic if you plan on doing queries, as you will need to prefix and unprefix the values, then join the results in memory – but it will reduce the error rate of your writes

The tips are applicable whether you are on Master-Slave or High Replication datastore. And one more tip: don’t prematurely optimize for this case, since chances are, you won’t run into it. You can be spending that time working on features.

- Ikai

P.S. Yes, I drew those doodles. No, I do not have any formal art training (how could you tell?!)

 

 

 

你可能感兴趣的:(application,Random,insert,Dictionary,Training,nosql数据库)