First, let’s start with some numbers and limitations on our problem, so we can solve the problem without losing sight of what we’re trying to do. Let’s say that 1,000 articles are submitted each day. Of those 1,000 articles, about 50 of them are interesting enough that we want them to be in the top-100 articles for at least one day. All of those 50 articles will receive at least 200 up votes. We won’t worry about down votes for this version.
首先,让我们着眼于问题的数字与局限,以使我们可以在不忽略我们要做的事情的同时,解决我们的问题。假设每天有1000篇文章被提交,在这1000篇文章里,有50篇足够有趣,而我们会把它们放到推荐里,然后这些文章将收到最少200张赞成票。在这里我们假设没有收到反对票。
When dealing with scores that go down over time, we need to make the posting time, the current time, or both relevant to the overall score. To keep things simple, we’ll say that the score of an item is a function of the time that the article was posted, plus a constant multiplier times the number of votes that the article has received.
The time we’ll use the number of seconds since January 1, 1970, in the UTC time zone, which is commonly referred to as Unix time. We’ll use Unix time because it can be fetched easily in most programming languages and on every platform that we may use Redis on. For our constant, we’ll take the number of seconds in a day (86,400) divided by the number of votes required (200) to last a full day, which gives us 432 “points” added to the score per vote.
To actually build this, we need to start thinking of structures to use in Redis. For starters, we need to store article information like the title, the link to the article, who posted it, the time it was posted, and the number of votes received. We can use a Redis HASH to store this information, and an example article can be seen in figure 1.8.
为了处理时间与得分之间的关系,我们需要生成提交时间、当前时间以及两者与总分间的关系。为了简单说明,我们设定得分的计算函数为提交时间加上这片文章获得的投票数乘以一个参数的值。
我们将使用Unix时间,因为它可以在大多数编程语言以及我们可能在其上使用Redis的每个平台上轻松获取。对于常数,我们用一天86400秒除以一天应得的200张票可以得出,一张票等于432点的分数。
为了实现这个功能,我们需要考虑在redis中使用的数据结构。首先,我们需要存储文章的信息如标题、提交人、提交时间、收到的投票数。我们可以使用redis的hash结构来存储这个信息,如图1.8。
USING THE COLON CHARACTER AS A SEPARATOR Throughout this and other chapters, you’ll find that we use the colon character (: ) as a separator between parts of names; for example, in figure 1.8, we used it to separate article from the ID of the article, creating a sort of namespace. The choice of : is subjective, but common among Redis users. Other common choices include a period (.), forward slash (/), and even occasionally the pipe character (|). Regardless of what you choose, be consistent, and note how we use colons to define nested namespaces throughout the examples in the book.
使用冒号作为分隔符 你会发现我们使用冒号(:)连结名字的分隔符。例如在图1.8中,我们通过用它来分隔article和ID,创建了一个命名空间。在redis使用者里把:作为分隔符的情况很常见。其他常见的分隔符还有(.)、(/)和(|)。无论你选择哪个,都要保持一致,并注意我们在示例中是如何使用冒号来定义嵌套命名空间的。
To store a sorted set of articles themselves, we’ll use a ZSET, which keeps items ordered by the item scores. We can use our article ID as the member, with the ZSET score being the article score itself. While we’re at it, we’ll create another ZSET with the score being just the times that the articles were posted, which gives us an option of browsing articles based on article score or time. We can see a small example of time- and scoreordered article ZSETs in figure 1.9.
我们将使用ZSET,以得分排序,存储文章集。我们可以使用我们的文章ID作为键名,而ZSET分数就是文章本身的分数。在进行此操作时,我们将创建另一个ZSET,其分数仅为文章发布的时间,这使我们可以根据文章得分或时间排序浏览文章。图1.9就是一个简单的例子。
In order to prevent users from voting for the same article more than once, we need to store a unique listing of users who have voted for each article. For this, we’ll use a SET for each article, and store the member IDs of all users who have voted on the given article. An example SET of users who have voted on an article is shown in figure 1.10.
为了防止同一用户多次为同一篇文章投票,我们需要为每一篇文章存储一个唯一的用户列表。为此,我们将为每一篇文章设置一个SET,用于存储投票给这篇文章的用户ID。参考图1.10。
For the sake of memory use over time, we’ll say that after a week users can no longer vote on an article and its score is fixed. After that week has passed, we’ll delete the SET of users who have voted on the article.
Before we build this, let’s take a look at what would happen if user 115423 were to vote for article 100408 in figure 1.11.
为了内存的长时间使用,我们设定一周后,用户将不能再对这篇文章进行投票,同时我们将把这篇文章的分数固定下来。该星期过去后,我们将删除用于存储这篇文章投票用户id的SET。
在此之前,让我们看看当user 115423为article 100408投票后会发生什么,如图1.11?
Now that you know what we’re going to build, let’s build it! First, let’s handle voting. When someone tries to vote on an article, we first verify that the article was posted within the last week by checking the article’s post time with ZSCORE. If we still have time, we then try to add the user to the article’s voted SET with SADD. Finally, if the user didn’t previously vote on that article, we increment the score of the article by 432 (which we calculated earlier) with ZINCRBY (a command that increments the score of a member), and update the vote count in the HASH with HINCRBY (a command that increments a value in a hash). The voting code is shown in listing 1.6.
现在让我们开始吧!首先,从投票开始。当有人给一篇文章投票,我们首先通过使用ZSCORE检查文章的发布时间,来验证文章是否在近一周内发布。如果是,我们通过SADD把该用户加到该文章的投票SET里。如果这个用户在之前没有给这篇文章投过票,我们将通过ZINCRBY为这篇文章加上432点的分数,并且通过HINCRBY为这篇文章的hash里的投票数加1。投票代码查看listing 1.6。
# Listing 1.6 The article_vote()function
# Prepare our constants.
ONE_WEEK_IN_SECONDS = 7 * 86400
VOTE_SCORE = 432
def article_vote(conn, user, article):
cutoff = time.time() - ONE_WEEK_IN_SECONDS # Calculate the cutoff time for voting.
if conn.zscore('time:', article) < cutoff: # Check to see if the article can still be voted on (we could use the article HASH here, but scores are returned as floats so we don`t have to cast it).
return
article_id = article.partition(':')[-1] # Get the id portion from the article:id identifier.
# If the user hasn`t voted for this article before, increment the article score and vote count.
if conn.sadd('voted:' + article_id, user):
conn.zincrby('score:', article, VOTE_SCORE)
conn.hincrby(article, 'votes', 1)
REDIS TRANSACTIONS In order to be correct, technically our SADD, ZINCRBY, and HINCRBY calls should be in a transaction. But since we don’t cover transactions until chapter 4, we won’t worry about them for now.
redis事务 为了保持一致性,从技术上来讲,上述的sadd、zincrby以及hincrby都应该在一个事务里。但是由于我们在第4章才讲到事务,所以我们现在先不关注它。
To post an article, we first create an article ID by incrementing a counter with INCR. We then create the voted SET by adding the poster’s ID to the SET with SADD. To ensure that the SET is removed after one week, we’ll give it an expiration time with the EXPIRE command, which lets Redis automatically delete it. We then store the article information with HMSET. Finally, we add the initial score and posting time to the relevant ZSETs with ZADD. We can see the code for posting an article in listing 1.7。
要发布一篇文章,我们首先通过INCR计数器来创建一个文章ID。然后我们通过sadd添加作者id的方式创建一个投票集。为了确保这个投票集在一周后被移除,我们将通过EXPIRE命令给它设置一个过期时间,以使redis自动删除它。再然后,我们用HMSET来存储文章的信息。最后,我们使用ZADD将初始得分和发布时间添加到相关的ZSET。例子查看listing 1.7。
# Listing 1.7 The post_article() function
def post article(conn, user, title, link):
article_id = str(conn.incr('article:')) # Generate a new article id.
voted = 'voted:' + article_id
# Start with the posting user having voted for the article, and set the article voting information to automatically expire in a week (we descuss expiration in chapter 3).
conn.sadd(voted, user)
conn.expire(voted, ONE_WEEK_IN_SECONDS)
now = time.time()
article = 'article:' + article_id
# Create the article hash.
conn.hmset(article, {
'title': title,
'link': link,
'poster': user,
'time': now,
'votes': 1
})
# Add the article to the time and score ordered ZSETs.
conn.zadd('score:', article, now + VOTE_SCORE)
conn.zadd('time:', article, now)
return article_id
Okay, so we can vote, and we can post articles. But what about fetching the current top-scoring or most recent articles? For that, we can use ZRANGE to fetch the article IDs, and then we can make calls to HGETALL to fetch information about each article. The only tricky part is that we must remember that ZSETs are sorted in ascending order by their score. But we can fetch items based on the reverse order with ZREVRANGEBYSCORE. The function to fetch a page of articles is shown in listing 1.8.
好了,现在我们实现了投票和发帖。那我们该怎么获取最新或是最热门的文章呢?为此,我们可以使用ZRANGE来获取文章id,然后我们可以调用HGETALL来获取关于每一篇文章的信息。我们唯一需要注意并记住的是,ZSET是以升序排序的。但我们可通过使用SREVRANGEBYSCORE来倒序获取。获取文章的代码参考listing 1.8。
# Listing 1.8 The get_articles() function
ARTICLES_PER_PAGE = 25
def get_articles(conn, page, order='score:'):
# Set up the start and end indexes for fetching the articles.
start = (page-1) * ARTICLES_PER_PAGE
end = start + ARTICLES_PER_PAGE - 1
ids = conn.zrevrange(order, start, end) # Fetch the article ids.
articles=[]
# Get the article information from the list of article ids.
for id in ids:
article_data = conn.hgetall(id)
article_data['id'] = id
articles.append(article_data)
return articles
DEFAULT ARGUMENTS AND KEYWORD ARGUMENTS Inside listing 1.8, we used an argument named order, and we gave it a default value of score:. Some of the details of default arguments and passing arguments as names (instead of by position) can be strange to newcomers to the Python language. If you’re having difficulty understanding what’s going on with function definition or argument passing, the Python language tutorial offers a good introduction to what’s going on, and you can jump right to the particular section by visiting this shortened URL: http://mng.bz/KM5x.
默认参数和关键字参数 在listing 1.8里,我们给一个参数命名为order,并且我们给它设置了默认值“score:”。对于Python语言的新手来说,默认参数和将参数作为名称传递(而不是按位置传递)的一些细节可能会让他们感到奇怪。如果您在理解函数定义或参数传递方面有困难,Python语言教程提供了一个很好的介绍,您可以访问这个URL: http://mng.bz/KM5x
We can now get the top-scoring articles across the entire site. But many of these article voting sites have groups that only deal with articles of a particular topic like cute animals, politics, programming in Java, and even the use of Redis. How could we add or alter our code to offer these topical groups?
现在我们可以在整个网站里找到得分最高的文章了。但是很多这样的文章投票网站都有专门的分类来处理一些特定主题的文章,比如可爱的动物,政治,Java编程,甚至是Redis的使用。我们如何添加或修改我们的代码来提供分类的功能?
To offer groups requires two steps. The first step is to add information about which articles are in which groups, and the second is to actually fetch articles from a group. We’ll use a SET for each group, which stores the article IDs of all articles in that group. In listing 1.9, we see a function that allows us to add and remove articles from groups.
实现分类只需两步。第一步把文章添加到分类里,第二步获取这个分组里的所有文章。我们将为每个分类设置一个SET,用于存储文章的ID。listing 1.9 展示了如何添加或移除分类。
# Listing 1.9 The add_remove_groups() function
def add_remove_groups(conn, article_id, to_add=[], to_remove=[]):
article = 'article:' + article_id # Construct the article information like we did in post_article.
for group in to_add:
conn.sadd('group:' + group, article) # Add the article to groups that it should be a part of.
for group in to_remove:
conn.srem('group:' + group, article) # Remove the article from groups that it should be removed from.
At first glance, these SETs with article information may not seem that useful. So far, you’ve only seen the ability to check whether a SET has an item. But Redis has the capability to perform operations involving multiple SETs, and in some cases, Redis can perform operations between SETs and ZSETs.
When we’re browsing a specific group, we want to be able to see the scores of all of the articles in that group. Or, really, we want them to be in a ZSET so that we can have the scores already sorted and ready for paging over. Redis has a command called ZINTERSTORE, which, when provided with SETs and ZSETs, will find those entries that are in all of the SETs and ZSETs, combining their scores in a few different ways (items in SETs are considered to have scores equal to 1). In our case, we want the maximum score from each item (which will be either the article score or when the article was posted, depending on the sorting option chosen).
乍一看,这些包含文章信息的集合似乎不是很有用。到目前为止,您只看到了检查集合是否含有某个元素的功能。但是Redis有能力执行涉及多个集合的操作,并且在某些情况下,Redis可以在集合和zset之间执行操作。
当我们浏览一个特定的分类时,我们希望能够看到该分类中所有文章的得分。或者,实际上,我们希望它们在一个ZSET中,这样我们就可以对分数进行排序,并为分页做好准备。Redis有一个叫ZINTERSTORE的命令,可以比较SET和ZSET里的所有键名比提取出它们的交集(SET可以看作是分数均为1的ZSET)。在我们的例子里,我们希望获得每个条目的最大得分(即文章得分或文章发布的时间,取决于所选择的排序选项)。
To visualize what is going on, let’s look at figure 1.12. This figure shows an example ZINTERSTORE operation on a small group of articles stored as a SET with the much larger (but not completely shown) ZSET of scored articles. Notice how only those articles that are in both the SET and the ZSET make it into the result ZSET?
To calculate the scores of all of the items in a group, we only need to make a ZINTERSTORE call with the group and the scored or recent ZSETs. Because a group may be large, it may take some time to calculate, so we’ll keep the ZSET around for 60 seconds to reduce the amount of work that Redis is doing. If we’re careful (and we are), we can even use our existing get_articles() function to handle pagination and article data fetching so we don’t need to rewrite it. We can see the function for fetching a page of articles from a group in listing 1.10.
为了可视化我们要实现的功能,让我们看看图1.12。该图展示了一个基于分类SET和文章分数ZSET的ZINTERSTORE操作,是如何只把两者中均存在的文章放入结果ZSET中的。
要计算分类中所有元素的分数,我们只需要对分类SET和分数、时间ZSET进行ZINTERSTORE。因为一个分类包含的数据可能很大以至于计算需要的时间较长,所以我们将ZSET保持在60秒左右,以减少Redis的工作量。如果我们足够细心,我们甚至可以使用我们已有的get_articles()方法来进行分页和读取数据,而无需重写新的方法。查看listing1.10。
# Listing 1.10 The get_group_articles() function
def get_group_articles(conn, group, page, order='score:'):
key = order + group # Create a key for each group and each sort order.
if not conn.exists(key): # If we haven`t sorted these articles recently, we should sort them.
# Actually sort the articles in the group based on score or recency.
conn.zinterstore(
key,
['group:' + group, order],
aggregate='max'
)
conn.expire(key, 60) # Tell Redis to automatically expire the ZSET in 60 seconds.
return get_articles(conn, page, key) # Call our earlier get_articles() function to handle pagination and article data fetching.
On some sites, articles are typically only in one or two groups at most (“all articles” and whatever group best matches the article). In that situation, it would make more sense to keep the group that the article is in as part of the article’s HASH, and add one more ZINCRBY call to the end of our article_vote() function. But in our case, we chose to allow articles to be a part of multiple groups at the same time (maybe a picture can be both cute and funny), so to update scores for articles in multiple groups, we’d need to increment all of those groups at the same time. For an article in many groups, that could be expensive, so we instead occasionally perform an intersection. How we choose to offer flexibility or limitations can change how we store and update our data in any database, and Redis is no exception.
在一些网站上,一篇文章通常最多只属于一到两个类别(“所有文章”和最适合文章)。在这种情况下,更合理的做法是将文章所在的分类保留为文章hash的一部分,并在article_vote()函数的末尾再添加一个ZINCRBY调用。但是在我们的例子中,我们允许文章同时存在于多个分类中(也许一张图片可以既可爱又有趣),所以要更新该文章的分数,我们需要同时增加所有这些分类中的分数,这将会是一笔高昂的消费,因此我们偶尔会用交集来实现。在任何数据库中,灵活与限制的取舍将影响我们存储和更新数据的方式,在redis中也不例外。