设计社交网站的数据结构 原文链接
1.描述使用场景和约束
使用场景:
- 用户搜索其他用户,并且显示出二人之间的最近关系
假设和约束:
- 流量不均衡,存在热点数据
- 图状数据无法存放在单台机器上(需要分布式数据存储)
- 图的边没有权重
- 1亿用户量
- 平均每位用户50位朋友
- 平均每月10亿次搜索操作
- 不能使用GraphQL和neo4j
容量估算:
- 50亿的用户之间的关系
- 平均每秒400次搜索
2.创建系统设计图
3.设计关键组件
使用场景:用户搜索某人,先是最近的关系路径
在用户关系搜索上,可以采用BFS来做:
class Graph(Graph):
def shortest_path(self, source, dest):
if source is None or dest is None:
return None
if source is dest:
return [source.key]
prev_node_keys = self._shortest_path(source, dest)
if prev_node_keys is None:
return None
else:
path_ids = [dest.key]
prev_node_key = prev_node_keys[dest.key]
while prev_node_key is not None:
path_ids.append(prev_node_key)
prev_node_key = prev_node_keys[prev_node_key]
return path_ids[::-1]
def _shortest_path(self, source, dest):
queue = deque()
queue.append(source)
prev_node_keys = {source.key: None}
source.visit_state = State.visited
while queue:
node = queue.popleft()
if node is dest:
return prev_node_keys
prev_node = node
for adj_node in node.adj_nodes.values():
if adj_node.visit_state == State.unvisited:
queue.append(adj_node)
prev_node_keys[adj_node.key] = prev_node.key
adj_node.visit_state = State.visited
return None
鉴于用户数量,Person Server
的数据应该做分片:
- 搜索服务通过调用用户服务
- 用户服务做以下操作
- 通过
Lookup Service
确定用户信息存储的节点(这个可以通过数据库中间件做,或者由用户服务负责把请求转发到目标节点) - 从目标节点获取用户信息和好友列表
- 使用
BFS
从好友列表出发查找目标用户
- 通过
查找服务:
class LookupService(object):
def __init__(self):
self.lookup = self._init_lookup() # key: person_id, value: person_server
def _init_lookup(self):
...
def lookup_person_server(self, person_id):
return self.lookup[person_id]
用户服务:
class PersonServer(object):
def __init__(self):
self.people = {} # key: person_id, value: person
def add_person(self, person):
...
def people(self, ids):
results = []
for id in ids:
if id in self.people:
results.append(self.people[id])
return results
用户实体:
class Person(object):
def __init__(self, id, name, friend_ids):
self.id = id
self.name = name
self.friend_ids = friend_ids
用户图服务:
class UserGraphService(object):
def __init__(self, lookup_service):
self.lookup_service = lookup_service
def person(self, person_id):
person_server = self.lookup_service.lookup_person_server(person_id)
return person_server.people([person_id])
def shortest_path(self, source_key, dest_key):
if source_key is None or dest_key is None:
return None
if source_key is dest_key:
return [source_key]
prev_node_keys = self._shortest_path(source_key, dest_key)
if prev_node_keys is None:
return None
else:
# Iterate through the path_ids backwards, starting at dest_key
path_ids = [dest_key]
prev_node_key = prev_node_keys[dest_key]
while prev_node_key is not None:
path_ids.append(prev_node_key)
prev_node_key = prev_node_keys[prev_node_key]
# Reverse the list since we iterated backwards
return path_ids[::-1]
def _shortest_path(self, source_key, dest_key, path):
# Use the id to get the Person
source = self.person(source_key)
# Update our bfs queue
queue = deque()
queue.append(source)
# prev_node_keys keeps track of each hop from
# the source_key to the dest_key
prev_node_keys = {source_key: None}
# We'll use visited_ids to keep track of which nodes we've
# visited, which can be different from a typical bfs where
# this can be stored in the node itself
visited_ids = set()
visited_ids.add(source.id)
while queue:
node = queue.popleft()
if node.key is dest_key:
return prev_node_keys
prev_node = node
for friend_id in node.friend_ids:
if friend_id not in visited_ids:
friend_node = self.person(friend_id)
queue.append(friend_node)
prev_node_keys[friend_id] = prev_node.key
visited_ids.add(friend_id)
return None
4.完善设计
可以优化的点有:将热点信息落入cache提高读服务能力;通过源用户和目标用户同时开始BFS,取路径重合点;对目标步数设置最大值(例如仅显示关系在6个人以内的情况)。