设计社交网站的数据结构

设计社交网站的数据结构 原文链接

1.描述使用场景和约束

使用场景:

  • 用户搜索其他用户,并且显示出二人之间的最近关系

假设和约束:

  • 流量不均衡,存在热点数据
  • 图状数据无法存放在单台机器上(需要分布式数据存储)
  • 图的边没有权重
  • 1亿用户量
  • 平均每位用户50位朋友
  • 平均每月10亿次搜索操作
  • 不能使用GraphQL和neo4j

容量估算:

  • 50亿的用户之间的关系
  • 平均每秒400次搜索

2.创建系统设计图

设计社交网站的数据结构_第1张图片
系统总体设计图

3.设计关键组件

使用场景:用户搜索某人,先是最近的关系路径
在用户关系搜索上,可以采用BFS来做:

class Graph(Graph):

    def shortest_path(self, source, dest):
        if source is None or dest is None:
            return None
        if source is dest:
            return [source.key]
        prev_node_keys = self._shortest_path(source, dest)
        if prev_node_keys is None:
            return None
        else:
            path_ids = [dest.key]
            prev_node_key = prev_node_keys[dest.key]
            while prev_node_key is not None:
                path_ids.append(prev_node_key)
                prev_node_key = prev_node_keys[prev_node_key]
            return path_ids[::-1]

    def _shortest_path(self, source, dest):
        queue = deque()
        queue.append(source)
        prev_node_keys = {source.key: None}
        source.visit_state = State.visited
        while queue:
            node = queue.popleft()
            if node is dest:
                return prev_node_keys
            prev_node = node
            for adj_node in node.adj_nodes.values():
                if adj_node.visit_state == State.unvisited:
                    queue.append(adj_node)
                    prev_node_keys[adj_node.key] = prev_node.key
                    adj_node.visit_state = State.visited
        return None

鉴于用户数量,Person Server的数据应该做分片:

  • 搜索服务通过调用用户服务
  • 用户服务做以下操作
    • 通过Lookup Service确定用户信息存储的节点(这个可以通过数据库中间件做,或者由用户服务负责把请求转发到目标节点)
    • 从目标节点获取用户信息和好友列表
    • 使用BFS从好友列表出发查找目标用户

查找服务:

class LookupService(object):

    def __init__(self):
        self.lookup = self._init_lookup()  # key: person_id, value: person_server

    def _init_lookup(self):
        ...

    def lookup_person_server(self, person_id):
        return self.lookup[person_id]

用户服务:

class PersonServer(object):

    def __init__(self):
        self.people = {}  # key: person_id, value: person

    def add_person(self, person):
        ...

    def people(self, ids):
        results = []
        for id in ids:
            if id in self.people:
                results.append(self.people[id])
        return results

用户实体:

class Person(object):

    def __init__(self, id, name, friend_ids):
        self.id = id
        self.name = name
        self.friend_ids = friend_ids

用户图服务:

class UserGraphService(object):

    def __init__(self, lookup_service):
        self.lookup_service = lookup_service

    def person(self, person_id):
        person_server = self.lookup_service.lookup_person_server(person_id)
        return person_server.people([person_id])

    def shortest_path(self, source_key, dest_key):
        if source_key is None or dest_key is None:
            return None
        if source_key is dest_key:
            return [source_key]
        prev_node_keys = self._shortest_path(source_key, dest_key)
        if prev_node_keys is None:
            return None
        else:
            # Iterate through the path_ids backwards, starting at dest_key
            path_ids = [dest_key]
            prev_node_key = prev_node_keys[dest_key]
            while prev_node_key is not None:
                path_ids.append(prev_node_key)
                prev_node_key = prev_node_keys[prev_node_key]
            # Reverse the list since we iterated backwards
            return path_ids[::-1]

    def _shortest_path(self, source_key, dest_key, path):
        # Use the id to get the Person
        source = self.person(source_key)
        # Update our bfs queue
        queue = deque()
        queue.append(source)
        # prev_node_keys keeps track of each hop from
        # the source_key to the dest_key
        prev_node_keys = {source_key: None}
        # We'll use visited_ids to keep track of which nodes we've
        # visited, which can be different from a typical bfs where
        # this can be stored in the node itself
        visited_ids = set()
        visited_ids.add(source.id)
        while queue:
            node = queue.popleft()
            if node.key is dest_key:
                return prev_node_keys
            prev_node = node
            for friend_id in node.friend_ids:
                if friend_id not in visited_ids:
                    friend_node = self.person(friend_id)
                    queue.append(friend_node)
                    prev_node_keys[friend_id] = prev_node.key
                    visited_ids.add(friend_id)
        return None

4.完善设计

设计社交网站的数据结构_第2张图片
最终设计图

可以优化的点有:将热点信息落入cache提高读服务能力;通过源用户和目标用户同时开始BFS,取路径重合点;对目标步数设置最大值(例如仅显示关系在6个人以内的情况)。

你可能感兴趣的:(设计社交网站的数据结构)