Why System Design Interviews?


  1. high-level design

  2. guide and move the conversation forward

  3. discussion with interviewer --> core

  4. gather all requirements 因为interviewer 不会告诉你

  5. Leading the conversation: candidate leads the discussion to go broad and deep -> take the interviewer with you step by step

  6. Solving by breaking down: top-down and modularization
    6.1 break problems into modules and solve them independently
    6.2 each component -> sub-problem -> algorithm
    6.3 NOTE:
    真正重要的是:how you make progress on 解决问题和采用哪种strategy

  7. handle bottleneck:
    7.1 each solution is a kind of trade-off
    7.2 talk about these trade-offs and to measure their impact on the system keeping all the constraints and use cases in mind

  8. Steps:
    8.1 Scoping the problem: No assumptions, Ask, 限制,use cases
    8.2 Abstract design: blocks of the system and the relationships between them.
    8.3 Identify and address the bottlenecks by using the fundamental principles of scalable system design.

  9. Know the preference of the interviewer
    9.1 focus on the right things while discussing the problem 心里有大局掌控的同时 要多问面试官是否想要多讨论一下 当前/某个 部分


  1. 花几分钟跟interviewer搞清楚 the full scope of the system
  2. 在完成High-level design之后,make sure that interviewer is OK with that. then move on to details --> scale

Super Important:

  1. NEVER assume things that are not stated!!!


  1. LB is to distribute load to multiple resources

  2. Where to ADD:
    2.1 user <--> web server
    2.2 web server <--> internal platform layer(e.g. app server or cache server)
    2.3 internal platform layer <--> DB

  3. Implement
    3.1 Smart Clients
    In a word: 纯自己develop,可以在各个layer,包括user和db server
    3.2 Hardware Load Balancers
    In a word: 纯硬件
    3.3 Software Load Balancers
    In a word: 用LB的软件,但是有时候需要software + hardware结合(如果如果无法控制user的host的port)


1. 各种Cache
1.1 Application server cache
request 到哪个node,哪个node就cache一下
pros: straightforward
cons: 如果LB是random分的request,会miss cache
1.2 Distributed cache
增加一个node来cache "cache是否available",利用hashing function,每个node上面只存自己需要负责存的cache。当查到自己这儿有相应的cache的时候,(在接到request的node去database那儿拿data前)发一个request告诉接到这个request 的node他这儿有cache
pros: 解决了application server cache的cons
cons: 如果有Missing node会很麻烦,即使利用"让多份copy of data到不同的nodes上"来解决这个con也会使整个cache变得很复杂
1.3 Global Cache
有两种global cache:


第二种的对于有的情况会有好处,比如:(1)cache的file很大 (2)cache的东西是static的,不希望被evict

1.4 Content Distribution Network (CDN)
个人理解(To be changed if it's wrong) 本质就是global cache
2. Cache Invalidation
2.1 Write-through cache
两边(cache + DB)一起更新
cons: 特别慢,load大
2.2 Write-around cache
cons: 慢, miss cache
2.3 Write-back cache
cons: 如果cache在DB更新前突然崩了,data可能会丢失
3. Cache eviction policies

Sharding or Data Partitioning

-> break up a big database (DB) into many smaller parts

  1. Partition Methods
    1.1 Horizontal partitioning: range-based
    cons: 分布不均
    1.2 Vertical Partitioning: column-based
    cons: 一般都得需要再partition
    1.3 Directory Based Partitioning: mapping

  2. Partitioning Criteria
    2.1 Key or Hash-based partitioning:
    hash -> consistent hashing
    2.2 List partitioning:
    Only contains the content with specific values
    2.3 Round-robin partitioning
    2.4 Composite partitioning

  3. Common Problems of Sharding
    a. Joins and Denormalization:
    problem: 不能join
    solution: denormalization -> con: have to deal with data inconsistency(caused by denormalization)

b. Referential integrity:
problem: not support foreign key constraint
solution: application needs to (1)handle itself (2)clean up dangling references

c. Rebalancing:
problem: 有时候各种原因会导致各个shard间unbalance
solution: Hash-based partitioning with consistent hashing


  • used to improve the speed of data retrieval operations on the data store
  • It's a data structure


Used to filter requests or log requests or transform requests
?As a cache?
Collapse the same requests from a system-wide perspective
Collapse the requests that are 空间相同(spatially close together,如同一个DB)

在(1)high load的情况下(2)cache 有限的情况下 非常有用


用了Queue, client不需要就在那儿等着某个Server的response。


  1. asynchronously run tasks
  2. 可以有更灵活的retry机制, fault tolerance.

Queue的限制: size of data and the number of outstanding requests

Redundancy and Replication

Backup important data or service.

  1. Failover
  2. shared-nothing architecture


Differ in (1) the way they were built (2) the kind of info they stored (3) how they store it

SQL: structured, pre-defined schema(例子:电话簿 姓名-电话-地址)
NoSQL: unstructured, distributed, dynamic schema(例子:folder 姓名-一切关于这个人的东西,如地址、电话、FB点赞数)

NoSQL types:

  1. Key-Value Stores:
    Stored in key-value pairs.
    e.g. Redis, Dynamo
  2. Document Databases:
    Stored in documents. Documents are grouped by collections. Each document 可以有完全不同的数据结构。
    e.g. MongoDB
  3. Wide-Column Databases:
    Colum families --> container of rows. 不需要提前了解columns, 每个row 不需要有同样的column.
    常用于large dataset.
    e.g. Cassandra, HBase.
  4. Graph Databases:

High level differences between SQL and NoSQL

  • Storage:
    SQL: row --> entity
    NoSQL: different models
  • Schema
    SQL: fixed schema <-- whole database
    NoSQL: dynamic
  • Querying
    SQL: SQL -> data
    NoSQL: focus on collections, different DB --> different syntax
  • Scalability
    SQL: vertically scalable
    NoSQL: horizontally scalable
  • Reliability or ACID Compliancy
    SQL: Good
    NoSQL: Bad

SQL VS. NoSQL - Which one to use?

  • When SQL:
    (1) ensure ACID compliance
    (2)data is structured and unchanging
  • When NoSQL:
    (1) data with no or little structure.
    (2) Used in Cloud Computing. Scalable.
    (3) Rapid development.

CAP Theorem



Consistent Hashing

如果想不起来了可以参考wiki 或者

Long-Polling vs WebSockets vs Server-Sent Events

Ajax Polling
client 发request --> server 收到然后处理发response(不管有没有能回的都要回,所以会产生empty response)
cons: too many requests
Long polling
client 发一个request,但是NOT expect 马上回复 --> server收到request后,每当有东西需要回的时候再回 --> client收到response再马上发一个request

persistent connection between client and server

Server-Sent Events

