无论是在大厂还是初创公司,技术产品经理 (TPM)都需要具备系统设计的基础知识。从历史上看,系统设计基础知识通常是软件工程师在面试时的要求,而 TPM 不受此期望的约束。然而,现在趋势正在发生变化。作为 TPM,您需要在面试和领导产品团队的工作中,对系统设计有深入的了解。
Technical product managers (TPMs) at big tech companies and startups are required to have a fundamental knowledge of System Design. Historically, System Design fundamentals were usually a requirement for software engineers during interviews, and TPMs were exempt from that expectation. However, the trend is now changing. As a TPM, you need to have a solid understanding of System Design in interviews and on the job as you lead a product team.
在本文中,我们将讨论5 个最常见的系统设计基础知识,您必须了解这些基础知识才能在技术产品管理中取得成功,同时领导工程团队并推出出色的产品。
In this article, we’ll discuss 5 of the most common fundamentals of System Design that you must know to succeed in your role in technical product management while leading an engineering team and rolling out great products.
我们将涵盖:
什么是系统设计,作为 TPM 为什么要关心?
1.负载均衡
2.键值存储
3.限速器
4. 内容分发网络 (CDN)
5.数据库
从这里开始系统设计
We’ll cover:
What is System Design, and why should you care as a TPM?
1. Load balancing
2. Key-value stores
3. Rate limiters
4. Content delivery networks (CDNs)
5. Databases
Where to go from here on System Design
系统设计是构建系统以满足所有功能和非功能需求的过程,包括 API、用例和集成。即使您作为 TPM 不直接负责此体系结构的复杂细节,您也应该了解全局,了解不同的系统组件如何支持您的组织的目标,并满足您的产品的要求。
TPM 应该具备系统设计的基础知识才能做好他们的工作。
Educative 的 CEO Fahim ul Haq 在 FAANG 公司从事分布式系统工作八年。在广泛采访了 TPM 之后,他同意了。
“TPM 应该了解可伸缩系统如何工作以及分布式系统的不同部分如何在抽象层面上相互交互以正确指导开发的所有基本概念。他们需要了解他们正在构建的系统的核心概念和构建块。” — Fahim ul Haq(前系统工程师,Educative 首席执行官)
TPM 需要了解系统设计基础知识,以便为其产品做出明智的设计决策。例如,如果你要设计 Facebook 的照片存储系统,你需要为每张图片分配一个特定的 ID,并有一个系统来唯一标识每张上传的照片。如果您了解系统设计,那么您就会知道您需要一个序列生成器来实现此功能。
因此,作为有效的 TPM,您的目标应该是构建敏捷、可扩展、可靠、可维护且健壮的系统,以在任何给定时间满足用户需求。
接下来,我们将介绍系统设计的五个基本概念,它们是您作为 TPM 在工作中取得成功所绝对必要的。
System Design is the process of architecting a system so that all functional and non-functional requirements are met, including APIs, use cases, and integrations. Even if you’re not directly responsible for the intricate details of this architecture as a TPM, you should understand the big-picture, how different system components support the goals of your organization, and meet the requirements of your products.
TPMs should have a fundamental knowledge of System Design to do their jobs well.
Fahim ul Haq, the CEO of Educative, has worked on distributed systems at FAANG companies for eight years. Having interviewed TPMs extensively, he agrees.
“A TPM should understand all the basic concepts of how scalable systems work and how different parts of a distributed system interact with each other at an abstract level to properly guide development. They need to know the core concepts and building blocks of the systems they are building.” — Fahim ul Haq (former systems engineer, CEO of Educative)
A TPM needs to know System Design fundamentals to make informed design decisions for their products. For instance, if you were to design Facebook’s photo storage system, you’d need to assign a specific ID to each image and have a system in place to uniquely identify every photo uploaded. If you know System Design, then you’d know that you’ll need a sequence generator to achieve this function.
Therefore, as an effective TPM, you should aim to architect agile, scalable, reliable, maintainable, and robust systems that meet user requirements at any given time.
Next, we’ll cover five fundamental concepts of System Design that are an absolute necessity for you, a TPM, to succeed in your job.
负载平衡是系统设计生命周期的一个组成部分,指的是在不同的计算服务器之间重新分配任务以提高系统性能和可靠性。
每秒有数百万个请求,负载均衡器将在可用资源之间平均分配任务,以确保流量顺畅流动。
提高效率:负载均衡器在不同服务器之间平均分配负载流量,从而提高效率并同时降低成本。
服务器的可用性:如果一台或多台服务器发生故障,负载均衡器将绕过它们并通过在正常运行的服务器之间分配流量来确保系统保持可用。
可扩展性:添加更多服务器可确保通过负载平衡同时增加应用程序容量。
作为 TPM,您将经常面临需要扩展服务器以满足用户需求的情况,或者出现流量激增和故障的情况。在这种情况下,负载均衡器就会派上用场。
此外,您必须具有决策能力,可以根据定价、涉众承诺和其他变量为您的开发团队选择合适的负载均衡器算法。
负载平衡器将帮助您的系统提高可伸缩性、性能、可用性,并将减少冗余。通过确保可以同时更改服务器容量,故障服务器优先于正常工作的服务器被绕过,并且服务器负载均匀分布。
Load balancing is an integral part of the System Design lifecycle and refers to redistributing tasks across different computing servers to enhance system performance and reliability.
With millions of requests per second, load balancers will evenly spread tasks between available resources to ensure traffic flows smoothly.
Improving efficiency: Load balancers distribute load traffic evenly among different servers, therefore improving efficiency and cutting down costs simultaneously.
Availability of servers: If one or more servers break down, load balancers will bypass them and ensure the system remains available by distributing traffic among properly functioning servers.
Scalability: Adding more servers ensures application capacity is increased concurrently via load balancing.
As a TPM, you will constantly be faced with situations where your servers need to be scaled up to meet user demand, or where there is a traffic surge and failure. In this scenario, a load balancer will come in handy.
Additionally, you must have the decision-making capacity to select a suitable load balancer algorithm for your development team depending on pricing, stakeholder commitment, and other variables.
A load balancer will help your system improve scalability, performance, availability, and will reduce redundancy. By ensuring server capacity can be altered simultaneously, failed servers are bypassed in preference for working ones and server load is distributed evenly.
键值存储是一种软件存储系统,它建立在关联数组数据模型(例如哈希表或字典)的基础上,为集合中的每个键分配唯一值。值可以是唯一 ID、blob 或服务器名称中的任何内容。
在分布式环境中使用传统存储系统进行扩展,同时仍保持强大且一致的可用性可能具有挑战性。几家顶级科技公司,包括 Facebook、Netflix 和亚马逊,比传统的在线事务处理 (OLTP) 数据库更依赖主键访问数据存储。根据定义,OLTP 是通过 Internet 快速实时执行大量数据库事务。
可扩展性:它们可以持续处理越来越多的数据,而不会显着降低性能。
速度:简单的检索和使用命令,如get
,put
并delete
确保效率。
灵活性:由于结合了可扩展性和速度键值存储提供的功能,扩展任何大型业务模型都变得更加容易。
在将系统设计为 TPM 时,您应该考虑何时何地需要使用键值存储,以及为什么它可能是当时的最佳选择。该模型有利于存储客户个性化数据,因为它具有可扩展性、速度和灵活性。
例如,您可以提高系统的处理性能,因为您将在多台具有更多内存的计算机上处理数据集,并提高容错能力。LinkedIn、Amazon 和 MongoDB 等公司在过去几年中使用键值存储来显着扩展。
A key-value store is a software storage system that builds on an associative array data model such as a hash table or dictionary to assign every key with a unique value in a collection. Values can be anything from unique ids, blobs, or server names.
It can be challenging to scale with traditional storage systems in distributed environments while still maintaining strong and consistent availability. Several top tech companies, including Facebook, Netflix, and Amazon, rely more on primary-key access data stores than traditional online transaction processing (OLTP) databases. By definition, OLTP is the rapid real-time execution of huge database transactions over the internet.
Scalability: They can continuously process increasingly large amounts of data without a significant drop in performance.
Speed: Simple retrieval and usage commands like get
, put
, and delete
ensure efficiency.
Flexibility: Scaling any large business model is easier because of the combined scalability and speed key-value stores offer.
When designing a system as a TPM, you should consider when and where you need to use a key-value store and why it may be the best choice at that given time. This model is beneficial for storing customer personalization data because of the scalability, speed, and flexibility that come with it.
For instance, you can increase processing performance on your systems because you will be working with datasets on multiple computers with more memory and also increase fault tolerance. Companies like LinkedIn, Amazon, and MongoDB have used key-value stores to scale significantly over the last couple of years.
速率限制器确保服务仅响应设定数量的请求。超出预定义限制的任何内容都会受到限制。例如,如果服务的 API 配置为每分钟仅处理 200 个请求,则超过该请求的任何请求都将被阻止。
成本效率:它们有助于控制运营成本,例如,通过防止运营实验超过服务器请求的设定配额。
避免资源剥夺:通过速率限制可以防止由于软件配置错误而发生的多种拒绝服务 (DoS) 攻击。
分配数据流:与负载均衡器一样,速率限制器确保系统不会因大量数据而负担过重,并在需要时帮助在不同服务器之间平均分配负载。
在您作为 TPM 的角色中,您希望确保您的服务器以最佳方式运行并且数据库不会因性能低下而受到损害。这是可以应用适当的速率限制算法的地方。
像 Lyft 这样的公司利用速率限制器来有效地运行他们的流程。
A rate limiter ensures that a service responds only to a set number of requests. Anything beyond the predefined limits is throttled. For example, if an API for a service has been configured to handle only 200 requests per minute, any requests over that will be blocked.
Cost efficiency: They help control operational costs, for instance, by preventing operational experiments from exceeding the set quota of server requests.
Averting resource deprivation: Several denial of service (DoS) attacks that happen due to software configuration errors are prevented with rate limiting.
Distributing data flow: Like load balancers, rate limiters ensure that systems are not overburdened with a large amount of data and help evenly spread the load among different servers when required.
In your role as a TPM, you want to ensure that your servers are running optimally and databases are not being compromised by slow performance. This is where an appropriate rate limiting algorithm can be applied.
Companies like Lyft make use of rate limiters to run their processes efficiently.
内容分发网络是地理分布的服务器,它们协同工作以确保通过 Internet 快速高效地分发内容。CDN 使用缓存作为一种机制来加速内容在 Web 上的传输。
CDN 服务的内容可以有多种类型,包括网站数据、社交媒体内容、可下载媒体等。
一些组织使用 CDN 来加速通过 Internet 传送内容。例如,银行可能会使用 CDN 来安全地传输敏感数据。
提高效率:CDN 可以缩短网页加载时间,同时降低跳出率。这使用户留在页面上并防止他们放弃它。
增强安全性:通过缓解分布式拒绝服务 (DDoS) 攻击,CDN 在增强安全性方面发挥着巨大作用。
降低带宽成本:由于 CDN 主要依赖缓存和其他优化,它们可以显着降低服务器带宽,从而降低网站管理员和所有者的托管成本。
如果您的组织内容繁多,作为 TPM,您可能会发现在某些情况下使用 CDN 很有帮助。您将能够减少数据加载时间和延迟、减少冗余、提高安全性并减少带宽费用,从而为组织节省时间和成本。
Content delivery networks are geographically distributed servers that work together to ensure quick and efficient content delivery over the internet. CDNs use caching as a mechanism to speed up the delivery of content across the web.
Content serviced by CDNs can be of several types, including website data, social media content, downloadable media, and so on.
Several organizations use CDNs to accelerate the delivery of content via the internet. A bank, for instance, might use a CDN to transfer sensitive data securely.
Improving efficiency: CDNs enhance web page load times while simultaneously cutting down bounce rates. This keeps a user on the page and prevents them from abandoning it.
Enhancing security: By mitigating distributed denial-of-service (DDoS) attacks, CDNs play a massive role in boosting security.
Cutting down on bandwidth costs: Because CDNs primarily rely on caching and other optimizations, they can significantly reduce server bandwidth, keeping hosting costs down for website administrators and owners.
If your organization is content-heavy, you, as a TPM, may find a CDN helpful to employ in some instances. You will be able to reduce data load times and latency, reduce redundancy, boost security and reduce bandwidth expenses hence saving time and costs for the organization.
传统的文件系统有很多缺点,因此数据库通常是首选。数据库是以易于访问、维护、管理和结构化的方式组织的数据集合,以便可以有效地更新和处理。
数据库有两种主要类型:
关系数据库是在多个表、列和记录中组织的数据集的集合。关系数据库通过数据库表相互通信。结构化查询语言 (SQL) 用于使用 、 、 和 等命令从这些数据库中操作insert
和delete
检索update
信息retrieve
。
非关系数据库(NoSQL) 通常以与关系数据库不同的格式存储非结构化数据。NoSQL 数据库有多种类型,包括图形、键值、文档和宽列。
数据一致性:数据库将确保消除数据冗余,所做的更改会立即反映在数据库中,因此不会出现数据不一致的情况。
数据完整性:通过确保向所有用户提供正确和准确的信息,可以维护数据完整性。
数据安全:包括密码和用户身份验证在内的多种安全功能有助于维护数据库中数据的安全性。
在这个数字时代,每个组织都使用数据库来扩展业务并改进工作流程和效率。你作为 TPM 的角色可能经常需要你也戴上数据产品经理的帽子。在这里,您可能需要监督数据在组织内的分布和使用方式的整个生命周期。在这里,强大的数据科学背景和使用数据库将帮助您大放异彩。
根据您选择的类型,数据库有几个优势,包括数据完整性、数据一致性、数据安全性、数据持久性和易于访问等。
Traditional file systems come with many disadvantages, so databases are often preferred. A database is a collection of data organized in a way that is easily accessible, maintainable, manageable, and structured so that it can be updated and processed efficiently.
Databases come in two main types:
Relational databases are collections of datasets organized in multiple tables, columns, and records. Relational databases communicate with each other via database tables. Structured query language (SQL) is used to manipulate and retrieve information from these databases with commands like insert
, delete
, update
, and retrieve
.
Non-relational databases (NoSQL) typically store unstructured data in a different format from relational databases. NoSQL databases have several types, including graph, key-value, document, and wide-column.
Data consistency: Databases will ensure that data redundancy is eliminated and changes made are reflected in the database immediately, hence no inconsistency in data.
Data integrity: By ensuring all users are presented with correct and accurate information, data integrity is maintained.
Data security: Several security features including password and user authentication help maintain the security of data in databases.
Every organization works with databases in this digital era to scale their business and improve workflows and efficiency. Your role as a TPM may often require you to wear the hat of a data product manager too. Here, you may be required to oversee the entire lifecycle of how data is distributed and used within an organization. This is where a strong background in data science and working with databases will help you shine.
Databases have several advantages depending on the type you choose, including data integrity, data consistency, data security, data persistence, and ease of access, among others.
NoSQL refers to "not only SQL" or non-relational database which does not store the data in the form of a table. Depending on the model, NoSQL has a variety of database types to store the data. The main types are document database,key-value pair, wide column, and graph database.
NoSQL database is used in the real-time web application.NoSQL database can easily be scaled with a large amount of data and high user loads. Horizontal scaling to clusters of machines can easily be done.
Different types of NoSQL database:
Document Database - store the data in JSON like format. Different types of data can be stored (strings, numbers, booleans, arrays, or objects).Ex(MongoDB is the example of the document database.)
Key-value Databases - items are stored in the key-value pair. This key is used as a unique identifier. It is simple to retrieve the information from the key-value pair does not require the complex queries. Example -:Redis and DynanoDB are popular key-value databases.
Wide-column Stores - store data in tables ,rows and dynamic columns . It is more flexible than a relational database table because it is not necessary for each row the same set of columns. Wide-column stores are used for storing the IOT data and user profile data. Examples:- Cassandra and HBase fall under this category of databases.
Graph Databases - We know, the graph is made up of two elements node and relationship. The node represents the entity and the edges represent the relationship. These types of databases are generally used for social networks.
Example: - Neo4j and Janusgraph fall under this category.
您学习系统设计的基础知识是否有趣?无论您是刚刚考虑开始技术产品管理的职业生涯还是已经在该领域,我们希望本文能帮助您在构建可扩展的软件产品时驾驭系统设计的复杂性。
但我们只是触及了这个话题的表面。本文未讨论但作为 TPM 绝对需要学习和掌握的更多系统设计基础知识包括:
域名系统 (DNS)
序列器
分布式缓存
发布-订阅系统
分片计数器
分布式消息队列
分布式任务调度
分布式日志记录
快乐学习!
Did you have fun learning about the fundamentals of System Design? Whether you are just thinking about starting a career in technical product management or are already in the field, we hope this article helped you navigate the complexities of System Design as you build scalable software products.
But we have just scratched the surface of this topic. More System Design fundamentals not discussed in this article but absolutely necessary for you as a TPM to learn and master include:
Domain Name Systems (DNS)
Sequencers
Distributed Caching
Publish-Subscribe Systems
Sharded Counters
Distributed Messaging Queues
Distributed Task Scheduling
Distributed Logging
Happy learning!
How to prepare for the System Design Interview in 2022
System Design fundamentals: What is the CAP theorem?
How machine learning gives you an edge in System Design
TPM 还需要了解哪些其他系统设计基础知识?本文是否有帮助?在下面的评论中让我们知道!
What other System Design fundamentals do TPMs need to know? Was this article helpful? Let us know in the comments below!
附:架构师知识图谱
【更多阅读】
《人月神话》(The Mythical Man-Month)5画蛇添足(The Second-System Effect)
《人月神话》(The Mythical Man-Month)4概念一致性:专制、民主和系统设计(System Design)
《人月神话》(The Mythical Man-Month)3 外科手术队伍(The Surgical Team)
《人月神话(The Mythical Man-Month)》2人和月可以互换吗?人月神话存在吗?
《人月神话(The Mythical Man-Month)》1 看清问题的本质:如果我们想解决问题,就必须试图先去理解它
悼念《人月神话》作者 Fred Brooks
【图文详解】一文全面彻底搞懂HBase、LevelDB、RocksDB等NoSQL背后的存储原理:LSM-tree日志结构合并树
在平时的工作中如何体现你的技术深度?
Redis 作者 Antirez 讲如何实现分布式锁?Redis 实现分布式锁天然的缺陷分析&Redis分布式锁的正确使用姿势!
程序员职业生涯系列:关于技术能力的思考与总结
十年技术进阶路:让我明白了三件要事。关于如何做好技术 Team Leader?如何提升管理业务技术水平?(10000字长文)
当你工作几年就会明白,以下几个任何一个都可以超过90%程序员
编程语言:类型系统的本质
软件架构设计的核心:抽象与模型、“战略编程”
【图文详解】深入理解 Hbase 架构 Deep Into HBase Architecture
HBase 架构详解及读写流程原理剖析
HDFS 底层交互原理,看这篇就够了!
MySQL 体系架构简介
一文看懂MySQL的异步复制、全同步复制与半同步复制
【史上最全】MySQL各种锁详解:一文搞懂MySQL的各种锁