SDE面试技巧之二:System Design

在面试的时候,有时会碰到这样一类的问题:

How will you design a search engine? Or How do you support google like instant responses

How will you design a system like eBay?

这样大的系统,可不是一两句话能说清楚的,面试官想知道什么呢?

通过这类问题,面试官可以了解两个问题:1. 如果应聘者对这个领域不熟悉,那他怎么想办法解决它? 2. 如果应聘者对领域有了解,但没经验,面试者对里面的技术、难点等有多少了解。这两个方面,应该来说,第一个更重要一些,第二个方面依赖于平时积累,不是技巧,而且一般也不决定面试成功与否。下面我们重点看怎么解决第一个问题。下面这段摘自《Cracking the Coding Interview (4th Edition)》:

General Approach

The general approach is as follows: Imagine we’re designing a hypothetical system X for millions of items (users, files, megabytes, etc):

  1. How would you solve it for a small number of items? Develop an algorithm for this case, which is often pretty straight-forward.
  2. What happens when you try to implement that algorithm with millions of items? It’s likely that you have run out of space on the computer. So, divide up the files across many computers.
    • How do you divide up data across many machines? That is, do the first 100 items appear on the same computer? Or all items with the same hash value mod 100?
    • About how many computers will you need? To estimate this, ask how big each item is and take a guess at how much space a typical computer has.
  3. Now, fix the problems that occur when you are using many computers. Make sure to answer the following questions:
    • How does one machine know which machine it should access to look up data?
    • Can data get out of sync across computers? How do you handle that?
    • How can you minimize expensive reads across computers?

这段话告诉我们,先从简单的问题出发,找到一个办法能解决简单问题,然后把问题逐步变复杂,在变复杂的过程中,会产生新的问题,再解决新问题。当然变复杂也有往哪个方向变复杂,复杂到什么程度,在复杂问题中是否有些共性以简化问题。然后还需要解答在你的新方案中需要解决哪些其他核心问题。

但其实面试官期待的不仅仅是你能找到一个办法把问题解决了,而是希望你知道最好的办法!!这个词太重要了,最好的办法,可是什么样的办法是最好的办法呢?好不好比较一下就知道了。面对一个问题,找出一个解决方法只是第一步【如果题目比较难,面试官也许就之期望面试者能做出第一步,或者一种方案+优化】,而找到更好的办法才是面试官想看到的。下面是一个例子:

If you were integrating a feed of end of day stock price information (open, high, low, and closing price) for 5,000 companies, how would you do it? You are responsible for the development, rollout and ongoing monitoring and maintenance of the feed. Describe the different methods you considered and why you would recommend your approach. The feed is delivered once per trading day in a comma-separated format via an FTP site. The feed will be used by 1000 daily users in a web application.

初看到这个问题,我们先得搞明白题意啊,这是很重要的!!题目说:每天的数据通过FTP发布出来,我们要把股票收盘后价格信息进行统一管理。如果管理信息呢?其实最主要的就是要确定存储数据模型。【如果从题目里看不到这一点,可以问】。数据存储有哪些办法呢,分别有什么优缺点呢?

Let’s assume we have some scripts which are scheduled to get the data via FTP at the end of the day. Where do we store the data? How do we store the data in such a way that we can do various analyses of it?
Proposal #1
Keep the data in text files. This would be very difficult to manage and update, as well as very hard to query. Keeping unorganized text files would lead to a very inefficient data model.
Proposal #2
We could use a database. This provides the following benefits:
»»Logical storage of data.
»»Facilitates an easy way of doing query processing over the data.
Example: return all stocks having open > N AND closing price < M
Advantages:
»»Makes the maintenance easy once installed properly.
»»Roll back, backing up data, and security could be provided using standard database features. We don’t have to “reinvent the wheel.”
Proposal #3
If requirements are not that broad and we just want to do a simple analysis and distribute the data, then XML could be another good option.
Our data has fixed format and fixed size: company_name, open, high, low, closing price. The XML could look like this:

< root >
  < date value ="2008-10-12" >
    < company name =“foo“>
     
<open >126.23 </ open >
      < high >130.27 </ high >
      < low >122.83 </ low >
    </ company >
    < company name ="bar" >
      < open >52.73 </ open >
      < high >60.27 </ high >
      < low >50.29 </ low >
      < closingPrice >54.91 </ closingPrice >
    </ company >
  </ date >
  < date value ="2008-10-11" > . . . </ date >
</ root >

Benefits:
»»Very easy to distribute. This is one reason that XML is a standard data model to share /distribute data.
»»Efficient parsers are available to parse the data and extract out only desired data.
»»We can add new data to the XML file by carefully appending data. We would not have to re-query the database.
However, querying the data could be difficult.

这个解答里面提到了3种办法,并说明了其优缺点。 如果根据优缺点和程序的实际需要得到一个最优方案,那就是Problem-Solving类问题的最好答案了。

 

Reference:

  1. Cracking the Coding Interview (4th Edition)
  2. http://www.palantir.com/2011/10/how-to-rock-a-systems-design-interview/

你可能感兴趣的:(System)