fb interview

1. new feed

http://www.slideshare.net/danmckinley/etsy-activity-feeds-architecture

http://www.quora.com/What-are-best-practices-for-building-something-like-a-News-Feed

facebook graph的建立,affinity, owner的activity表达式

再是如何aggregate the activities,如何classify activities,score activities,remove duplicate,flatten, sort by score, trim off, stuff into memcache, 最后那个rollup不懂

 

2. chat

https://www.facebook.com/note.php?note_id=14218138919

http://www.cnblogs.com/piaoger/archive/2012/08/19/2646530.html

http://www.erlang-factory.com/upload/presentations/31/EugeneLetuchy-ErlangatFacebook.pdf

http://essay.utwente.nl/59204/1/scriptie_J_Schipers.pdf

关键是那个图

原理如下,基本也是用到memcache

fb interview

fb interview

3. memcache

https://www.youtube.com/watch?v=UH7wkvcf0ys

https://www.adayinthelifeof.nl/2011/02/06/memcache-internals/

memcache(server) - slab(LRU) - page(1MB) - chunk

先定义,再running time, LRU algorithm, memory allocation, consistent hashing

 

4. distributed web system

http://www.aosabook.org/en/distsys.html

 

5. typehead search

https://www.facebook.com/notes/facebook-engineering/under-the-hood-building-out-the-infrastructure-for-graph-search/10151347573598920

https://www.facebook.com/note.php?note_id=389105248919

https://www.facebook.com/notes/facebook-engineering/under-the-hood-indexing-and-ranking-in-graph-search/10151361720763920

 

    • The NLP module parses the query, and identifies portions to search in Unicorn. Steps 2 through 9 are then performed simultaneously for each of these search requests.

    • The Top Aggregator receives the request and fans it out for each vertical. Steps 3 through 8 are performed simultaneously for each vertical.

    • The Top Aggregator rewrites the query for the vertical. Each vertical has different query rewriting requirements. Rewritten queries are typically augmented with additional searcher context.

    • The Top Aggregator sends the rewritten query to the vertical – first to the Vertical Aggregator, which passes it on to each of the Index Servers.

    • Each Index Server retrieves a specified number of entities from the index.

    • Each of these retrieved entities is scored and the top results are returned from the Index Server to the Vertical Aggregator.

    • The Vertical Aggregator combines the results from all Index Servers and sends them back to the Top Aggregator.

    • The Top Aggregator performs result set scoring on the returned results separately for each vertical.

    • The Top Aggregator runs the blending algorithm to combine the results from each vertical and returns this combined result set to the NLP module.

    • Once the results for all the search requests are back at the NLP module, it constructs all possible parse trees with this information, assigns a score to each parse tree and shows the top parse trees as suggestions to the searcher.

 6. message system:

https://www.youtube.com/watch?v=UaGINWPK068

http://www.slideshare.net/brizzzdotcom/facebook-messages-hbase/

architecture(cell, haystack), 其实就是HBASE,讲下distributed system, three dimensional array of cells: Rowkey, ColumnKey, Tiemstamp/version的建立,举例inbox search:

key: Rowkey: userid, columnid: word, version: messageID

value: auxillary info(like offset of word in message).

why Hbase? horizontal scalability, automatic failover, map reduce for large scale data processing

migrate data from mysql to Hbase

 

总结下system design的套路:

1. 先证明system的操作独立性以及大数据特点,这样就可以用distributed system了。

2. 提下distributed system需要考虑的6个地方:availability(replica, recovery system), performance, reliability(race condition), scalability(vertically & horizontally, partition), manageability, cost.

3. Access: Memcache

作用,running time(O1), LRU, 结构, consistent hashing

4. Access: Proxies.

两种方式

5. Access: Index

原理,layered index, reverse index

6. Access: Load Balancing

多次load balancing, user-session-specific data

7. Write: Queue.

asynchronous

你可能感兴趣的:(interview)