1. new feed
http://www.slideshare.net/danmckinley/etsy-activity-feeds-architecture
http://www.quora.com/What-are-best-practices-for-building-something-like-a-News-Feed
facebook graph的建立,affinity, owner的activity表达式
再是如何aggregate the activities,如何classify activities,score activities,remove duplicate,flatten, sort by score, trim off, stuff into memcache, 最后那个rollup不懂
2. chat
https://www.facebook.com/note.php?note_id=14218138919
http://www.cnblogs.com/piaoger/archive/2012/08/19/2646530.html
http://www.erlang-factory.com/upload/presentations/31/EugeneLetuchy-ErlangatFacebook.pdf
http://essay.utwente.nl/59204/1/scriptie_J_Schipers.pdf
关键是那个图
原理如下,基本也是用到memcache
3. memcache
https://www.youtube.com/watch?v=UH7wkvcf0ys
https://www.adayinthelifeof.nl/2011/02/06/memcache-internals/
memcache(server) - slab(LRU) - page(1MB) - chunk
先定义,再running time, LRU algorithm, memory allocation, consistent hashing
4. distributed web system
http://www.aosabook.org/en/distsys.html
5. typehead search
https://www.facebook.com/note.php?note_id=389105248919
The NLP module parses the query, and identifies portions to search in Unicorn. Steps 2 through 9 are then performed simultaneously for each of these search requests.
The Top Aggregator receives the request and fans it out for each vertical. Steps 3 through 8 are performed simultaneously for each vertical.
The Top Aggregator rewrites the query for the vertical. Each vertical has different query rewriting requirements. Rewritten queries are typically augmented with additional searcher context.
The Top Aggregator sends the rewritten query to the vertical – first to the Vertical Aggregator, which passes it on to each of the Index Servers.
Each Index Server retrieves a specified number of entities from the index.
Each of these retrieved entities is scored and the top results are returned from the Index Server to the Vertical Aggregator.
The Vertical Aggregator combines the results from all Index Servers and sends them back to the Top Aggregator.
The Top Aggregator performs result set scoring on the returned results separately for each vertical.
The Top Aggregator runs the blending algorithm to combine the results from each vertical and returns this combined result set to the NLP module.
Once the results for all the search requests are back at the NLP module, it constructs all possible parse trees with this information, assigns a score to each parse tree and shows the top parse trees as suggestions to the searcher.
6. message system:
https://www.youtube.com/watch?v=UaGINWPK068
http://www.slideshare.net/brizzzdotcom/facebook-messages-hbase/
architecture(cell, haystack), 其实就是HBASE,讲下distributed system, three dimensional array of cells: Rowkey, ColumnKey, Tiemstamp/version的建立,举例inbox search:
key: Rowkey: userid, columnid: word, version: messageID
value: auxillary info(like offset of word in message).
why Hbase? horizontal scalability, automatic failover, map reduce for large scale data processing
migrate data from mysql to Hbase
总结下system design的套路:
1. 先证明system的操作独立性以及大数据特点,这样就可以用distributed system了。
2. 提下distributed system需要考虑的6个地方:availability(replica, recovery system), performance, reliability(race condition), scalability(vertically & horizontally, partition), manageability, cost.
3. Access: Memcache
作用,running time(O1), LRU, 结构, consistent hashing
4. Access: Proxies.
两种方式
5. Access: Index
原理,layered index, reverse index
6. Access: Load Balancing
多次load balancing, user-session-specific data
7. Write: Queue.
asynchronous