1.Sinfonia: A New Paradigm for Building Scalable Distributed Systems,这篇论文是SOSP2007的Best Paper,阐述了一种构建分布式文件系统的范式方法,个人感觉非常有用。淘宝在构建TFS、OceanBase和Tair这些系统时都充分参考了这篇论文。
2. The Chubby lock service for loosely-coupled distributed systems,http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/zh-CN//archive/chubby-osdi06.pdf,这篇论文详细介绍了Google的分布式锁实现机制Chubby。Chubby是一个基于文件实现的分布式锁,Google的Bigtable、Mapreduce和Spanner服务都是在这个基础上构建的,所以Chubby实际上是Google分布式事务的基础,具有非常高的参考价值。另外,著名的zookeeper就是基于Chubby的开源实现,但是根据在Google工作的朋友讲,zookeeper跟Chubby在性能和功能上都还有差距。
3. Spanner: Google's Globally-Distributed Database,这个是第一个全球意义上的分布式数据库,也是Google的作品。其中介绍了很多一致性方面的设计考虑,为了简单的逻辑设计,还采用了原子钟,同样在分布式系统方面具有很强的借鉴意义。
另外,还有一本书:
刚出的,读了一下样章,感觉还不错,一起推荐给大家——《大规模分布式存储系统:原理解析与架构实战》华章图书 - 大规模分布式存储系统:原理解析与架构实战
Prerequisites
- Unix shell basics: http://www.amazon.com/Unix-Progr...
- C: http://www.amazon.com/Programmin...
- OS basics: http://www.amazon.com/Andrew-S-T... ,http://www.amazon.com/Linux-Kern...
- Unix Programming: http://www.kohala.com/start/
- Networking Basics: http://www.amazon.com/Computer-N... , more advanced: http://web.mit.edu/dimitrib/www/...
- Sockets: http://www.amazon.com/TCP-Socket... ,http://www.amazon.com/Foundation... , and network programming:http://www.kohala.com/start/
- Transmission of information: http://www.amazon.com/Informatio... ,http://www.inference.phy.cam.ac.... and network coding
- Intro to concurrency: http://www.amazon.com/Practical-...
- Java: Which are the frequently asked interview questions for Java Engineers ?
- Data structures and algorithms: What are the most learner-friendly resources for learning about algorithms?
Nice to have: Scala, Erlang, Go
Courses
- 15-440: CMU Distributed Systems in Go
- ETH - Principles of Distributed Computing
- CS525: http://www.cs.uiuc.edu/class/sp1...
- 6.824: 6.824 Home Page: Spring 2013 (by rtm)
- 6.828: 6.828 / Fall 2012 / Schedule
- CS264: http://www.cs264.org/lectures/le...
- CS294: http://www.cs.berkeley.edu/~oded...
- CS 707: http://www.cs.gmu.edu/~setia/cs7...
- Advanced Computer Science Courses: http://the-paper-trail.org/blog/...
- CS696: http://www.eli.sdsu.edu/courses/...
- Google Code University: http://code.google.com/edu/paral...
- CS7960: http://www.cs.utah.edu/~jeffp/te...
- 6.829 http://ocw.mit.edu/courses/elect...
- Textbook: http://www.tacc.utexas.edu/~eijk...
Notes
- BitTorrent: http://en.wikipedia.org/wiki/Bit...
- A case for NOW (Networks of Workstations, 1995):http://www.ncic.ac.cn/~majie/Pap...
Other:
- Start with Lin, Data-Intensive Text Processing with MapReduce, ISBN 1608453421, http://www.umiacs.umd.edu/~jimmy...
- See Tom White, Hadoop: The Definitive Guide, ISBN 1449389732,http://www.amazon.com/Hadoop-Def..., Sean Owen et al., Mahout in Action, ISBN 978193518268, http://www.manning.com/owen/
- HDFS under the hood: http://assets.en.oreilly.com/1/e...
- Zookeper: http://highscalability.com/zooke...
- Download Hadoop (http://hadoop.apache.org/) and run some MapReduce jobs on your laptop in pseudo-distributed mode (see
What's the best way to come up to speed on MapReduce, Hadoop, and Hive? )
- Learn about Google technology stack (MapReduce, BigTable, Dremel, Pregel, GFS, Chubby, Protobuf, Snappy, Ganeti, Tenzing, Sawzall, BigQuery, F1, Spanner, Jingle, GCM, Google Talk, Cloud Analytics , etc). (See
What are the most interesting Google Research papers?
also http://www.columbia.edu/~ak2834/... , http://www.cs.rutgers.edu/~muthu... , http://the-paper-trail.org/blog/...)
- Setup account with Amazon AWS/EC2/S3/EBS and experiment with running Hadoop on a cluster with large data sets (you can use Cloudera or YDN images, but in my opinion you can better understand the system if you set it up from scratch, using the original distribution). Watch the costs: http://www.networkworld.com/news...
- Try out Hadoop alternatives, specifically the minimalist frameworks such as BashReduce: http://github.com/erikfrey/bashr... and CloudMapReduce: http://code.google.com/p/cloudma... (see
What are some promising open-source alternatives to Hadoop MapReduce for map/reduce?)
- See What are some good class projects for machine learning using MapReduce?
- Run Bryan Cooper's Cloud Serving Benchmark on AWS, compare Hbase vs
Cassandra performance on a small cluster (6-8 nodes): http://wiki.github.com/brianfran... also see Pete Warden's tests:http://petewarden.typepad.com/se... , Hbase book:http://hbase.apache.org/book.html- Run LINPACK benchmark: http://www.datawrangling.com/on-...
- Run some experiments with MPI
(http://www.mcs.anl.gov/research/...) try to implement a
simple clustering algorithm
(e.g http://en.wikipedia.org/wiki/K-m...) with MPI vs
Hadoop/MapReduce and compare the performance, fault tolerance, ease of
use etc. Learn the differences between the two approaches, and when it
makes sense to use each one.- Check out Dongarra' papers: http://www.netlib.org/utk/people... , works by Gibbons: http://www.bell-labs.com/org/112... , Lamport:http://research.microsoft.com/en... , Blelloch:http://www.cs.cmu.edu/~guyb/pubs... , also see What are the seminal papers in distributed systems? Why?
- There is a new library called MPI-Mapreduce
(http://www.sandia.gov/~sjplimp/m...) see how it works and how
it compares to other MapReduce implementations- Run some tests with Scalapack (http://www.netlib.org/scalapack/), try to port one of the routines to Hadoop, compare the performance and scalability. See how stability of numerical algorithms is evaluated:http://portal.acm.org/citation.c...
- Write your own simplified MapReduce runtime in C or any other programming language.
- Try xargs -P and GNU Parallel, also see What are some lesser known but useful Unix commands?
- Check out http://www.cascading.org/, http://clojure.org/ ,http://www.bloom-lang.net/features/
- Learn about distributed hash tables (http://en.wikipedia.org/wiki/Dis... ,http://www.linuxjournal.com/arti...), run some experiments withPaxos (algorithm) http://the-paper-trail.org/blog/..., Kademlia:http://en.wikipedia.org/wiki/Kad... , See Wolf Garbe's answers onPeer-to-Peer Technology. Also see Petar Maymounkov's answer to Have there been any new advances in distributed hash tables?
- Download Nutch (http://nutch.apache.org/) or Solr
(http://lucene.apache.org/solr/), run a crawl on Wikipedia. Analyze the
collected data with R (see item 2 above) or Python
(http://www.nltk.org/)- Write you own simplified crawler/indexer, test the performance and
scalability, look at the Lucene source for ideas, look
at http://infolab.stanford.edu/~bac... for inspiration. You
can probably build it as a term project in either Information Retrieval
or Search Engines course.- Learn about prefix-sum: http://en.wikipedia.org/wiki/Pre... ,parallel
matrix multiplication: http://www.cs.berkeley.edu/~yeli... ,streaming:http://infolab.stanford.edu/stream/ and
BSP: http://en.wikipedia.org/wiki/Bul... , DSM:http://www.amazon.com/Distribute... and "The Use of Name Spaces in Plan 9 Tech Incubator"- Check out Persistent Linda: http://www.google.com/#sclient=p... , if you find it interesting see Linda and Friends (an article by Sudhir Ahuja from 1986), also search for "Linda in Context", "tuple space", Javaspaces, Gigaspaces. Read "How to write parallel programs: a guide to the perplexed" by Nicholas Carriero and David Gelernter:http://portal.acm.org/citation.c...
- Learn about Compute Unified Device Architecture (CUDA):http://www.amazon.com/CUDA-Examp... , Graphics Processing Unitand Field Programmable Gate Arrays (FPGAs) accelerators, PlayStation 3 (video game console) programming: http://en.wikipedia.org/wiki/Cel... ,http://www.hotchips.org/hc21/mai...
- Pick one of the PGAS languages
(http://en.wikipedia.org/wiki/Par...), e.g.
X10 (http://en.wikipedia.org/wiki/X10..., go
through the tutorials
(http://ppppcourse.ning.com/forum...),
run some HPC benchmarks (LU, FFT) and the examples (the streaming
example in particular): see how it scales on a cluster/AWS, compare to
sequential and Hadoop/MapReduce implementation, see what kind of
performance/scalability gains it gives you on multicore boxes.- Some good references on parallel programming: Herlihy& Shavit, The art
of multiprocessor programming:
http://www.amazon.com/Art-Multip... , Blelloch, Vector models for data-parallel computing:
http://citeseerx.ist.psu.edu/vie... , Valiant, A bridging model for parallel computation:
http://portal.acm.org/citation.c... ,Hillis & Steele, Data
Parallel Algorithms: http://portal.acm.org/citation.c... , Miller & Boxer, Algorithms sequential and parallel: http://www.amazon.com/Algorithms... , Leighton, Introduction to Parallel Algorithms and Architectures: http://www.amazon.com/Introducti... , JaJa: Introduction to Parallel Algorithms: http://www.amazon.com/Introducti...- You should probably start with Dijkstra, Cooperating Sequential Processes: http://www.cs.utexas.edu/users/E... and Ben-Ari, Principles of Concurrent and Distributed Programming:http://www.amazon.com/Principles... (I have the older edition from 1982, which is an excellent intro)
- Take a course in Parallel Computer Architecture:http://www.eecs.berkeley.edu/~cu... , http://www.amazon.com/Parallel-C... , http://people.engr.ncsu.edu/efg/...
- Check out Cilk: http://software.intel.com/en-us/... and
Matlab Parallel computing toolbox:
http://www.mathworks.com/product...- For some theoretical background on distributed algorithms, information decomposition and complexity see: Feldman et al., On the Complexity of Processing Massive, Unordered, Distributed Data:http://arxiv.org/abs/cs/0611108, Traub, An introduction to information based complexity: http://octopus.library.cmu.edu/C... andIs Nancy Lynch's book still the best intro to distributed algorithms?
- Parallel Distributed Processing (PDP) by Rumelhart and PDP research group: http://www.amazon.com/Parallel-D... - look up the computing architectures for Artificial Neural Networks, e.g.http://www.amazon.com/Parallel-A...
- Run some experiments with Weka (http://www.cs.waikato.ac.nz/ml/w...) or RapidMiner (http://rapid-i.com/), pick a simple algorithm and port it to MapReduce, see how it scales on a cluster/AWS
- Experiment with distributed 'NoSQL' data stores (Voldemort, Hbase, Redis, Tokyo, Cassandra etc). Figure out what is CAP theorem all about
(http://www.allthingsdistributed.... , http://www.cloudera.com/blog/201... ).
Create a simple app with key-value or column-based store as a back-end.
Import several GBs of interesting data into it and run some simple
clustering/KNN algos (http://en.wikipedia.org/wiki/Clu...,http://en.wikipedia.org/wiki/Nea...).
Optimize your algo to better utilize random access patterns, experiment
with various tuning options. Build a frond-end visualization for the
results (Check out Protovis or similar visualization
package: http://vis.stanford.edu/protovis/)- A good resource on 'NoSQL': Daniel Abadi's publications: http://cs-www.cs.yale.edu/homes/... and Varley, No Relation: The Mixed Blessings of Non-Relational Databases: http://ianvarley.com/UT/MR/Varle...
- Doozer: http://xph.us/2011/04/13/introdu...
- Learn about main-memory
databases: http:What is YouTube's architecture?//en.wikipedia.org/wiki/In-memory_database , http://scholar.google.com/schola..., http://monetdb.cwi.nl/ , http://hstore.cs.brown.edu/ , Microsoft Trinity - a graph database over distributed memory cloud:http://research.microsoft.com/en...- Write a distributed hash table in C, here is a good reference:http://pdos.csail.mit.edu/papers... or use node:https://github.com/stbuehler/nod...
- Networking: http://www.amazon.com/Unix-Netwo... ,http://www.amazon.com/TCP-Illust... , What are some good resources for learning about network programming?
- Write a distributed file system in C. See git for inspiration: http://apenwarr.ca/log/?m=200801#31 , Frangipanihttp://portal.acm.org/citation.c... . For a good intro see the Tanenbaum's series: http://www.amazon.com/Distribute... ,http://www.amazon.com/Modern-Ope... andhttp://www.stanford.edu/class/cs... , The Amoeba Distributed OS:http://www.cs.vu.nl/pub/amoeba/a...
- Graph databases, etc: http://nosql-database.org/ , http://www.graph-database.org/ , GraphLab: http://graphlab.org/
- What is Facebook's architecture? , What are the most interesting Facebook Data papers/projects?
- Hadoop/Hbase at Facebook: http://borthakur.com/ftp/Realtim...
- What is YouTube's architecture?
- How does justin.tv work?
- Scalability: How does Heroku work?
- What is Netflix's architecture?
- What is Hulu's architecture?
- What is eBay's architecture?
- What is Dropbox's architecture?
- What are the core technologies that Twitter uses for their platform and what is the Twitter Macro architecture?
- LinkedIn (product) SNA: http://sna-projects.com/sna/
- What is LinkedIn's database architecture like?
- Quora Infrastructure: How does LiveNode work?
- Scaling LiveJournal: http://danga.com/words/2007_06_u...
- Content Delivery Networks: How does Akamai CDN work?
- GitHub architecture: https://github.com/blog/530-how-...
- Twitter Rainbird: http://www.slideshare.net/kevinw..., http://www.slideshare.net/nkalle...
- Yahoo! S4: https://github.com/s4/core , http://docs.s4.io/
- IBM Infosphere streams/System S: http://www-01.ibm.com/software/d...
- BackType Storm: http://news.ycombinator.com/item...
- Octobot, a distributed task queue worker: http://octobot.taco.cat/
- F* by Microsoft: http://research.microsoft.com/en...
- HN thread on the architecture of backend systems:http://news.ycombinator.com/item...
- The secrets of Node's success: http://radar.oreilly.com/2011/06...
- Druid: A Distributed, In-Memory OLAP Store: http://metamarketsgroup.com/blog... (some dissing here: http://news.ycombinator.com/item...)
- FathomDB response to AWS outage: http://news.ycombinator.com/item...
- Google Ganeti - Cluster-based virtualization management software:http://code.google.com/p/ganeti/ , http://k1024.org/~iusty/papers/i...
- Google GO: http://www.theregister.co.uk/201...
- Erlang/OTP: http://learnyousomeerlang.com/co...
- Cloud Haskell: http://research.microsoft.com/en...
- NASA Nebula: http://nebula.nasa.gov/
- Platform MR: http://www.platform.com/Products...
- Fast 2011: http://www.usenix.org/events/fas...
- The history of consensus: http://betathoughts.blogspot.com... (viahttp://the-paper-trail.org )
- Distributed Linked List: http://www.google.com/search?scl...
- GIbbons, Synopsys Data Structures For Massive Data Sets:www.cs.princeton.edu/courses/archive/spring04/cos598B/bib/GibbonsM-syn.pdf
- Lock-Free Linked Lists and Skip Lists: http://www.cse.yorku.ca/~ruppert...
- Maekawa's lock: http://www.google.com/#sclient=p...
- Crossbow, searching for SNPs with cloud computing:http://www.biomedcentral.com/con...
- Distributed Caching: Hazelcast, Ehcache, Terracotta (company),Memcached, Oracle Coherence
- Bitcoin, A Peer-to-Peer Electronic Cash System: www.bitcoin.org/bitcoin.pdf
- Distributed computing with JS: BitCoin miner: http://news.ycombinator.com/item..., MapRejuice: https://github.com/ryanmcgrath/m... ,http://www.igvita.com/2009/03/03...
- OpenCirrus - Cloud Computing Research Testbed:https://opencirrus.org/content/r...
- Antonio Piccolboni, A Comparison of Eight MapReduce
Languages: http://www.dataspora.com/2011/04...- Caching and processing 2TB in memory with Hazelcast:http://highscalability.com/blog/...
- Dapper: a Large-Scale Distributed Systems Tracing Infrastructure:http://static.googleusercontent....
- On the performance of distributed lock-based synchronization:http://portal.acm.org/citation.c...
- Tonika: social routing with organic security: http://pdos.csail.mit.edu/~petar...
- http://www.linuxvirtualserver.org/
- It's time for low latency: http://www.matt-welsh.blogspot.c...
- OS Research Wanted: http://surriel.com/research_wanted
- FlightPath: Obedience vs. Choice in Cooperative Services: http://www.usenix.org/event/osdi...
- Piccolo: Building Fast, Distributed Programs with Partitioned Tables:http://piccolo.news.cs.nyu.edu/p...
- CIEL: a universal execution engine for distributed data-flow computing:www.usenix.org/event/nsdi11/tech/full_papers/Murray.pdf
- Directed Edge: On building a stupidly fast graph database:http://blog.directededge.com/200...
- What is the best tutorial for Python's Twisted framework?
- What are the best resources to learn Node.js?
- Parallelism /= Concurrency: http://ghcmutterings.wordpress.c...
- PyCon 2011: Handling ridiculous amounts of data with probabilistic data structures: http://blip.tv/pycon-us-videos-2...
- Go (programming language) at Heroku: http://blog.golang.org/2011/04/g...
- Meijer & Lamport, Mathematical Reasoning and Distributed Systems:http://channel9.msdn.com/Shows/G...
- Concurrency's Shysters: http://blogs.oracle.com/bmc/entr...
- Horton: Online Query Execution On Large Distributed Graphs:http://www.graph-database.org/20...
- DataDomain: http://www.datadomain.com/
- WebRTC: https://sites.google.com/site/we...
- Bloom: http://www.bloom-lang.net/ and http://boom.cs.berkeley.edu/pape...
- Jini tutorial: http://jan.newmarch.name/java/ji...
- java distributed cache for low latency, high availability:http://stackoverflow.com/questio...
- Scalable, Distributed Data Structures for Internet Service Construction (2000): http://usenix.org/events/osdi200...
- Cloud Programming: From Doom and Gloom to BOOM and Bloom:http://neilconway.org/talks/boom...
- SEDA: An Architecture for Highly Concurrent Server Applications:http://www.eecs.harvard.edu/~mdw... (http://matt-welsh.blogspot.com/2... )
- Scalable Network programming: http://bulk.fefe.de/scalable-net...
- Protocol Buffers: http://news.ycombinator.com/item...
- What are some current directions in operating system research?
- What are the best resources for learning about distributed file systems?
- How do I approach building a distributed queue architecture?
- What are some good resources for learning about data compression? Why?
- What are some good resources to get started with Information Retrieval? Why?
- What are good resources to learn about search engine architecture?
- What are the good resources to learn about distributed, scalable, robust software architecture/infrastructure building?
- What are some common approaches to error aggregation, alerting, and analysis in distributed systems?
- What are some good research papers and articles on fault-tolerant systems design?
- What are the most influential papers in the world of big data? Why?
- What are some introductory resources for learning about large scale machine learning? Why?
- Which CS areas have the most low-hanging fruit for research?
- Why the current obsession with big data?
- Which conferences are the best to follow for Distributed Systems?
- What are the best recommended research topics on databases according to edge technologies and recent research trends?
- What are the most interesting research projects related to the management of distributed systems?
- What are some high performance TCP hacks?
- Pike, Systems Software Research is Irrelevant: http://herpolhode.com/rob/utah20...
- Concurrency in Go (programming language): http://golang.org/doc/effective_...
- Communicating Sequential Processes (CSP): http://www.usingcsp.com/
- Scalable Joins: http://research.microsoft.com/en...
- Kestrel, tiny queue system based on starling, in scala:https://github.com/robey/kestrel
- Disruptor - concurrent programming framework:http://code.google.com/p/disruptor/
- DataTurbine streaming engine: http://www.dataturbine.org/
- The Task Parallel Library (TPL) in .NET: http://msdn.microsoft.com/en-us/...
- A crash course on modern hardware: http://www.infoq.com/presentatio...
- Infinispan data grid on top of JGroups: http://www.jboss.org/infinispan
- Memcached distributed cache on top of Jgroups:http://www.jgroups.org/memcached...
- Scalable Application Layer Multicast: http://pages.cs.wisc.edu/~suman/...
- Hadapt: Efficient Processing of Data Warehousing Queries in a split execution environment: http://portal.acm.org/citation.c...
- Danga's open source projects: http://danga.com/
- Hadoop on MPI: http://hadoopbi.com/index.php/te...
- Hadoop on Pallet: http://sritchie.github.com/2011/...
- Oracle Grid Engine: http://en.wikipedia.org/wiki/Ora...
- Ejabberd: a scalable XMPP instant messaging server:http://www.ejabberd.im/
- Cheetah - Circuit-switched High-speed End-to-End Transport ArcHitecture:
http://www.ece.virginia.edu/chee...- PVM: http://www.snakebytestudios.com/...
- Systems at ETH Zurich: http://www.systems.ethz.ch/resea...
- OpenCL: http://www.khronos.org/opencl/
- Cloud Haskell: http://research.microsoft.com/en...
- Concurrent programming in Erlang: http://www.erlang.org/erlang_boo...
- Concurrent programming in Occam 2: http://www.amazon.com/Programmin...
- Concurrent and Real-Time Programming in Ada:http://www.amazon.com/Concurrent...
- Distributed Programming in Ruby: http://www.amazon.com/dp/0321638...
- REST: http://rest.elkstein.org/
- Gearman: http://gearman.org
- Drizzle: https://launchpad.net/drizzle
- Distributed logging with syslog: https://wiki.archlinux.org/index...
- Logs are streams, not files: http://adam.heroku.com/past/2011...
- Utilizing Redis in distributed Erlang systems (Heroku): http://erlang-factory.herokuapp....
- MS Command Shell: http://arstechnica.com/business/...
- MS PowerShell: http://en.wikipedia.org/wiki/Win...
- DRb - Distributed Ruby: http://segment7.net/projects/rub...
- God - The Ruby Framework for Process Management:https://github.com/mojombo/god
- Taco Bell programming: http://teddziuba.com/2010/10/tac...
- Rush - the Ruby Shell: http://rush.heroku.com/
- CloudCrowd: https://github.com/documentcloud...
- Coda: http://www.coda.cs.cmu.edu
- Tenzing: http://research.google.com/pubs/...
- GNTP: http://www.growlforwindows.com/g...
- CycleCloud : http://blog.cyclecomputing.com/2...
- GNU Parallel: http://www.gnu.org/software/para...
- Torque: http://www.adaptivecomputing.com...
- Chef: http://www.opscode.com/chef/
- Dempsy - Nokia's Distributed Elastic Message Processing System:http://dempsy.github.com/Dempsy/
- F1 -The Fault-Tolerant Distributed RDBMS Supporting Google's Ad Business: http://research.google.com/pubs/...
- Galaxy - a distributed in-memory data grid by Parallel Universe:http://blog.paralleluniverse.co/...
- Spanner: Google's Globally-Distributed Database: http://research.google.com/archi...
- Jingle: http://code.google.com/p/libjingle/
- Paxos Made Live: http://www.eecs.harvard.edu/cs26...
- TeleHash: https://github.com/quartzjer/Tel...
- Adobe RTMP: http://en.wikipedia.org/wiki/Rea...
- UPnP: http://en.wikipedia.org/wiki/Uni...
- IP multicast: http://en.wikipedia.org/wiki/IP_...
- Reliable multicast: http://en.wikipedia.org/wiki/Rel...
- JGroups: http://www.jgroups.org/
- LDPC Codes: http://en.wikipedia.org/wiki/Low...
- Erasure codes: see the last chapter in D.J.C MacKay's http://www.inference.phy.cam.ac.... (note that some of these are heavily patented, e.g. http://www.inference.phy.cam.ac.... )
- Blaze: Next-gen NumPy on which to build out-of-core and distributed algorithms: https://speakerdeck.com/sdiehl/b...
- Spark cluster computing by UC Berkeley AMPLab
- NSQ by bitly | ♥ your bitmarks: NSQ: realtime distributed message processing at scale
- Celery: Distributed Task Queue
- Go'Circuit by Tumblr: Paradigm for developing and sustaining Big Data apps
- List of popular backend stacks: ragingwind/backend-architectures.md
- CRAN Task View: High-Performance and Parallel Computing with R
- Apache Accumulo by NSA
- alternative-internet
- OpenFlow for programmable networks: Page on Openflow, switch spec:http://archive.openflow.org/docu...
- On data center scale, OpenFlow, and SDN
- Docker, The linux container engine , and one use case at Etsy: LXC - Running 14,000 tests per day and beyond! (Part 1)
- Apache Mesos (A Platform for Fine-Grained Resource Sharing in the Data Center, tech report: Page on Berkeley )
- Apache Spark: Lightning-Fast Cluster Computing viahttps://news.ycombinator.com/ite...
- Google's MillWheel: Fault-Tolerant Stream Processing at Internet Scale:Page on Googleusercontent
- Google's Sibyl, a distributed learning system: www.magicbroom.info/Papers/Ladis10.pdf
- Berkeley amplab - SQL Benchmark: Redshift vs Hive vs Impala vs Shark :Big Data Benchmark based on Page on Brown
- The Log: What every software engineer should know about real-time data's unifying abstraction | LinkedIn Engineering by Jay Kreps (alsoKafka: a Distributed Messaging System for Log Processing and Why local state is a fundamental primitive in stream processing)
- Apache Phoenix | Distributed SQL query engine for Hbase by Salesforce
- Presto | Distributed SQL Query Engine for Big Data by Facebook
- Supersonic Query Engine by Google
- GridGain Systems | In-Memory Computing
- 0xdata/h2o - R accelerator
- Consul - coordination service
- Fastpass - TCP alternative
- GQL - a SQL-like language for Google App Engine
- ElasticSearch - Rivers & Plugins
- DocumentDB | Azure
- Overview of Spark ecosystem: The lab that created Spark wants to speed up everything, including cures for cancer
- DataMPI - Extending MPI for Spark with Key-Value based Communication, DataMPI paper , Performance of DataMPI
- Big Data Benchmark by Chinese Academy of Sciences
- Big Data Benchmark by AMPLab - UC Berkeley
- SPARK-LIBLINEAR: Linear Support Vector Machines Using Spark
- Ceph , a scalable alternative to the Hadoop Distributed File System
- OpenStack Open Source Cloud Computing Software
- InfluxDB - Distributed Time Series Database in Go
- Firebase - Cloud Backend for Realtime Apps
- Tango: Distributed Data Structures over a Shared Log
- Signal/Collect - Parallel Graph Processing
- GraphX: A Resilient Distributed Graph System on Spark
- Apache Mesos or Why the data center needs an operating system
- In-Stream Data Processing
- Distributed (in-memory) graph processing with Akka
- Fora language