【系统设计】相关总结

1 最新考查内容 nosql, bigdata, concurrency, distribute

1. NO SQL:

K/V store: Memcached, Redis
Document based: Mongodb, CouchDB (zhaoce大牛最钟爱的)
Column based: HBase, Cassandra
Graph based: Neo4j

2. BigData
Hadoop including HDFS and Mapreduce (800题大牛的钟爱)
HBase
Hive, Pig, Cascalog etc
Data mining

3. Concurrency
Multi threading: Java, C++
Actor model: Scala AKKA, Erlang
Reactor model: Node.js, Ruby eventmachine, Python twisted
STM: Clojure, Haskell

4. Distributed computing
很多技术的融合,这个应该800题大牛来讲讲,看他最近在研究zookeeper
主要考虑的是reliability, scalability, concurrency, high availability,  low 
latency, high throughput, fault tolerance, fail over 等等,比较杂。需要对各
种技术的了解,根据需求能够给出一个比较好的方案出来,很多tradeoffs在里面。



2 关于akka 并行和分布式系统:

site: http://akka.io/  

doc: http://doc.akka.io/docs/akka/2.3.7/java.html?_ga=1.80409751.1491357007.1417354870

Actors

Actors are very lightweight concurrent entities. They process messages asynchronously using an event-driven receive loop. Pattern matching against messages is a convenient way to express an actor's behavior. They raise the abstraction level and make it much easier to write, test, understand and maintain concurrent and/or distributed systems. You focus on workflow—how the messages flow in the system—instead of low level primitives like threads, locks and socket IO. Learn More.

Remoting

Actors are location transparent and distributable by design. This means that you can write your application without hardcoding how it will be deployed and distributed, and then later just configure your actor system against a certain topology with all of the application’s semantics, including actor supervision, retained. Learn more.

Supervision

Actors form a tree with actors being parents to the actors they've created. As a parent, the actor is responsible for handling its children’s failures (so-called supervision), forming a chain of responsibility, all the way to the top. When an actor crashes, its parent can either restart or stop it, or escalate the failure up the hierarchy of actors. This enables a clean set of semantics for managing failures in a concurrent, distributed system and allows for writing highly fault-tolerant systems that self-heal. Learn more.


3 Java concurrency 

原文: http://blog.sina.com.cn/s/blog_b9285de20101j7cx.html

下面是总结出来的一个提纲 


1. How to implement thread

  • implement Runnable: recommended
  • implement Callable: has return value
  • extends Thread

2. How to create and start thread

  • Executor(Thread pool): recommended
  • Thread.start
  • Timer

以上内容理解起来都比较简单。其实如果多线程不共享资源,更确切的说不共享mutable资源的话,是谈不上复杂的。但是一旦涉及到共享mutable variable以后,所有的复杂问题开始接踵而至了。这也是为什么functional programming更适合并行运算的原因,因为FP最基本的特性就是immutability了,这一下就解决了多线程主要的复杂问题了。


3. Synchronization:因为share mutable variable就会产生race condition的问题,因此我们就需要同步了。下面是几种同步方式。

  • synchronized keyword: instance method, class method, code block
  • Lock
  • Semaphore
  • ReadWriteLock
  • CountDownLatch, CyclicBarrier etc.

因为同步编程比较难掌控,因此最好是能够回避同步。下面是一些avoid synchronization的办法。

  • Immutability like FP
  • ThreadLocal
  • volatile
  • Atomic Wrapper: java.util.concurrent.atomic
  • concurrent collection
    • CocurrentHashMap (HashMap)
    • CopyOnWriteArrayList <ArrayList>
    • CopyOnWriteArraySet <Set)
    • ConcurrentLinkedQueue (Queue)
    • ConcurrentSkipListMap (TreeMap)
    • ConcurrentSkipListSet (TreeSet)

4. Communiication:除了同步,线程之间还需要通信和协作。下面是几种通信方式。

  • wait, notify, notifyAll
  • Condition
  • Semaphore: can be used both for lock and signal  (can avoid missed signal)
  • concurrent collection
    • BlockingQueue

5. Common problems and solutions

  • Race condition (synchronization)
  • Deadlock (lock ordering, lock timeoutdeadlock detection)
  • Starvation (fairness)

6. Common coding questions

  • blocking queue
  • producer/consumer
  • read write lock
  • reentrant read write lock
  • dining philospher


4 关于  基础知识

Systems are complex, and when you’re designing a system you’re grappling with its full complexity. Given this, there are many topics you should be familiar with, such as:

  • Concurrency

  • Do you understand threads, deadlock, and starvation? Do you know how to parallelize algorithms? Do you understand consistency and coherence?

  • Networking

  • Do you roughly understand IPC and TCP/IP? Do you know the difference between throughput and latency, and when each is the relevant factor?
  • Abstraction
  • You should understand the systems you’re building upon. Do you know roughly how an OS, file system, and database work? Do you know about the various levels of caching in a modern OS?

  • Real-World Performance

  • You should be familiar with the speed of everything your computer can do, including the relative performance of RAM, disk, SSD and your network.

  • Estimation

  • Estimation, especially in the form of a back-of-the-envelope calculation, is important because it helps you narrow down the list of possible solutions to only the ones that are feasible. Then you have only a few prototypes or micro-benchmarks to write.

  • Availability and Reliability

  • Are you thinking about how things can fail, especially in adistributed environment? Do know how to design a system to cope with network failures? Do you understand durability?

Remember, we’re not looking for mastery of all these topics. We’re looking for familiarity. We just want to make sure you have a good lay of the land, so you know which questions to ask and when to consult an expert.


NETWORKING部分,IPC(相同或不同机器)的 进程(线程)之间的通信简称 IPC

Method Short Description Provided by (operating systems or other environments)
File A record stored on disk, or a record synthesized on demand by a file server, which can be accessed by multiple processes. Most operating systems
Signal A system message sent from one process to another, not usually used to transfer data but instead used to remotely command the partnered process. Most operating systems
Socket A data stream sent over a network interface, either to a different process on the same computer or to another computer on the network. Most operating systems
Message queue An anonymous data stream similar to a socket, usually implemented by the operating system, that allows multiple processes to read and write to the message queue without being directly connected to each other. Most operating systems
Pipe A two-way data stream between two processes interfaced through standard input and output and read in one character at a time. All POSIX systems, Windows
Named pipe A pipe implemented through a file on the file system instead of standard input and output. Multiple processes can read and write to the file as a buffer for IPC data. All POSIX systems, Windows
Semaphore A simple structure that synchronizes multiple processes acting on shared resources. All POSIX systems, Windows
Shared memory Multiple processes are given access to the same block of memory which creates a shared buffer for the processes to communicate with each other. All POSIX systems, Windows
Message passing Allows multiple programs to communicate using channels, commonly used in concurrency models. Used in MPI paradigm, Java RMI, CORBA, DDS, MSMQ,MailSlots, QNX, others
Memory-mapped file A file mapped to RAM and can be modified by changing memory addresses directly instead of outputting to a stream. This shares the same benefits as a standard file. All POSIX systems, Windows

你可能感兴趣的:(【系统设计】相关总结)