Lesson 1 : Introduction to Distributed Systems

0x00 Introduction

This post focus on distributed systems. The basic concept and some main topics.

0x01 Distributed Systems

1. Definition

Various definitions of distributed systems have been given in the literature ! Here are two.

One :

A distributed system is a model in which components located on networked computers communicate and coordinate their actions by passing messages. The components interact with each other in order to achieve a common goal.

The other :

A distributed system is a collection of independent computers that appears to its users as a single coherent system.

Three significant characteristics of distributed systems are:

  • concurrency of components
  • lack of a global clock
  • independent failure of components

Some example of distributed system: multiple cooperating computers;big databases, P2P file sharing, MapReduce, DNS, &c;lots of critical infrastructure is distributed!

2. Why distributed?

  • to connect physically separate entities
  • to achieve security via isolation
  • to tolerate faults via replication
  • to scale up throughput via parallel CPUs/mem/disk/net

3. Difficulties?

  • complex: many concurrent parts
  • must cope with partial failure
  • tricky to realize performance potential

0x02 Topics

1. Performance

What we want: scalable throughput.

Nx servers -> Nx total throughput via parallel CPU, disk, net. So handling more load only requires buying more computers.

But Scaling gets harder as N grows. Why?

  • Load im-balance, stragglers. (Some node is much more slower than others. 慢节点)
  • Non-parallelizable code: initialization, interaction.
  • Bottlenecks from shared resources, e.g. network.

2. fault tolerance

1000s of servers, complex net -> always something broken. We'd like to hide these failures from the application.

What we want:

  • Availability -- app can keep using its data despite failures
  • Durability -- app's data will come back to life when failures are repaired

How: replicated servers.

If one server crashes, client can proceed using the other(s).

3. consistency(一致性)

Consistency is an issue for both replicated objects and transactions involving related updates to different objects (recall ACID properties)

Achieving good behavior is hard!

  • "Replica" servers are hard to keep identical.
  • Clients may crash midway through multi-step update.
  • Servers crash at awkward moments, e.g. after executing but before replying.
  • Network may make live servers look dead; risk of "split brain".

Consistency and performance are enemies.

  • Consistency requires communication, e.g. to get latest Put().
  • "Strong consistency" often leads to slow systems.
  • High performance often imposes "weak consistency" on - applications.
    People have pursued many design points in this spectrum.

参考

  • https://www.cl.cam.ac.uk/teaching/0910/ConcDistS/11a-cons-tx.pdf
  • https://en.wikipedia.org/wiki/Consistency_model
  • https://pdos.csail.mit.edu/6.824/notes/l01.txt
  • https://en.wikipedia.org/wiki/Distributed_computing

你可能感兴趣的:(Lesson 1 : Introduction to Distributed Systems)