6.824 Notes1 (lec1 && lec2)

Introduction

reasons people build distributed system:

  • parallelism
  • fault tolerence
  • physical reasons
  • security / isolated

in 6.824, we'll focus on first two points;

challenges :

  • concurrency
  • partial failure
  • performance

Infrastructure consumes:

  • storage
  • communication
  • computation

Impl:

  • RPC, threads, concurrency control

Perfomance:

  • scalability (2x resoueces ----> 2x throughput) (careful about the design to actually get that perfomance)

fault tolerance:

  • availability
  • recoverability : non-volatile storage (raid, wal) && replication (sync problem, management problem, pretty complex);

Consistency:

  • update several copy(for fault tolerance), crash, get diffrence value;
  • weak consistency
  • strong consistency(expensive)

replica independent

MapReduce

input s---> a bunch of chunks --> call Map() for every input file -- > intermediate output: a list of key-value pairs.

-->collect all value corresponding to same key --- > call Reduce() for every key --> [key, totalVal] for each call.

The whole computation is call jobs.

6.824 Notes1 (lec1 && lec2)_第1张图片

Go

good support for thread, locking, sync, convenient RPC.

type safe, memory safe.

GC

threads

in Go calls goroutines,

each thread has a stacks (in the same memory addr space);

I/O concurrency,

parallelism

convenience (use a goroutine(sleep 1 sec) and fires a check, whether works are still alive)

event-driven programming(async) single thread control (IO concurrency, but no cpu parallelism)

lock!!!

language doesn't know anything about the relationship between lock and variables !!! They function properly because programmers know which variable should be protected. the language itself doesn't know nothing about them. it's totally up to us programmers to decide what we whill use the lock to protect.

Cordination

channels.

sync.Cond (condition variable)

waitGroup

你可能感兴趣的:(6.824 Notes1 (lec1 && lec2))