AWS

IaaS (Infrastructure as a Service, like Azure, Google Could, these kinds of virtual services), this is used to build large companies involved in different kinds of servers.

Not like DigitalOcean and Linode(VPS - virtual process service). It is more for building wordpress or kind of small websites involved in single server.

Services

  • CDN (CloudFront)
    Content deliver network, make you to access the website from the closest place.
  • Glacier
    Store data that is not used frequently
  • Storage
    Store data that is used frequently
  • Virtual Server
  • Lambda
    Pure compute without worrying the server.
  • Database

Benefits

  • Scalable(just spend more money)
  • Total Cost of Ownership is low , you need to hire people to deal with different servers and modules, like power, cooler, etc.
  • Highly reliable for price point
  • Centralized Billing and Management

Problems

  • lock in
  • learning curve
  • cost adds up

Pricing

  • compute
  • storage
  • bandwidth
  • interaction

Normal File system

  • Linux default disk block size = 4 KB, file smaller than a block, the rest of the block will be wasted
  • GFS <-> HDFS
  • MapReduce <-> Hadoop

HDFS

  • Specially designed FS for storing big data with a streaming access pattern (write once and read as many as you want)
  • default disk block size = 64MB, file smaller than a block, the rest of the block will NOT BE wasted

Hadoop

daemons

  • master daemons: name node, secondary name node, job tracker
  • slaves daemons: data node, task tracker

example - theory

  • we(client) have 200MB data, so we need 4 blocks
  • we need 1 name node(nn), and several data node(dn), e.g. 8 data nodes.
  • nn creates metadata, creates daemons.
  • nn passes metadata back to client. Then client distributes the blocks to the data nodes and make replications based on the info from name node.
  • the data nodes send heartbeats back to the nn to notify that it is alive.
  • client sends code to the data node
  • job tracker tells task trackers to do its job
  • after the job are finished, the job tracker will assign a reducer.

example - real world

  • split data(documents) into input splits, and pass them to Record Readers,
    then send them to the mapper. (default for text jobs is to split document into lines then send the lines to the mappers).
  • then shuffle the data to make the pairs with the same key together, default shuffle(sort) in Hadoop is alphabetically.
  • then reduce (each reducer reduces one key)

HDFS instructions

  • step 1
    hdfs dfs -ls /, hdfs dfs -mkdir, hdfs dfs -put, hdfs dfs -get
  • step 2 move file to hdfs
    hdfs dfs -put input.txt /user/class/
  • step 3 complie
    javac -cp $HADOOP_core.jar *.java
  • step 4
    jar cvf test.jar *.class
  • step 5
    hadoop jar wordcount.jar ...WordCount

Setup

Setup your AWS accounts by following the below steps:

  1. Go to AWS (https://aws.amazon.com/) and create an account. You need to enter your credit card info.
  2. You can find an AWS account number in your AWS profile. Use that account number to apply for AWS educate credits at https://aws.amazon.com/education/awseducate/apply/ It will take a few hours before your receive an email confirming your credits are active.

If you have not received your AWS educate credits and are not using free tier services you will be charged on your credit card for usage! If you do, you will be responsible for any costs incurred.

你可能感兴趣的:(AWS)