Handson-3

handson3

515030910223
杨健邦

  • 整个框架
Hostname Zookeeper HDFS HBASE Spark
hadoop-master NameNode
SecondaryNameNode
hadoop-slave DataNode
hbase-master YES HMaster
hbase-region1 YES RegionServer
BackupMaster
hbase-region2 YES RegionServer
BackupMaster
spark-master Master
spark-worker1 Worker
spark-worker2 Worker
spark-worker3 Worker
spark-worker4 Worker

Question 1:
After configuring HDFS, please type "jps" in the bash. What is the result for the two containers?

  • Master Node
root@hadoop-master:~# jps
516 ResourceManager
786 Jps
164 NameNode
363 SecondaryNameNode
  • Slave Node
root@hadoop-slave:~# jps
69 DataNode
310 Jps
181 NodeManager

Question 2:

If you use a standalone ZooKeeper service, after setting up, type

bin/zkServer.sh status

Is there any difference among the outputs from containers? If so, what's the difference?

root@habase-region1:~# zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Mode: follower
root@hbase-region2:~# zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Mode: leader
root@hbase-master:~# zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Mode: follower

Question 3:
After killing the primary master, the backup master should be elected as a new primary. Please read the log in ZooKeeper and HBase and describe what acutally happens.

  • hbase-region1 中hbase的log
2017-12-22 08:09:15,911 INFO  [hbase-region1:16000.activeMasterManager] master.ActiveMasterManager: Another master is the active master, hbase-master,16000,1513930144374; waiting to become the next active master
2017-12-22 08:10:10,039 INFO  [hbase-region1:16000.activeMasterManager] master.ActiveMasterManager: Another master is the active master, hbase-region2,16000,1513930151343; waiting to become the next active master
  • hbase-region2 中hbase的log
2017-12-22 08:09:15,963 INFO  [hbase-region2:16000.activeMasterManager] master.ActiveMasterManager: Another master is the active master, hbase-master,16000,1513930144374; waiting to become the next active master
2017-12-22 08:10:10,046 INFO  [hbase-region2:16000.activeMasterManager] master.ActiveMasterManager: Registered Active Master=hbase-region2,16000,1513930151343

一开始,hbase-master节点是primary,其它hbase-region节点是作为backup primary的,每个节点都有zookeeper的进程来维护“谁是primary的一致性”。其它两个backup一直在监测有没有primary的存在,没有的话就一直等待。当发现master无法正常连接到的时候,hbase-region2的activeManger通过zookeeper将自己变为了primary,系统继续正常工作。

Question 4
Can you find where the data is stored in HDFS? Please answer it in detail by descibing the files within the directories related to HBase.


  
    hbase.rootdir
    hdfs://hadoop-master:9000/hbase
  
  
    hbase.cluster.distributed
    true
  
  
    hbase.zookeeper.quorum
    hbase-master,hbase-region1,hbase-region2
  

配置文件将hbase的数据放在HDFS的根目录的一个叫作hbase的目录里面。

root@hadoop-master:~# hadoop fs -ls /
Found 1 items
drwxr-xr-x   - root supergroup          0 2017-12-22 08:26 /hbase
root@hadoop-master:~# hadoop fs -ls /hbase
Found 8 items
drwxr-xr-x   - root supergroup          0 2017-12-22 08:26 /hbase/.tmp
drwxr-xr-x   - root supergroup          0 2017-12-22 08:26 /hbase/MasterProcWALs
drwxr-xr-x   - root supergroup          0 2017-12-22 08:26 /hbase/WALs
drwxr-xr-x   - root supergroup          0 2017-12-22 08:26 /hbase/corrupt
drwxr-xr-x   - root supergroup          0 2017-12-22 07:48 /hbase/data
-rw-r--r--   3 root supergroup         42 2017-12-22 07:48 /hbase/hbase.id
-rw-r--r--   3 root supergroup          7 2017-12-22 07:48 /hbase/hbase.version
drwxr-xr-x   - root supergroup          0 2017-12-22 08:26 /hbase/oldWALs

每张表以它的名字作为文件夹放在/hbase/data中,如果未指定namespace,则在/hbase/data/default中。

Question5
What parameters need be configured to run the code above? Please list them all.

将hbase的配置信息加载。

conf.addResource("hbase-site.xml")
conf.set(TableInputFormat.INPUT_TABLE, "test")

test 为表名字。

object WorkCount {
  def main(args: Array[String]) {
    val sparkConf = new SparkConf().setAppName("Spark_hbase").setMaster("local[2]")
    val sc = new SparkContext(sparkConf)
    val conf = HBaseConfiguration.create()
    conf.addResource("hbase-site.xml")
    conf.set(TableInputFormat.INPUT_TABLE, "test")
    val usersRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat], classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable],                        classOf[org.apache.hadoop.hbase.client.Result])
    val count = usersRDD.count()
    println("Temp RDD count:" + count)
  }
}

你可能感兴趣的:(Handson-3)