handson3
515030910223
杨健邦
- 整个框架
Hostname | Zookeeper | HDFS | HBASE | Spark |
---|---|---|---|---|
hadoop-master | NameNode SecondaryNameNode |
|||
hadoop-slave | DataNode | |||
hbase-master | YES | HMaster | ||
hbase-region1 | YES | RegionServer BackupMaster |
||
hbase-region2 | YES | RegionServer BackupMaster |
||
spark-master | Master | |||
spark-worker1 | Worker | |||
spark-worker2 | Worker | |||
spark-worker3 | Worker | |||
spark-worker4 | Worker |
Question 1:
After configuring HDFS, please type "jps" in the bash. What is the result for the two containers?
- Master Node
root@hadoop-master:~# jps
516 ResourceManager
786 Jps
164 NameNode
363 SecondaryNameNode
- Slave Node
root@hadoop-slave:~# jps
69 DataNode
310 Jps
181 NodeManager
Question 2:
If you use a standalone ZooKeeper service, after setting up, type
bin/zkServer.sh status
Is there any difference among the outputs from containers? If so, what's the difference?
root@habase-region1:~# zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Mode: follower
root@hbase-region2:~# zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Mode: leader
root@hbase-master:~# zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /usr/local/zookeeper/bin/../conf/zoo.cfg
Mode: follower
Question 3:
After killing the primary master, the backup master should be elected as a new primary. Please read the log in ZooKeeper and HBase and describe what acutally happens.
- hbase-region1 中hbase的log
2017-12-22 08:09:15,911 INFO [hbase-region1:16000.activeMasterManager] master.ActiveMasterManager: Another master is the active master, hbase-master,16000,1513930144374; waiting to become the next active master
2017-12-22 08:10:10,039 INFO [hbase-region1:16000.activeMasterManager] master.ActiveMasterManager: Another master is the active master, hbase-region2,16000,1513930151343; waiting to become the next active master
- hbase-region2 中hbase的log
2017-12-22 08:09:15,963 INFO [hbase-region2:16000.activeMasterManager] master.ActiveMasterManager: Another master is the active master, hbase-master,16000,1513930144374; waiting to become the next active master
2017-12-22 08:10:10,046 INFO [hbase-region2:16000.activeMasterManager] master.ActiveMasterManager: Registered Active Master=hbase-region2,16000,1513930151343
一开始,hbase-master节点是primary,其它hbase-region节点是作为backup primary的,每个节点都有zookeeper的进程来维护“谁是primary的一致性”。其它两个backup一直在监测有没有primary的存在,没有的话就一直等待。当发现master无法正常连接到的时候,hbase-region2的activeManger通过zookeeper将自己变为了primary,系统继续正常工作。
Question 4
Can you find where the data is stored in HDFS? Please answer it in detail by descibing the files within the directories related to HBase.
hbase.rootdir
hdfs://hadoop-master:9000/hbase
hbase.cluster.distributed
true
hbase.zookeeper.quorum
hbase-master,hbase-region1,hbase-region2
配置文件将hbase的数据放在HDFS的根目录的一个叫作hbase的目录里面。
root@hadoop-master:~# hadoop fs -ls /
Found 1 items
drwxr-xr-x - root supergroup 0 2017-12-22 08:26 /hbase
root@hadoop-master:~# hadoop fs -ls /hbase
Found 8 items
drwxr-xr-x - root supergroup 0 2017-12-22 08:26 /hbase/.tmp
drwxr-xr-x - root supergroup 0 2017-12-22 08:26 /hbase/MasterProcWALs
drwxr-xr-x - root supergroup 0 2017-12-22 08:26 /hbase/WALs
drwxr-xr-x - root supergroup 0 2017-12-22 08:26 /hbase/corrupt
drwxr-xr-x - root supergroup 0 2017-12-22 07:48 /hbase/data
-rw-r--r-- 3 root supergroup 42 2017-12-22 07:48 /hbase/hbase.id
-rw-r--r-- 3 root supergroup 7 2017-12-22 07:48 /hbase/hbase.version
drwxr-xr-x - root supergroup 0 2017-12-22 08:26 /hbase/oldWALs
每张表以它的名字作为文件夹放在/hbase/data中,如果未指定namespace,则在/hbase/data/default中。
Question5
What parameters need be configured to run the code above? Please list them all.
将hbase的配置信息加载。
conf.addResource("hbase-site.xml")
conf.set(TableInputFormat.INPUT_TABLE, "test")
test 为表名字。
object WorkCount {
def main(args: Array[String]) {
val sparkConf = new SparkConf().setAppName("Spark_hbase").setMaster("local[2]")
val sc = new SparkContext(sparkConf)
val conf = HBaseConfiguration.create()
conf.addResource("hbase-site.xml")
conf.set(TableInputFormat.INPUT_TABLE, "test")
val usersRDD = sc.newAPIHadoopRDD(conf, classOf[TableInputFormat], classOf[org.apache.hadoop.hbase.io.ImmutableBytesWritable], classOf[org.apache.hadoop.hbase.client.Result])
val count = usersRDD.count()
println("Temp RDD count:" + count)
}
}