

  • CENTOS7.2
  • CDH5.10
  • Kudu1.2

3.1 Kudu安装

  • CDH从5.10开始,打包集成Kudu1.2,并且Cloudera正式提供支持。
    • 这个版本开始Kudu的安装较之前要简单很多,省去了Impala_Kudu,安装完Kudu,Impala即可直接操作Kudu。
  • 以下安装步骤基于用户使用Cloudera Manager来安装和部署Kudu1.2

3.1.1 安装csd文件

  • 下载csd文件
[root@ip-172-31-2-159 ~]# wget  http://archive.cloudera.com/kudu/csd/KUDU-5.10.0.jar
  • 将下载的jar包文件移动到/opt/cloudera/csd目录
[root@ip-172-31-2-159 ~]# mv KUDU-5.10.0.jar  /opt/cloudera/csd
  • 修改权限
[root@ip-172-31-2-159 ~]# chown  cloudera-scm:cloudera-scm /opt/cloudera/csd/KUDU-5.10.0.jar 
[root@ip-172-31-2-159 ~]# chmod 644  /opt/cloudera/csd/KUDU-5.10.0.jar
  • 重启Cloudera Manager服务
[root@ip-172-31-2-159 ~]# systemctl restart  cloudera-scm-server

3.1.2 安装Kudu服务

  • 下载Kudu服务需要的Parcel包
[root@ip-172-31-2-159 ~]# wget  http://archive.cloudera.com/kudu/parcels/5.10/KUDU-1.2.0-1.cdh5.10.1.p0.66-el7.parcel
[root@ip-172-31-2-159 ~]# wget  http://archive.cloudera.com/kudu/parcels/5.10/KUDU-1.2.0-1.cdh5.10.1.p0.66-el7.parcel.sha1
[root@ip-172-31-2-159 ~]# wget  http://archive.cloudera.com/kudu/parcels/5.10/manifest.json
  • 将Kudu的Parcel包部署到http服务
[root@ip-172-31-2-159 ~]# mkdir kudu1.2
[root@ip-172-31-2-159 ~]# mv  KUDU-1.2.0-1.cdh5.10.1.p0.66-el7.parcel* kudu1.2/
[root@ip-172-31-2-159 ~]# mv manifest.json kudu1.2
[root@ip-172-31-2-159 ~]# mv kudu1.2/ /var/www/html/
[root@ip-172-31-2-159 ~]# systemctl start httpd
  • 检查http显示Kudu正常:

  • 通过CM界面配置Kudu的Parcel地址,并下载,分发,激活Kudu。

  • 通过CM安装Kudu1.2
    • 添加Kudu服务

    • 选择Master和Tablet Server

  • 配置相应的目录,注:无论是Master还是Tablet根据实际情况数据目录(fs_data_dir)应该都可能有多个,以提高并发读写,从而提高Kudu性能

    • 启动Kudu服务

    • 安装完毕

3.1.3 配置Impala


3.2 快速组件服务验证

3.2.1 HDFS验证(mkdir+put+cat+get)

[root@ip-172-31-2-159 ~]# hadoop fs -mkdir -p  /lilei/test_table
[root@ip-172-31-2-159 ~]# cat > a.txt
[root@ip-172-31-2-159 ~]# hadoop fs -put a.txt  /lilei/test_table
[root@ip-172-31-2-159 ~]# hadoop fs -cat  /lilei/test_table/a.txt
[root@ip-172-31-2-159 ~]# rm -rf a.txt
[root@ip-172-31-2-159 ~]# hadoop fs -get  /lilei/test_table/a.txt
[root@ip-172-31-2-159 ~]# cat a.txt

3.2.2 Hive验证

[root@ip-172-31-2-159 ~]# hive

Logging initialized using configuration in jar:file:/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/jars/hive-common-1.1.0-cdh5.10.0.jar!/hive-log4j.properties
WARNING: Hive CLI is deprecated and migration to  Beeline is recommended.
hive> create external table test_table
    > (
    > s1  string,
    > s2  string
    > )
    >  row format delimited fields terminated by '#'
    >  stored as textfile location '/lilei/test_table';
Time taken: 0.631 seconds
hive> select * from test_table;
1   2
c   d
Time taken: 0.36 seconds, Fetched: 2 row(s)
hive> select count(*) from test_table;
Query ID =  root_20170404013939_69844998-4456-4bc1-9da5-53ea91342e43
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile  time: 1
In order to change the average load for a reducer  (in bytes):
  set  hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set  hive.exec.reducers.max=
In order to set a constant number of reducers:
  set  mapreduce.job.reduces=
Starting Job = job_1491283979906_0005, Tracking  URL = http://ip-172-31-2-159:8088/proxy/application_1491283979906_0005/
Kill Command =  /opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/bin/hadoop  job  -kill job_1491283979906_0005
Hadoop job information for Stage-1: number of mappers:  1; number of reducers: 1
2017-04-04 01:39:25,425 Stage-1 map = 0%,  reduce = 0%
2017-04-04 01:39:31,689 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.02 sec
2017-04-04 01:39:36,851 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 2.34 sec
MapReduce Total cumulative CPU time: 2 seconds  340 msec
Ended Job = job_1491283979906_0005
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1    Cumulative CPU: 2.34 sec   HDFS  Read: 6501 HDFS Write: 2 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 340  msec
Time taken: 21.56 seconds, Fetched: 1 row(s)

3.2.3 MapReduce验证

[root@ip-172-31-2-159 ~]# hadoop jar  /opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/hadoop-examples.jar pi 5  5
Number of Maps   = 5
Samples per Map = 5
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Starting Job
17/04/04 01:38:15 INFO client.RMProxy: Connecting  to ResourceManager at ip-172-31-2-159/
17/04/04 01:38:15 INFO mapreduce.JobSubmissionFiles:  Permissions on staging directory /user/root/.staging are incorrect:  rwxrwxrwx. Fixing permissions to correct value rwx------
17/04/04 01:38:15 INFO input.FileInputFormat:  Total input paths to process : 5
17/04/04 01:38:15 INFO mapreduce.JobSubmitter:  number of splits:5
17/04/04 01:38:15 INFO mapreduce.JobSubmitter:  Submitting tokens for job: job_1491283979906_0004
17/04/04 01:38:16 INFO impl.YarnClientImpl:  Submitted application application_1491283979906_0004
17/04/04 01:38:16 INFO mapreduce.Job: The url to  track the job:  http://ip-172-31-2-159:8088/proxy/application_1491283979906_0004/
17/04/04 01:38:16 INFO mapreduce.Job: Running  job: job_1491283979906_0004
17/04/04 01:38:21 INFO mapreduce.Job: Job  job_1491283979906_0004 running in uber mode : false
17/04/04 01:38:21 INFO mapreduce.Job:  map 0% reduce 0%
17/04/04 01:38:26 INFO mapreduce.Job:  map 100% reduce 0%
17/04/04 01:38:32 INFO mapreduce.Job:  map 100% reduce 100%
17/04/04 01:38:32 INFO mapreduce.Job: Job  job_1491283979906_0004 completed successfully
17/04/04 01:38:32 INFO mapreduce.Job: Counters:  49
    File  System Counters
       FILE:  Number of bytes read=64
       FILE:  Number of bytes written=749758
       FILE:  Number of read operations=0
       FILE:  Number of large read operations=0
       FILE:  Number of write operations=0
       HDFS:  Number of bytes read=1350
       HDFS:  Number of bytes written=215
       HDFS:  Number of read operations=23
       HDFS:  Number of large read operations=0
       HDFS:  Number of write operations=3
    Job  Counters 
       Launched  map tasks=5
       Launched  reduce tasks=1
       Data-local  map tasks=5
       Total  time spent by all maps in occupied slots (ms)=16111
       Total  time spent by all reduces in occupied slots (ms)=2872
       Total  time spent by all map tasks (ms)=16111
       Total  time spent by all reduce tasks (ms)=2872
       Total  vcore-seconds taken by all map tasks=16111
       Total  vcore-seconds taken by all reduce tasks=2872
       Total  megabyte-seconds taken by all map tasks=16497664
       Total  megabyte-seconds taken by all reduce tasks=2940928
    Map-Reduce  Framework
       Map  input records=5
       Map  output records=10
       Map  output bytes=90
       Map  output materialized bytes=167
       Input  split bytes=760
       Combine  input records=0
       Combine  output records=0
       Reduce  input groups=2
       Reduce  shuffle bytes=167
       Reduce  input records=10
       Reduce  output records=0
       Spilled  Records=20
       Shuffled  Maps =5
       Failed  Shuffles=0
       Merged  Map outputs=5
       GC  time elapsed (ms)=213
       CPU  time spent (ms)=3320
       Physical  memory (bytes) snapshot=2817884160
       Virtual  memory (bytes) snapshot=9621606400
       Total  committed heap usage (bytes)=2991587328
    Shuffle  Errors
    File  Input Format Counters 
       Bytes  Read=590
    File  Output Format Counters 
       Bytes  Written=97
Job Finished in 17.145 seconds
Estimated value of Pi is 3.68000000000000000000

3.2.4 Impala验证

[root@ip-172-31-2-159 ~]# impala-shell -i  ip-172-31-7-96
Starting Impala Shell without Kerberos  authentication
Connected to ip-172-31-7-96:21000
Server version: impalad version 2.7.0-cdh5.10.0  RELEASE (build 785a073cd07e2540d521ecebb8b38161ccbd2aa2)
Welcome to the Impala shell.
(Impala Shell v2.7.0-cdh5.10.0 (785a073) built on  Fri Jan 20 12:03:56 PST 2017)

Run the PROFILE command after a query has  finished to see a comprehensive summary
of all the performance and diagnostic information  that Impala gathered for that
query. Be warned, it can be very long!
[ip-172-31-7-96:21000] > show tables;
Query: show tables
| name        |
| test_table |
Fetched 1 row(s) in 0.20s
[ip-172-31-7-96:21000] > select * from  test_table;
Query: select * from test_table
Query submitted at: 2017-04-04 01:41:56  (Coordinator: http://ip-172-31-7-96:25000)
Query progress can be monitored at:  http://ip-172-31-7-96:25000/query_plan?query_id=c4a06bd46f9106b:4a69f04800000000
| s1 | s2 |
| 1  |  2  |
| c  |  d  |
Fetched 2 row(s) in 3.73s
[ip-172-31-7-96:21000] > select count(*) from  test_table;
Query: select count(*) from test_table
Query submitted at: 2017-04-04 01:42:06  (Coordinator: http://ip-172-31-7-96:25000)
Query progress can be monitored at:  http://ip-172-31-7-96:25000/query_plan?query_id=2a415724696f7414:1f9113ea00000000
| count(*) |
| 2         |
Fetched 1 row(s) in 0.15s

3.2.5 Spark验证

[root@ip-172-31-2-159 ~]# spark-shell
Setting default log level to "WARN".
To adjust logging level use  sc.setLogLevel(newLevel).
Welcome to
       ____              __
     /  __/__  ___ _____/ /__
    _\ \/ _  \/ _ `/ __/  '_/
   /___/  .__/\_,_/_/ /_/\_\   version 1.6.0

Using Scala version 2.10.5 (Java HotSpot(TM)  64-Bit Server VM, Java 1.7.0_67)
Type in expressions to have them evaluated.
Type :help for more information.
Spark context available as sc (master =  yarn-client, app id = application_1491283979906_0006).
17/04/04 01:43:26 WARN metastore.ObjectStore:  Version information not found in metastore.  hive.metastore.schema.verification is not enabled so recording the schema  version 1.1.0
17/04/04 01:43:27 WARN metastore.ObjectStore:  Failed to get database default, returning NoSuchObjectException
SQL context available as sqlContext.

scala> var  textFile=sc.textFile("hdfs://ip-172-31-2-159:8020/lilei/test_table/a.txt")
textFile: org.apache.spark.rdd.RDD[String] =  hdfs://ip-172-31-2-159:8020/lilei/test_table/a.txt MapPartitionsRDD[1] at  textFile at :27


scala> textFile.count()
res0: Long = 2

3.2.6 Kudu验证

[root@ip-172-31-2-159 ~]# impala-shell -i  ip-172-31-7-96
Starting Impala Shell without Kerberos  authentication
Connected to ip-172-31-7-96:21000
Server version: impalad version 2.7.0-cdh5.10.0  RELEASE (build 785a073cd07e2540d521ecebb8b38161ccbd2aa2)
Welcome to the Impala shell.
(Impala Shell v2.7.0-cdh5.10.0 (785a073) built on  Fri Jan 20 12:03:56 PST 2017)

Every command must be terminated by a ';'.
[ip-172-31-7-96:21000] > CREATE TABLE  my_first_table
                        > (
                       >   id BIGINT,
                       >   name STRING,
                       >   PRIMARY KEY(id)
                       > )
                       > PARTITION BY HASH  PARTITIONS 16
                       > STORED AS KUDU;
Query: create TABLE my_first_table
  id  BIGINT,
  name  STRING,

Fetched 0 row(s) in 1.35s
[ip-172-31-7-96:21000] > INSERT INTO  my_first_table VALUES (99, "sarah");
Query: insert INTO my_first_table VALUES (99,  "sarah")
Query submitted at: 2017-04-04 01:46:08  (Coordinator: http://ip-172-31-7-96:25000)
Query progress can be monitored at:  http://ip-172-31-7-96:25000/query_plan?query_id=824ce0b3765c6b91:5ea8dd7c00000000
Modified 1 row(s), 0 row error(s) in 3.37s
[ip-172-31-7-96:21000] > INSERT INTO  my_first_table VALUES (1, "john"), (2, "jane"), (3,  "jim");
Query: insert INTO my_first_table VALUES (1,  "john"), (2, "jane"), (3, "jim")
Query submitted at: 2017-04-04 01:46:13  (Coordinator: http://ip-172-31-7-96:25000)
Query progress can be monitored at:  http://ip-172-31-7-96:25000/query_plan?query_id=a645259c3b8ae7cd:e446e15500000000
Modified 3 row(s), 0 row error(s) in 0.11s
[ip-172-31-7-96:21000] > select * from  my_first_table;
Query: select * from my_first_table
Query submitted at: 2017-04-04 01:46:19  (Coordinator: http://ip-172-31-7-96:25000)
Query progress can be monitored at:  http://ip-172-31-7-96:25000/query_plan?query_id=f44021589ff0d94d:8d30568200000000
| id | name   |
| 2  |  jane  |
| 3  |  jim   |
| 1  |  john  |
| 99 | sarah |
Fetched 4 row(s) in 0.55s
[ip-172-31-7-96:21000] > delete from  my_first_table where id =99;
Query: delete from my_first_table where id =99
Query submitted at: 2017-04-04 01:46:56  (Coordinator: http://ip-172-31-7-96:25000)
Query progress can be monitored at:  http://ip-172-31-7-96:25000/query_plan?query_id=814090b100fdf0b4:1b516fe400000000
Modified 1 row(s), 0 row error(s) in 0.15s
[ip-172-31-7-96:21000] > select * from  my_first_table;
Query: select * from my_first_table
Query submitted at: 2017-04-04 01:46:57  (Coordinator: http://ip-172-31-7-96:25000)
Query progress can be monitored at:  http://ip-172-31-7-96:25000/query_plan?query_id=724aa3f84cedb109:a679bf0200000000
| id | name |
| 2  | jane  |
| 3  |  jim  |
| 1  | john  |
Fetched 3 row(s) in 0.15s
[ip-172-31-7-96:21000] > INSERT INTO  my_first_table VALUES (99, "sarah");
Query: insert INTO my_first_table VALUES (99,  "sarah")
Query submitted at: 2017-04-04 01:47:32  (Coordinator: http://ip-172-31-7-96:25000)
Query progress can be monitored at:  http://ip-172-31-7-96:25000/query_plan?query_id=6244b3c6d33b443e:f43c857300000000
Modified 1 row(s), 0 row error(s) in 0.11s
[ip-172-31-7-96:21000] > update my_first_table  set name='lilei' where id=99;
Query: update my_first_table set name='lilei'  where id=99
Query submitted at: 2017-04-04 01:47:32 (Coordinator:  http://ip-172-31-7-96:25000)
Query progress can be monitored at:  http://ip-172-31-7-96:25000/query_plan?query_id=8f4ab0dd3c19f9df:b2c7bdfa00000000
Modified 1 row(s), 0 row error(s) in 0.13s
[ip-172-31-7-96:21000] > select * from  my_first_table;
Query: select * from my_first_table
Query submitted at: 2017-04-04 01:47:34  (Coordinator: http://ip-172-31-7-96:25000)
Query progress can be monitored at:  http://ip-172-31-7-96:25000/query_plan?query_id=6542579c8bd5b6ad:af68f50800000000
| id | name   |
| 2  |  jane  |
| 3  |  jim   |
| 1  |  john  |
| 99 | lilei |
Fetched 4 row(s) in 0.15s
[ip-172-31-7-96:21000] > upsert  into my_first_table values(1,  "john"), (4, "tom"), (99, "lilei1");
Query: upsert into my_first_table values(1,  "john"), (4, "tom"), (99, "lilei1")
Query submitted at: 2017-04-04 01:48:52  (Coordinator: http://ip-172-31-7-96:25000)
Query progress can be monitored at:  http://ip-172-31-7-96:25000/query_plan?query_id=694fc7ac2bc71d21:947f1fa200000000
Modified 3 row(s), 0 row error(s) in 0.11s
[ip-172-31-7-96:21000] > select * from  my_first_table;
Query: select * from my_first_table
Query submitted at: 2017-04-04 01:48:52  (Coordinator: http://ip-172-31-7-96:25000)
Query progress can be monitored at:  http://ip-172-31-7-96:25000/query_plan?query_id=a64e0ee707762b6b:69248a6c00000000
| id | name    |
| 2  |  jane   |
| 3  |  jim    |
| 1  |  john   |
| 99 | lilei1 |
| 4  |  tom    |
Fetched 5 row(s) in 0.16s

