CDH 6.3 大数据平台搭建

一.CDH概述

Cloudera版本(Cloudera’s Distribution Including Apache Hadoop,简称“CDH”),基于Web的用户界面,支持大多数Hadoop组件,包括HDFS、MapReduce、Hive、Pig、 Hbase、Zookeeper、Sqoop,简化了大数据平台的安装、使用难度。

由于组件齐全,安装维护方便,国内已经有不少公司部署了CDH大数据平台,此处我选择CDH 6.3版本。

二.安装CDH前准备

2.1 环境准备

主机配置:

IP 主机名
10.31.1.123 hp1
10.31.1.124 hp2
10.31.1.125 hp3
10.31.1.126 hp4

硬件配置:
每台主机:CPU4核、内存8G、硬盘500G

软件版本:

名称 版本
操作系统 CentOS release 7.8 (Final) 64位
JDK 1.8
数据库 MySQL 5.6.49
JDBC MySQL Connector Java 5.1.38
Cloudera Manager 6.3.1
CDH 6.3.1

2.2 安装前准备

2.2.1 主机名配置(所有节点)

分别在各个主机下设置主机名

hostnamectl set-hostname hp1
hostnamectl set-hostname hp2
hostnamectl set-hostname hp3
hostnamectl set-hostname hp4

配置4台机器 /etc/hosts

vi /etc/hosts
127.0.0.1               localhost
10.31.1.123             hp1
10.31.1.124             hp2
10.31.1.125             hp3
10.31.1.126             hp4

配置4台机器 /etc/sysconfig/network

-- 以123为例,其它3台参考
[root@10-31-1-123 ~]# more /etc/sysconfig/network
# Created by anaconda
HOSTNAME=hp1

2.2.2 防火墙及SeLinux配置(所有节点)

关闭防火墙

systemctl disable firewalld
systemctl stop firewalld

配置SeLinux

vi /etc/selinux/config
SELINUX=enforcing  改为  SELINUX=permissive

2.2.3 NTP服务配置(所有节点)

yum install ntp
systemctl start ntpd
systemctl enable ntpd

hp1为ntp服务器,其它3台同步123的

vi /etc/ntp.config
restrict 10.31.1.0 mask 255.255.255.0
server 10.31.1.123

systemctl restart ntpd

2.2.4 安装python(所有节点)

CDH要求python 2.7版本,此处系统自带,略过

2.2.5 数据库需求(主节点)

此处安装MySQL5.6版本,安装步骤略过

2.2.6 安装JDK(所有节点)

image.png

此处选择安装JDK 1.8

cd /usr/
mkdir java
cd java
wget http://download.oracle.com/otn-pub/java/jdk/8u181-b13/96a7b8442fe848ef90c96a2fad6ed6d1/jdk-8u181-linux-x64.tar.gzAuthParam=1534129356_6b3ac55c6a38ba5a54c912855deb6a22
mv jdk-8u181-linux-x64.tar.gzAuthParam\=1534129356_6b3ac55c6a38ba5a54c912855deb6a22 jdk-8u181-linux-x64.tar.gz
tar -zxvf jdk-8u181-linux-x64.tar.gz 
vi /etc/profile
#java
export JAVA_HOME=/usr/java/jdk1.8.0_181
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib

如下,显示java安装成功

[root@10-31-1-123 java]# java -version
java version "1.8.0_181"
Java(TM) SE Runtime Environment (build 1.8.0_181-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)

2.2.7 下载安装包(所有节点)

CM:CM6.3.1
连接:https://archive.cloudera.com/cm6/6.3.1/repo-as-tarball/cm6.3.1-redhat7.tar.gz

Parcel:
https://archive.cloudera.com/cdh6/6.3.1/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567-el7.parcel
https://archive.cloudera.com/cdh6/6.3.1/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567-el7.parcel.sha1
https://archive.cloudera.com/cdh6/6.3.1/parcels/manifest.json

以上软件打包近网盘中,可自取:
链接:https://pan.baidu.com/s/1UH50Uweyi7yg6bV7dl02mQ
提取码:nx7p

2.2.8 安装MySQL的jdbc驱动(主节点)

[root@10-31-1-123 mysql]# mkdir -p /usr/share/java
[root@10-31-1-123 mysql]# cd /usr/share/java
[root@10-31-1-123 java]# 
[root@10-31-1-123 java]# ll
总用量 832
-rw-r--r--. 1 root root 848067 1月  15 2014 mysql-connector-java-commercial-5.1.25-bin.jar
[root@10-31-1-123 java]# 
[root@10-31-1-123 java]# mv mysql-connector-java-commercial-5.1.25-bin.jar mysql-connector-java.jar
[root@10-31-1-123 java]# ll
总用量 832
-rw-r--r--. 1 root root 848067 1月  15 2014 mysql-connector-java.jar
[root@10-31-1-123 java]# 

2.2.9 创建CDH源数据库、用户、amon服务的数据库(主节点)

create database cmf DEFAULT CHARACTER SET utf8;
create database amon DEFAULT CHARACTER SET utf8;
grant all on cmf.* TO 'cmf'@'%' IDENTIFIED BY 'www.research.com';
grant all on amon.* TO 'amon'@'%' IDENTIFIED BY 'www.research.com';
flush privileges;

2.2.10 修改Linux swappiness参数(所有节点)

为了避免服务器使用swap功能而影响服务器性能,一般都会把vm.swappiness修改为0(cloudera建议10以下)

[root@hp1 mysql]# cd /usr/lib/tuned/
[root@hp1 tuned]# grep "vm.swappiness" * -R
latency-performance/tuned.conf:vm.swappiness=10
throughput-performance/tuned.conf:vm.swappiness=10
virtual-guest/tuned.conf:vm.swappiness = 30

然后将文件中的配置依次修改为0
修改后将这些文件同步到其他机器上

2.2.11 禁用透明页(所有节点)

[root@hp1 ~]# vim /etc/rc.local
在文件中添加如下内容:
echo never > /sys/kernel/mm/transparent_hugepage/defrag
echo never > /sys/kernel/mm/transparent_hugepage/enabled
然后将该文件同步其他机器上,然后启动所有服务器

三.CDH部署

3.1 离线部署CM server及agent

3.1.1 创建软件目录解压软件(所有节点)

[root@10-31-1-123 cdh]# mkdir -p /opt/cloudera-manager
[root@10-31-1-123 cloudera-manager]# cd /usr/local/cdh/
[root@10-31-1-123 cdh]# ls -lrth
总用量 3.3G
-rw-r--r--. 1 root root  34K 11月 13 15:46 manifest.json
-rw-r--r--. 1 root root 1.4G 11月 13 16:10 cm6.3.1-redhat7.tar.gz
-rw-r--r--. 1 root root   40 11月 13 16:10 CDH-6.3.1-1.cdh6.3.1.p0.1470567-el7.parcel.sha1
-rw-r--r--. 1 root root 2.0G 11月 13 16:37 CDH-6.3.1-1.cdh6.3.1.p0.1470567-el7.parcel
[root@10-31-1-123 cdh]# 
[root@10-31-1-123 cdh]# tar -zxf cm6.3.1-redhat7.tar.gz -C /opt/cloudera-manager
[root@10-31-1-123 cdh]# 

3.1.2 选择hp1为主节点作为cm server,直接部署(主节点)

cd /opt/cloudera-manager/cm6.3.1/RPMS/x86_64/
rpm -ivh cloudera-manager-daemons-6.3.1-1466458.el7.x86_64.rpm --nodeps --force
rpm -ivh cloudera-manager-server-6.3.1-1466458.el7.x86_64.rpm --nodeps --force 

安装记录:

警告:cloudera-manager-daemons-6.3.1-1466458.el7.x86_64.rpm: 头V3 RSA/SHA256 Signature, 密钥 ID b0b19c9f: NOKEY
准备中...                          ################################# [100%]
正在升级/安装...
   1:cloudera-manager-daemons-6.3.1-14################################# [100%]
[root@10-31-1-123 x86_64]# rpm -ivh cloudera-manager-server-6.3.1-1466458.el7.x86_64.rpm --nodeps --force 
警告:cloudera-manager-server-6.3.1-1466458.el7.x86_64.rpm: 头V3 RSA/SHA256 Signature, 密钥 ID b0b19c9f: NOKEY
准备中...                          ################################# [100%]
正在升级/安装...
   1:cloudera-manager-server-6.3.1-146################################# [100%]
Created symlink from /etc/systemd

3.1.3 cm agent部署 (所有节点)

cd /opt/cloudera-manager/cm6.3.1/RPMS/x86_64
rpm -ivh cloudera-manager-daemons-6.3.1-1466458.el7.x86_64.rpm --nodeps --force
rpm -ivh cloudera-manager-agent-6.3.1-1466458.el7.x86_64.rpm --nodeps --force

3.1.4 修改agent配置,指向server节点hp1 (所有节点)

sed -i "s/server_host=localhost/server_host=10.31.1.123/g" /etc/cloudera-scm-agent/config.ini

3.1.5 server配置(主节点)

 vim /etc/cloudera-scm-server/db.properties
com.cloudera.cmf.db.type=mysql
com.cloudera.cmf.db.host=10.31.1.123
com.cloudera.cmf.db.name=cmf
com.cloudera.cmf.db.user=cmf
com.cloudera.cmf.db.password=www.research.com
com.cloudera.cmf.db.setupType=EXTERNAL

3.2 主节点部署离线parcel源 (主节点)

3.2.1 安装httpd

yum install -y httpd 

3.2.2 部署离线parcel源 (主节点)

mkdir -p /var/www/html/cdh6_parcel
cp /usr/local/cdh/CDH-6.3.1-1.cdh6.3.1.p0.1470567-el7.parcel /var/www/html/cdh6_parcel/
mv /usr/local/cdh/CDH-6.3.1-1.cdh6.3.1.p0.1470567-el7.parcel.sha1 /var/www/html/cdh6_parcel/CDH-6.3.1-1.cdh6.3.1.p0.1470567-el7.parcel.sha
mv /usr/local/cdh/manifest.json /var/www/html/cdh6_parcel/
systemctl start httpd

3.2.3 页面访问

http://10.31.1.123/cdh6_parcel/
image.png

3.3 主节点启动server (主节点)

[root@10-31-1-123 x86_64]#  systemctl start cloudera-scm-server
[root@10-31-1-123 x86_64]#  ll /var/log/cloudera-scm-server/ 
总用量 20
-rw-r-----. 1 cloudera-scm cloudera-scm 19610 11月 13 17:29 cloudera-scm-server.log
-rw-r-----. 1 cloudera-scm cloudera-scm     0 11月 13 17:29 cmf-server-nio.log
-rw-r-----. 1 cloudera-scm cloudera-scm     0 11月 13 17:29 cmf-server-perf.log
[root@10-31-1-123 x86_64]# tail /var/log/cloudera-scm-server/cloudera-scm-server.log
        at com.mchange.v2.c3p0.WrapperConnectionPoolDataSource.getPooledConnection(WrapperConnectionPoolDataSource.java:195)
        at com.mchange.v2.c3p0.WrapperConnectionPoolDataSource.getPooledConnection(WrapperConnectionPoolDataSource.java:184)
        at com.mchange.v2.c3p0.impl.C3P0PooledConnectionPool$1PooledConnectionResourcePoolManager.acquireResource(C3P0PooledConnectionPool.java:200)
        at com.mchange.v2.resourcepool.BasicResourcePool.doAcquire(BasicResourcePool.java:1086)
        at com.mchange.v2.resourcepool.BasicResourcePool.doAcquireAndDecrementPendingAcquiresWithinLockOnSuccess(BasicResourcePool.java:1073)
        at com.mchange.v2.resourcepool.BasicResourcePool.access$800(BasicResourcePool.java:44)
        at com.mchange.v2.resourcepool.BasicResourcePool$ScatteredAcquireTask.run(BasicResourcePool.java:1810)
        at com.mchange.v2.async.ThreadPoolAsynchronousRunner$PoolThread.run(ThreadPoolAsynchronousRunner.java:648)
2020-11-13 17:29:26,939 WARN C3P0PooledConnectionPoolManager[identityToken->2t3hq3ad1hj75spfrks4l|3be4f71]-HelperThread-#1:com.mchange.v2.resourcepool.BasicResourcePool: Having failed to acquire a resource, com.mchange.v2.resourcepool.BasicResourcePool@3b0ee03a is interrupting all Threads waiting on a resource to check out. Will try again in response to new client requests.
2020-11-13 17:29:27,568 INFO main:com.cloudera.enterprise.CommonMain: Statistics not enabled, JMX will not be registered
[root@10-31-1-123 x86_64]# 

3.4 所有节点启动agent (所有节点)

 systemctl start cloudera-scm-agent 

3.5 web页面操作

3.5.1 登录主节点的7180端口

http://10.31.1.123:7180/

登陆用户名:admin
登陆密码: admin


image.png

3.5.2 选择免费版

image.png

3.5.3 创建集群

image.png

输入集群名


image.png

输入集群主机,此处用主机名


image.png

选择存储库
版本一定要对应上,不然一直安装不成功
添加本地的存储:http://10.31.1.123/cdh6_parcel/
然后把{latest_version}改为当前版本 : 6.3.1

image.png

安装JDK


image.png

配置SSH登陆


image.png

等待安装结束


image.png

安装parcels


image.png

安装告一段落


image.png

选择服务;


image.png

默认项:


image.png

创建数据库

create database hive DEFAULT CHARACTER SET utf8;
grant all on hive.* TO 'hive'@'%' IDENTIFIED BY 'hive';

create database oozie DEFAULT CHARACTER SET utf8;
grant all on oozie.* TO 'oozie'@'%' IDENTIFIED BY 'oozie';

create database hue DEFAULT CHARACTER SET utf8;
grant all on hue.* TO 'hue'@'%' IDENTIFIED BY 'hue';

flush privileges;

--hue会报错
mkdir /usr/lib64/mysql
cp /usr/local/mysql/lib/libmysqlclient.so.18.1.0  /usr/lib64/mysql/
cd /usr/lib64/mysql/
ln -s libmysqlclient.so.18.1.0 libmysqlclient.so.18

[root@hp1 mysql]# more /etc/ld.so.conf
include ld.so.conf.d/*.conf
/usr/lib64/mysql
[root@hp1 mysql]# ldconfi
image.png

审核更改:


image.png

大功告成:


image.png

FAQ

1.CDH文件权限问题

Hive执行语句的时候提示 /user权限不够

hive> 
    > select count(*) from fact_sale;
Query ID = root_20201119152619_16f496b5-2482-4efb-a26c-e18117b2f10c
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=
In order to set a constant number of reducers:
  set mapreduce.job.reduces=
org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x

解决方案:

[root@hp1 ~]# 
[root@hp1 ~]# hadoop fs -ls /
Found 2 items
drwxrwxrwt   - hdfs supergroup          0 2020-11-15 12:22 /tmp
drwxr-xr-x   - hdfs supergroup          0 2020-11-15 12:21 /user
[root@hp1 ~]# 
[root@hp1 ~]# hadoop fs -chmod 777 /user
chmod: changing permissions of '/user': Permission denied. user=root is not the owner of inode=/user
[root@hp1 ~]# 
[root@hp1 ~]# sudo -u hdfs hadoop fs -chmod 777 /user
[root@hp1 ~]# 
[root@hp1 ~]# 
[root@hp1 ~]# hadoop fs -ls /
Found 2 items
drwxrwxrwt   - hdfs supergroup          0 2020-11-15 12:22 /tmp
drwxrwxrwx   - hdfs supergroup          0 2020-11-15 12:21 /user
[root@hp1 ~]# 

2.CDH yarn资源包问题

hive insert into语句报错:

> insert into fact_sale(id,sale_date,prod_name,sale_nums) values (1,'2011-08-16','PROD4',28);
Query ID = root_20201119163832_f78a095d-2656-4da6-825f-64127e84b8b4
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
20/11/19 16:38:32 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm69
Starting Job = job_1605767427026_0013, Tracking URL = http://hp3:8088/proxy/application_1605767427026_0013/
Kill Command = /opt/cloudera/parcels/CDH-6.3.1-1.cdh6.3.1.p0.1470567/lib/hadoop/bin/hadoop job  -kill job_1605767427026_0013
Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0
2020-11-19 16:39:12,211 Stage-1 map = 0%,  reduce = 0%
Ended Job = job_1605767427026_0013 with errors
Error during job, obtaining debugging information...
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 HDFS EC Read: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec

根据提示看错误得到错误信息:
跑mapreduce任务报错Download and unpack failed


image.png

安装一下就OK

image.png

3.hue的load Balancer启动失败

load Balancer启动失败


image.png

查看日志,提示也没有日志,创建了指定目录,也没有日志输出


image.png

解决方案:

yum -y install httpd
yum -y install mod_ssl

重新启动hue

参考

1.https://docs.cloudera.com/documentation/enterprise/latest/topics/installation.html
2.https://www.cnblogs.com/shwang/p/12112508.html
3.https://blog.csdn.net/gxd520/article/details/100982436
4.https://wxy0327.blog.csdn.net/article/details/51768968
5.https://blog.csdn.net/sinat_35045195/article/details/102566776

错误处理,参考
1.https://blog.csdn.net/weixin_39478115/article/details/77483251
2.https://q.cnblogs.com/q/110190/

你可能感兴趣的:(CDH 6.3 大数据平台搭建)