CENTOS 7.4 64Bit下 使用Cloudera Manager 安装 CDH 5.14.3

1. 概述

本文根据官方安装文档的PathB 整理

CDH的搭建有多种方法,一般来说,都是先搭建Cloudera Manager, 然后利用Cloudera Manager搭建CDH,如果是测试环境,可以直接利用Cloudera Manager自动化安装,这种方式使用内嵌的PostgreSQL作为metadata等数据的存储,不适于生产环境。生产环境中一般会使用MariaDB, MySQL或其他独立搭建的数据库(当然要做HA),所以我们会先搭建一个MySQL数据库作为外部存储数据库。

2. 准备工作

以下为腾讯云服务器为例

2.1 集群配置(所有节点)

主机名 CPU 内存 带宽 内网IP 系统盘大小 数据盘大小
cdh1 4 16 10 10.0.11.15 50 300
cdh2 4 16 10 10.0.11.4 50 300
cdh3 4 16 10 10.0.11.3 50 300

2.2 修改机器名(所有节点)

修改/etc/hostname 中的主机名

 

1

 

echo "" > /etc/hostname && echo "cdh3" > /etc/hostname && hostname cdh3

修改/etc/hosts 增加机器名和IP映射到host文件

 

1

 

vim /etc/hosts

复制以下内容到host文件(注意服务器重启后丢失的情况)

 

1

2

3

 

10.0.11.15 cdh1

10.0.11.4 cdh2

10.0.11.3 cdh3

2.3 关闭防火墙(所有节点)

 

1

2

 

systemctl stop firewalld

systemctl disable firewalld

2.4 设置无密钥登录(所有节点)

所有节点依次执行, 并将authorized_keys文件复制到下一个节点,
全部节点执行过后,将包含所有节点公钥的authorized_keys 复制回cdh1并同步到所有节点

 

1

2

3

4

5

 

cd ~/.ssh/

ssh-keygen -t rsa

cat id_rsa.pub >> authorized_keys

chmod 600 authorized_keys

scp authorized_keys root@cdh2:~/.ssh/

2.5 设置仓库源(所有节点)

 

1

 

curl -o /etc/yum.repos.d/cloudera-manager.repo https://archive.cloudera.com/cm5/redhat/7/x86_64/cm/cloudera-manager.repo # cloudera yum源

2.5.1 设置Centos-Base 源为阿里镜像(如有需要)

 

1

 

curl -o /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo # 阿里yum源

2.5.2 更新系统

 

1

 

yum update

2.6 安装ntp, 设置从时间服务器定时同步(所有节点)

腾讯云使用: ntpupdate.tencentyun.com
阿里云使用:ntp1.aliyun.com 或者 time1.aliyun.com

 

1

2

3

4

 

yum install ntp

ntpdate ntpupdate.tencentyun.com

crontab e

0 */2 * * * ntpdate ntpupdate.tencentyun.com

2.7 安装JDK(所有节点)

2.7.1 Oracle JDK1.7

 

1

 

yum install oracle-j2sdk1.7 -y

2.7.2 Oracle JDK1.8(建议)

所有节点的JDK版本和安装路径必须一致

  • 安装前请务必确认ClouderaManager 和 CDH支持的JDK版本点击此处查询原文

  • 以JDK1.8为例,官网文档要求最低版本为1.8u31, 建议1.8u74版本以上(All JDK 8 updates, from the minimum required version, are supported in Cloudera Manager/CDH 5.3 and higher unless specifically excluded. Updates above the minimum that are not listed are supported but not tested.)

  • 部分JDK8版本会有bug/安全问题,参见下图

    2.7.3 安装

    • 首先从 Java SE 8 Downloads 下载对应版本的 .tar.gz 压缩包, 以jdk-8u172-linux-x64.tar.gz为例,下载完成后上传到cdh1的/opt下
     

    1

    2

    3

    4

    5

     

    rpm -qa | grep java #查询是否有旧版jdk

    rpm -e –nodeps #卸载旧版的jdk

    mkdir /usr/java

    tar zxvf /opt/jdk-8u172-linux-x64.tar.gz -C /usr/java #解压到/usr/java下(重要)

    vim /etc/profile #编辑环境变量

    • 在文件末尾增加以下配置并执行 source /etc/profile 让其生效
     

    1

    2

    3

    4

     

    JAVA_HOME=/usr/java/jdk1.8.0_172

    JRE_HOME=/usr/java/jdk1.8.0_172/jre

    PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin

    export JAVA_HOME JRE_HOME PATH

    • 新建一个软链接到/usr/bin 下
     

    1

     

    ln -s /usr/java/jdk1.8.0_172/bin/java /usr/bin/java

    • 测试
     

    1

     

    java -version

2.8 关闭SElinux(所有节点)

修改/etc/selinux/config, 将SElinx的值设置为disabled, 重启生效

 

1

 

vim /etc/selinux/config

3. 安装与配置MySQL(仅master节点)

  • 本小节原文参考地址: Cloudera Manager and Managed Service Datastores

3.0 说明

Cloudera Manager和CDH所需要的数据库清单如下:

  • Cloudera Manager - Contains all the information about services you have configured and their role assignments, all configuration history, commands, users, and running processes. This relatively small database (< 100 MB) is the most important to back up.Important: When you restart processes, the configuration for each of the services is redeployed using information saved in the Cloudera Manager database. If this information is not available, your cluster does not start or function correctly. You must schedule and maintain regular backups of the Cloudera Manager database to recover the cluster in the event of the loss of this database.
  • Oozie Server - Contains Oozie workflow, coordinator, and bundle data. Can grow very large.
  • Sqoop Server - Contains entities such as the connector, driver, links and jobs. Relatively small.
  • Activity Monitor - Contains information about past activities. In large clusters, this database can grow large. Configuring an Activity Monitor database is only necessary if a MapReduce service is deployed.
  • Reports Manager - Tracks disk utilization and processing activities over time. Medium-sized.
  • Hive Metastore Server - Contains Hive metadata. Relatively small.
  • Hue Server - Contains user account information, job submissions, and Hive queries. Relatively small.
  • Sentry Server - Contains authorization metadata. Relatively small.
  • Cloudera Navigator Audit Server - Contains auditing information. In large clusters, this database can grow large.
  • Cloudera Navigator Metadata Server - Contains authorization, policies, and audit report metadata. Relatively small.

3.1 安装MySQL

注意: CentOS可能自带了MariaDB,注意观察安装日志是否被取代

 

1

2

3

4

5

6

 

cd /opt/

wget http://repo.mysql.com/mysql-community-release-el7-5.noarch.rpm

sudo rpm -ivh mysql-community-release-el7-5.noarch.rpm

yum update

yum install mysql-server

service mysqld start

3.2 配置MySQL

配置MySQL的目的有以下几个:

  • To prevent deadlocks, set the isolation level to read committed.
  • Configure the InnoDB engine. Cloudera Manager will not start if its tables are configured with the MyISAM engine. (Typically, tables revert to MyISAM if the InnoDB engine is misconfigured.)
  • The default settings in the MySQL installations in most distributions use conservative buffer sizes and memory usage. Cloudera Management Service roles need high write through put because they might insert many records in the database. Cloudera recommends that you set the innodb_flush_method property to O_DIRECT.
  • Set the max_connections property according to the size of your cluster:
    • Small clusters (fewer than 50 hosts) - You can store more than one database (for example, both the Activity Monitor and Service Monitor) on the same host. If you do this, you should:
      • Put each database on its own storage volume.
      • Allow 100 maximum connections for each database and then add 50 extra connections. For example, for two databases, set the maximum connections to 250. If you storefive databases on one host (the databases for Cloudera Manager Server, Activity Monitor, Reports Manager, Cloudera Navigator, and Hive metastore), set the maximum connections to 550.
    • Large clusters (more than 50 hosts) - Do not store more than one database on the same host. Use a separate host for each database/host pair. The hosts need not be reserved exclusivelyfor databases, but each database should be on a separate host.
  • Binary logging is not a requirement for Cloudera Manager installations. Binary logging provides benefits such as MySQL replication or point-in-time incremental recovery after databaserestore. Examples of this configuration follow. For more information, see The Binary Log.

3.2.1 删除部分文件

删除/var/lib/mysql/ib_logfile0 和 /var/lib/mysql/ib_logfile1文件

 

1

2

 

rm -f /var/lib/mysql/ib_logfile0

rm -f /var/lib/mysql/ib_logfile1

3.2.2 修改MySQL配置文件

 

1

2

 

cp /etc/my.cnf /etc/my.cnf.bak

vim /etc/my.cnf

替换为以下内容(官方推荐):

 

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

 

[mysqld]

transaction-isolation = READ-COMMITTED

# Disabling symbolic-links is recommended to prevent assorted security risks;

# to do so, uncomment this line:

# symbolic-links = 0

key_buffer_size = 32M

max_allowed_packet = 32M

thread_stack = 256K

thread_cache_size = 64

query_cache_limit = 8M

query_cache_size = 64M

query_cache_type = 1

max_connections = 550

#expire_logs_days = 10

#max_binlog_size = 100M

#log_bin should be on a disk with enough free space. Replace '/var/lib/mysql/mysql_binary_log' with an appropriate path for your system

#and chown the specified folder to the mysql user.

log_bin=/var/lib/mysql/mysql_binary_log

# For MySQL version 5.1.8 or later. For older versions, reference MySQL documentation for configuration help.

binlog_format = mixed

read_buffer_size = 2M

read_rnd_buffer_size = 16M

sort_buffer_size = 8M

join_buffer_size = 8M

# InnoDB settings

innodb_file_per_table = 1

innodb_flush_log_at_trx_commit = 2

innodb_log_buffer_size = 64M

innodb_buffer_pool_size = 4G

innodb_thread_concurrency = 8

innodb_flush_method = O_DIRECT

innodb_log_file_size = 512M

[mysqld_safe]

log-error=/var/log/mysqld.log

pid-file=/var/run/mysqld/mysqld.pid

sql_mode=STRICT_ALL_TABLES

3.2.3 安装MySQL的JDBC Driver

从该页面Download Connector/J 下载, 并上传到cdh1:/opt 目录下, 以mysql-connector-java-5.1.41.tar.gz为例

 

1

2

3

 

cd /opt && tar zxvf mysql-connector-java-5.1.41.tar.gz

mkdir -p /usr/share/java/

cp mysql-connector-java-5.1.41/mysql-connector-java-5.1.41-bin.jar /usr/share/java/mysql-connector-java.jar

3.2.4 使用MySQL脚本初始化root帐号

 

1

 

/usr/bin/mysql_secure_installation

按照选项依次输入

 

1

2

3

4

5

 

Set root password? [Y/n] y

Remove anonymous users? [Y/n] y

Disallow root login remotely? [Y/n] n

Remove test database and access to it? [Y/n] y

Reload privilege tables now? [Y/n] y

3.2.5 确保MySQL已经设置开机启动

 

1

2

3

 

/sbin/chkconfig mysqld on

/sbin/chkconfig --list mysqld

mysqld 0:off 1:off 2:on 3:on 4:on 5:on 6:off

3.2.6 为Activity Monitor等CDH的组件 创建数据库和对应的用户

注意: 原文中的数据库清单不全, 缺少hue和scm, 准确清单如下:

Role Database User Password
Activity Monitor amon amon amon_password
Reports Manager rman rman rman_password
Hive Metastore Server hive hive hive_password
Sentry Server sentry sentry sentry_password
Cloudera Navigator Audit Server nav nav nav_password
Cloudera Navigator Metadata Server navms navms navms_password
Cloudera Manager scm scm scm_password
Oozie oozie oozie oozie_password
Hue hue hue hue_password
  • 以管理员身份登陆MySQL并执行以下SQL:
 

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

 

CREATE DATABASE scm DEFAULT CHARACTER SET utf8;

grant all on scm.* TO 'scm'@'%' IDENTIFIED BY 'scm_password';

CREATE DATABASE amon DEFAULT CHARACTER SET utf8;

grant all on amon.* TO 'amon'@'%' IDENTIFIED BY 'amon_password';

CREATE DATABASE rman DEFAULT CHARACTER SET utf8;

grant all on rman.* TO 'rman'@'%' IDENTIFIED BY 'rman_password';

CREATE DATABASE hive DEFAULT CHARACTER SET utf8;

grant all on hive.* TO 'hive'@'%' IDENTIFIED BY 'hive_password';

CREATE DATABASE sentry DEFAULT CHARACTER SET utf8;

grant all on sentry.* TO 'sentry'@'%' IDENTIFIED BY 'sentry_password';

CREATE DATABASE nav DEFAULT CHARACTER SET utf8;

grant all on nav.* TO 'nav'@'%' IDENTIFIED BY 'nav_password';

CREATE DATABASE navms DEFAULT CHARACTER SET utf8;

grant all on navms.* TO 'navms'@'%' IDENTIFIED BY 'navms_password';

CREATE DATABASE oozie DEFAULT CHARACTER SET utf8;

grant all on oozie.* TO 'oozie'@'%' IDENTIFIED BY 'oozie_password';

CREATE DATABASE hue DEFAULT CHARACTER SET utf8;

grant all on hue.* TO 'hue'@'%' IDENTIFIED BY 'hue_password';

FLUSH PRIVILEGES;

4. 安装Cloudera Manager并启动

4.1 在Master节点安装Cloudera Manager Server 和 Cloudera Manager Agent

 

1

 

yum install cloudera-manager-daemons cloudera-manager-server cloudera-manager-agent

4.1.1 在Master节点执行scm_prepare_database.sh(建议)

为了避免数据库错误, 建议安装Cloudera Manager Server后, 执行/usr/share/cmf/schema/scm_prepare_database.sh 这个初始化数据库文件,参见: 原文链接

 

1

 

/usr/share/cmf/schema/scm_prepare_database.sh mysql scm scm scm_password

4.1.2 启动Cloudera Manager Server

 

1

 

service cloudera-scm-server start

4.1.3 Cloudera Manager Server 日志

 

1

 

tail -f /var/log/cloudera-scm-server/cloudera-scm-server.log

4.1.4 启动Cloudera Manager Agent

 

1

 

service cloudera-scm-agent start

4.1.5 Cloudera Manager Agent日志

 

1

 

tail -f /var/log/cloudera-scm-agent/cloudera-scm-agent.log

4.2 在Node节点安装Cloudera Manager Agent

 

1

 

yum install cloudera-manager-daemons cloudera-manager-agent

4.2.1 配置Cloudera Manager Agent的Server地址

修改 /etc/cloudera-scm-agent/config.ini 中的server_host为master的IP, (注意填Private IP或者机器名)

 

1

 

vim /etc/cloudera-scm-agent/config.ini

改为:

 

1

2

3

4

5

6

 

[General]

# Hostname of the CM server.

server_host=cdh1

# Port that the CM server is listening on.

server_port=7182

4.2.2 启动Cloudera Manager Agent

 

1

 

service cloudera-scm-agent start

如果日志没有报错, 那么进行下一步

4.3 常见错误

4.3.1 Unable to retrieve non-local address

“ScmActive-0:com.cloudera.server.cmf.components.ScmActive: ScmActive: Unable to retrieve non-local non-loopback IP address. Seeing address: cdh1/127.0.0.1.”

解决: 修改/etc/hosts 将 127.0.0.1 cdh1 这一行注释掉, 保留内网IP和机器名的映射即可

4.3.2 腾讯云服务器重启后/etc/hosts 文件被重置

解决: 增加shell脚本, 启动时将映射表执行写入/etc/hosts

5. 使用Cloudera Manager安装CDH5.14.3

5.1 安装流程

  • 打开 http://masterHost:7180 (cloudera-manager-server所在服务器的公网IP), 进入管理页面进行安装

  • 在登陆页面输入 帐号密码: admin/admin

  • 在条款页面 点击同意

  • 在“欢迎使用 Cloudera Manager”页面, 选择Cloudera Express版本

  • 然后进入下个页面时应在“当前管理的主机”内看到刚才的服务器清单, 点击全选, 进入下一步

  • 进行CDH Components的选择, 默认选择即可

  • 然后会进入安装界面

  • 等待完成即可(速度取决于内网分发速度)

5.2 解决CDH下载速度过慢的问题

此时会发现CDH的下载速度非常慢, 大约会需要数十个小时。

通过手动上传parcel包解决:

  1. 分别将下面的三个文件上传到cdh1 (即Cloudera Manager Server所在服务器)

     

    1

    2

    3

     

    scp CDH-5.14.2-1.cdh5.14.2.p0.3-el7.parcel cdh1:/home/

    scp CDH-5.14.2-1.cdh5.14.2.p0.3-el7.parcel.sha1 cdh1:/home/

    scp manifest.json cdh1:/home/

  2. 登陆cdh1, 复制这三个文件到/opt/cloudera/parcel-repo下,并将CDH-5.14.2-1.cdh5.14.2.p0.3-el7.parcel.sha1 改名为CDH-5.14.2-1.cdh5.14.2.p0.3-el7.parcel.sha

     

    1

    2

    3

     

    cp /home/CDH-5.14.2-1.cdh5.14.2.p0.3-el7.parcel /opt/cloudera/parcel-repo/

    cp /home/CDH-5.14.2-1.cdh5.14.2.p0.3-el7.parcel.sha1 /opt/cloudera/parcel-repo/CDH-5.14.2-1.cdh5.14.2.p0.3-el7.parcel.sha #非常重要,将.sha1重命名为.sha文件

    cp /home/manifest.json /opt/cloudera/parcel-repo/

  3. 设置文件权限

     

    1

    2

     

    cd /opt/cloudera/parcel-repo

    chown cloudera-scm:cloudera-scm ./*

  4. 删除MySQL的scm数据库,并重新创建

     

    1

    2

    3

     

    DROP DATABASE scm;

    CREATE DATABASE scm DEFAULT CHARACTER SET utf8;

    GRANT ALL ON scm.* TO 'scm'@'%' IDENTIFIED BY 'scm_password';

  5. 重新执行

     

    1

     

    /usr/share/cmf/schema/scm_prepare_database.sh mysql scm scm scm_password

  6. 删除所有cloudera-scm-agent服务器上的cm_guid

     

    1

     

    rm -f /var/lib/cloudera-scm-agent/cm_guid

  7. 重启cloudera-scm-server 和 cloudera-scm-agent

     

    1

    2

     

    service cloudera-scm-server restart

    service cloudera-scm-agent restart

  8. 重新打开 http://masterHost:7180 进行安装

5.3 安装完成后的修改

安装后会进入“检查主机正确性”的步骤, 例如下图:

5.3.1 解决swappiness过高的问题

在所有节点上执行:

 

1

 

echo 10 > /proc/sys/vm/swappiness

5.3.2 解决“透明页面压缩”的设置问题

在所有节点上执行下面两行命令 并添加到/etc/rc.local 下:

 

1

2

 

echo never > /sys/kernel/mm/transparent_hugepage/defrag

echo never > /sys/kernel/mm/transparent_hugepage/enabled

5.3.3 解决”系统之间存在不匹配的版本,这将导致失败”的问题

TODO:

6. 集群设置

6.1 Select Services

暂选择核心hadoop, 后面可以追加组件

6.2 自定义角色分配

一般不必改动

6.3 设置hive, hue和oozie的数据库连接

根据 3.2.6 的 创建数据库步骤, 填入对应的数据库名, 用户和密码

6.4 其他配置

保持默认即可

6.5 完成配置

7. 常见问题

7.1 HDFS

7.1.1 HDFS报错“副本不足的块 744块”的问题

  • 原因: 设置的副本备份数与DataNode的个数不匹配。

  • 说明: dfs. replication属性默认是3,也就是说副本数—块的备份数默认为3份, 但是集群只有两个DataNode, 所以导致副本备份不足。

  • 解决办法:

    1. 设置目标备份数为2

      1.1 点击集群 -> HDFS -> 配置

      1.2 搜索dfs. replication,设置为2后保存更改。

    2. 在Master服务器更改当前备份数设置

       

      1

      2

       

      su hdfs

      hadoop fs -setrep -R 2 /

你可能感兴趣的:(cloudera,manager)