CDH 6.0.1 安装详解及问题解决

本文源自:https://blog.csdn.net/u010003835/article/details/85007946

参考文章 : 

官方文档 

https://www.cloudera.com/documentation/enterprise/6/6.0/topics/installation.html

 

CDH常见问题及解决方法(待续)

https://blog.csdn.net/qq_24699959/article/details/80310070


CDH 6.0.1 集群搭建 「Process」

http://www.cnblogs.com/piperck/p/9944469.html

 

CentOS7 关闭防火墙

https://blog.csdn.net/Post_Yuan/article/details/78603212

 

本篇文章 将会讲解 如何安装 CDH 6.0.1。 配置 CDH 花了将近 1 周的时间。

首先,我们强调一下,博主其实最主要的步奏都是参照 官网的 安装步奏进行 配置的。官网是 最好的解决方案,网上的很多博文只是用来参考,这点大家切记切记。

 

其中重要的文档:

安装前准备

https://www.cloudera.com/documentation/enterprise/6/latest/topics/installation_reqts.html

 

安装cm的步骤流程

https://www.cloudera.com/documentation/enterprise/6/latest/topics/install_cm_cdh.html

 

其他文档:

 

CDH5实践(二)Cloudera Manager 5安装Hive,HBase,Impala,Spark等服务

https://blog.csdn.net/csolo/article/details/52506718

 

安装CDH2.2后页面显示“无法发出查询:Host Monitor未运行

http://www.aboutyun.com/thread-12751-1-1.html

 

Linux系统如何查看是多少位的

https://blog.csdn.net/qq_33326449/article/details/79712249

 
  1. 输入命令【uname -a】,回车

  2.  
  3. 根据返回的信息,如果有【x86_64】字样就说明是64位的,没有就是32位的!

 

python安装pip出现No package python-pip available

https://blog.csdn.net/u011418530/article/details/79986251

 

特别鸣谢,军哥 与 杨兄的帮助,在安装过程中给予了一定的帮助。

 

0.安装环境

  0.1 安装环境 

 

1.安装前准备

  1.1 操作系统设置

    1.1.1 配置固定IP 与 解析关系

    1.1.2 关闭 SELINUX

    1.1.3 关闭防火墙

    1.1.4 配置时间同步 NTP

    1.1.5 配置集群 ssh免密登录

 

2.安装

  2.1 步骤1 Linux 下安装JDK

  2.2 步骤2 配置Repo

  2.3 步骤3 安装服务 

  2.4 步骤4 修改各节点配置 并启动  cloudera-scm-agent

  2.5 步骤5 安装并配置MySQL   

  2.6 步骤6 配置CM 及其他服务的数据库 基于 MySQL,增加mysql-connector 包

  2.7 步骤7 初始化数据库,启动服务

  2.8 步骤8  Parcel 安装  

 

 

3.常见问题详解

  3.1 安装中断,如何继续

  3.2 Inspector 检测 出现各种错误警告 

  3.3 运行日志在哪里

 

4.几种加快安装的方式

  4.1 启动httpd 服务,使用离线 parcel 包安装

  4.2 创建本地yum源,使用离线 rpm 包安装

 

======================================================================

 

0.安装环境

  0.1 安装环境 

本文的安装环境在  CentOS7 下 搭建完成

Linux 版本  : CentOS 7

CDH版本 :CM 6.0.1

 

1.安装前准备

  1.1 操作系统设置

 

    1.1.1 配置固定IP 与 解析关系

关于本话题 请参考 以下两篇文章:

CentOS 6

Linux_Centos6下_三种配置固定ip的方式

https://blog.csdn.net/u010003835/article/details/52233296

CentOS 7

Linux_VMware下配置_CentOS7固定IP

https://blog.csdn.net/u010003835/article/details/84791773

 

    1.1.2 关闭 SELINUX

我们在安装的时候,需要在集群的各个节点 关闭 SELINUX

查看 SELINUX 的状态

 
  1. [root@cdh-node2 ~]# /usr/sbin/sestatus -v

  2. SELinux status: disabled

修改状态

vim /etc/selinux/config

将SELINUX=enforcing改为SELINUX=disabled

 

    1.1.3 关闭防火墙

关于 CentOS7 如何关闭防火墙  /  查询状态  推荐以下文章 :

 

CentOS7 关闭防火墙

https://blog.csdn.net/Post_Yuan/article/details/78603212

 

其中重要的概念:

firewalld 服务管理

 

1、安装firewalld

yum -y install firewalld

 

2、开机启动/禁用服务

systemctl enable/disable firewalld

 

3、启动/关闭服务

systemctl start/stop firewalld

 

4、查看服务状态

systemctl status firewalld

 

    1.1.4 配置时间同步 NTP

     为了一些服务的一致性,我们需要配置集群的时间同步,可以参考我写过的文章进行配置:

 

Linux_NTP 网络同步 -> ntpd / CDH6

https://blog.csdn.net/u010003835/article/details/84962098

 

    1.1.5 配置集群 ssh免密登录

 

请参考我的博客:

https://blog.csdn.net/u010003835/article/details/85006786

 

 

2.安装

 

  2.1 步骤1 安装Java

Linux 下安装 JDK 请考看以下文章:

Linux下Java安装与配置

http://www.cnblogs.com/caosiyang/archive/2013/03/14/2959087.html

 

 

  2.2 步骤2 配置Repo     

 

该步骤建议替换从本地yum源安装的方式 !!!!!!

 

首先我们需要下载 Repo , Repo 是 yum 用来管理包的配置文件。

 

下载地址:

Step1:

https://archive.cloudera.com/cm6/6.0.1/redhat7/yum/cloudera-manager.repo

 
  1. 通过以下指令:

  2.  
  3. sudo wget -P /etc/yum.repos.d/

repo_file_url 换成以上的地址即可

 -P LOCAL_DIR:保存所有的文件或目录到指定的目录下

 

Step2:

之后我们还需要执行以下指令,Import the repository signing GPG key:

sudo rpm --import https://archive.cloudera.com/cm6/6.0.1/redhat7/yum/RPM-GPG-KEY-cloudera

 

 

  2.3 步骤3 安装服务   

 

 

这个步骤主要用来 安装 cloudera-manager-daemon 与   cloudera-manager-agent , cloudera-manager-server

其中 cloudera-manager-daemon 与   cloudera-manager-agent 是要每台机器进行安装的

cloudera-manager-server  只需要 在集群中想作为 未来管理节点的机器上安装即可 .

 

我们先看下是否能找到这些包

我们执行下面指令:

yum search cloudera-manager-daemons cloudera-manager-agent cloudera-manager-server

 

这里,我们采用 yum 安装。

安装指令:

 

主节点:

yum install cloudera-manager-daemons cloudera-manager-agent cloudera-manager-server

 

其他节点:

yum install cloudera-manager-daemons cloudera-manager-agent

 

最后建议把这些服务设置为 开机启动服务,方便管理。在集群重启的时候,可以不用担心服务的问题。

 

主节点:

sudo systemctl enable cloudera-scm-agent

sudo systemctl enable cloudera-scm-server

 

其他节点:

sudo systemctl  enable  cloudera-scm-agent

 

  2.4 步骤4 修改各节点配置 并启动  cloudera-scm-agent

配置各个节点的 agent,使得各个节点的 agent 都指向 主节点:

主要通过以下步奏:

修改  /etc/cloudera-scm-agent/config.ini

Property Description
server_host Name of the host where Cloudera Manager Server is running.
server_port Port on the host where Cloudera Manager Server is running.

 

[General]
# Hostname of the CM server.
#server_host=localhost
server_host=cdh-manager

# Port that the CM server is listening on.
server_port=7182
 

 

修改大致如图 , 请根据自己的机器动态调整:

 
  1. [General]

  2. # Hostname of the CM server.

  3. #server_host=localhost

  4. server_host=cdh-manager

  5.  
  6. # Port that the CM server is listening on.

  7. server_port=7182

  8.  
  9. ## It should not normally be necessary to modify these.

  10. # Port that the CM agent should listen on.

  11. # listening_port=9000

  12.  
  13. # IP Address that the CM agent should listen on.

  14. # listening_ip=

  15.  
  16. # Hostname that the CM agent reports as its hostname. If unset, will be

  17. # obtained in code through something like this:

  18. #

  19. # python -c 'import socket; \

  20. # print socket.getfqdn(), \

  21. # socket.gethostbyname(socket.getfqdn())'

  22. #

  23. # listening_hostname=

  24.  
  25. # An alternate hostname to report as the hostname for this host in CM.

  26. # Useful when this agent is behind a load balancer or proxy and all

  27. # inbound communication must connect through that proxy.

  28. # reported_hostname=

  29.  
  30. # Port that supervisord should listen on.

  31. # NB: This only takes effect if supervisord is restarted.

  32. # supervisord_port=19001

  33.  
  34. # Log file. The supervisord log file will be placed into

  35. # the same directory. Note that if the agent is being started via the

  36. # init.d script, /var/log/cloudera-scm-agent/cloudera-scm-agent.out will

  37. # also have a small amount of output (from before logging is initialized).

  38. # log_file=/var/log/cloudera-scm-agent/cloudera-scm-agent.log

  39.  
  40. # Persistent state directory. Directory to store CM agent state that

  41. # persists across instances of the agent process and system reboots.

  42. # Particularly, the agent's UUID is stored here.

  43. # lib_dir=/var/lib/cloudera-scm-agent

  44.  
  45. # Parcel directory. Unpacked parcels will be stored in this directory.

  46. # Downloaded parcels will be stored in /../parcel-cache

  47. # parcel_dir=/opt/cloudera/parcels

  48.  
  49. # Enable supervisord event monitoring. Used in eager heartbeating, amongst

  50. # other things.

  51. # enable_supervisord_events=true

  52.  
  53. # Maximum time to wait (in seconds) for all metric collectors to finish

  54. # collecting data.

  55. max_collection_wait_seconds=10.0

  56.  
  57. # Maximum time to wait (in seconds) when connecting to a local role's

  58. # webserver to fetch metrics.

  59. metrics_url_timeout_seconds=30.0

  60.  
  61. # Maximum time to wait (in seconds) when connecting to a local TaskTracker

  62. # to fetch task attempt data.

  63. task_metrics_timeout_seconds=5.0

  64.  
  65. # The list of non-device (nodev) filesystem types which will be monitored.

  66. monitored_nodev_filesystem_types=nfs,nfs4,tmpfs

  67.  
  68. # The list of filesystem types which are considered local for monitoring purposes.

  69. # These filesystems are combined with the other local filesystem types found in

  70. # /proc/filesystems

  71. local_filesystem_whitelist=ext2,ext3,ext4,xfs

  72.  
  73. # The largest size impala profile log bundle that this agent will serve to the

  74. # CM server. If the CM server requests more than this amount, the bundle will

  75. # be limited to this size. All instances of this limit being hit are logged to

  76. # the agent log.

  77. impala_profile_bundle_max_bytes=1073741824

  78.  
  79. # The largest size stacks log bundle that this agent will serve to the CM

  80. # server. If the CM server requests more than this amount, the bundle will be

  81. # limited to this size. All instances of this limit being hit are logged to the

  82. # agent log.

  83. stacks_log_bundle_max_bytes=1073741824

  84.  
  85. # The size to which the uncompressed portion of a stacks log can grow before it

  86. # is rotated. The log will then be compressed during rotation.

  87. stacks_log_max_uncompressed_file_size_bytes=5242880

  88.  
  89. # The orphan process directory staleness threshold. If a diretory is more stale

  90. # than this amount of seconds, CM agent will remove it.

  91. orphan_process_dir_staleness_threshold=5184000

  92.  
  93. # The orphan process directory refresh interval. The CM agent will check the

  94. # staleness of the orphan processes config directory every this amount of

  95. # seconds.

  96. orphan_process_dir_refresh_interval=3600

  97.  
  98. # A knob to control the agent logging level. The options are listed as follows:

  99. # 1) DEBUG (set the agent logging level to 'logging.DEBUG')

  100. # 2) INFO (set the agent logging level to 'logging.INFO')

  101. scm_debug=INFO

  102.  
  103. # The DNS resolution collecion interval in seconds. A java base test program

  104. # will be executed with at most this frequency to collect java DNS resolution

  105. # metrics. The test program is only executed if the associated health test,

  106. # Host DNS Resolution, is enabled.

  107. dns_resolution_collection_interval_seconds=60

  108.  
  109. # The maximum time to wait (in seconds) for the java test program to collect

  110. # java DNS resolution metrics.

  111. dns_resolution_collection_timeout_seconds=30

  112.  
  113. # The directory location in which the agent-wide kerberos credential cache

  114. # will be created.

  115. # agent_wide_credential_cache_location=/var/run/cloudera-scm-agent

  116.  
  117. [Security]

  118. # Use TLS and certificate validation when connecting to the CM server.

  119. use_tls=0

  120.  
  121. # The maximum allowed depth of the certificate chain returned by the peer.

  122. # The default value of 9 matches the default specified in openssl's

  123. # SSL_CTX_set_verify.

  124. max_cert_depth=9

  125.  
  126. # A file of CA certificates in PEM format. The file can contain several CA

  127. # certificates identified by

  128. #

  129. # -----BEGIN CERTIFICATE-----

  130. # ... (CA certificate in base64 encoding) ...

  131. # -----END CERTIFICATE-----

  132. #

  133. # sequences. Before, between, and after the certificates text is allowed which

  134. # can be used e.g. for descriptions of the certificates.

  135. #

  136. # The file is loaded once, the first time an HTTPS connection is attempted. A

  137. # restart of the agent is required to pick up changes to the file.

  138. #

  139. # Note that if neither verify_cert_file or verify_cert_dir is set, certificate

  140. # verification will not be performed.

  141. # verify_cert_file=

  142.  
  143. # Directory containing CA certificates in PEM format. The files each contain one

  144. # CA certificate. The files are looked up by the CA subject name hash value,

  145. # which must hence be available. If more than one CA certificate with the same

  146. # name hash value exist, the extension must be different (e.g. 9d66eef0.0,

  147. # 9d66eef0.1 etc). The search is performed in the ordering of the extension

  148. # number, regardless of other properties of the certificates. Use the c_rehash

  149. # utility to create the necessary links.

  150. #

  151. # The certificates in the directory are only looked up when required, e.g. when

  152. # building the certificate chain or when actually performing the verification

  153. # of a peer certificate. The contents of the directory can thus be changed

  154. # without an agent restart.

  155. #

  156. # When looking up CA certificates, the verify_cert_file is first searched, then

  157. # those in the directory. Certificate matching is done based on the subject name,

  158. # the key identifier (if present), and the serial number as taken from the

  159. # certificate to be verified. If these data do not match, the next certificate

  160. # will be tried. If a first certificate matching the parameters is found, the

  161. # verification process will be performed; no other certificates for the same

  162. # parameters will be searched in case of failure.

  163. #

  164. # Note that if neither verify_cert_file or verify_cert_dir is set, certificate

  165. # verification will not be performed.

  166. # verify_cert_dir=

  167.  
  168. # PEM file containing client private key.

  169. # client_key_file=

  170.  
  171. # A command to run which returns the client private key password on stdout

  172. # client_keypw_cmd=

  173.  
  174. # If client_keypw_cmd isn't specified, instead a text file containing

  175. # the client private key password can be used.

  176. # client_keypw_file=

  177.  
  178. # PEM file containing client certificate.

  179. # client_cert_file=

  180.  
  181. ## Location of Hadoop files. These are the CDH locations when installed by

  182. ## packages. Unused when CDH is installed by parcels.

  183. [Hadoop]

  184. #cdh_crunch_home=/usr/lib/crunch

  185. #cdh_flume_home=/usr/lib/flume-ng

  186. #cdh_hadoop_bin=/usr/bin/hadoop

  187. #cdh_hadoop_home=/usr/lib/hadoop

  188. #cdh_hbase_home=/usr/lib/hbase

  189. #cdh_hbase_indexer_home=/usr/lib/hbase-solr

  190. #cdh_hcat_home=/usr/lib/hive-hcatalog

  191. #cdh_hdfs_home=/usr/lib/hadoop-hdfs

  192. #cdh_hive_home=/usr/lib/hive

  193. #cdh_httpfs_home=/usr/lib/hadoop-httpfs

  194. #cdh_hue_home=/usr/share/hue

  195. #cdh_hue_plugins_home=/usr/lib/hadoop

  196. #cdh_impala_home=/usr/lib/impala

  197. #cdh_kudu_home=/usr/lib/kudu

  198. #cdh_llama_home=/usr/lib/llama

  199. #cdh_mr1_home=/usr/lib/hadoop-0.20-mapreduce

  200. #cdh_mr2_home=/usr/lib/hadoop-mapreduce

  201. #cdh_oozie_home=/usr/lib/oozie

  202. #cdh_parquet_home=/usr/lib/parquet

  203. #cdh_pig_home=/usr/lib/pig

  204. #cdh_solr_home=/usr/lib/solr

  205. #cdh_spark_home=/usr/lib/spark

  206. #cdh_sqoop_home=/usr/lib/sqoop

  207. #cdh_sqoop2_home=/usr/lib/sqoop2

  208. #cdh_yarn_home=/usr/lib/hadoop-yarn

  209. #cdh_zookeeper_home=/usr/lib/zookeeper

  210. #hive_default_xml=/etc/hive/conf.dist/hive-default.xml

  211. #webhcat_default_xml=/etc/hive-webhcat/conf.dist/webhcat-default.xml

  212. #jsvc_home=/usr/libexec/bigtop-utils

  213. #tomcat_home=/usr/lib/bigtop-tomcat

  214. #oracle_home=/usr/share/oracle/instantclient

  215.  
  216. ## Location of Cloudera Management Services files.

  217. [Cloudera]

  218. #mgmt_home=/usr/share/cmf

  219.  
  220. ## Location of JDBC Drivers.

  221. [JDBC]

  222. #cloudera_mysql_connector_jar=/usr/share/java/mysql-connector-java.jar

  223. #cloudera_oracle_connector_jar=/usr/share/java/oracle-connector-java.jar

  224. #By default, postgres jar is found dynamically in $MGMT_HOME/lib

  225. #cloudera_postgresql_jdbc_jar=

更多的配置 ,请参考以下文档:

 

https://www.cloudera.com/documentation/enterprise/6/latest/topics/cm_ag_agent_config.html#cmug_topic_5_16__section_kw3_5rq_wm

 

  2.5 步骤5  安装并配置 MySQL (只需要 主节点)

 

首先是安装 mysql , 我是按照 官网的教程去安装的:

https://www.cloudera.com/documentation/enterprise/6/6.0/topics/cm_ig_mysql.html#cmig_topic_5_5

 

wget http://repo.mysql.com/mysql-community-release-el7-5.noarch.rpm
sudo rpm -ivh mysql-community-release-el7-5.noarch.rpm
sudo yum update    该步骤耗时较长 20-30 min,并不必要
sudo yum install mysql-server
sudo systemctl start mysqld

 

检查是否安装好:

启动服务:

sudo systemctl start mysqld

执行 :

mysql 

看是否能空密码连接上。

 

 

接下来我们主要修改下配置文件 (注意将 mysql 服务关闭 ):

配置文件所在地  /etc/my.cnf  (默认配置地址)

改为以下段落,记得修改前进行备份:

 
  1. [mysqld]

  2. datadir=/var/lib/mysql

  3. socket=/var/lib/mysql/mysql.sock

  4. transaction-isolation = READ-COMMITTED

  5. # Disabling symbolic-links is recommended to prevent assorted security risks;

  6. # to do so, uncomment this line:

  7. symbolic-links = 0

  8.  
  9. key_buffer_size = 32M

  10. max_allowed_packet = 32M

  11. thread_stack = 256K

  12. thread_cache_size = 64

  13. query_cache_limit = 8M

  14. query_cache_size = 64M

  15. query_cache_type = 1

  16.  
  17. max_connections = 550

  18. #expire_logs_days = 10

  19. #max_binlog_size = 100M

  20.  
  21. #log_bin should be on a disk with enough free space.

  22. #Replace '/var/lib/mysql/mysql_binary_log' with an appropriate path for your

  23. #system and chown the specified folder to the mysql user.

  24. log_bin=/var/lib/mysql/mysql_binary_log

  25.  
  26. #In later versions of MySQL, if you enable the binary log and do not set

  27. #a server_id, MySQL will not start. The server_id must be unique within

  28. #the replicating group.

  29. server_id=1

  30.  
  31. binlog_format = mixed

  32.  
  33. read_buffer_size = 2M

  34. read_rnd_buffer_size = 16M

  35. sort_buffer_size = 8M

  36. join_buffer_size = 8M

  37.  
  38. # InnoDB settings

  39. innodb_file_per_table = 1

  40. innodb_flush_log_at_trx_commit = 2

  41. innodb_log_buffer_size = 64M

  42. innodb_buffer_pool_size = 4G

  43. innodb_thread_concurrency = 8

  44. innodb_flush_method = O_DIRECT

  45. innodb_log_file_size = 512M

  46.  
  47. [mysqld_safe]

  48. log-error=/var/log/mysqld.log

  49. pid-file=/var/run/mysqld/mysqld.pid

  50.  
  51. sql_mode=STRICT_ALL_TABLES

这些配置的含义,请参考本节开头的文档

 

 

之后我们把之前的日志文件拷贝出来:

 
  1. # 拷贝日志文件

  2.  
  3. Move old InnoDB log files /var/lib/mysql/ib_logfile0 and /var/lib/mysql/ib_logfile1 out of /var/lib/mysql/ to a backup location

将原来的日志移出

 

 

再启动mysql, 为了防止mysql 默认帐户 被人乱用,我们可以执行下面的工具,

  1. sudo /usr/bin/mysql_secure_installation
     
    1. [...]

    2. Enter current password for root (enter for none):

    3. OK, successfully used password, moving on...

    4. [...]

    5. Set root password? [Y/n] Y

    6. New password:

    7. Re-enter new password:

    8. Remove anonymous users? [Y/n] Y

    9. [...]

    10. Disallow root login remotely? [Y/n] N

    11. [...]

    12. Remove test database and access to it [Y/n] Y

    13. [...]

    14. Reload privilege tables now? [Y/n] Y

    15. All done!

 

最后不要忘记, 将mysql 加到 开机启动中。

sudo systemctl enable mysqld

 

 

  2.6 步骤6  配置CM 及其他服务的数据库 基于 MySQL,增加mysql-connector 包

我们需要给 mysql 增加必要的用户,用于其他组件 将 mysql 作为源数据库。

 

Databases for Cloudera Software
Service Database User
Cloudera Manager Server scm scm
Activity Monitor amon amon
Reports Manager rman rman
Hue hue hue
Hive Metastore Server metastore hive
Sentry Server sentry sentry
Cloudera Navigator Audit Server nav nav
Cloudera Navigator Metadata Server navms navms
Oozie oozie oozie

我们登录mysql 执行以下指令,我们给每个数据库设置密码 123456 :

CREATE DATABASE scm DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
CREATE DATABASE amon DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
CREATE DATABASE rman DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
CREATE DATABASE hue DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
CREATE DATABASE metastore DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
CREATE DATABASE sentry DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
CREATE DATABASE nav DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
CREATE DATABASE navms DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;
CREATE DATABASE oozie DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_general_ci;

 

GRANT ALL ON scm.* TO 'scm'@'%' IDENTIFIED BY '123456';
GRANT ALL ON amon.* TO 'amon'@'%' IDENTIFIED BY '123456';
GRANT ALL ON rman.* TO 'rman'@'%' IDENTIFIED BY '123456';
GRANT ALL ON hue.* TO 'hue'@'%' IDENTIFIED BY '123456';
GRANT ALL ON metastore.* TO 'metastore'@'%' IDENTIFIED BY '123456';
GRANT ALL ON sentry.* TO 'sentry'@'%' IDENTIFIED BY '123456';
GRANT ALL ON nav.* TO 'nav'@'%' IDENTIFIED BY '123456';
GRANT ALL ON navms.* TO 'navms'@'%' IDENTIFIED BY '123456';
GRANT ALL ON oozie.* TO 'oozie'@'%' IDENTIFIED BY '123456';

#刷新权限:

FLUSH PRIVILEGES;

 

必须使用flush privileges的两种情况

1、改密码。

2、授权超用户。

 

其他

  1. Confirm that you have created all of the databases:
    SHOW DATABASES;
    You can also confirm the privilege grants for a given user by running:
    SHOW GRANTS FOR ''@'%';

 

除此以外,我们需要 

Installing the MySQL JDBC Driver

具体步骤如下:

 

  1. Download the MySQL JDBC driver from http://www.mysql.com/downloads/connector/j/5.1.html (in .tar.gz format). As of the time of writing, you can download version 5.1.46 using wget as follows:
    wget https://dev.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.46.tar.gz
  2. Extract the JDBC driver JAR file from the downloaded file. For example:
    tar zxvf mysql-connector-java-5.1.46.tar.gz
  3. Copy the JDBC driver, renamed, to /usr/share/java/. If the target directory does not yet exist, create it. For example:
     
    1. sudo mkdir -p /usr/share/java/

    2. cd mysql-connector-java-5.1.46

    3. sudo cp mysql-connector-java-5.1.46-bin.jar /usr/share/java/mysql-connector-java.jar

  2.7 步骤7  初始化数据库,启动服务

由于这里我们用到了 cm 使用 mysql 作为源数据库,所以我们应该对 mysql 进行初始化,

初始化 的步奏 如下 

The syntax for the scm_prepare_database.sh script is as follows:

sudo /opt/cloudera/cm/schema/scm_prepare_database.sh [options]    

示例如下:

 
  1. sudo /opt/cloudera/cm/schema/scm_prepare_database.sh mysql -h db01.example.com --scm-host cm01.example.com scm scm

  2.  
  3. Enter database password:

该步骤做完之后,我们就可以启动 cm 了

 

启动命令如下:

sudo systemctl start cloudera-scm-server

开机启动 

 sudo systemctl enable cloudera-scm-server

 

  2.8 步骤8  Parcel 安装  

如果这些步骤都顺利进行的话,我们应该可以访问  cm 的 7180 端口 进行初始化配置了。

初始化界面如下:

 

 

我们进行下一步。如果 我们的 agent 配置合理,可以扫描到集群中的机器。 参考 2.4

配置各个节点的 agent,使得各个节点的 agent 都指向 主节点:

主要通过以下步奏:

修改  /etc/cloudera-scm-agent/config.ini

[General]
# Hostname of the CM server.
#server_host=localhost
server_host=cdh-manager

# Port that the CM server is listening on.
server_port=7182

 

然后我们选择 parcel  6.0.1 网络安装  ,国内大约要 8-10 h ,我们可以下载离线包的形式进行配置。参考 第4 章 ,加速安装部分。

 

安装最后,我们再登录 cm 即可看到 管理界面了。

 

 

3.常见问题详解

 

  3.1 安装中断,如何继续

出现原因:

parcel 包下载自动安装后,服务没分配

 

Step1 :

在顶部状态栏:

 选择继续下一步安装

 

Step2:

分配服务:

 

 

 

 

  3.2 Inspector 检测 出现各种错误警告 

CDH创建Mysql数据库时能找到服务器找不到数据库
在创建新的数据库后应运行以下语句来修改数据库访问权限

mysql -uroot -p1234 -e"GRANT ALL PRIVILEGES ON *.* TO 'root'@'%' IDENTIFIED BY '1234' WITH GRANT OPTION;FLUSH PRIVILEGES;"

===============================

 

Cloudera 建议将 /proc/sys/vm/swappiness 设置为最大值 10。当前设置为 30。

echo "vm.swappiness=10" >> /etc/sysctl.conf

===========================


已启用透明大页面压缩,可能会导致重大性能问题

 
  1. echo "echo never > /sys/kernel/mm/transparent_hugepage/defrag" >> /etc/rc.local 

  2. echo "echo never > /sys/kernel/mm/transparent_hugepage/enabled" >> /etc/rc.local 

  3. chmod +x /etc/rc.d/rc.local


===================


自启动CDH挂掉
原因:ip地址映射错误 
Mysql服务未启动

 

========================

CDH安装Yarn时NodeManager启动失败
Error starting NodeManager 
org.apache.hadoop.service.ServiceStateException: org.fusesource.leveldbjni.internal.NativeDB$DBException: IO error: /var/lib/hadoop-yarn/yarn-nm-recovery/yarn-nm-state/LOCK: 权限不够 

方法一:修改目录权限 

chmod -R 755 /var/lib/hadoop-yarn/ 

方法二:删除该对应子目录重试CDH添加服务操作

CDH内存不足超过阈值
调整主机配置参数 默认0.8 剩余内存为系统占用(可调整优化)
 

 

  3.3 运行日志在哪里

 

运行日志 所在目录  

/var/log

下  cloudera 开头的都是 目录 ,重要的是 cloudera-scm-server

 

 

4.几种加快安装的方式

 

  4.1 启动httpd 服务,使用离线 parcel 包安装

 

 

 

这里我们需要借用httpd 搭建一个网络服务

 

Httpd 

CentOS7 默认是不安装 httpd 的,我们通过以下指令先安装下 httpd 服务

安装:

     yum install -y httpd

启动httpd 服务:

    systemctl start httpd

开机启动

   systemctl enable httpd

 

启动后,我们可以直接通过地址访问一下,看下服务是否启动正常: httpd 默认使用的就是 80端口

 

 

注意 httpd 有一定的公约,/var/www/html 是 httpd 的根目录,

假设我们 将文件放在  /var/www/html/cdh6/parcels/6.0.1  下,我们可以通过页面访问到 :

 http://192.168.75.134/cdh6/parcels/6.0.1/

 

==================================

 

这里我们需要从官网上下载对应系统的 parcel,在安装时候通过本地加速

6.0.1 版本的地址如下,

https://archive.cloudera.com/cdh6/6.0.1/parcels/

注意 

manifest.json  与 sha256 校验文件最好一并下载 
 

 

 

4.2 创建本地数据源

 

step1 : 搭建httpd 服务

这里我们需要借鉴上一个小节 4.1 的 httpd 服务,不熟悉的可以先看下 httpd 服务如何搭建。

搭建好服务后,启动httpd

 

step2: 下载所需要的包

我们就从官网上下载对应系统 rpm 包

官网地址 

https://archive.cloudera.com/cm6/6.0.1/redhat7/yum/RPMS/x86_64/

将该目录下的rpm全部下载。下载好后,我们将这些包放置在

 /var/www/html/yum/centos7/repo/

目录下。

 

step3: 创建本地 repo 源

首先,我们要为下载的这些rpm,生成一个依赖关系。

我们执行以下指令 

 cd /var/www/html/yum/centos7/repo/

createrepo .

 

之后。会生成一个 repodata 文件夹

 

 

除此之外,我们还需要为源,编写 .repo 文件来描述我们的源。

首先,我们切换到所有 源的配置目录

cd /etc/yum.repos.d/
 

新建一个 localCDH.repo

内容如下:

 
  1.  
  2. [localCDHRepo]

  3. name=local_CDH_Repo

  4. baseurl=http://192.168.75.134/yum/centos7/repo/

  5. enabled=true

  6. gpgcheck=false

  7.  

 

   

最后,我们执行下 

yum makecache  

 
  1. [root@cdh-repo yum.repos.d]# yum makecache

  2. Loaded plugins: fastestmirror, langpacks

  3. Loading mirror speeds from cached hostfile

  4. * base: mirrors.tuna.tsinghua.edu.cn

  5. * extras: mirrors.neusoft.edu.cn

  6. * updates: mirrors.tuna.tsinghua.edu.cn

  7. base | 3.6 kB 00:00:00

  8. epel | 3.2 kB 00:00:00

  9. extras | 3.4 kB 00:00:00

  10. localCDHRepo | 2.9 kB 00:00:00

  11. updates | 3.4 kB 00:00:00

  12. Metadata Cache Created

更新本地缓存,这样就加载了我们新创建的源。

 

我们验证一下:

 
  1. [root@cdh-repo yum.repos.d]# yum search cloudera

  2. Loaded plugins: fastestmirror, langpacks

  3. Loading mirror speeds from cached hostfile

  4. * base: mirrors.tuna.tsinghua.edu.cn

  5. * extras: mirrors.neusoft.edu.cn

  6. * updates: mirrors.tuna.tsinghua.edu.cn

  7. ========================================================================== N/S matched: cloudera ===========================================================================

  8. cloudera-manager-agent.x86_64 : The Cloudera Manager Agent

  9. cloudera-manager-server.x86_64 : The Cloudera Manager Server

  10. cloudera-manager-server-db-2.x86_64 : Embedded database for the Cloudera Manager Server

  11. cloudera-manager-daemons.x86_64 : Provides daemons for monitoring Hadoop and related tools.

  12.  
  13. Name and summary matches only, use "search all" for everything.

  14. [root@cdh-repo yum.repos.d]#

可以看到能找到相对应的包。

 

 

之后,我们把   localCDH.repo  放到其他机器的 /etc/yum.repos.d/ 目录下,让其他机器也可以通过 httpd 访问到这些 rpm。

 

你可能感兴趣的:(Hadoop,Linux)