大家经常说CDH,其全称是:Cloudera’s Distribution Including Apache Hadoop,简单的说是Cloudera公司的Hadoop平台,是在Apache原生的Hadoop组件基础上进行了封装和加强。CDH里面有些什么东西呢?如下图:
那么这个CDH软件如何安装呢?Cloudera公司提供了一套安装CDH,管理、维护CDH各组件的一个软件,叫做Cloudera Manager(以下简称为CM)。CM本身是一种主从结构,由CM Server和CM agent构成,所以,在后面可以看到,在安装CM时,是要先在一台主机上安装CM Server,然后在各个主机上安装CM agent。
我们接下来要讲的就是利用CM 5.6 来安装CDH 5.6。
在Cloudera的官网上CM安装CDH的文档中,介绍了几种安装方法:A、B、C。对于生产环境,可以选用B和C。B是先手工安装好CM,然后通过CM自动来安装其他组件。而C是CM和其他所有组件都是通过tarball的方式进行手工安装。我们采用的是CM用tarball来安装,其他组件都用CM来安装。
以下没有特殊说明,都采用root用户操作
SuSEfirewall2 stop
ssh-keygen -t rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
然后将每台机器中的~/.ssh/authorized_keys文件中的内容追加到其他机器~/.ssh/authorized_keys文件的末尾。
export JAVA_HOME=JAVA安装地址
export PATH=.:$JAVA_HOME/bin:$PATH
使其生效
source .bash_profile
rpm -e mysql --nodeps
然后再安装:
rpm -ivh MySQL-server-5.5.28-1.linux2.6.x86_64.rpm
touch /etc/my.cnf
里面的内容可以使用文档中推荐的配置值:
[mysqld]
transaction-isolation = READ-COMMITTED
# Disabling symbolic-links is recommended to prevent assorted security risks;
# to do so, uncomment this line:
# symbolic-links = 0
key_buffer = 16M
key_buffer_size = 32M
max_allowed_packet = 32M
thread_stack = 256K
thread_cache_size = 64
query_cache_limit = 8M
query_cache_size = 64M
query_cache_type = 1
max_connections = 550
#expire_logs_days = 10
#max_binlog_size = 100M
#log_bin should be on a disk with enough free space. Replace '/var/lib/mysql/mysql_binary_log' with an appropriate path for your system
#and chown the specified folder to the mysql user.
log_bin=/var/lib/mysql/mysql_binary_log
# For MySQL version 5.1.8 or later. Comment out binlog_format for older versions.
binlog_format = mixed
read_buffer_size = 2M
read_rnd_buffer_size = 16M
sort_buffer_size = 8M
join_buffer_size = 8M
# InnoDB settings
innodb_file_per_table = 1
innodb_flush_log_at_trx_commit = 2
innodb_log_buffer_size = 64M
innodb_buffer_pool_size = 4G
innodb_thread_concurrency = 8
innodb_flush_method = O_DIRECT
innodb_log_file_size = 512M
[mysqld_safe]
log-error=/var/log/mysqld.log
pid-file=/var/run/mysqld/mysqld.pid
sql_mode=STRICT_ALL_TABLES
chkconfig --add mysql
service mysql start
如果启动失败了,参看后面“遇到的问题”一节
/usr/bin/mysql_secure_installation
$ sudo /usr/bin/mysql_secure_installation
[…]
Enter current password for root (enter for none):
OK, successfully used password, moving on…
[…]
Set root password? [Y/n] y
New password:
Re-enter new password:
Remove anonymous users? [Y/n] Y
[…]
Disallow root login remotely? [Y/n] N
[…]
Remove test database and access to it [Y/n] Y
[…]
Reload privilege tables now? [Y/n] Y
All done!
tar zxvf mysql-connector-java-5.1.38.tar.gz
cp mysql-connector-java-5.1.38/mysql-connector-java-5.1.38-bin.jar /usr/share/java/mysql-connector-java.jar
create database hive DEFAULT CHARACTER SET utf8;
grant all on hive.* TO 'root'@'%' IDENTIFIED BY 'root';
flush privileges;
use hive;
由于采用tarball安装CM,可以参考文档
tar -xzf cloudera-manager*.tar.gz
useradd --system --home=/opt/cm-5.6.0/run/cloudera-scm-server --shell=/bin/false --comment "Cloudera SCM User" cloudera-scm
mkdir /var/lib/cloudera-scm-server
mkdir /var/log/cloudera-scm-server
chown cloudera-scm:cloudera-scm /var/log/cloudera-scm-server
配置CM agent
在CM Server上修改/opt/cm-5.6.0/etc/cloudera-scm-agent/config.ini文件,只需要将server_host修改为CM Server的主机名
将解压后的整个文件夹scp到其他各个主机
scp -r /opt/cm-5.6.0 各主机的/opt目录
mkdir -p /opt/cloudera/parcel-repo
chown cloudera-scm:cloudera-scm /opt/cloudera/parcel-repo
mkdir -p /opt/cloudera/parcels
chown cloudera-scm:cloudera-scm /opt/cloudera/parcels
/opt/cm-5.6.0/share/cmf/schema/scm_prepare_database.sh mysql scm -hlocalhost -uroot -proot --scm-host localhost scm scm scm
/opt/cm-5.6.0/etc/init.d/cloudera-scm-server start
cp /opt/cm-5.6.0/etc/init.d/cloudera-scm-server /etc/init.d/cloudera-scm-server
chkconfig cloudera-scm-server on
修改/etc/init.d/cloudera-scm-server文件的内容,将CMF_DEFAULTS 的值由 ${CMF_DEFAULTS:-/etc/default} 改为/opt/cm-5.6.0/etc/default
/opt/cm-5.6.0/etc/init.d/cloudera-scm-agent start
cp /opt/cm-5.6.0/etc/init.d/cloudera-scm-agent /etc/init.d/cloudera-scm-agent
chkconfig cloudera-scm-agent on
修改/etc/init.d/cloudera-scm-agent文件的内容,将CMF_DEFAULTS 的值由 ${CMF_DEFAULTS:-/etc/default} 改为/opt/cm-5.6.0/etc/default
注意:如果CM Server主机上也要启动CM Agent,则也要执行上述命令
如果CM Server和CM Agent都成功启动后,我们就可以安装CDH了。
mv CDH-5.6.0-1.cdh5.6.0.p0.45-sles11.parcel.sha1 CDH-5.6.0-1.cdh5.6.0.p0.45-sles11.parcel.sha
sysctl vm.swappiness=0
解决这些问题后,可以选择重新运行再做检查。
mysql_install_db --user=mysql
主机缺少用户错误
查看agent的日志,发现agent创建用户的时候失败了,通过搜索代码,发现如下文件中在创建用户:
/opt/cm-5.6.0/lib64/cmf/agent/src/cmf/parcel.py
它里面有这样一个代码,发现在使用useradd命令时,它使用了一个-U选项,这个选项在SuseLinux操作系统的useradd命令中是没有的,不知道其他OS的useradd是否支持此选项。
解决办法很简单,我们把499行这一段代码注释掉就可以了:
#umask_arg, umask_param,
487 for user, data in users.items():
488 try:
489 if self.is_suse:
490 umask_arg = '-U'
491 umask_param = '022'
492 else:
493 umask_arg = '-K'
494 umask_param = 'UMASK=022'
495
496 useradd_args = [ "/usr/sbin/useradd",
497 "-r", "-m",
498 "-g", user,
499 umask_arg, umask_param,
500 "--home", data['home'],
501 "--comment", data['longname'],
502 "--shell", data['shell'] ]
Traceback (most recent call last):
File “/opt/cm-5.6.0/lib64/cmf/agent/src/cmf/util.py”, line 370, in source
return dict((line.split(“=”, 1) for line in data.splitlines()))
ValueError: dictionary update sequence element #103 has length 1; 2 is required
网上有人贴出了如下解决方法:
这个错误是CM的一个bug,解决方法为修改/opt/cm-5.3.0/lib64/cmf/agent/src/cmf/util.py文件。将其中的代码:
pipe = subprocess.Popen([‘/bin/bash’, ‘-c’, “. %s; %s; env” % (path, command)],
stdout=subprocess.PIPE, env=caller_env)
修改为:
pipe = subprocess.Popen([‘/bin/bash’, ‘-c’, “. %s; %s; env | grep -v { | grep -v }” % (path, command)],
stdout=subprocess.PIPE, env=caller_env)
这个方法是过滤掉env的输出,但是对于我的环境是没有用的,其实代码就是把env的输出保存到一个字典中,每一行是一个key=value的形式,但是如果env的输出中存在只有key,没有=等号的情况,那么插入字典时就会失败。我在agnet的日志中看到了打印的env的输出,果然有一行是这样的:
CLASSPATH=/usr/java/java^M/lib
这个^M是一个特殊的字符,应该是\r\n这类的,是一个换行,从而导致/lib后面没有等号,所以解决办法应该是:
修改格式不对的环境变量的值