CDH简介:
在Hadoop生态系统中,规模最大、知名度最高的公司则是Cloudera。现在国内很多公司也都选用他们的发行版本(CDH)。
CDH:一个对Apache Hadoop的集成环境的封装,可以使用Cloudera Manager进行自动化安装。CDH有企业版和免费版,下面介绍一下免费版可以使用的功能:
1、可以使用的组件
HDFS ,MapReduce,Hive,hue,impala,oozie,sqoop,zookeeper,hbase
2、集群管理可以使用的功能
节点的启动、停止、添加;可以创建管理员(all)和非管理员(only read)两种用户;组件的状态监控、配置信息修改;日志信息修改。
环境
三台主机:dev-cdh1, dev-cdh2, dev-cdh3
OS:CentOS 6.5 64位
JDK:Oracle JDK jdk-1.7.0_65
配置hosts记录
vi /etc/hosts #hadoop(hdfs+hbase) 10.57.1.116 dev-cdh1 10.57.1.120 dev-cdh2 10.57.1.148 dev-cdh3
安装ZooKeeper(集群模式)
Node Type: dev-cdh1, dev-cdh2, dev-cdh3
1.所有节点安装zookeeper, zookeeper-server
yum install -y zookeeper zookeeper-server
2.所有节点修改zookeeper配置文件
vi /etc/zookeeper/conf/zoo.cfg
增加节点的配置
server.1=dev-cdh1:2888:3888 server.2=dev-cdh2:2888:3888 server.3=dev-cdh3:2888:3888
3.所有节点初始化zookeeper-server
每个节点的myid唯一
dev-cdh1:service zookeeper-server init --myid=1 dev-cdh2:service zookeeper-server init --myid=2 dev-cdh3:service zookeeper-server init --myid=3
4.所有节点启动zookeeper
service zookeeper-server start
5.查看zookeeper状态
zookeeper-server status
安装CDH(集群模式,HDFS+YARN)
Node Type: namenode: dev-cdh1 datanode: dev-cdh1, dev-cdh2, dev-cdh3 yarn-resourcemanager: dev-cdh2 yarn-nodemanager: dev-cdh1, dev-cdh2, dev-cdh3 mapreduce-historyserver: dev-cdh3 yarn-proxyserver: dev-cdh3
node1: yum install hadoop-hdfs-namenode node2: yum install hadoop-yarn-resourcemanager node3: yum install hadoop-mapreduce-historyserver hadoop-yarn-proxyserver 所有节点: yum install hadoop-client yum install hadoop-yarn-nodemanager hadoop-hdfs-datanode hadoop-mapreduce
部署CDH
1.部署HDFS
(1) 配置文件
core-site.xml
<property> <name>fs.defaultFS</name> <value>hdfs://dev-cdh1:8020</value> </property> <property> <name>fs.trash.interval</name> <value>1440</value> </property>
hdfs-site.xml
<property> <name>dfs.permissions.superusergroup</name> <value>hadoop</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/hadoop/hdfs/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/hadoop/hdfs/datanode</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property>
slaves
dev-cdh1 dev-cdh2 dev-cdh3
(2)创建namenode和datanode文件夹
namenode:
mkdir -p /hadoop/hdfs/namenode chown -R hdfs:hdfs /hadoop/hdfs/namenode chmod 700 /hadoop/hdfs/namenode
datanode:
mkdir -p /hadoop/hdfs/datanode chown -R hdfs:hdfs /hadoop/hdfs/datanode chmod 700 /hadoop/hdfs/datanode
(3)格式化namenode
sudo -u hdfs hadoop namenode -format
(4)启动hdfs
namenode(dev-cdh1):
service hadoop-hdfs-namenode start
datanode(dev-cdh1, dev-cdh2, dev-cdh3):
service hadoop-hdfs-datanode start
(for x in `cd /etc/init.d ; ls hadoop-hdfs-*` ; do sudo service $x start ; done)
(5)查看hdfs状态
sudo -u hdfs hdfs dfsadmin -report sudo -u hdfs hadoop fs -ls -R -h /
(6)创建HDFS临时文件夹
sudo -u hdfs hadoop fs -mkdir /tmp sudo -u hdfs hadoop fs -chmod -R 1777 /tmp
http://10.57.1.116:50070
2.部署YARN
(1)配置YARN
mapred-site.xml:
<property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>dev-cdh3:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>dev-cdh3:19888</value> </property>
yarn-site.xml
<property> <name>yarn.resourcemanager.address</name> <value>dev-cdh2:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>dev-cdh2:8030</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>dev-cdh2:8088</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>dev-cdh2:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>dev-cdh2:8033</value> </property> <property> <description>Classpath for typical applications.</description> <name>yarn.application.classpath</name> <value> $HADOOP_CONF_DIR, $HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*, $HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*, $HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*, $HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/* </value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.nodemanager.local-dirs</name> <value>/hadoop/data/yarn/local</value> </property> <property> <name>yarn.nodemanager.log-dirs</name> <value>/hadoop/data/yarn/logs</value> </property> <property> <name>yarn.aggregation.enable</name> <value>true</value> </property> <property> <description>Where to aggregate logs</description> <name>yarn.nodemanager.remote-app-log-dir</name> <value>/var/log/hadoop-yarn/apps</value> </property> <property> <name>yarn.app.mapreduce.am.staging-dir</name> <value>/user</value> </property>
(2)所有nodemanager创建本地目录
sudo mkdir -p /hadoop/data/yarn/local sudo chown -R yarn:yarn /hadoop/data/yarn/local
sudo mkdir -p /hadoop/data/yarn/logs sudo chown -R yarn:yarn /hadoop/data/yarn/logs
(3)创建HDFS目录
sudo -u hdfs hadoop fs -mkdir -p /user/history sudo -u hdfs hadoop fs -chmod -R 1777 /user/history sudo -u hdfs hadoop fs -chown yarn /user/history
sudo -u hdfs hadoop fs -mkdir -p /var/log/hadoop-yarn sudo -u hdfs hadoop fs -chown yarn:mapred /var/log/hadoop-yarn
(4)启动YARN
ResourceManager(dev-cdh2):
sudo service hadoop-yarn-resourcemanager start
NodeManager(dev-cdh1, dev-cdh2, dev-cdh3):
sudo service hadoop-yarn-nodemanager start
MapReduce JobHistory Server(dev-cdh3):
sudo service hadoop-mapreduce-historyserver start
(5)创建YARN的HDFS用户目录
sudo -u hdfs hadoop fs -mkdir -p /user/$USER sudo -u hdfs hadoop fs -chown $USER /user/$USER
(6)测试
查看节点状态
yarn node -all -list hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar randomwriter input
(7)关闭
sudo service hadoop-yarn-resourcemanager stop sudo service hadoop-yarn-nodemanager stop sudo service hadoop-mapreduce-historyserver stop
http://10.57.1.120:8088/
安装和部署HBase
Node Type: hbase-master: dev-cdh1, dev-cdh3 hbase-regionserver: dev-cdh1, dev-cdh2, dev-cdh3 hbase-thrift: dev-cdh3 hbase-rest: dev-cdh1, dev-cdh2, dev-cdh3
1.安装HBase
(1)修改配置
/etc/security/limits.conf,增加配置
hdfs - nofile 32768 hbase - nofile 32768
hdfs-site.xml,增加配置
<property> <name>dfs.datanode.max.xcievers</name> <value>4096</value> </property>
(2)安装HBase
hbase-master: sudo yum install hbase hbase-master hbase-regionserver: sudo yum install hbase hbase-regionserver hbase-thrift: sudo yum install hbase-thrift hbase-rest: sudo yum install hbase-rest
(3)配置HBase
hbase-site.xml
<property> <name>hbase.rest.port</name> <value>60050</value> </property> <property> <name>hbase.zookeeper.quorum</name> <value>dev-cdh1, dev-cdh2, dev-cdh3</value> </property> <property> <name>hbase.cluster.distributed</name> <value>true</value> </property> <property> <name>hbase.tmp.dir</name> <value>/hadoop/hbase</value> </property> <property> <name>hbase.rootdir</name> <value>hdfs://dev-cdh1:8020/hbase/</value> </property>
(4)创建本地目录
mkdir -p /hadoop/hbase chown -R hbase:hbase /hadoop/hbase
(5)创建hbase的HDFS目录
sudo -u hdfs hadoop fs -mkdir /hbase/ sudo -u hdfs hadoop fs -chown hbase /hbase
(6)启动HBase
hbase-master: sudo service hbase-master start hbase-regionserver: sudo service hbase-regionserver start hbase-thrift: sudo service hbase-thrift start hbase-rest: sudo service hbase-rest start
http://10.57.1.116:60010
本文出自 “让一切随风” 博客,谢绝转载!