Ip位置:10.1.130.10----10.1.130.19
cmanager:10.1.130.10 master,namenode(Rstudio)
cslave1: 10.1.130.11 子节点,datanode
cslave2: 10.1.130.12 子节点,datanode
cslave3: 10.1.130.13 子节点,datanode
…
cslave9: 10.1.130.19 子节点,datanode
root 密码:123456
hadoop 密码:hadoop
hive 密码:hive (若有此用户,密码为此)
Java:jdk-7u71-linux-x64.rpm(hadoop2.7只支持jdk7+版本)
Hadoop:hadoop-2.7.1.tar.gz
软件来源:
wget http://archive.apache.org/dist/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz
ZooKeeper:zookeeper-3.4.6.tar.gz
wget http://archive.apache.org/dist/zookeeper/zookeeper-3.4.6/zookeeper-3.4.6.tar.gz
Hbase:hbase-1.0.1.1-bin.tar.gz
软件来源:
wget http://archive.apache.org/dist/hbase/stable/hbase-1.0.1.1-bin.tar.gz
Hive: apache-hive-1.2.1-bin.tar.gz
wget http://archive.apache.org/dist/hbase/stable/hbase-1.0.1.1-bin.tar.gz
软件来源:
wgethttp://archive.apache.org/dist/hive/stable/apache-hive-1.2.1-bin.tar.gz
Sqoop: sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz
软件来源:
wget http://archive.apache.org/dist/sqoop/1.4.6/sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz
Thrift: thrift-0.9.2.tar.gz
软件来源:
wget http://archive.apache.org/dist/thrift/0.9.2/thrift-0.9.2.tar.gz
Spark: spark-1.4.1.tgz
wget http://archive.apache.org/dist/spark/spark-1.4.1/spark-1.4.1-bin-hadoop2.6.tgz
#rpm -qa|grep rpm
jdk --这里是自定义的各类名称,想查什么写什么:python,java等
首先用root权限登录Linux系统,创建hadoop用户,命令如下:
useradd hadoop -p hadoop |
useradd是创建用户 hadoop,-p 是设置用户hadoop的密码为hadoop,当然也可以设置为别的密码。
每台机器都创建该账号
参考文档\\192.168.203.44\soft\HADOOP资料备份\4.documents\2.服务器安装记录文档.docx
1.网络设置
(1)修改hostname
(2)修改hosts
(3)修改网卡
(4)关闭防火墙,selinux
(5)测试是否通的
2.ssh(centos已经自带)
3.安装JAVA
[root@cslave9 jars]# rpm -ivhjdk-7u71-linux-x64.rpm
Preparing... ########################################### [100%]
1:jdk ###########################################[100%]
Unpacking JAR files...
rt.jar...
jsse.jar...
charsets.jar...
tools.jar...
localedata.jar...
jfxrt.jar...
安装后,文件夹在/usr下面,并且修改全局变量
# vim /etc/profile
/*-------------------------------------------------------------- 粘贴放到文档最后面即可
#JAVA
export JAVA_HOME=/usr/java/jdk1.7.0_13
export JRE_HOME=/usr/java/jdk1.7.0_13/jre
exportCLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JRE_HOME/lib
exportPATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin
export JAVA_LIBRARY_PATH=/gtabigdata/soft/hadoop/lib/native
---------------------------------------------------------------*/
4.配置VNCSERVER
5.机器间无密码访问
备注:在没有安装完成无密码验证的时候 每台机器无法通过hadoop账号实现ssh的时候,先用root账号拷贝authorized_keys到/home/hadoop/.ssh目录下 然后通过chown –Rhaodop:hadoop authorized_keys 授权该文档属于hadoop用户
如:
scp -r authorized_keys [email protected]:/home/hadoop/.ssh无效果
scp -pr .ssh/ 10.1.139.10:/home/hadoop 有效果的
chown -R hadoop:hadoop authorized_keys
在做完每个的复制后,将最终生成的结果再替换到每台机器,这样就可以实现每台机器之间的无密码验证登陆
本次安装所有原始jars包都准备在 /jars下
规划除java安装在默认的/usr/java目录下外,其它大数据平台软件均安装在
/ gtabigdata/soft下
Hadoop的hdfs文件目录放在/gtabigdata/hdfs/下,具体文件夹目录看各个软件安装的配置
软件过程使用的临时文件目录统一规划在/gtabigdata/tmp下
其它软件过程的数据放在 /gtabigdata/data/软件/目录下
[hadoop@cmanager jars]$ sudo chown -Rhadoop:hadoop /usr/gtabigdata
/gtabigdata
|
soft/
|
data/
|
hadoop/
|
hbase/
|
hive/
|
....
|
zookeeper/
|
tmp/
|
…
|
dfs/ |
name/me |
data/ |
dfs/ |
name/me |
data/ |
data1/
|
安装策略:主机(cmanager)解压安装完相关软件后拷贝到数据节点(cslave1-cslave9)
[hadoop@cmanager jars]$ sudo tar -zxvf hadoop-2.7.1.tar.gz [hadoop@cmanager jars]$ ll total 374444 drwxr-xr-x 9 10021 10021 4096 Jun 29 14:15 hadoop-2.7.1 [hadoop@cmanager jars]$ sudo mv hadoop-2.7.1 / gtabigdata/soft/hadoop [hadoop@cmanager gtabigdata]$ ll /gtabigdata/soft total 4 drwxr-xr-x 9 10021 10021 4096 Jun 29 14:15 hadoop |
[hadoop@cmanager root]$ sudo vim /etc/profile /*-------------------------------------------------------------------------粘贴放到文档最后面即可
#hadoop
export HADOOP_HOME=/gtabigdata/soft/hadoop
export PATH=.:$PATH:$HADOOP_HOME/bin
export HADOOP_PREFIX=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_PREFIX}
export HADOOP_HDFS_HOME=${HADOOP_PREFIX}
export HADOOP_MAPRED_HOME=${HADOOP_PREFIX}
export HADOOP_YARN_HOME=${HADOOP_PREFIX}
exportHADOOP_CONF_DIR="$HADOOP_HOME/etc/hadoop/"
export YARN_CONF_DIR=${HADOOP_CONF_DIR}
exportPATH=$PATH:$HADOOP_PREFIX/bin:$HADOOP_PREFIX/sbin
exportHADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
-------------------------------------------------------------------------*/
[hadoop@cmanager root]$ source /etc/profle
[hadoop@cmanager hadoop]$ ll /gtabigdata/soft/hadoop/etc/hadoop/
-rw-r--r-- 1 10021 10021 4224 Jun 29 14:15 hadoop-env.sh
-rw-r--r-- 1 10021 10021 4567 Jun 29 14:15 yarn-env.sh
(1)hadoop env
[hadoop@cmanager hadoop]$ vim /gtabigdata/soft/hadoop/etc/hadoop/hadoop-env.sh
修改25行:
/*-------------------------------------------------------------------------
export JAVA_HOME=/usr/java/jdk1.7.0_71
-------------------------------------------------------------------------*/
(2)yarn env
[hadoop@cmanager hadoop]$ vim /gtabigdata/soft/hadoop/etc/hadoop/yarn-env.sh
修改23行:
/*-------------------------------------------------------------------------
export JAVA_HOME=/usr/java/jdk1.7.0_71
-------------------------------------------------------------------------*/
[hadoop@cmanager hadoop]$ ll /gtabigdata/soft/hadoop/etc/hadoop/
-rw-r--r-- 1 10021 10021 774 Jun 29 14:15 core-site.xml
-rw-r--r-- 1 10021 10021 775 Jun 29 14:15 hdfs-site.xml
-rw-r--r-- 1 10021 10021 758 Jun 29 14:15 mapred-site.xml.template
-rw-r--r-- 1 10021 10021 10 Jun 29 14:15 slaves
-rw-r--r-- 1 10021 10021 690 Jun 29 14:15 yarn-site.xml
无mapered-site.xml文档通过以下步骤生成
[hadoop@cmanager hadoop]$ cd /gtabigdata/soft/hadoop/etc/hadoop [hadoop@cmanager hadoop]$ cp mapred-site.xml.template mapred-site.xml
|
2.3 配置${HADOOP_HOME}/etc/hadoop/core-site.xml
[hadoop@cmanager hadoop]$ sudo mkdir -p /gtabigdata/data/tmp/hadoop
[hadoop@cmanager hadoop]$ sudo chown -Rhadoop:hadoop /gtabigdata
[hadoop@cmanager hadoop]$ vim /gtabigdata/soft/hadoop/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://cmanager:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value> /gtabigdata/data/tmp/hadoop</value>
</property>
<property>
<name>hadoop.native.lib</name>
<value>true</value>
<description>Should native hadoop libraries, if present, beused.</description>
</property>
</configuration>
core-site.xml的各项默认配置可参考:http://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-common/core-default.xml
对于一个新集群来说,唯一必须修改的项是:fs.defaultFS,该项指明了文件系统的访问入口,实际上是告知所有的datanode它们的namenode是哪一个从而建立起namenode与各datanode之间的通信。
除此之外,按照前文约定,我们把hadoop.tmp.dir设置为/gtabigdata/data/tmp/hadoop。观察core-default.xml我们可以发现,在所有涉及目录的配置项上,默认都是在${hadoop.tmp.dir}之下建立子文件夹,所以本次安装我们只简单地将hadoop.tmp.dir的原默认值/tmp/hadoop-${user.name}改为/gtabigdata/data/tmp/hadoop,将所有hadoop生成和使用的文件集中/gtabigdata/data/tmp/hadoop下,避免与/tmp目录下若干其他文件混杂在一起。
2.4 配置${HADOOP_HOME}/etc/hadoop/hdfs-site.xml
[hadoop@cmanager hadoop]$ sudo mkdir -p /gtabigdata/data/dfs/name
[hadoop@cmanager hadoop]$ sudo mkdir -p /gtabigdata/data/dfs/data
[hadoop@cmanager hadoop]$sudo chown -Rhadoop:hadoop /gtabigdata/data
[hadoop@cmanager hadoop]$ sudo mkdir -p /gtabigdata/data1/dfs/name
[hadoop@cmanager hadoop]$ sudo mkdir -p /gtabigdata/data1/dfs/data
[hadoop@cmanager hadoop]$sudo chown -R hadoop:hadoop /gtabigdata/data1
[hadoop@cmanager hadoop]$ vim /gtabigdata/soft/hadoop/etc/hadoop/hdfs-site.xml
-----------------------------------------------------------------------------------------------------------------------------
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///gtabigdata/data/dfs/name,file:///gtabigdata/data1/dfs/name</value>
<final>true</final>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///gtabigdata/data/dfs/data,file:///gtabigdata/data1/dfs/data</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.permissions.superusergroup</name>
<value>hadoop</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration> ---------------------------------------------------------------------------------------------------------------------------------
hdfs-site.xml的各项默认配置可参考:http://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml对于一个新集群来说这个文件没没有必须修改的项,但是有几个重要的配置项是可能需要修改的,主要是如下几项:
dfs.namenode.secondary.http-address//指定secondary namenode结点,若不指定,在使用start-dfs.sh启动时,当前节点将自动成为secondary namenode
dfs.replication //每一个block的复制份数,默认是3,如果集群datanode结点数量小于3应将该值设置小于或等于datanode的结点数量
dfs.namenode.name.dir//存放namenode相关数据的文件夹
dfs.datanode.data.dir//存放datanade相关数据的文件夹
dfs.namenode.checkpoint.dir//存放secondary namenode相关数据的文件夹
对于后三项来说,它们的默认值也都是在${hadoop.tmp.dir}之下的子文件夹,你可以根据集群的实际情况修改这三个值。比如:把它们改到一个挂载到NFS上的文件夹。
2.5 配置${HADOOP_HOME}/etc/hadoop/mapred-site.xml
在${HADOOP_HOME}/etc/hadoop下拷贝一份mapred-site.xml.template命名为mapred-site.xml,添加如下内容:
[hadoop@cmanager hadoop]$ vim /gtabigdata/soft/hadoop/etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
mapred-site.xml的各项默认配置可参考:hadoop.apache.org/docs/r2.2.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml该文件唯一一个必须要修改的项是mapreduce.framework.name,该项告诉hadoop使用何种框架执行map-reduce任务。
另外该文件也有几个设及文件存放位置的配置项:
mapreduce.cluster.local.dir
mapreduce.jobtracker.system.dir
mapreduce.jobtracker.staging.root.dir
mapreduce.cluster.temp.dir
如有必要也应该做适当修改。
2.6 配置${HADOOP_HOME}/etc/hadoop/yarn-site.xml
[hadoop@cmanager gtabigdata]$ vim /gtabigdata/soft/hadoop/etc/hadoop/yarn-site.xml
-----------------------------------------------------------------------------------------------------------
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>cmanager</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
----------------------------------------------------------------------------------------------------------------
yarn-site.xml的各项默认配置可参考:http://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-common/yarn-default.xml该文件必须修改的项有两个,基中yarn.resourcemanager.hostname项与core-site.xml的fs.defaultFS项类似,该项为所有的nodemanager指明了resourcemanager,建立起resourcemanager与各nodemanager之间的通信。
另外该文件也有几个设及文件存放位置的配置项:
yarn.nodemanager.local-dirs
yarn.resourcemanager.fs.state-store.uri
如有必要也应该做适当修改。
通常情况我们使用start-dfs.sh脚本来启动整个集群,查看该脚本可以知道,该脚本会基于配置文件在目标结点上启动namenode,secondary namenode, 和slave(datanode)结点,slave(datanode)的列表是在${HADOOP_HOME}/etc/hadoop/slaves文件中配置的,一个结点一行。所以我们需要在此文件中添加所有的datanode机器名或IP
[hadoop@cmanagerhadoop]$ vim /gtabigdata/soft/hadoop/etc/hadoop/slaves
/*-------------------------------------------------------------------------
cslave1
cslave2
cslave3
-------------------------------------------------------------------------*/
注:以上有多少个Datanode就有写多少个,本次安装包含九个cslave1~cslave9
2.8 将上述配置的hadoop安装程序重新打包复制到所有结点的对应位置再展开,同时记得修改每个结点的/etc/profile,并且要建立相关的文件路径。
---每个datanode建立一下文件夹
sudo mkdir / gtabigdata
sudo mkdir -p /gtabigdata/data/tmp/hadoop
sudo mkdir -p /gtabigdata/data/dfs/name
sudo mkdir -p/gtabigdata/data/dfs/data
sudo mkdir -p /gtabigdata/data1/dfs/name
sudo mkdir -p /gtabigdata/data1/dfs/data
sudo chown -R hadoop:hadoop /gtabigdata
通过以下命令将hadoop拷贝到各个datanode上
scp -r /gtabigdata/soft/hadoop hadoop@cslave1:/gtabigdata/soft/
修改/etc/profile (可考虑复制但是要按照实际情况以免影响其它应用)
source /etc/profile
备注:关于账户设置
在正式的部署环境上,我们推荐设置专有账户来运行hadoop,例如:创建用户hadoop用户来启动namenode和datanode,创建yarn用户来启动resourcemanager和nodemanager.至于这些用户的用户组可以是同名用户组(这与CDH的做法一致的),也可以使用一个统一的用户组,如hadoop之类,这取决于你自己的喜好,但本质上这些用户彼此之间并没有必然的联系,让它们从属于同一个用户组并没有什么特殊的意味,所以我喜欢让每一个用户独立于自己的用户组。
需要特别提醒的是:如果配置专用账户启动hadoop相关服务需要将hadoop所使用的各种文件夹(例如:dfs.namenode.name.dir等等)的owner和owner group改为专有用户,否则会导致专有用户因为没有足够的权限访问这些文件夹而导致服务启动失败。
3.2 格式化集群
初启动前,需要首先格式化集群,执行命令:
[hadoop@cmanager usr]$ hadoop namenode–format
3.3 启动hdfs
执行:
start-dfs.sh
该命令可以任意结点上执行。不过需要注意的是如果配置文件中没有指明secondary namenode(即在hdfs-site.xml中没有配置dfs.namenode.secondary.http-address),那么在哪个结点上执行该命令,该点将自动成为secondary namenode.
以下单独启动某项服务的命令:
启动namenode
hadoop-daemon.sh startnamenode
启动secondarynamenode
hadoop-daemon.sh start secondarynamenode
启动datanode
hadoop-daemon.sh start datanode
启动之后,访问:
http://cmanger:50070
检查HDFS各结点情况,如都能访问表示HDFS已无问题,如无法访问或缺少节点,可分析log的中的信息找出问题原因。
3.4 启动yarn
执行:
start-yarn.sh
该命令可以任意结点上执行。其slaves结点与hdfs一样,读取的也是${HADOOP_HOME}/etc/hadoop/slaves文件。
以下单独启动某项服务的命令:
启动resourcemanager
yarn-daemon.sh start resourcemanager
启动nodemanager
yarn-daemon.sh startnodemanager
启动之后,访问:
http://cmanager:8088
检查YARN各结点情况,如都能访问表示YARN无问题,如无法访问或缺少节点,可分析log的中的信息找出问题原因
Start-all.sh该命令会包含start-dfs.sh,start-yarn.sh等等
[hadoop@cmanager jars]$ sudo tar -zxvfhbase-1.0.1.1-bin.tar.gz
[hadoop@cmanager jars]$ sudo mvhbase-1.0.1.1 /gtabigdata/soft/hbase
[hadoop@cmanager gtabigdata]$ cd /gtabigdata/soft/
[hadoop@cmanager gtabigdata]$ ll
total 8
drwxr-xr-x 10 hadoop hadoop 4096 Aug 4 17:45 hadoop
drwxr-xr-x 7 root root 4096 Aug 5 09:27 hbase
[hadoop@cmanager gtabigdata]$ sudo chown –Rhadoop:hadoop hbase
[hadoop@cmanager gtabigdata]$ sudo vim/etc/profile
/*-------------------------------------------------------------------------
#HBASE
export HBASE_HOME=/gtabigdata/soft/hbase
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin
-------------------------------------------------------------------------*/
[hadoop@cmanager gtabigdata]$ source /etc/profile
(1) vim /gtabigdata/soft/hbase/conf/hbase-env.sh
[hadoop@cmanager gtabigdata]$ vim /gtabigdata/soft/hbase/conf/hbase-env.sh
粘贴在文档最后面
/*-------------------------------------------------------------------------
export JAVA_HOME=/usr/java/jdk1.7.0_71
export HBASE_MANAGES_ZK=false -- 设置zookeeper由hbase管理,禁用自带的。
-------------------------------------------------------------------------*/
(2) vim /gtabigdata/soft/hbase/conf/hbase-site.xml
[hadoop@cmanager gtabigdata]$ vim /gtabigdata/soft/hbase/conf/hbase-site.xml
/*-------------------------------------------------------------------------
<property>
<name>hbase.rootdir</name>
<value>hdfs://master:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.master</name>
<value> master:60000</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>master,slave,slave1</value> #逗号分开,多个,必须单数
</property>
-------------------------------------------------------------------------*/
zookeeper有这样一个特性:集群中只要有过半的机器是正常工作的,那么整个集群对外就是可用的。也就是说如果有2个zookeeper,那么只要有1个死了zookeeper就不能用了,因为1没有过半,所以2个zookeeper的死亡容忍度为0;同理,要是有3个zookeeper,一个死了,还剩下2个正常的,过半了,所以3个zookeeper的容忍度为1;同理你多列举几个:2->0;3->1;4->1;5->2;6->2会发现一个规律,2n和2n-1的容忍度是一样的,都是n-1,所以为了更加高效,何必增加那一个不必要的zookeeper呢?
这里我们有九台机器,配置9台集群的zookeeper,实际中按照集群中zookeeper配置调整
添加从节点主机名
Regionservers配置同Hadoop配置下slaves,有多少数据节点,每行添加一条数据节点机器的名称
# [hadoop@cmanager gtabigdata]$ vim /gtabigdata/soft/hbase/conf/regionservers
/*-------------------------------------------------------------------------
cslave1
cslave2
cslave3
cslave4
cslave5
cslave6
cslave7
cslave8
cslave9
-------------------------------------------------------------------------*/
[hadoop@cmanager jars]$ sudo tar -zxvfzookeeper-3.4.6.tar.gz
[hadoop@cmanagerjars]$ sudo mv zookeeper-3.4.6 /gtabigdata/soft/zookeeper
[hadoop@cmanagerjars]$ cd /gtabigdata/soft/
[hadoop@cmanagergtabigdata]$ sudo chown -R hadoop:hadoop zookeeper/
[hadoop@cmanagergtabigdata]$ sudo vim /etc/profile
/*-------------------------------------------------------------------------
#zookeeper
export ZOOKEEPER_HOME=/gtabigdata/soft/zookeeper
exportPATH=$ZOOKEEPER_HOME/bin:$HBASE_HOME/bin:$HADOOP_HOME/sbin:$JAVA_HOME/bin:$PATH
-------------------------------------------------------------------------*/
[hadoop@cmanager gtabigdata]$ source/etc/profile
[hadoop@cmanagergtabigdata]$ mkdir -p /gtabigdata/data/zookeeper/data/
[hadoop@cmanagergtabigdata]$ mkdir -p /gtabigdata/data/zookeeper/log/
[hadoop@cmanagergtabigdata]$cp /gtabigdata/soft/zookeeper/conf/zoo_sample.cfg /gtabigdata/soft/zookeeper/conf/zoo.cfg
[hadoop@cmanagergtabigdata $ vim /gtabigdata/soft/zookeeper/conf/zoo.cfg
/*-------------------------------------------------------------------------
i
server.4=cslave3:2888:3888
server.5=cslave4:2888:3888
server.6=cslave5:2888:3888
server.7=cslave6:2888:3888
server.8=cslave7:2888:3888
server.9=cslave8:2888:3888
server.10=cslave9:2888:3888
#2888为Leader服务端口,3888为选举时所用的端口
这里我们有10台机器
-------------------------------------------------------------------------*/
在{dataDir }即=/gtabigdata/data/zookeeper/data/目录下 创建myid文件 ,编辑“myid”文件,并在对应的IP的机器上输入对应的编号。如在cmanager上,“myid”文件内容就是1(多个主机的myid内容都不一样,和zoo.cfg配置文件(server.*)一致即可)。
# vim/gtabigdata/data/zookeeper/data/myid
这里根据zoo.cfg的server.*)编号来定
[hadoop@cmanager conf]$ vim/gtabigdata/data/zookeeper/data/myid
/*-------------------------------------------------------------------------
1
-------------------------------------------------------------------------*/
配置说明:
tickTime:这个时间是作为 Zookeeper 服务器之间或客户端与服务器之间维持心跳的时间间隔,也就是每个 tickTime 时间就会发送一个心跳。
dataDir:顾名思义就是 Zookeeper 保存数据的目录,默认情况下,Zookeeper 将写数据的日志文件也保存在这个目录里。
dataLogDir:写数据的日志文件
clientPort:这个端口就是客户端连接 Zookeeper 服务器的端口,Zookeeper 会监听这个端口,接受客户端的访问请求。
Zookeeper启动|关闭|查看状态命令
zkSever.sh start|stop|stauts
以上软件均在主机(cmanger)和slave集群上均要安装,尤其zookeeper要在每台机器调整myid内容,hadoop+hbase主从机安装方法相同,可直接拷贝主机到slave,启动hadoop+hbase只需在主机启动即可,zookeeper要在每台机器启动,可考虑在主机写批量脚本一次启动。
[hadoop@cmanager jars]$ sudo tar -zxvf thrift-0.9.2.tar.gz
[hadoop@cmanager jars]$ sudo mv thrift-0.9.2 /gtabigdata/soft/thrift
[hadoop@cmanager jars]$ sudo chown -R hadoop:hadoop/gtabigdata/soft/thrift
# cd /gtabigdata/soft/thrift
# ./configure
# make
# make install
[hadoop@cmanager thrift]$ cd /gtabigdata/soft/thrift
[hadoop@cmanager thrift]$ ./configure
这里报个错:configure: error: Bison version 2.5 or highermust be installed on the system!
所以要升级Bison如下,如果没错可不用处理
wget http://ftp.gnu.org/gnu/bison/bison-2.5.1.tar.gz
tar xvf bison-2.5.1.tar.gz
cd bison-2.5.1
./configure --prefix=/usr
make
sudo make install
cd ..
-------make报错的问题:
[hadoop@cmanager thrift]$ make
[hadoop@cmanager thrift]$ make install
vim /etc/profile --------这个是后边安装rhbase,rhive需要的
export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/usr/local/lib/pkgconfig/
--------------------------------------------------------------#
export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/usr/local/lib/pkgconfig/
# pkg-config -cflags thrift
返回:
-I/usr/local/include/thrift
为正确
启动:
hbase-daemon.sh start thrift
复制lib
------thrift启动:
[hadoop@cmanagerbin]$ hbase-daemon.sh start thrift
startingthrift, logging to /gtabigdata/soft/hbase/logs/hbase-hadoop-thrift-cmanager.out
[hadoop@cmanagerbin]$ jps
5037 Jps
8532ResourceManager
8370SecondaryNameNode
9088 NameNode
12692 HMaster
4957ThriftServer---thrif对应的进程
[root@cmanagerjars]# tar -zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz
[hadoop@cmanagerjars]$ sudo mv sqoop-1.4.6.bin__hadoop-2.0.4-alpha /gtabigdata/soft/sqoop
[hadoop@cmanagerjars]$ cd /gtabigdata/soft/
[hadoop@cmanagersoft]$ ll
…
drwxr-xr-x 8 root root 4096 Apr 27 14:19 sqoop
[hadoop@cmanager soft]$ sudo chown hadoop:hadoop -R sqoop
修改/etc/profile
#sqoop
export SQOOP_HOME=/gtabigdata/soft/sqoop
exportPATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin:$SQOOP_HOME/bin
sqoop-site.xml修改
# cp sqoop-env-template.shsqoop-env.sh
# vim sqoop-env.sh
[hadoop@cmanagerconf]$ cp sqoop-env-template.sh sqoop-env.sh
[hadoop@cmanagerconf]$ vim sqoop-env.sh
修改如下:(用到了什么就加什么,)
#Set path to where bin/hadoop isavailable
export HADOOP_COMMON_HOME=/gtabigdata/soft/hadoop/
#Set path to wherehadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/gtabigdata/soft/hadoop/share/hadoop/mapreduce/
#set the path to where bin/hbaseis available
export HBASE_HOME=/gtabigdata/soft/hbase/
#Set the path to where bin/hiveis available
export HIVE_HOME=/gtabigdata/soft/hive/
#Set the path for where zookeperconfig dir is
export ZOOCFGDIR=/gtabigdata/soft/zookeeper
将mysql的conn复制到lib下(oracle 的ojdbc也一样)
mysql-connector-java-5.1.28-bin.jar
ojdbc6.jar
sqljdbc4.jar
[hadoop@cmanager jars]$ sudo mv ojdbc6.jar/gtabigdata/soft/sqoop/lib/
[hadoop@cmanager jars]$ sudo mv sqljdbc4.jar/gtabigdata/soft/sqoop/lib/
[hadoop@cmanager jars]$ sudo mv mysql-connector-java-5.1.28-bin.jar /gtabigdata/soft/sqoop/lib/
Sqoop使用:
sqoopimport --connect jdbc:oracle:thin:@10.224.1.129:1521:gtadb3 --username DCSYS--password DCSYS --m 1 --table DEPT --columns id,name,addr --hbase-create-table --hbase-table DEPT --hbase-row-key id --column-family deptinfo
sqoop import --connect jdbc:mysql://cmanager:3306/mysql --tablebaa --hbase-create-table --hbase-table h_test --column-family name--hbase-row-key id –m 1
[hadoop@cmanager~]$ sqoop import --connect jdbc:oracle:thin:@10.224.1.129:1521:gtadb3 --usernameDCSYS --password DCSYS --m 1 --table DEPT --columns id,name,addr --hbase-create-table --hbase-table DEPT --hbase-row-key id --column-family deptinfo
装完这个用上边语句从oracle或者mysql导入数据到hbase,发现通过sqoop导入数据到hbase-1.0.1依然不通,之前旧的环境也是这样,看上去又要换成0.98的版本,真是要抓狂啊,暂时先不重装hbase,装完是所有环境再研究这个问题
该网站也有人碰到这个问题:可是没人解决 http://www.aboutyun.com/thread-12236-1-1.html
心情低落啊!!
现状Hive,hive需要用到mysql作为元数据库,所以先装mysql
简略
1.yum install mysql* -y
[root@cmanager conf]# yum install mysql* -y
2.常用操作
启动:
# chkconfig mysqld on
# servicemysqld start
[root@cmanager conf]# chkconfig mysqld on
[root@cmanager conf]# service mysqld start
查看状态
service mysqld status
停止
--service mysqld stop
设置密码
--set password for 'root'@'%'=password('123456');
登陆
# mysql –uroot刚安装时,通过这个命令登陆
# mysql-uroot –p123456 /*回车后会要求输入密码*/
授权:
grant all privileges on *.* to 'hadoop'@'%'identified by 'hadoop';
--% ,表示所有登陆方式:别名,ip
flush privileges;
Hive需要注意的问题:(1)元数据(2)远程访问
# tar zxvf ******
# mv
[hadoop@cmanager jars]$ sudo tar -zxvfapache-hive-1.2.1-bin.tar.gz
[hadoop@cmanager jars]$ sudo mvapache-hive-1.2.1-bin /gtabigdata/soft/hive/
[hadoop@cmanager jars]$ cd/gtabigdata/soft/hive/
[hadoop@cmanager conf]$ sudo vim /etc/profile
/*-------------------------------------------------------------------------
#HIVE
export HIVE_HOME=/gtabigdata/soft/hive
exportPATH=.:$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HBASE_HOME/bin:$HIVE_HOME/bin
-------------------------------------------------------------------------*/
hadoop@cmanager conf]$ source /etc/profile
cp hive-env.sh.template hive-env.sh
vim hive-env.sh
[hadoop@cmanager conf]$ cphive-env.sh.template hive-env.sh
[hadoop@cmanager conf]$ vim hive-env.sh
/*-------------------------------------------------------------------------
export HADOOP_HOME=/gtabigdata/soft/hadoop
export HIVE_CONF_DIR=/gtabigdata/soft/hive/conf
export HIVE_AUX_JARS_PATH=/gtabigdata/soft/hive/lib
-------------------------------------------------------------------------*/
创建日志目录
# mkdir -p /gtabigdata/data/hive/log4j
# cp hive-log4j.properties.templatehive-log4j.properties
# vim hive-log4j.properties
第20行
/*-------------------------------------------------------------------------
hive.log.dir=/gtabigdata/data/hive/log4j/
-------------------------------------------------------------------------*/
(1)warehouse目录
$ $HADOOP_HOME/bin/hadoop fs -mkdir /tmp
$ $HADOOP_HOME/bin/hadoop fs -mkdir /hive/hivewarehouse
$ $HADOOP_HOME/bin/hadoop fs -chmod g+w /tmp
$ $HADOOP_HOME/bin/hadoop fs -chmod g+w /hive/hivewarehouse
(2)log目录
# cd /gtabigdata/data/hive
# mkdir logs
# cd logs
# mkdir querylog
# mkdir operation_logs
cd /gtabigdata/data/hive
mkdir temp
mkdir resources
(3)若是用mysql作元数据库:
需要建一个hive用户,密码hive,在mysql中建立库hive
grant all privileges on *.* to 'hive'@'%' identified by 'hive';
(4)创建R调用需要的目录
# mkdir -p/rhive/lib/2.0-0.2
# chmod 755 -R/rhive/lib/2.0-0.2
(5)远程模式需要关注的属性(这部分很重要,其它程序连接hive都是需要到远程模式的)
· hive.metastore.warehouse.dir:指定数据目录,默认值是/user/hive/warehouse;
· hive.exec.scratchdir:指定临时文件目录,默认值是/tmp/hive-${user.name};
· hive.metastore.local:指定是否使用本地元数据,此处改为false,使用远端的MySQL数据库存储元数据;
· javax.jdo.option.ConnectionURL:指定数据库的连接串,此处修改为:
jdbc:mysql://cmanager:3306/hive?createDatabaseIfNotExist=true
· javax.jdo.option.ConnectionDriverName:指定数据库连接驱动,此处修改为com.mysql.jdbc.Driver;
· javax.jdo.option.ConnectionUserName:指定连接MySQL的用户名,根据实际情况设定;
· javax.jdo.option.ConnectionPassword:指定连接MySQL的密码,根据实际情况设定;
· hive.stats.dbclass:指定数据库类型,此处修改为jdbc:mysql;
· hive.stats.jdbcdriver:指定数据库连接驱动,此处指定为com.mysql.jdbc.Driver;
· hive.stats.dbconnectionstring:指定hive临时统计信息的数据库连接方式,此处指定为jdbc:mysql://192.168.10.203:3306/hivestat?useUnicode=true&characterEncoding=utf8$amp;user=hive&password=hive$amp;createDatabaseIfNotExist=true;
· hive.metastore.uris:指定hive元数据访问路径,此处指定为thrift://127.0.0.1:9083;
复制数据库驱动jar包到HIVE_HOME/lib目录
mysql:cp ~/pakages /mysql-connector-java-5.1.28-bin.jar /gtabigdata/soft/hive/lib/
6.修改hive-site.xml 的属性
# cd /gtabigdata/soft/hive/conf
# cp hive-default.xml.template hive-site.xml
# vim hive-site.xml
#mkdir -p /gtabigdata/data/hive/tmp
https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-NewConfigurationParametersforTransactions
详情见文件
其中:hive-site.xml
${system:java.io.tmpdir}----->/gtabigdata/data/hive/tmp/
${system:user.name}------>hive
复制hive-site.xml文件
cp hive-default.xml.template hive-site.xml
vim hive-site.xml
配置数据库连接
<property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://cmanger:3306/hive?createDatabaseIfNotExist=true</value> <description>JDBC connect string for a JDBC metastore</description> </property> <property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> <description>Driver class name for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hive</value> <description>username to use against metastore database</description> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>hive</value> <description>password to use against metastore database</description> </property> |
以上四项分别是:
数据库连接,数据库驱动名,用户名,密码。
javax.jdo.option.ConnectionURL: 元数据连接字符串 javax.jdo.option.ConnectionDriverName: DB连接引擎,Mysql的为com.mysql.jdbc.Driver javax.jdo.option.ConnectionUserName : DB连接用户名 javax.jdo.option.ConnectionPassword: DB连接密码 |
配置hive数据仓库路径
hive-site.xml
<property> <name>hive.metastore.warehouse.dir</name> <value>/hive/hivewarehouse</value> <description>location of default database for the warehouse</description> </property> |
其他目录配置
<property>
<name>hive.querylog.location</name>
<value> /gtabigdata/data/hive/logs/querylog</value>
<description>Location of Hive run time structured logfile</description>
</property>
<property>
<name>hive.exec.local.scratchdir</name>
<value>/gtabigdata/data/hive/temp</value>
<description>Local scratch space forHive jobs</description>
</property>
<property>
<name>hive.downloaded.resources.dir</name>
<value>/gtabigdata/data/hive/resources</value>
<description>Temporary local directory for added resources in theremote file system.</description>
</property>
hive --service metastore
hive --service hiveserver2 10000
因为需要用Rstudio连接,所以server2也需要配置起来
Ps:hive --service hiveserver 10001 发现hive的1.2版本已经放弃了server,直接使用server2了没这个服务了
服务启动后,hive已经可以登录了非常的成功,都有点不敢相信了
[hadoop@cmanager ~]$ hive
Logging initialized using configuration infile:/gtabigdata/soft/hive/conf/hive-log4j.properties
hive> show tables;
-------这里试试sqoop导入hive
sqoop create-hive-table --connect jdbc:oracle:thin:@192.168.103.236:1521:gtadcsys --tableSTK_STOCKINFO --username DCSYS--password DCSYS --hive-tableSTK_STOCKINFO1
sqoopimport --connect jdbc:oracle:thin:@192.168.103.236:1521:gtadcsys --tableSTK_STOCKINFO --username DCSYS--password DCSYS --hive-import --m 1
1. 发现mysql这次安装的版本较高
[hadoop@cmanager ~]$ mysql -V
mysql Ver 14.14 Distrib 5.7.6-m16, for Linux (x86_64) using EditLine wrapper
mysql-connector-java-5.1.28-bin.jar
2. 发现sqoop导入hive是成功的,看来hbase的版本真的不兼容!
R这部分参考陈黄文档即可
不过R还是要规划安装在/gtabigdata/soft下
sudo yum install gcc-gfortran
sudo yum install gcc gcc-c++
sudo yum install readline-devel
sudo yuminstall libXt-devel
[hadoop@cmanager jars]$ sudo tar -zxvfR-3.1.2.tar.gz
[hadoop@cmanager jars]$ sudo mv R-3.1.2/gtabigdata/soft/R
[hadoop@cmanager R]$ ./configure--enable-R-shlib
[hadoop@cmanager R]$ make
[hadoop@cmanager R]$ make install
/etc/profile
export HADOOP_CMD=/gtabigdata/soft/hadoop/bin/hadoop
export HADOOP_STREAMING=/gtabigdata/soft/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.7.1.jar
export JAVA_LIBRARY_PATH/gtabigdata/soft/hadoop/lib/native
R:
install.packages(“rJava”)
> library(rJava)
检查需要安装哪些依赖包:R CMD check “rmr2_3.3.0.tar.gz”
安装如下报,记得要通过hadoop账号进入R环境进行包安装,不然后边安装rmr在hadoop下会有权限不足问题
install.packages("reshape2")
install.packages("Rcpp")
install.packages("iterators")
install.packages("stringr")
install.packages("plyr")
install.packages("itertools")
install.packages("digest")
install.packages("RJSONIO")
install.packages("functional")
install.packages(testthat)
install.packages("bitops")
install.packages("caTools")
install.packages("devTools")
install.packages("RCurl")
install.packages("httr")
其中suggest 的几个无法安装
root:
# R CMD INSTALL 'rhdfs_1.0.8.tar.gz'
# R CMD INSTALL 'rmr2_3.3.0.tar.gz'
--无法安装的几个,处理找办法:
(1)
install.packages(c("Rcpp","rjson","bit64"))
# R CMD INSTALL 'rmr2_3.3.0.tar.gz'
其中有报错的,但是不会包rmr2没有
还是不行
(2)
Imports:digest,functional, testthat, pryr, microbenchmark, lazyeval, dplyr, bitops
试验了一下,有报错,继续装
install.packages("lazyeval")
install.packages("dplyr")
#ln -s /gtabigdata/soft/R/bin /usr/bin
#ln -s /gtabigdata/soft/R/bin/Rscript /usr/bin
试验行不行
装quickcheck,时用的下载的包
目的:主机跑通rmr2,
Datanode,加载 rmr2,rhdfs无问题
普通的R语言程序:
R执行请切换到hadoop账号不然后报错
> small.ints = 1:10
> sapply(small.ints, function(x) x^2)
MapReduce的R语言程序:
library(rhdfs)
hdfs.init()
library(rmr2)
library(rJava)
small.ints=to.dfs(1:10)
mapreduce(input=small.ints,map=function(k,v)cbind(v,v*v))
> small.ints = to.dfs(1:10)
> mapreduce(input = small.ints, map = function(k, v) cbind(v, v^2))
> from.dfs("/tmp/RtmpWnzxl4/file5deb791fcbd5")
每台Datanode也安装R,同上主机步骤
sudo yum install gcc-gfortran
sudo yum install gcc gcc-c++
sudo yum install readline-devel
sudo yuminstall libXt-devel
在装rmr2的时候依赖很多包,上边的包装了还报错的情况下,就要进行没有的包的安装
install.packages(c("RJSONIO","digest","functional","caTools"))
PS:在装R的过程中发现,当仅仅装主机namenode的R和rhdfs,rmr2的时候,执行library(rmr2)进行mapreduce计算是会报很多错的,后边发现R,rhdfs,rmr2也需要安装在DataNode上的,安装中发现装了几台后就在装有的机器上执行,不报错了,并不是所有的装完,这个问题让我有点困扰,难道是只需装一台主节点和一台datanode就可以?可是确实发现装一台Datanode时候还是不行的,当我装了四台datanode的时候就已经发现没问题了,我就困扰了,好在解决了问题,为了保证R每台都能用,我决定九台机器都装好R及rhdfs,rmr2
这中间的错误很类似http://cos.name/2013/03/rhadoop2-rhadoop/这个网页上大家的提问,正是这里给我了指点,我打算装上datanode ,潜意识一直以为R不用每台机器安装,看上去不是这样的,至少R ,rhdfs,rmr2是必须装主节点和几台datanode的,这个问题还需要理论研究一下
[hadoop@cslave7 jars]$ cd rhadoop/
[hadoop@cslave7 rhadoop]$ ll
total 900
-rw-r--r-- 1 hadoop hadoop 251733 Aug 11 13:50rhbase-1.2.1.tar.gz
-rw-r--r-- 1 hadoop hadoop 25105 Aug 11 13:50 rhdfs_1.0.8.tar.gz
-rw-r--r-- 1 hadoop hadoop 567515 Aug 11 13:50rJava_0.9-6.tar.gz
-rw-r--r-- 1 hadoop hadoop 62774 Aug 11 13:50 rmr2_3.3.0.tar.gz
drwxr-xr-x 4 hadoop hadoop 4096 Aug 11 13:50 rmr2.Rcheck
--------下边这个步骤要在hadoop下能成功,必须是在Hadoop账号下进入R环境安装完所有依赖的包就可以,这个地方我开始在root下操作,结果发现在root下安装成功后,进入Hadoop到R环境就又说没有rmr2包了,真的是很折腾人啊!
[hadoop@cslave5 rhadoop]$ R CMD INSTALL "rmr2_3.3.0.tar.gz"
报这个错 就要通过hadoop用户进入到R环境安装上边提到的包:
装完再执行:
成功了!
下边装rhbase和rhive,这两个装在cmanager上就可以了,不用每台机器都装
查看之前陈黄的文档就可以了
Rstudio安装:参考陈黄文档
[hadoop@cslave3 ~]$ ssh cslave1
[hadoop@cslave1 ~]$ sudo chown -Rhadoop:hadoop /gtabigdata
[hadoop@cslave1~]$ ssh cslave2
Lastlogin: Wed Aug 5 16:57:26 2015 fromcslave1
[hadoop@cslave2~]$ sudo chown -R hadoop:hadoop /gtabigdata
[hadoop@cslave2~]$ ssh cslave3
Lastlogin: Wed Aug 5 16:57:28 2015 fromcslave2
[hadoop@cslave3~]$ sudo chown -R hadoop:hadoop /gtabigdata
[hadoop@cslave3~]$ ssh cslave4
Lastlogin: Wed Aug 5 12:09:48 2015 fromcslave3
[hadoop@cslave4~]$ sudo chown -R hadoop:hadoop /gtabigdata
[hadoop@cslave4~]$ ssh cslave5
Lastlogin: Wed Aug 5 10:50:24 2015 fromcslave6
[hadoop@cslave5~]$ sudo chown -R hadoop:hadoop /gtabigdata
[hadoop@cslave5~]$ ssh cslave6
Lastlogin: Wed Aug 5 12:08:43 2015 fromcslave5
[hadoop@cslave6~]$ sudo chown -R hadoop:hadoop /gtabigdata
[hadoop@cslave6~]$ ssh cslave7
Lastlogin: Wed Aug 5 12:09:04 2015 fromcslave6
[hadoop@cslave7~]$ sudo chown -R hadoop:hadoop /gtabigdata
[hadoop@cslave7~]$ ssh cslave8
Lastlogin: Wed Aug 5 12:09:07 2015 fromcslave7
[hadoop@cslave8~]$ sudo chown -R hadoop:hadoop /gtabigdata
[hadoop@cslave8~]$ ssh cslave9
Lastlogin: Wed Aug 5 12:09:00 2015 fromcslave8
[hadoop@cslave9~]$ sudo chown -R hadoop:hadoop /gtabigdata
gtabigdata包含:‘hadoop,hbase,zookeeper及期数据目录
修改zookeeper的myid
[hadoop@cslave1~]$ vim /gtabigdata/data/zookeeper/data/myid
对应myid文件中修改为2
[hadoop@cslave2~]$ vim /gtabigdata/data/zookeeper/data/myid
----3
……….
每台机器环境变量设置
vim/etc/profile 按照主机的/etc/profile进行每台slave的配置
source/etc/profile
启动hadoop
[hadoop@cmanager~]$ start-all.sh
启动zookeeper:
[hadoop@cslave1~]$ zkServer.sh start
JMXenabled by default
Usingconfig: /gtabigdata/soft/zookeeper/bin/../conf/zoo.cfg
Startingzookeeper ... STARTED
--------每台开启完成后 查看每台机器zookeeper的状态
[hadoop@cslave6~]$ zkServer.sh status
JMXenabled by default
Usingconfig: /gtabigdata/soft/zookeeper/bin/../conf/zoo.cfg
Mode:leader
[hadoop@cslave6~]$ ssh cslave5
Lastlogin: Thu Aug 6 14:55:54 2015 fromcslave4
[hadoop@cslave5~]$ zkServer.sh status
JMXenabled by default
Usingconfig: /gtabigdata/soft/zookeeper/bin/../conf/zoo.cfg
Mode:follower
开启HBASE
[hadoop@cmanager ~]$start-hbase.sh
starting master, logging to/gtabigdata/soft/hbase/logs/hbase-hadoop-master-cmanager.out
cslave4: startingregionserver, logging to /gtabigdata/soft/hbase/bin/../logs/hbase-hadoop-regionserver-cslave4.out
cslave2: startingregionserver, logging to/gtabigdata/soft/hbase/bin/../logs/hbase-hadoop-regionserver-cslave2.out
cslave9: startingregionserver, logging to /gtabigdata/soft/hbase/bin/../logs/hbase-hadoop-regionserver-cslave9.out
cslave5: startingregionserver, logging to/gtabigdata/soft/hbase/bin/../logs/hbase-hadoop-regionserver-cslave5.out
cslave6: startingregionserver, logging to /gtabigdata/soft/hbase/bin/../logs/hbase-hadoop-regionserver-cslave6.out
cslave1: startingregionserver, logging to/gtabigdata/soft/hbase/bin/../logs/hbase-hadoop-regionserver-cslave1.out
cslave8: startingregionserver, logging to/gtabigdata/soft/hbase/bin/../logs/hbase-hadoop-regionserver-cslave8.out
cslave7: startingregionserver, logging to/gtabigdata/soft/hbase/bin/../logs/hbase-hadoop-regionserver-cslave7.out
cslave3: startingregionserver, logging to/gtabigdata/soft/hbase/bin/../logs/hbase-hadoop-regionserver-cslave3.out
[hadoop@cmanager ~]$ jps
8532 ResourceManager
8370 SecondaryNameNode
9088 NameNode
12692 HMaster---------Hbase主进程
12978 Jps
[hadoop@cmanager ~]$ sshcslave1
Last login: Thu Aug 6 14:53:47 2015 from cmanager
[hadoop@cslave1 ~]$ jps
10340 Jps
10150 HRegionServer----HbaseRegion进程
8791 NodeManager
8667 DataNode
10055 QuorumPeerMain—zookeeper进程
Hadoop编译方法参见:Native Libraries Guide
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/NativeLibraries.html
hbase-client-1.0.1.1.jar
hive on Spark +HBASE整合
file:///gtabigdata/soft/hive/lib/hive-hbase-handler-1.2.1.jar,file:///usr/local/hive/lib/hbase-it-1.0.1.1.jar,file:///gtabigdata/soft/hive/lib/zookeeper-3.4.6.jar
http://my.oschina.net/repine/blog/285015#OSC_h1_3
Ip位置:10.1.130.10----10.1.130.19
cmanager:10.1.130.10 master,namenode(Rstudio)
cslave1: 10.1.130.11 子节点,datanode
cslave2: 10.1.130.12 子节点,datanode
cslave3: 10.1.130.13 子节点,datanode
…
cslave9: 10.1.130.19 子节点,datanode
root 密码:123456
hadoop 密码:hadoop
hive 密码:hive (若有此用户,密码为此)
Java:jdk-7u71-linux-x64.rpm(hadoop2.7只支持jdk7+版本)
Hadoop:hadoop-2.7.1.tar.gz
软件来源:
wget http://archive.apache.org/dist/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz
ZooKeeper:zookeeper-3.4.6.tar.gz
wget http://archive.apache.org/dist/zookeeper/zookeeper-3.4.6/zookeeper-3.4.6.tar.gz
Hbase:hbase-1.0.1.1-bin.tar.gz
软件来源:
wget http://archive.apache.org/dist/hbase/stable/hbase-1.0.1.1-bin.tar.gz
Hive: apache-hive-1.2.1-bin.tar.gz
wget http://archive.apache.org/dist/hbase/stable/hbase-1.0.1.1-bin.tar.gz
软件来源:
wgethttp://archive.apache.org/dist/hive/stable/apache-hive-1.2.1-bin.tar.gz
Sqoop: sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz
软件来源:
wget http://archive.apache.org/dist/sqoop/1.4.6/sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz
Thrift: thrift-0.9.2.tar.gz
软件来源:
wget http://archive.apache.org/dist/thrift/0.9.2/thrift-0.9.2.tar.gz
Spark: spark-1.4.1.tgz
wget http://archive.apache.org/dist/spark/spark-1.4.1/spark-1.4.1-bin-hadoop2.6.tgz
#rpm -qa|grep rpm
jdk --这里是自定义的各类名称,想查什么写什么:python,java等
首先用root权限登录Linux系统,创建hadoop用户,命令如下:
useradd hadoop -p hadoop |
useradd是创建用户 hadoop,-p 是设置用户hadoop的密码为hadoop,当然也可以设置为别的密码。
每台机器都创建该账号
参考文档\\192.168.203.44\soft\HADOOP资料备份\4.documents\2.服务器安装记录文档.docx
1.网络设置
(1)修改hostname
(2)修改hosts
(3)修改网卡
(4)关闭防火墙,selinux
(5)测试是否通的
2.ssh(centos已经自带)
3.安装JAVA
[root@cslave9 jars]# rpm -ivhjdk-7u71-linux-x64.rpm
Preparing... ########################################### [100%]
1:jdk ###########################################[100%]
Unpacking JAR files...
rt.jar...
jsse.jar...
charsets.jar...
tools.jar...
localedata.jar...
jfxrt.jar...
安装后,文件夹在/usr下面,并且修改全局变量
# vim /etc/profile
/*-------------------------------------------------------------- 粘贴放到文档最后面即可
#JAVA
export JAVA_HOME=/usr/java/jdk1.7.0_13
export JRE_HOME=/usr/java/jdk1.7.0_13/jre
exportCLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JRE_HOME/lib
exportPATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin
export JAVA_LIBRARY_PATH=/gtabigdata/soft/hadoop/lib/native
---------------------------------------------------------------*/
4.配置VNCSERVER
5.机器间无密码访问
备注:在没有安装完成无密码验证的时候 每台机器无法通过hadoop账号实现ssh的时候,先用root账号拷贝authorized_keys到/home/hadoop/.ssh目录下 然后通过chown –Rhaodop:hadoop authorized_keys 授权该文档属于hadoop用户
如:
scp -r authorized_keys [email protected]:/home/hadoop/.ssh无效果
scp -pr .ssh/ 10.1.139.10:/home/hadoop 有效果的
chown -R hadoop:hadoop authorized_keys
在做完每个的复制后,将最终生成的结果再替换到每台机器,这样就可以实现每台机器之间的无密码验证登陆
本次安装所有原始jars包都准备在 /jars下
规划除java安装在默认的/usr/java目录下外,其它大数据平台软件均安装在
/ gtabigdata/soft下
Hadoop的hdfs文件目录放在/gtabigdata/hdfs/下,具体文件夹目录看各个软件安装的配置
软件过程使用的临时文件目录统一规划在/gtabigdata/tmp下
其它软件过程的数据放在 /gtabigdata/data/软件/目录下
[hadoop@cmanager jars]$ sudo chown -Rhadoop:hadoop /usr/gtabigdata
/gtabigdata
|
soft/
|
data/
|
hadoop/
|
hbase/
|
hive/
|
....
|
zookeeper/
|
tmp/
|
…
|
dfs/ |
name/me |
data/ |
dfs/ |
name/me |
data/ |
data1/
|
安装策略:主机(cmanager)解压安装完相关软件后拷贝到数据节点(cslave1-cslave9)
[hadoop@cmanager jars]$ sudo tar -zxvf hadoop-2.7.1.tar.gz [hadoop@cmanager jars]$ ll total 374444 drwxr-xr-x 9 10021 10021 4096 Jun 29 14:15 hadoop-2.7.1 [hadoop@cmanager jars]$ sudo mv hadoop-2.7.1 / gtabigdata/soft/hadoop [hadoop@cmanager gtabigdata]$ ll /gtabigdata/soft total 4 drwxr-xr-x 9 10021 10021 4096 Jun 29 14:15 hadoop |
[hadoop@cmanager root]$ sudo vim /etc/profile /*-------------------------------------------------------------------------粘贴放到文档最后面即可
#hadoop
export HADOOP_HOME=/gtabigdata/soft/hadoop
export PATH=.:$PATH:$HADOOP_HOME/bin
export HADOOP_PREFIX=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_PREFIX}
export HADOOP_HDFS_HOME=${HADOOP_PREFIX}
export HADOOP_MAPRED_HOME=${HADOOP_PREFIX}
export HADOOP_YARN_HOME=${HADOOP_PREFIX}
exportHADOOP_CONF_DIR="$HADOOP_HOME/etc/hadoop/"
export YARN_CONF_DIR=${HADOOP_CONF_DIR}
exportPATH=$PATH:$HADOOP_PREFIX/bin:$HADOOP_PREFIX/sbin
exportHADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
-------------------------------------------------------------------------*/
[hadoop@cmanager root]$ source /etc/profle
[hadoop@cmanager hadoop]$ ll /gtabigdata/soft/hadoop/etc/hadoop/
-rw-r--r-- 1 10021 10021 4224 Jun 29 14:15 hadoop-env.sh
-rw-r--r-- 1 10021 10021 4567 Jun 29 14:15 yarn-env.sh
(1)hadoop env
[hadoop@cmanager hadoop]$ vim /gtabigdata/soft/hadoop/etc/hadoop/hadoop-env.sh
修改25行:
/*-------------------------------------------------------------------------
export JAVA_HOME=/usr/java/jdk1.7.0_71
-------------------------------------------------------------------------*/
(2)yarn env
[hadoop@cmanager hadoop]$ vim /gtabigdata/soft/hadoop/etc/hadoop/yarn-env.sh
修改23行:
/*-------------------------------------------------------------------------
export JAVA_HOME=/usr/java/jdk1.7.0_71
-------------------------------------------------------------------------*/
[hadoop@cmanager hadoop]$ ll /gtabigdata/soft/hadoop/etc/hadoop/
-rw-r--r-- 1 10021 10021 774 Jun 29 14:15 core-site.xml
-rw-r--r-- 1 10021 10021 775 Jun 29 14:15 hdfs-site.xml
-rw-r--r-- 1 10021 10021 758 Jun 29 14:15 mapred-site.xml.template
-rw-r--r-- 1 10021 10021 10 Jun 29 14:15 slaves
-rw-r--r-- 1 10021 10021 690 Jun 29 14:15 yarn-site.xml
无mapered-site.xml文档通过以下步骤生成
[hadoop@cmanager hadoop]$ cd /gtabigdata/soft/hadoop/etc/hadoop [hadoop@cmanager hadoop]$ cp mapred-site.xml.template mapred-site.xml
|
2.3 配置${HADOOP_HOME}/etc/hadoop/core-site.xml
[hadoop@cmanager hadoop]$ sudo mkdir -p /gtabigdata/data/tmp/hadoop
[hadoop@cmanager hadoop]$ sudo chown -Rhadoop:hadoop /gtabigdata
[hadoop@cmanager hadoop]$ vim /gtabigdata/soft/hadoop/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://cmanager:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value> /gtabigdata/data/tmp/hadoop</value>
</property>
<property>
<name>hadoop.native.lib</name>
<value>true</value>
<description>Should native hadoop libraries, if present, beused.</description>
</property>
</configuration>
core-site.xml的各项默认配置可参考:http://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-common/core-default.xml
对于一个新集群来说,唯一必须修改的项是:fs.defaultFS,该项指明了文件系统的访问入口,实际上是告知所有的datanode它们的namenode是哪一个从而建立起namenode与各datanode之间的通信。
除此之外,按照前文约定,我们把hadoop.tmp.dir设置为/gtabigdata/data/tmp/hadoop。观察core-default.xml我们可以发现,在所有涉及目录的配置项上,默认都是在${hadoop.tmp.dir}之下建立子文件夹,所以本次安装我们只简单地将hadoop.tmp.dir的原默认值/tmp/hadoop-${user.name}改为/gtabigdata/data/tmp/hadoop,将所有hadoop生成和使用的文件集中/gtabigdata/data/tmp/hadoop下,避免与/tmp目录下若干其他文件混杂在一起。
2.4 配置${HADOOP_HOME}/etc/hadoop/hdfs-site.xml
[hadoop@cmanager hadoop]$ sudo mkdir -p /gtabigdata/data/dfs/name
[hadoop@cmanager hadoop]$ sudo mkdir -p /gtabigdata/data/dfs/data
[hadoop@cmanager hadoop]$sudo chown -Rhadoop:hadoop /gtabigdata/data
[hadoop@cmanager hadoop]$ sudo mkdir -p /gtabigdata/data1/dfs/name
[hadoop@cmanager hadoop]$ sudo mkdir -p /gtabigdata/data1/dfs/data
[hadoop@cmanager hadoop]$sudo chown -R hadoop:hadoop /gtabigdata/data1
[hadoop@cmanager hadoop]$ vim /gtabigdata/soft/hadoop/etc/hadoop/hdfs-site.xml
-----------------------------------------------------------------------------------------------------------------------------
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///gtabigdata/data/dfs/name,file:///gtabigdata/data1/dfs/name</value>
<final>true</final>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///gtabigdata/data/dfs/data,file:///gtabigdata/data1/dfs/data</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.permissions.superusergroup</name>
<value>hadoop</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration> ---------------------------------------------------------------------------------------------------------------------------------
hdfs-site.xml的各项默认配置可参考:http://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml对于一个新集群来说这个文件没没有必须修改的项,但是有几个重要的配置项是可能需要修改的,主要是如下几项:
dfs.namenode.secondary.http-address//指定secondary namenode结点,若不指定,在使用start-dfs.sh启动时,当前节点将自动成为secondary namenode
dfs.replication //每一个block的复制份数,默认是3,如果集群datanode结点数量小于3应将该值设置小于或等于datanode的结点数量
dfs.namenode.name.dir//存放namenode相关数据的文件夹
dfs.datanode.data.dir//存放datanade相关数据的文件夹
dfs.namenode.checkpoint.dir//存放secondary namenode相关数据的文件夹
对于后三项来说,它们的默认值也都是在${hadoop.tmp.dir}之下的子文件夹,你可以根据集群的实际情况修改这三个值。比如:把它们改到一个挂载到NFS上的文件夹。
2.5 配置${HADOOP_HOME}/etc/hadoop/mapred-site.xml
在${HADOOP_HOME}/etc/hadoop下拷贝一份mapred-site.xml.template命名为mapred-site.xml,添加如下内容:
[hadoop@cmanager hadoop]$ vim /gtabigdata/soft/hadoop/etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
mapred-site.xml的各项默认配置可参考:hadoop.apache.org/docs/r2.2.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml该文件唯一一个必须要修改的项是mapreduce.framework.name,该项告诉hadoop使用何种框架执行map-reduce任务。
另外该文件也有几个设及文件存放位置的配置项:
mapreduce.cluster.local.dir
mapreduce.jobtracker.system.dir
mapreduce.jobtracker.staging.root.dir
mapreduce.cluster.temp.dir
如有必要也应该做适当修改。
2.6 配置${HADOOP_HOME}/etc/hadoop/yarn-site.xml
[hadoop@cmanager gtabigdata]$ vim /gtabigdata/soft/hadoop/etc/hadoop/yarn-site.xml
-----------------------------------------------------------------------------------------------------------
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>cmanager</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
----------------------------------------------------------------------------------------------------------------
yarn-site.xml的各项默认配置可参考:http://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/hadoop-yarn-common/yarn-default.xml该文件必须修改的项有两个,基中yarn.resourcemanager.hostname项与core-site.xml的fs.defaultFS项类似,该项为所有的nodemanager指明了resourcemanager,建立起resourcemanager与各nodemanager之间的通信。
另外该文件也有几个设及文件存放位置的配置项:
yarn.nodemanager.local-dirs
yarn.resourcemanager.fs.state-store.uri
如有必要也应该做适当修改。
通常情况我们使用start-dfs.sh脚本来启动整个集群,查看该脚本可以知道,该脚本会基于配置文件在目标结点上启动namenode,secondary namenode, 和slave(datanode)结点,slave(datanode)的列表是在${HADOOP_HOME}/etc/hadoop/slaves文件中配置的,一个结点一行。所以我们需要在此文件中添加所有的datanode机器名或IP
[hadoop@cmanagerhadoop]$ vim /gtabigdata/soft/hadoop/etc/hadoop/slaves
/*-------------------------------------------------------------------------
cslave1
cslave2
cslave3
-------------------------------------------------------------------------*/
注:以上有多少个Datanode就有写多少个,本次安装包含九个cslave1~cslave9
2.8 将上述配置的hadoop安装程序重新打包复制到所有结点的对应位置再展开,同时记得修改每个结点的/etc/profile,并且要建立相关的文件路径。
---每个datanode建立一下文件夹
sudo mkdir / gtabigdata
sudo mkdir -p /gtabigdata/data/tmp/hadoop
sudo mkdir -p /gtabigdata/data/dfs/name
sudo mkdir -p/gtabigdata/data/dfs/data
sudo mkdir -p /gtabigdata/data1/dfs/name
sudo mkdir -p /gtabigdata/data1/dfs/data
sudo chown -R hadoop:hadoop /gtabigdata
通过以下命令将hadoop拷贝到各个datanode上
scp -r /gtabigdata/soft/hadoop hadoop@cslave1:/gtabigdata/soft/
修改/etc/profile (可考虑复制但是要按照实际情况以免影响其它应用)
source /etc/profile
备注:关于账户设置
在正式的部署环境上,我们推荐设置专有账户来运行hadoop,例如:创建用户hadoop用户来启动namenode和datanode,创建yarn用户来启动resourcemanager和nodemanager.至于这些用户的用户组可以是同名用户组(这与CDH的做法一致的),也可以使用一个统一的用户组,如hadoop之类,这取决于你自己的喜好,但本质上这些用户彼此之间并没有必然的联系,让它们从属于同一个用户组并没有什么特殊的意味,所以我喜欢让每一个用户独立于自己的用户组。
需要特别提醒的是:如果配置专用账户启动hadoop相关服务需要将hadoop所使用的各种文件夹(例如:dfs.namenode.name.dir等等)的owner和owner group改为专有用户,否则会导致专有用户因为没有足够的权限访问这些文件夹而导致服务启动失败。
3.2 格式化集群
初启动前,需要首先格式化集群,执行命令:
[hadoop@cmanager usr]$ hadoop namenode–format
3.3 启动hdfs
执行:
start-dfs.sh
该命令可以任意结点上执行。不过需要注意的是如果配置文件中没有指明secondary namenode(即在hdfs-site.xml中没有配置dfs.namenode.secondary.http-address),那么在哪个结点上执行该命令,该点将自动成为secondary namenode.
以下单独启动某项服务的命令:
启动namenode
hadoop-daemon.sh startnamenode
启动secondarynamenode
hadoop-daemon.sh start secondarynamenode
启动datanode
hadoop-daemon.sh start datanode
启动之后,访问:
http://cmanger:50070
检查HDFS各结点情况,如都能访问表示HDFS已无问题,如无法访问或缺少节点,可分析log的中的信息找出问题原因。
3.4 启动yarn
执行:
start-yarn.sh
该命令可以任意结点上执行。其slaves结点与hdfs一样,读取的也是${HADOOP_HOME}/etc/hadoop/slaves文件。
以下单独启动某项服务的命令:
启动resourcemanager
yarn-daemon.sh start resourcemanager
启动nodemanager
yarn-daemon.sh startnodemanager
启动之后,访问:
http://cmanager:8088
检查YARN各结点情况,如都能访问表示YARN无问题,如无法访问或缺少节点,可分析log的中的信息找出问题原因
Start-all.sh该命令会包含start-dfs.sh,start-yarn.sh等等
[hadoop@cmanager jars]$ sudo tar -zxvfhbase-1.0.1.1-bin.tar.gz
[hadoop@cmanager jars]$ sudo mvhbase-1.0.1.1 /gtabigdata/soft/hbase
[hadoop@cmanager gtabigdata]$ cd /gtabigdata/soft/
[hadoop@cmanager gtabigdata]$ ll
total 8
drwxr-xr-x 10 hadoop hadoop 4096 Aug 4 17:45 hadoop
drwxr-xr-x 7 root root 4096 Aug 5 09:27 hbase
[hadoop@cmanager gtabigdata]$ sudo chown –Rhadoop:hadoop hbase
[hadoop@cmanager gtabigdata]$ sudo vim/etc/profile
/*-------------------------------------------------------------------------
#HBASE
export HBASE_HOME=/gtabigdata/soft/hbase
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin
-------------------------------------------------------------------------*/
[hadoop@cmanager gtabigdata]$ source /etc/profile
(1) vim /gtabigdata/soft/hbase/conf/hbase-env.sh
[hadoop@cmanager gtabigdata]$ vim /gtabigdata/soft/hbase/conf/hbase-env.sh
粘贴在文档最后面
/*-------------------------------------------------------------------------
export JAVA_HOME=/usr/java/jdk1.7.0_71
export HBASE_MANAGES_ZK=false -- 设置zookeeper由hbase管理,禁用自带的。
-------------------------------------------------------------------------*/
(2) vim /gtabigdata/soft/hbase/conf/hbase-site.xml
[hadoop@cmanager gtabigdata]$ vim /gtabigdata/soft/hbase/conf/hbase-site.xml
/*-------------------------------------------------------------------------
<property>
<name>hbase.rootdir</name>
<value>hdfs://master:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.master</name>
<value> master:60000</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>master,slave,slave1</value> #逗号分开,多个,必须单数
</property>
-------------------------------------------------------------------------*/
zookeeper有这样一个特性:集群中只要有过半的机器是正常工作的,那么整个集群对外就是可用的。也就是说如果有2个zookeeper,那么只要有1个死了zookeeper就不能用了,因为1没有过半,所以2个zookeeper的死亡容忍度为0;同理,要是有3个zookeeper,一个死了,还剩下2个正常的,过半了,所以3个zookeeper的容忍度为1;同理你多列举几个:2->0;3->1;4->1;5->2;6->2会发现一个规律,2n和2n-1的容忍度是一样的,都是n-1,所以为了更加高效,何必增加那一个不必要的zookeeper呢?
这里我们有九台机器,配置9台集群的zookeeper,实际中按照集群中zookeeper配置调整
添加从节点主机名
Regionservers配置同Hadoop配置下slaves,有多少数据节点,每行添加一条数据节点机器的名称
# [hadoop@cmanager gtabigdata]$ vim /gtabigdata/soft/hbase/conf/regionservers
/*-------------------------------------------------------------------------
cslave1
cslave2
cslave3
cslave4
cslave5
cslave6
cslave7
cslave8
cslave9
-------------------------------------------------------------------------*/
[hadoop@cmanager jars]$ sudo tar -zxvfzookeeper-3.4.6.tar.gz
[hadoop@cmanagerjars]$ sudo mv zookeeper-3.4.6 /gtabigdata/soft/zookeeper
[hadoop@cmanagerjars]$ cd /gtabigdata/soft/
[hadoop@cmanagergtabigdata]$ sudo chown -R hadoop:hadoop zookeeper/
[hadoop@cmanagergtabigdata]$ sudo vim /etc/profile
/*-------------------------------------------------------------------------
#zookeeper
export ZOOKEEPER_HOME=/gtabigdata/soft/zookeeper
exportPATH=$ZOOKEEPER_HOME/bin:$HBASE_HOME/bin:$HADOOP_HOME/sbin:$JAVA_HOME/bin:$PATH
-------------------------------------------------------------------------*/
[hadoop@cmanager gtabigdata]$ source/etc/profile
[hadoop@cmanagergtabigdata]$ mkdir -p /gtabigdata/data/zookeeper/data/
[hadoop@cmanagergtabigdata]$ mkdir -p /gtabigdata/data/zookeeper/log/
[hadoop@cmanagergtabigdata]$cp /gtabigdata/soft/zookeeper/conf/zoo_sample.cfg /gtabigdata/soft/zookeeper/conf/zoo.cfg
[hadoop@cmanagergtabigdata $ vim /gtabigdata/soft/zookeeper/conf/zoo.cfg
/*-------------------------------------------------------------------------
i
server.4=cslave3:2888:3888
server.5=cslave4:2888:3888
server.6=cslave5:2888:3888
server.7=cslave6:2888:3888
server.8=cslave7:2888:3888
server.9=cslave8:2888:3888
server.10=cslave9:2888:3888
#2888为Leader服务端口,3888为选举时所用的端口
这里我们有10台机器
-------------------------------------------------------------------------*/
在{dataDir }即=/gtabigdata/data/zookeeper/data/目录下 创建myid文件 ,编辑“myid”文件,并在对应的IP的机器上输入对应的编号。如在cmanager上,“myid”文件内容就是1(多个主机的myid内容都不一样,和zoo.cfg配置文件(server.*)一致即可)。
# vim/gtabigdata/data/zookeeper/data/myid
这里根据zoo.cfg的server.*)编号来定
[hadoop@cmanager conf]$ vim/gtabigdata/data/zookeeper/data/myid
/*-------------------------------------------------------------------------
1
-------------------------------------------------------------------------*/
配置说明:
tickTime:这个时间是作为 Zookeeper 服务器之间或客户端与服务器之间维持心跳的时间间隔,也就是每个 tickTime 时间就会发送一个心跳。
dataDir:顾名思义就是 Zookeeper 保存数据的目录,默认情况下,Zookeeper 将写数据的日志文件也保存在这个目录里。
dataLogDir:写数据的日志文件
clientPort:这个端口就是客户端连接 Zookeeper 服务器的端口,Zookeeper 会监听这个端口,接受客户端的访问请求。
Zookeeper启动|关闭|查看状态命令
zkSever.sh start|stop|stauts
以上软件均在主机(cmanger)和slave集群上均要安装,尤其zookeeper要在每台机器调整myid内容,hadoop+hbase主从机安装方法相同,可直接拷贝主机到slave,启动hadoop+hbase只需在主机启动即可,zookeeper要在每台机器启动,可考虑在主机写批量脚本一次启动。
[hadoop@cmanager jars]$ sudo tar -zxvf thrift-0.9.2.tar.gz
[hadoop@cmanager jars]$ sudo mv thrift-0.9.2 /gtabigdata/soft/thrift
[hadoop@cmanager jars]$ sudo chown -R hadoop:hadoop/gtabigdata/soft/thrift
# cd /gtabigdata/soft/thrift
# ./configure
# make
# make install
[hadoop@cmanager thrift]$ cd /gtabigdata/soft/thrift
[hadoop@cmanager thrift]$ ./configure
这里报个错:configure: error: Bison version 2.5 or highermust be installed on the system!
所以要升级Bison如下,如果没错可不用处理
wget http://ftp.gnu.org/gnu/bison/bison-2.5.1.tar.gz
tar xvf bison-2.5.1.tar.gz
cd bison-2.5.1
./configure --prefix=/usr
make
sudo make install
cd ..
-------make报错的问题:
[hadoop@cmanager thrift]$ make
[hadoop@cmanager thrift]$ make install
vim /etc/profile --------这个是后边安装rhbase,rhive需要的
export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/usr/local/lib/pkgconfig/
--------------------------------------------------------------#
export PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/usr/local/lib/pkgconfig/
# pkg-config -cflags thrift
返回:
-I/usr/local/include/thrift
为正确
启动:
hbase-daemon.sh start thrift
复制lib
------thrift启动:
[hadoop@cmanagerbin]$ hbase-daemon.sh start thrift
startingthrift, logging to /gtabigdata/soft/hbase/logs/hbase-hadoop-thrift-cmanager.out
[hadoop@cmanagerbin]$ jps
5037 Jps
8532ResourceManager
8370SecondaryNameNode
9088 NameNode
12692 HMaster
4957ThriftServer---thrif对应的进程
[root@cmanagerjars]# tar -zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz
[hadoop@cmanagerjars]$ sudo mv sqoop-1.4.6.bin__hadoop-2.0.4-alpha /gtabigdata/soft/sqoop
[hadoop@cmanagerjars]$ cd /gtabigdata/soft/
[hadoop@cmanagersoft]$ ll
…
drwxr-xr-x 8 root root 4096 Apr 27 14:19 sqoop
[hadoop@cmanager soft]$ sudo chown hadoop:hadoop -R sqoop
修改/etc/profile
#sqoop
export SQOOP_HOME=/gtabigdata/soft/sqoop
exportPATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$HBASE_HOME/bin:$SQOOP_HOME/bin
sqoop-site.xml修改
# cp sqoop-env-template.shsqoop-env.sh
# vim sqoop-env.sh
[hadoop@cmanagerconf]$ cp sqoop-env-template.sh sqoop-env.sh
[hadoop@cmanagerconf]$ vim sqoop-env.sh
修改如下:(用到了什么就加什么,)
#Set path to where bin/hadoop isavailable
export HADOOP_COMMON_HOME=/gtabigdata/soft/hadoop/
#Set path to wherehadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/gtabigdata/soft/hadoop/share/hadoop/mapreduce/
#set the path to where bin/hbaseis available
export HBASE_HOME=/gtabigdata/soft/hbase/
#Set the path to where bin/hiveis available
export HIVE_HOME=/gtabigdata/soft/hive/
#Set the path for where zookeperconfig dir is
export ZOOCFGDIR=/gtabigdata/soft/zookeeper
将mysql的conn复制到lib下(oracle 的ojdbc也一样)
mysql-connector-java-5.1.28-bin.jar
ojdbc6.jar
sqljdbc4.jar
[hadoop@cmanager jars]$ sudo mv ojdbc6.jar/gtabigdata/soft/sqoop/lib/
[hadoop@cmanager jars]$ sudo mv sqljdbc4.jar/gtabigdata/soft/sqoop/lib/
[hadoop@cmanager jars]$ sudo mv mysql-connector-java-5.1.28-bin.jar /gtabigdata/soft/sqoop/lib/
Sqoop使用:
sqoopimport --connect jdbc:oracle:thin:@10.224.1.129:1521:gtadb3 --username DCSYS--password DCSYS --m 1 --table DEPT --columns id,name,addr --hbase-create-table --hbase-table DEPT --hbase-row-key id --column-family deptinfo
sqoop import --connect jdbc:mysql://cmanager:3306/mysql --tablebaa --hbase-create-table --hbase-table h_test --column-family name--hbase-row-key id –m 1
[hadoop@cmanager~]$ sqoop import --connect jdbc:oracle:thin:@10.224.1.129:1521:gtadb3 --usernameDCSYS --password DCSYS --m 1 --table DEPT --columns id,name,addr --hbase-create-table --hbase-table DEPT --hbase-row-key id --column-family deptinfo
装完这个用上边语句从oracle或者mysql导入数据到hbase,发现通过sqoop导入数据到hbase-1.0.1依然不通,之前旧的环境也是这样,看上去又要换成0.98的版本,真是要抓狂啊,暂时先不重装hbase,装完是所有环境再研究这个问题
该网站也有人碰到这个问题:可是没人解决 http://www.aboutyun.com/thread-12236-1-1.html
心情低落啊!!
现状Hive,hive需要用到mysql作为元数据库,所以先装mysql
简略
1.yum install mysql* -y
[root@cmanager conf]# yum install mysql* -y
2.常用操作
启动:
# chkconfig mysqld on
# servicemysqld start
[root@cmanager conf]# chkconfig mysqld on
[root@cmanager conf]# service mysqld start
查看状态
service mysqld status
停止
--service mysqld stop
设置密码
--set password for 'root'@'%'=password('123456');
登陆
# mysql –uroot刚安装时,通过这个命令登陆
# mysql-uroot –p123456 /*回车后会要求输入密码*/
授权:
grant all privileges on *.* to 'hadoop'@'%'identified by 'hadoop';
--% ,表示所有登陆方式:别名,ip
flush privileges;
Hive需要注意的问题:(1)元数据(2)远程访问
# tar zxvf ******
# mv
[hadoop@cmanager jars]$ sudo tar -zxvfapache-hive-1.2.1-bin.tar.gz
[hadoop@cmanager jars]$ sudo mvapache-hive-1.2.1-bin /gtabigdata/soft/hive/
[hadoop@cmanager jars]$ cd/gtabigdata/soft/hive/
[hadoop@cmanager conf]$ sudo vim /etc/profile
/*-------------------------------------------------------------------------
#HIVE
export HIVE_HOME=/gtabigdata/soft/hive
exportPATH=.:$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HBASE_HOME/bin:$HIVE_HOME/bin
-------------------------------------------------------------------------*/
hadoop@cmanager conf]$ source /etc/profile
cp hive-env.sh.template hive-env.sh
vim hive-env.sh
[hadoop@cmanager conf]$ cphive-env.sh.template hive-env.sh
[hadoop@cmanager conf]$ vim hive-env.sh
/*-------------------------------------------------------------------------
export HADOOP_HOME=/gtabigdata/soft/hadoop
export HIVE_CONF_DIR=/gtabigdata/soft/hive/conf
export HIVE_AUX_JARS_PATH=/gtabigdata/soft/hive/lib
-------------------------------------------------------------------------*/
创建日志目录
# mkdir -p /gtabigdata/data/hive/log4j
# cp hive-log4j.properties.templatehive-log4j.properties
# vim hive-log4j.properties
第20行
/*-------------------------------------------------------------------------
hive.log.dir=/gtabigdata/data/hive/log4j/
-------------------------------------------------------------------------*/
(1)warehouse目录
$ $HADOOP_HOME/bin/hadoop fs -mkdir /tmp
$ $HADOOP_HOME/bin/hadoop fs -mkdir /hive/hivewarehouse
$ $HADOOP_HOME/bin/hadoop fs -chmod g+w /tmp
$ $HADOOP_HOME/bin/hadoop fs -chmod g+w /hive/hivewarehouse
(2)log目录
# cd /gtabigdata/data/hive
# mkdir logs
# cd logs
# mkdir querylog
# mkdir operation_logs
cd /gtabigdata/data/hive
mkdir temp
mkdir resources
(3)若是用mysql作元数据库:
需要建一个hive用户,密码hive,在mysql中建立库hive
grant all privileges on *.* to 'hive'@'%' identified by 'hive';
(4)创建R调用需要的目录
# mkdir -p/rhive/lib/2.0-0.2
# chmod 755 -R/rhive/lib/2.0-0.2
(5)远程模式需要关注的属性(这部分很重要,其它程序连接hive都是需要到远程模式的)
· hive.metastore.warehouse.dir:指定数据目录,默认值是/user/hive/warehouse;
· hive.exec.scratchdir:指定临时文件目录,默认值是/tmp/hive-${user.name};
· hive.metastore.local:指定是否使用本地元数据,此处改为false,使用远端的MySQL数据库存储元数据;
· javax.jdo.option.ConnectionURL:指定数据库的连接串,此处修改为:
jdbc:mysql://cmanager:3306/hive?createDatabaseIfNotExist=true
· javax.jdo.option.ConnectionDriverName:指定数据库连接驱动,此处修改为com.mysql.jdbc.Driver;
· javax.jdo.option.ConnectionUserName:指定连接MySQL的用户名,根据实际情况设定;
· javax.jdo.option.ConnectionPassword:指定连接MySQL的密码,根据实际情况设定;
· hive.stats.dbclass:指定数据库类型,此处修改为jdbc:mysql;
· hive.stats.jdbcdriver:指定数据库连接驱动,此处指定为com.mysql.jdbc.Driver;
· hive.stats.dbconnectionstring:指定hive临时统计信息的数据库连接方式,此处指定为jdbc:mysql://192.168.10.203:3306/hivestat?useUnicode=true&characterEncoding=utf8$amp;user=hive&password=hive$amp;createDatabaseIfNotExist=true;
· hive.metastore.uris:指定hive元数据访问路径,此处指定为thrift://127.0.0.1:9083;
复制数据库驱动jar包到HIVE_HOME/lib目录
mysql:cp ~/pakages /mysql-connector-java-5.1.28-bin.jar /gtabigdata/soft/hive/lib/
6.修改hive-site.xml 的属性
# cd /gtabigdata/soft/hive/conf
# cp hive-default.xml.template hive-site.xml
# vim hive-site.xml
#mkdir -p /gtabigdata/data/hive/tmp
https://cwiki.apache.org/confluence/display/Hive/Hive+Transactions#HiveTransactions-NewConfigurationParametersforTransactions
详情见文件
其中:hive-site.xml
${system:java.io.tmpdir}----->/gtabigdata/data/hive/tmp/
${system:user.name}------>hive
复制hive-site.xml文件
cp hive-default.xml.template hive-site.xml
vim hive-site.xml
配置数据库连接
<property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://cmanger:3306/hive?createDatabaseIfNotExist=true</value> <description>JDBC connect string for a JDBC metastore</description> </property> <property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> <description>Driver class name for a JDBC metastore</description> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>hive</value> <description>username to use against metastore database</description> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>hive</value> <description>password to use against metastore database</description> </property> |
以上四项分别是:
数据库连接,数据库驱动名,用户名,密码。
javax.jdo.option.ConnectionURL: 元数据连接字符串 javax.jdo.option.ConnectionDriverName: DB连接引擎,Mysql的为com.mysql.jdbc.Driver javax.jdo.option.ConnectionUserName : DB连接用户名 javax.jdo.option.ConnectionPassword: DB连接密码 |
配置hive数据仓库路径
hive-site.xml
<property> <name>hive.metastore.warehouse.dir</name> <value>/hive/hivewarehouse</value> <description>location of default database for the warehouse</description> </property> |
其他目录配置
<property>
<name>hive.querylog.location</name>
<value> /gtabigdata/data/hive/logs/querylog</value>
<description>Location of Hive run time structured logfile</description>
</property>
<property>
<name>hive.exec.local.scratchdir</name>
<value>/gtabigdata/data/hive/temp</value>
<description>Local scratch space forHive jobs</description>
</property>
<property>
<name>hive.downloaded.resources.dir</name>
<value>/gtabigdata/data/hive/resources</value>
<description>Temporary local directory for added resources in theremote file system.</description>
</property>
hive --service metastore
hive --service hiveserver2 10000
因为需要用Rstudio连接,所以server2也需要配置起来
Ps:hive --service hiveserver 10001 发现hive的1.2版本已经放弃了server,直接使用server2了没这个服务了
服务启动后,hive已经可以登录了非常的成功,都有点不敢相信了
[hadoop@cmanager ~]$ hive
Logging initialized using configuration infile:/gtabigdata/soft/hive/conf/hive-log4j.properties
hive> show tables;
-------这里试试sqoop导入hive
sqoop create-hive-table --connect jdbc:oracle:thin:@192.168.103.236:1521:gtadcsys --tableSTK_STOCKINFO --username DCSYS--password DCSYS --hive-tableSTK_STOCKINFO1
sqoopimport --connect jdbc:oracle:thin:@192.168.103.236:1521:gtadcsys --tableSTK_STOCKINFO --username DCSYS--password DCSYS --hive-import --m 1
1. 发现mysql这次安装的版本较高
[hadoop@cmanager ~]$ mysql -V
mysql Ver 14.14 Distrib 5.7.6-m16, for Linux (x86_64) using EditLine wrapper
mysql-connector-java-5.1.28-bin.jar
2. 发现sqoop导入hive是成功的,看来hbase的版本真的不兼容!
R这部分参考陈黄文档即可
不过R还是要规划安装在/gtabigdata/soft下
sudo yum install gcc-gfortran
sudo yum install gcc gcc-c++
sudo yum install readline-devel
sudo yuminstall libXt-devel
[hadoop@cmanager jars]$ sudo tar -zxvfR-3.1.2.tar.gz
[hadoop@cmanager jars]$ sudo mv R-3.1.2/gtabigdata/soft/R
[hadoop@cmanager R]$ ./configure--enable-R-shlib
[hadoop@cmanager R]$ make
[hadoop@cmanager R]$ make install
/etc/profile
export HADOOP_CMD=/gtabigdata/soft/hadoop/bin/hadoop
export HADOOP_STREAMING=/gtabigdata/soft/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.7.1.jar
export JAVA_LIBRARY_PATH/gtabigdata/soft/hadoop/lib/native
R:
install.packages(“rJava”)
> library(rJava)
检查需要安装哪些依赖包:R CMD check “rmr2_3.3.0.tar.gz”
安装如下报,记得要通过hadoop账号进入R环境进行包安装,不然后边安装rmr在hadoop下会有权限不足问题
install.packages("reshape2")
install.packages("Rcpp")
install.packages("iterators")
install.packages("stringr")
install.packages("plyr")
install.packages("itertools")
install.packages("digest")
install.packages("RJSONIO")
install.packages("functional")
install.packages(testthat)
install.packages("bitops")
install.packages("caTools")
install.packages("devTools")
install.packages("RCurl")
install.packages("httr")
其中suggest 的几个无法安装
root:
# R CMD INSTALL 'rhdfs_1.0.8.tar.gz'
# R CMD INSTALL 'rmr2_3.3.0.tar.gz'
--无法安装的几个,处理找办法:
(1)
install.packages(c("Rcpp","rjson","bit64"))
# R CMD INSTALL 'rmr2_3.3.0.tar.gz'
其中有报错的,但是不会包rmr2没有
还是不行
(2)
Imports:digest,functional, testthat, pryr, microbenchmark, lazyeval, dplyr, bitops
试验了一下,有报错,继续装
install.packages("lazyeval")
install.packages("dplyr")
#ln -s /gtabigdata/soft/R/bin /usr/bin
#ln -s /gtabigdata/soft/R/bin/Rscript /usr/bin
试验行不行
装quickcheck,时用的下载的包
目的:主机跑通rmr2,
Datanode,加载 rmr2,rhdfs无问题
普通的R语言程序:
R执行请切换到hadoop账号不然后报错
> small.ints = 1:10
> sapply(small.ints, function(x) x^2)
MapReduce的R语言程序:
library(rhdfs)
hdfs.init()
library(rmr2)
library(rJava)
small.ints=to.dfs(1:10)
mapreduce(input=small.ints,map=function(k,v)cbind(v,v*v))
> small.ints = to.dfs(1:10)
> mapreduce(input = small.ints, map = function(k, v) cbind(v, v^2))
> from.dfs("/tmp/RtmpWnzxl4/file5deb791fcbd5")
每台Datanode也安装R,同上主机步骤
sudo yum install gcc-gfortran
sudo yum install gcc gcc-c++
sudo yum install readline-devel
sudo yuminstall libXt-devel
在装rmr2的时候依赖很多包,上边的包装了还报错的情况下,就要进行没有的包的安装
install.packages(c("RJSONIO","digest","functional","caTools"))
PS:在装R的过程中发现,当仅仅装主机namenode的R和rhdfs,rmr2的时候,执行library(rmr2)进行mapreduce计算是会报很多错的,后边发现R,rhdfs,rmr2也需要安装在DataNode上的,安装中发现装了几台后就在装有的机器上执行,不报错了,并不是所有的装完,这个问题让我有点困扰,难道是只需装一台主节点和一台datanode就可以?可是确实发现装一台Datanode时候还是不行的,当我装了四台datanode的时候就已经发现没问题了,我就困扰了,好在解决了问题,为了保证R每台都能用,我决定九台机器都装好R及rhdfs,rmr2
这中间的错误很类似http://cos.name/2013/03/rhadoop2-rhadoop/这个网页上大家的提问,正是这里给我了指点,我打算装上datanode ,潜意识一直以为R不用每台机器安装,看上去不是这样的,至少R ,rhdfs,rmr2是必须装主节点和几台datanode的,这个问题还需要理论研究一下
[hadoop@cslave7 jars]$ cd rhadoop/
[hadoop@cslave7 rhadoop]$ ll
total 900
-rw-r--r-- 1 hadoop hadoop 251733 Aug 11 13:50rhbase-1.2.1.tar.gz
-rw-r--r-- 1 hadoop hadoop 25105 Aug 11 13:50 rhdfs_1.0.8.tar.gz
-rw-r--r-- 1 hadoop hadoop 567515 Aug 11 13:50rJava_0.9-6.tar.gz
-rw-r--r-- 1 hadoop hadoop 62774 Aug 11 13:50 rmr2_3.3.0.tar.gz
drwxr-xr-x 4 hadoop hadoop 4096 Aug 11 13:50 rmr2.Rcheck
--------下边这个步骤要在hadoop下能成功,必须是在Hadoop账号下进入R环境安装完所有依赖的包就可以,这个地方我开始在root下操作,结果发现在root下安装成功后,进入Hadoop到R环境就又说没有rmr2包了,真的是很折腾人啊!
[hadoop@cslave5 rhadoop]$ R CMD INSTALL "rmr2_3.3.0.tar.gz"
报这个错 就要通过hadoop用户进入到R环境安装上边提到的包:
装完再执行:
成功了!
下边装rhbase和rhive,这两个装在cmanager上就可以了,不用每台机器都装
查看之前陈黄的文档就可以了
Rstudio安装:参考陈黄文档
[hadoop@cslave3 ~]$ ssh cslave1
[hadoop@cslave1 ~]$ sudo chown -Rhadoop:hadoop /gtabigdata
[hadoop@cslave1~]$ ssh cslave2
Lastlogin: Wed Aug 5 16:57:26 2015 fromcslave1
[hadoop@cslave2~]$ sudo chown -R hadoop:hadoop /gtabigdata
[hadoop@cslave2~]$ ssh cslave3
Lastlogin: Wed Aug 5 16:57:28 2015 fromcslave2
[hadoop@cslave3~]$ sudo chown -R hadoop:hadoop /gtabigdata
[hadoop@cslave3~]$ ssh cslave4
Lastlogin: Wed Aug 5 12:09:48 2015 fromcslave3
[hadoop@cslave4~]$ sudo chown -R hadoop:hadoop /gtabigdata
[hadoop@cslave4~]$ ssh cslave5
Lastlogin: Wed Aug 5 10:50:24 2015 fromcslave6
[hadoop@cslave5~]$ sudo chown -R hadoop:hadoop /gtabigdata
[hadoop@cslave5~]$ ssh cslave6
Lastlogin: Wed Aug 5 12:08:43 2015 fromcslave5
[hadoop@cslave6~]$ sudo chown -R hadoop:hadoop /gtabigdata
[hadoop@cslave6~]$ ssh cslave7
Lastlogin: Wed Aug 5 12:09:04 2015 fromcslave6
[hadoop@cslave7~]$ sudo chown -R hadoop:hadoop /gtabigdata
[hadoop@cslave7~]$ ssh cslave8
Lastlogin: Wed Aug 5 12:09:07 2015 fromcslave7
[hadoop@cslave8~]$ sudo chown -R hadoop:hadoop /gtabigdata
[hadoop@cslave8~]$ ssh cslave9
Lastlogin: Wed Aug 5 12:09:00 2015 fromcslave8
[hadoop@cslave9~]$ sudo chown -R hadoop:hadoop /gtabigdata
gtabigdata包含:‘hadoop,hbase,zookeeper及期数据目录
修改zookeeper的myid
[hadoop@cslave1~]$ vim /gtabigdata/data/zookeeper/data/myid
对应myid文件中修改为2
[hadoop@cslave2~]$ vim /gtabigdata/data/zookeeper/data/myid
----3
……….
每台机器环境变量设置
vim/etc/profile 按照主机的/etc/profile进行每台slave的配置
source/etc/profile
启动hadoop
[hadoop@cmanager~]$ start-all.sh
启动zookeeper:
[hadoop@cslave1~]$ zkServer.sh start
JMXenabled by default
Usingconfig: /gtabigdata/soft/zookeeper/bin/../conf/zoo.cfg
Startingzookeeper ... STARTED
--------每台开启完成后 查看每台机器zookeeper的状态
[hadoop@cslave6~]$ zkServer.sh status
JMXenabled by default
Usingconfig: /gtabigdata/soft/zookeeper/bin/../conf/zoo.cfg
Mode:leader
[hadoop@cslave6~]$ ssh cslave5
Lastlogin: Thu Aug 6 14:55:54 2015 fromcslave4
[hadoop@cslave5~]$ zkServer.sh status
JMXenabled by default
Usingconfig: /gtabigdata/soft/zookeeper/bin/../conf/zoo.cfg
Mode:follower
开启HBASE
[hadoop@cmanager ~]$start-hbase.sh
starting master, logging to/gtabigdata/soft/hbase/logs/hbase-hadoop-master-cmanager.out
cslave4: startingregionserver, logging to /gtabigdata/soft/hbase/bin/../logs/hbase-hadoop-regionserver-cslave4.out
cslave2: startingregionserver, logging to/gtabigdata/soft/hbase/bin/../logs/hbase-hadoop-regionserver-cslave2.out
cslave9: startingregionserver, logging to /gtabigdata/soft/hbase/bin/../logs/hbase-hadoop-regionserver-cslave9.out
cslave5: startingregionserver, logging to/gtabigdata/soft/hbase/bin/../logs/hbase-hadoop-regionserver-cslave5.out
cslave6: startingregionserver, logging to /gtabigdata/soft/hbase/bin/../logs/hbase-hadoop-regionserver-cslave6.out
cslave1: startingregionserver, logging to/gtabigdata/soft/hbase/bin/../logs/hbase-hadoop-regionserver-cslave1.out
cslave8: startingregionserver, logging to/gtabigdata/soft/hbase/bin/../logs/hbase-hadoop-regionserver-cslave8.out
cslave7: startingregionserver, logging to/gtabigdata/soft/hbase/bin/../logs/hbase-hadoop-regionserver-cslave7.out
cslave3: startingregionserver, logging to/gtabigdata/soft/hbase/bin/../logs/hbase-hadoop-regionserver-cslave3.out
[hadoop@cmanager ~]$ jps
8532 ResourceManager
8370 SecondaryNameNode
9088 NameNode
12692 HMaster---------Hbase主进程
12978 Jps
[hadoop@cmanager ~]$ sshcslave1
Last login: Thu Aug 6 14:53:47 2015 from cmanager
[hadoop@cslave1 ~]$ jps
10340 Jps
10150 HRegionServer----HbaseRegion进程
8791 NodeManager
8667 DataNode
10055 QuorumPeerMain—zookeeper进程
Hadoop编译方法参见:Native Libraries Guide
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/NativeLibraries.html
hbase-client-1.0.1.1.jar
hive on Spark +HBASE整合
file:///gtabigdata/soft/hive/lib/hive-hbase-handler-1.2.1.jar,file:///usr/local/hive/lib/hbase-it-1.0.1.1.jar,file:///gtabigdata/soft/hive/lib/zookeeper-3.4.6.jar
http://my.oschina.net/repine/blog/285015#OSC_h1_3