Hadoop+Hive+Hbase完全分布式安装实录

   在成熟平台上学习了hadoop的基本知识后,开始尝试自己搭建hadoop平台,在搭建伪分布式成功的基础上,进行了真正分布式的部署。
    前期搭建的伪分布式,采用的版本分别是:hadoop-1.2.1+apache-hive-0.13.0+hbase-0.98.1,这个版本组合经过测试可以正常使用,所有搭建分布式的环境也采用了这些版本。

环境规划
机器名    IP             角色
master 100.2.12.85 nameNode,master,jobtracker
slave1 100.2.12.13 Datanode,slave,tasktracker
slave2 100.2.12.97 Datanode,slave,tasktracker
slave3 100.2.12.94 Datanode,slave,tasktracker

一、Hadoop安装
1、根据规划对各个机器设置hostname
2、配置/etc/hosts,每台机器上采用相同的配置
100.2.12.85 master
100.2.12.13 slave1
100.2.12.97 slave2
100.2.12.94 slave3
3、可以新建一个用户,用来安装hadoop,比如新建用户hadoop,也可以再root进行分布式部署。
useradd hadoop
passwd hadoop
4、配置ssh免登陆
这一块折腾了一会,因为不明白公钥、私钥的原理,如果想深入了解一下可以参考一个文章,http://www.blogjava.net/yxhxj2006/archive/2012/10/15/389547.html;
a、ssh-keygen -t dsa -P ''
b、cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
如果想让多台机器互相免密码登陆,即maser登陆slave1,slave1登陆master,只要把各自的id_rsa.pub 追加到对方的authorized_keys即可。把mster的公钥追加到slave1的keys种,slave1的公钥追加到maset的keys中,这样就master与slave1就可以免密码登陆了。以此类推配置其它节点。
5、部署hadoop
a、tar解压hadoop压缩包,mv重命名伟hadoop。
b、修改配置文件
修改conf/Hadoop-env.sh,配置java_home
修改conf/core-site.xml,创建文件夹/home/hadoop/tempdir
<configuration>
<property>
   <name>fs.default.name</name>
      <value>hdfs://master:9000</value>
      </property>
      <property>
          <name>hadoop.tmp.dir</name>
              <value>/home/hadoop/tempdir</value>
              </property>
              </configuration>
c、修改hdfs-site.xml配置
<configuration>
<property>
   <name>dfs.replication</name>
      <value>2</value>
      </property>
      </configuration>
d、修改mapred-site.xml配置
<configuration>
<property>
     <name>mapred.job.tracker</name>
        <value>master:9001</value>
        </property>
        </configuration>
e、配置masters
master
f、配置slave
slave1
slave2
slave3
到此hadoop需要的配置已经完成了,用scp命令将hadoop发送到slave1~slave3。
发送完成后进行格式化
hadoop namenode –format
格式化成功后,start-all.sh 启动hadoop集群,通过jps查看启动状态。
你可以通过命令
Hadoop dfsadmin –report
查看集群状态
或者通过http://master:50070;http://master:50030查看集群状态。
二、Hive的安装
Hive的安装相对比较简单,只需修改几个配置就行。Hive自动了元数据库,但使用不方便,因为自带的数据库如果进入hive shell的路径不一致,就看不到对应的表,如从/home进入hive建的表,从/home/hadoop,路径进入hive则看不到刚才建的表,所有建议单独安装数据库作为元数据库,我安装了mysq数据库,作为Hive的元数据库。
1、安装myslq,最好采用yum或apt-get的方式安装,不同的平台的采取不同方式,这样可以省去很多麻烦,避免了解决依赖包的问题。
安装完myslq之后,需要做一点配置。
设置密码:
MySQL在刚刚被安装的时候,它的root用户是没有被设置密码的。首先来设置MySQL的root密码。

[root@sample ~]# mysql -u root  ← 用root用户登录MySQL服务器
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 2 to server version: 4.1.20

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql>grant all on mysql.* to 'root'@'localhost' identified by 'password';
之后 就可以通过密码登录mysql了。


设置远程访问:

首先,在MySQL服务器上本地登录,使用下面的命令登录:
# mysql-u root -p
然后,使用权限管理命令将所有表的所有权限授予用户,下面继续以root为例(需要注意的是用户名root,被授权主机%,登录密码password需要被单引号括起来):
mysql> grant all privileges on *.* to 'root’@'%’ identified by 'password';
2、tar 解压hive压缩包
修改配置文件
hive的配置文件放在HIVE_HOME/conf目录下,我们需要修改hive-env.sh和hive-site.xml这两个文件。
ls之 后发现并没有这两个文件,但是有hive-env.sh.template,hive-default.xml.template,我们须复制这两个文 件,并分别命名为hive-env.sh,hive-site.xml。
一般还有一个hive-default.xml文件,同样由hive- default.xml.template复制而来。hive-default.xml是默认配置,hive-site.xml是个性化配置,将会覆盖 hive-default.xml配置。
a、修改hive-env.sh配置文件
将export HADOOP_HEAPSIZE=1024前面的‘#’去掉,当然可以根据自己的环境对这个默认的1024进行优化;
将export HADOOP_HOME前面的‘#’号去掉,并让它指向您所安装hadoop的目录,我的/home/hadoop/hadoop;
将export HIVE_CONF_DIR=/home/hadoop/hive/conf,并且把‘#’号去掉;
将export HIVE_AUX_JARS_PATH=/home/hadoop/hive/lib,并且把‘#’号去掉。
b、修改hive-site.xml
<property>
  <name>hive.metastore.warehouse.dir</name>
  <value>/user/hive/warehouse</value>#hdfs下的目录
  <description>location of default database for the warehouse</description>
</property>
#存放hive相关日志的目录
<property>
  <name>hive.querylog.location</name>
  <value>/usr/hadoop/hive/log</value>#需要自己手动创建对应目录
  <description>
    Location of Hive run time structured log file
  </description>
</property>
c、在hive-site.xml下配置mysql元数据库
<configuration>
    <property>
        <name>hive.metastore.local</name>
        <value>true</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://192.168.11.157:3306/hive</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>hive</value>
    </property>
    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>mysql</value>
    </property>
</configuration>
根据实际情况做对应的修改。并下载mysql的驱动复制到hive/lib中。
到此hive的安装配置已完成,可以进入hive查看一下。
hive
show tables;
三、Hbase安装
Hbase需要安装zookeeper,Hbase自带了Zookeeper,单独安装zookeeper和使用自带的我都十了一下,如果想自己安装可以安装zookeeper-3.4.5,安装zookeeper可以参考http://blog.sina.com.cn/s/blog_7c5a82970101trxu.html。
1、tar解压hbase安装包
2、配置hbase
a、/conf/hbase-env.sh
export JAVA_HOME=
export HBASE_MANAGES_ZK=true (可以启用hbase自带的zookeeper,这样也不用单独安装zookeeper了,如单独安装了,配为false)
b、conf/hbase-site.xml
该配置采用了hbase自带的zookeeper
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://master:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>slave1,slave2,slave3</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
<description>
</description>
</property>
</configuration>
单独安装的zookeeper采用如下配置
regionservers<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://master:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>master,slave1,slave2,slave3</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
<description>
</description>
</property>

<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>
    /home/hadoop/zk</value>
<description>
</description>
</property>


  
</configuration>注意hbase.rootdir配置需要与hadoop的配置一致。
c、conf/regionservers
slave1
slave2
slave3
到此hbase的配置已完成,用scp命令复制到slave1~salve3中。

启动hbase,
start-hbase.sh
用jps观看是否启动正常,或通过浏览器查看,master:60010。


总结:
整个部署其实没什么难度,就是配置稍微多一些,可能一个地方漏掉,就会出现莫名其妙的问题。只要选对了对应的版本,一般不会有什么问题。
我用java的api操作hbase数据库时,会报一个错,但不影响运行,在hbase shell下是可以正常操作的,不知道是版本兼容问题,还是什么问题,也没找到什么原因,我把异常放到这,希望有解决此问题的,提供一下解决方式,虽说不影响运行,但看着还是比较恶心,呵呵,异常如下:
14/05/22 13:52:49 INFO zookeeper.ClientCnxn: Session establishment complete on server slave3/100.2.12.94:2181, sessionid = 0x346227c54700001, negotiated timeout = 90000
14/05/22 13:52:51 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/05/22 13:52:52 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/05/22 13:52:53 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/05/22 13:52:54 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/05/22 13:52:55 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/05/22 13:52:56 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/05/22 13:52:57 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/05/22 13:52:58 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/05/22 13:52:59 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/05/22 13:53:00 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
14/05/22 13:53:00 WARN util.DynamicClassLoader: Failed to identify the fs of dir hdfs://localhost:9000/hbase/lib, ignored
java.net.ConnectException: Call to localhost/127.0.0.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
at org.apache.hadoop.ipc.Client.wrapException(Client.java:1142)
at org.apache.hadoop.ipc.Client.call(Client.java:1118)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
at com.sun.proxy.$Proxy5.getProtocolVersion(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62)
at com.sun.proxy.$Proxy5.getProtocolVersion(Unknown Source)
at org.apache.hadoop.ipc.RPC.checkVersion(RPC.java:422)
at org.apache.hadoop.hdfs.DFSClient.createNamenode(DFSClient.java:183)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:281)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:245)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:100)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1446)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1464)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:263)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)
at org.apache.hadoop.hbase.util.DynamicClassLoader.<init>(DynamicClassLoader.java:104)
at org.apache.hadoop.hbase.protobuf.ProtobufUtil.<clinit>(ProtobufUtil.java:201)
at org.apache.hadoop.hbase.ClusterId.parseFrom(ClusterId.java:64)
at org.apache.hadoop.hbase.zookeeper.ZKClusterId.readClusterIdZNode(ZKClusterId.java:69)
at org.apache.hadoop.hbase.client.ZooKeeperRegistry.getClusterId(ZooKeeperRegistry.java:83)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.retrieveClusterId(HConnectionManager.java:857)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.<init>(HConnectionManager.java:662)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnectionManager.java:414)
at org.apache.hadoop.hbase.client.HConnectionManager.createConnection(HConnectionManager.java:393)
at org.apache.hadoop.hbase.client.HConnectionManager.getConnection(HConnectionManager.java:274)
at org.apache.hadoop.hbase.client.HBaseAdmin.<init>(HBaseAdmin.java:192)
at com.inspur.hbase.HbaseTest12.createTable(HbaseTest12.java:61)
at com.inspur.hbase.HbaseTest12.main(HbaseTest12.java:44)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:160)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:511)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:481)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:457)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:583)
at org.apache.hadoop.ipc.Client$Connection.access$2200(Client.java:205)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1249)
at org.apache.hadoop.ipc.Client.call(Client.java:1093)
... 41 more

你可能感兴趣的:(hadoop)