安装hadoop-2.6.5,zookeeper-3.4.9,hbase-1.2.4,sqoop-1.99.7,snappy压缩配置

1.配置host

vi /etc/hosts

IP0 master

IP1 slave01

IP2 slave02

验证要能互相按照域名ping通

2.增加hadoop用户(首先配置mater机器)

useradd hadoop (创建用户)

passwd hadoop (设置密码,为简单起见,3台机器上的hadoop密码最好设置成一样,比如hadoop123)

为了方便,建议将hadoop加入root用户组,操作方法:

先以root身份登录,然后输入

usermod -g root hadoop ,执行完后hadoop即归属于root组了,可以再输入

id hadoop 查看输出验证一下,如果看到类似下面的输出:

uid=502(hadoop) gid=0(root) 组=0(root)

3.配置ssh免登录 su hadoop
进入hadoop用户主目录/home/hadoop
ssh-keygen -t rsa -P ‘’
然后一直回车到结束提示公钥,私钥生成成功 ,-P ”表示密码为空
生成后导入公钥
cat .ssh/id_rsa.pub >> .ssh/authorized_keys
执行成功后 ssh localhost不需要密码表示已经成功,如果不行需要修改下文件权限
chmod 600 .ssh/authorized_keys
连接成功后,在其他机器上也生成公钥,私钥,并将公钥复制到master机器
通过
slave01
scp .ssh/id_rsa.pub hadoop@master:/home/hadoop/id_rsa_01.pub
slave02
scp .ssh/id_rsa.pub hadoop@master:/home/hadoop/id_rsa_02.pub
在master主机上将公钥写入认证keys
cat id_rsa_01.pub >> .ssh/authorized_keys

cat id_rsa_02.pub >> .ssh/authorized_keys

将master上的“最全”公钥,复制到其它机器

a) 继续保持在master上,

scp .ssh/authorized_keys hadoop@slave01:/home/hadoop/.ssh/authorized_keys

scp .ssh/authorized_keys hadoop@slave02:/home/hadoop/.ssh/authorized_keys

b) 修改其它机器上authorized_keys文件的权限

slave01以及slave02机器上,均执行命令

chmod 600 .ssh/authorized_keys

验证:
在每个虚拟机上,均用 ssh 其它机器的hostname 验证下,如果能正常无密码连接成功,表示ok

下面的安装都是先在master主机上进行安装,然后再复制到远程主机上
安装前可以首先配置下相关的路径,本次安装已知需要安装的结果及目录层级,为方便后面的调试安装,可以首先进行如下配置
vi ~/.bashrc

.bashrc

User specific aliases and functions

alias rm=’rm -i’
alias cp=’cp -i’
alias mv=’mv -i’

Source global definitions

if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi
alias cdhp=‘cd /home/hadoop/hadoop-2.6.5’
alias cdhb=’cd /home/hadoop/hbase-1.2.4’
alias cdz=’cd /home/hadoop/zookeeper-3.4.9’

vi~/.bash_profile

.sh_profile

Get the aliases and functions

if [ -f ~/.bashrc ]; then
. ~/.bashrc
fi

User specific environment and startup programs

PATH= PATH: HOME/bin

export PATH

User defined

export HADOOP_HOME=/home/hadoop/hadoop-2.6.5
export ZOOKEEPER_HOME=/home/hadoop/zookeeper-3.4.9
export HBASE_HOME=/home/hadoop/hbase-1.2.4
export PATH= HADOOPHOME/bin: HADOOP_HOME/sbin: ZOOKEEPERHOME/bin: HBASE_HOME/bin:$PATH

source ~/.bashrc
source ~/.bash_profile
使以上配置可以生效

4.安装hadoop
首先下载安装包ftp到相应的主机上,这里我们用的是hadoop-2.6.5.tar.gz
所有的相关软件安装在hadoop用户的根目录/home/hadoop/opt下
a).解压
tar -xzvf hadoop-2.6.5.tar.gz
mv hadoop-2.6.5 hadoop
hadoop version可以查看当前的hadoop版本
b).修改配置文件共有7个文件需要修改:
$HADOOP_HOME/etc/hadoop/hadoop-env.sh

$HADOOP_HOME/etc/hadoop/yarn-env.sh

$HADOOP_HOME/etc/hadoop/core-site.xml

$HADOOP_HOME/etc/hadoop/hdfs-site.xml

$HADOOP_HOME/etc/hadoop/mapred-site.xml

$HADOOP_HOME/etc/hadoop/yarn-site.xml

$HADOOP_HOME/etc/hadoop/slaves

其中$HADOOP_HOME表示hadoop根目录,本文中默认为/home/hadoop/opt/hadoop

echo $JAVA_HOME查看下是否配置

hadoop-env.sh文件修改

The java implementation to use.

export JAVA_HOME=${JAVA_HOME}

export JAVA_HOME=”/usr/java/jdk1.8.0_131”

yarn-env.sh 文件修改

some Java parameters

export JAVA_HOME=/home/y/libexec/jdk1.6.0/

export JAVA_HOME=”/usr/java/jdk1.8.0_131”

core-site.xml文件修改


fs.defaultFS
hdfs://master:9000


hadoop.tmp.dir
file:/data/hadoop/tmp

master

slave01
slave02

c).格式化hadoop的文件系统
sh $HADOOP_HOME/bin/hdfs namenode -format
显示
15/02/12 21:29:53 INFO namenode.FSImage: Allocated new BlockPoolId: BP-85825581-192.168.187.102-1423747793784

15/02/12 21:29:53 INFO common.Storage: Storage directory /home/hadoop/tmp/dfs/name has been successfully formatted.

等看到这个时,表示格式化ok

如果是重新安装,并且配置的tmp或者data文件目录下已经有相关的文件需要将其全部手工删除,然后重新格式化

d).启动dfs
sh $HADOOP_HOME/sbin/start-dfs.sh
如果启动成功jps查看进程
66576 NameNode
66790 SecondaryNameNode
如果在slaves中配置了master主机为data节点,则此时也会出现DataNode,NodeManager两个进程

如果在启动日志中报WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable
Starting namenodes on [dev-103],则可以查看下lib/native库下的libhadoop.so.1.0.0文件的编译版本
ldd libhadoop.so.1.0.0
然后查看本地的版本
ldd —version
如果两个版本不一致
本地jar包中的libhadoop.so.1.0.0 文件是已经编译好的替换下就行

e).启动yarn
sh $HADOOP_HOME/sbin/start-yarn.sh
如果启动成功jps查看进程
jps此时
66576 NameNode
66790 SecondaryNameNode
67042 ResourceManager

f).将master主机复制到slave01,slave02两个节点上
scp -r hadoop hadoop@slave01:/home/hadoop/opt

scp -r hadoop hadoop@slave02:/home/hadoop/opt

g).mv slaves.bak slaves文件
在master节点上重新启动
如果一切正常,则在master主机上应该至少看到
66576 NameNode
66790 SecondaryNameNode
67042 ResourceManager
如果配置为data节点则还有DataNode,NodeManager两个进程
slave01,slave02上有2个进程
DataNode
NodeManager

同时可浏览
http://master:50070/
http://master:8088/
查看dfs状态可以
$HADOOP_HOME/bin/hdfs dfsadmin -report 查看hdfs的状态报告

注意:master(即:namenode节点)若要重新格式化,请先清空各datanode上的data目录(最好连tmp目录也一起清空),否则格式化完成后,启动dfs时,datanode会启动失败

5.安装zookeeper
下载zookeeper-3.4.9.tar.gz到master主机上
a).解压 tar -xvzf zookeeper-3.4.9.tar.gz
mv zookeeper-3.4.9 zookeeper
b).修改生成配置文件zoo.cfg
cd $ZOOKEEPER_HOME/conf

cp zoo_sample.cfg zoo.cfg

vi zoo.cfg

The number of milliseconds of each tick

tickTime=2000

The number of ticks that the initial

synchronization phase can take

initLimit=10

The number of ticks that can pass between

sending a request and getting an acknowledgement

syncLimit=5

the directory where the snapshot is stored.

do not use /tmp for storage, /tmp here is just

example sakes.

这里的目录视个人定义,一定注意这些目录的读写权限
dataDir=/data/tmp/zookeeper
dataLogDir=/data/tmp/datalog

the port at which the clients will connect

clientPort=2181

the maximum number of client connections.

increase this if you need to handle more clients

maxClientCnxns=60

#

Be sure to read the maintenance section of the

administrator guide before turning on autopurge.

#

http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance

#

The number of snapshots to retain in dataDir

autopurge.snapRetainCount=3

Purge task interval in hours

Set to “0” to disable auto purge feature

autopurge.purgeInterval=1

这里配置服务节点
server.1=master:2888:3888
server.2=slave01:2888:3888
server.3=slave02:2888:3888

server.这里的需要在相应主机的dataDir目录中生成相应的myid文件,文件里的值为*
例如server.1则在master主机的/data/tmp/zookeeper中创建文件myid文件值为1

c).复制master主机上的配置到slaves主机上
在/home/hadoop/opt/目录下
scp -r zookeeper hadoop@slave01:/home/hadoop/opt
scp -r zookeeper hadoop@slave02:/home/hadoop/opt

d).启动zookeeper
zookeeper需要每个节点都启动分别在主机上执行
bin/sh zkServer.sh start

bin/sh zkServer.sh status可以查看启动状态,如果已经启动成功应该显示如下内容
[java]
view plaincopy
JMX enabled
by default Using config: /home/hadoop/zookeeper/bin/../conf/zoo.cfg Mode: follower //或者有且只有一个leader
可以使用 bin/sh zkServer.sh start-foreground来前台启动查看启动日志进行错误查找
或者在logs目录中可以查看master或者各region节点的启动日志

jps显示进程如下
QuorumPeerMain

6.安装Hbase
下载hbase-1.2.4-bin.tar.gz到master主机上
a).解压 tar -xzvf hbase-1.2.4-bin.tar.gz
mv hbase-1.2.4 hbase
b).修改配置文件

hbase-env.sh 文件修改

The java implementation to use. Java 1.7+ required.

export JAVA_HOME=/usr/java/jdk1.6.0/

export JAVA_HOME=/usr/local/jdk1.8.0_65

Tell HBase whether it should manage it’s own instance of Zookeeper or not.

export HBASE_MANAGES_ZK=true

export HBASE_MANAGES_ZK=false //这里是设置是使用hbase自身的zookeeper还是使用独立的

hbase-site.xml 文件修改


hbase.rootdir
hdfs://master:9000/hbase


hbase.cluster.distributed
true


hbase.zookeeper.quorum
master


hbase.master
hdfs://master:60000


hbase.zookeeper.property.dataDir
/data/tmp/zookeeper

Hadoop configuration directory

org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/etc/hadoop/conf/

org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/home/hadoop/opt/hadoop/etc/hadoop/
修改认证方式
#

Authentication configuration

#

user change the simple authentication

org.apache.sqoop.security.authentication.type=SIMPLE
org.apache.sqoop.security.authentication.handler=org.apache.sqoop.security.authentication.SimpleAuthenticationHandler
org.apache.sqoop.security.authentication.anonymous=true
d).验证是否有效

bin/sqoop2-tool verify

e).开启服务器
bin/sqoop2-server start
jps SqoopJettyServer如有此进程则启动成功

Snappy压缩配置

首先hadoop checknative 先检查下是否有snappy压缩库文件
安装hadoop-2.6.5,zookeeper-3.4.9,hbase-1.2.4,sqoop-1.99.7,snappy压缩配置_第1张图片

如果本地有的话就可以直接cp /usr/lib64/libsnappy.so.1 /home/hadoop/opt/hadoop-2.6.5/lib/native/libsnappy.so.1
把这个复制到hadoop的native库里,如果没有则需要从网上重新下载编译
修改core-site.xml文件增加

io.compression.codecs

org.apache.hadoop.io.compress.GzipCodec,
org.apache.hadoop.io.compress.DefaultCodec,
org.apache.hadoop.io.compress.BZip2Codec,
org.apache.hadoop.io.compress.SnappyCodec

配置压缩
重新启动hadoop ./start-dfs.sh ./start-yarn.sh
jps查看是否启动成功

hadoop checknative查看是否已经修改完snappy库文件的位置

配置hbase的压缩库
cp -r /home/hadoop/opt/hadoop-2.6.5/lib/native/* /home/hadoop/opt/hbase-1.2.4/lib

然后在hbase-env.sh中配置
export LD_LIBRARY_PATH= LDLIBRARYPATH: HADOOP_HOME/lib/native/:/usr/local/lib/
export HBASE_LIBRARY_PATH= HBASELIBRARYPATH: HBASE_HOME/lib/:/usr/local/lib/
可以创建/home/hadoop/data/mytest.txt文件进行测试
hbase -Djava.library.path=/home/hadoop/opt/hadoop-2.6.5/lib/native/ org.apache.hadoop.hbase.util.CompressionTest /home/hadoop/data/mytest.txt snappy
执行完成后
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/opt/hbase-1.2.4/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/opt/hadoop-2.6.5/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2017-03-07 16:45:33,475 INFO [main] hfile.CacheConfig: Created cacheConfig: CacheConfig:disabled
2017-03-07 16:45:33,626 INFO [main] compress.CodecPool: Got brand-new compressor [.snappy]
2017-03-07 16:45:33,630 INFO [main] compress.CodecPool: Got brand-new compressor [.snappy]
2017-03-07 16:45:33,832 INFO [main] hfile.CacheConfig: Created cacheConfig: CacheConfig:disabled
2017-03-07 16:45:33,877 INFO [main] compress.CodecPool: Got brand-new decompressor [.snappy]
SUCCESS
表示配置成功

全部节点配置完成后,重启hbase即可

你可能感兴趣的:(Hadoop)