Hadoop基础知识: http://hadoop.apache.org/docs/r1.0.4/cn/quickstart.html
注:本文中缺少的图可以下载附件DOC
Hadoop2.7.1集群搭建
1.系统配置
电脑1(Lenovo),win7 64位系统,8G内存,此电脑虚拟机上运行name系统。
电脑1(Lenovo ),win7 64位系统,8G内存,此电脑虚拟机上运行standyname系统
电脑1(Lenovo),win7 64位系统,8G内存,此电脑虚拟机上运行amrm系统
虚拟机:Vmware12.0
Hadoop2.7.1
Zookeeper3.4.6
2.集群规划
其具体规划如下:
JournalServer 应该单纯一台,slaves文件中为JournalNode(存储name的元数据)
journalServer and journalNode 中配置zookeeper,name和standy name
主机名 IP 安装软件 运行的进程
name 192.168.32.137 Jdk,hadoop
zookeeper namenode、DFSZKFailoverController、datanode、jobhistorysever、NodeManager、JournalNode、QuorumPeerMain
sname 192.168.32.135 Jdk,hadoop
zookeeper Namenode、DFSZKFailoverController,datanode、NodeManager、JournalNode、QuorumPeerMain
amrm 192.168.32.136 Jdk,hadoop
zookeeper datanode、NodeManager、JournalNode、QuorumPeerMain,ResourceManager
说明:
在hadoop2.0中通常由两个name组成,一个处于active状态,另一个处于standby状态。Active name对外提供服务,而Standby name则不对外提供服务,仅同步activename的状态,以便能够在它失败时快速进行切换。
hadoop2.0官方提供了两种HDFS HA的解决方案,一种是NFS,另一种是QJM。这里我们使用简单的QJM。
在该方案中,主备name之间通过一组JournalNode同步元数据信息,一条
数据只要成功写入多数JournalNode即认为写入成功。通常配置奇数个JournalNode
这里还配置了一个zookeeper集群,用于ZKFC(DFSZKFailoverController)故障转移,当
Active name挂掉了,会自动切换Standby name为standby状态。
1)在name,sname,amrm命令行vim /etc/hostname中分别设置name,sname,amrm的主机名,如下图所示:
2)在name,sname,amrm命令行vim /etc/hosts 中设置name,sname,amrm主机名和ip地址的对应关系,如下图所示:
3)验证各系统之间是否能够ping通。
4)安装SSH 并产生公私钥在name上:(可以copy ~/.ssh 到 sname和amrm,统一公私钥)
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
拷贝公钥到sname,amrm做同样的动作(最好统一公私钥)
scp -r /root/.ssh root@sname:/root/
scp -r /root/.ssh/id_dsa.pub root@sname:/root/.ssh/id_dsa.pub
scp -r /root/.ssh/id_dsa.pub root@amrm:/root/.ssh/id_dsa.pub
检查 ssh sname amrm 保证互相访问不需要密码 ,如果slaves文件中包括自己那么还要执行
ssh name
---------------------------------------------------------------------------------------------------------------------------------------------
Scp 命令:
// scp from source to destination(local)
scp root@data:/root/.ssh/id_dsa.pub ~/.ssh/data_dsa.pub
// scp from source(local) to destination
scp -r /root/.ssh/id_dsa.pub root@amrm:/root/.ssh/id_dsa.pub
---------------------------------------------------------------------------------------------------------------------------------------------
注:scp 在ssh通的情况下用
错误:
-1.Please contact your system administrator.
Add correct host key in /root/.ssh/known_hosts to get rid of this message.
Offending ECDSA key in /root/.ssh/known_hosts:2
remove with:
ssh-keygen -f "/root/.ssh/known_hosts" -R sname
执行:ssh-keygen -f "/root/.ssh/known_hosts" -R sname 或删除/root/.ssh/known_hosts的第2行。
-2.Warning: the ECDSA host key for 'sname' differs from the key for the IP address '192.168.32.138'
Offending key for IP in /root/.ssh/known_hosts:2
Matching host key in /root/.ssh/known_hosts:5
Are you sure you want to continue connecting (yes/no)? yes
Welcome to Ubuntu 15.10 (GNU/Linux 4.2.0-16-generic x86_64)
* Documentation: https://help.ubuntu.com/
82 packages can be updated.
42 updates are security updates.
Last login: Fri Dec 11 22:30:00 2015 from 192.168.32.138
解决:删除/root/.ssh/known_hosts的第2行。
-3.Your id_dsa is 755 cann’t used
chmod 700 ~/.ssh/id_dsa(私钥文件权限)
5)关ip6
-1.cat /proc/sys/net/ipv6/conf/all/disable_ipv6
显示0说明ipv6开启,1说明关闭
-2在 /etc/sysctl.conf 增加下面几行,并重启。
#disable IPv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1
-3. sudo vim /etc/default/grub
-4. 将文件中的 GRUB_CMDLINE_LINUX_DEFAULT="quiet spalsh" 修改为
GRUB_CMDLINE_LINUX_DEFAULT="ipv6.disable=1 quiet splash"
-5. wq保存后,运行sudo update-grub更新
-6. 重启网络服务,禁用ipv6成功
可以使用
ip a | grep inet6
查看关闭情况,若没有结果则说明禁用IPv6成功
3.安装配置zookeeper集群
1)解压zookeeper压缩包到/hadoop
tar –zxvf zookeeper-3.4.6.tar.gz /hadoop
mv /hadoop/zookeeper-3.4.6 /hadoop/zookeeper-3.4.6
2)在/hadoop/zookeeper-3.4.6/conf修改zookeeper配置zoo.cfg,具体配置如下图所示:
3)在/hadoop/zookeeper-3.4.6中设置创建tmp目录
Mkdir /hadoop/zookeeper-3.4.6/tmp
4)在/hadoop/zookeeper-3.4.6/tmp目录中创建空文件myid,并写入4
vim /hadoop/zookeeper-3.4.6/tmp/myid。
5)将配置好的zookeeper拷贝到sname和amrm
scp -r /hadoop/zookeeper-3.4.6 root@sname:/hadoop/zookeeper-3.4.6
scp -r /hadoop/zookeeper-3.4.6 root@amrm:/hadoop/zookeeper-3.4.6
6)在sname和amrm中分别修改myid为2和3。
4.安装配置hadoop集群
1)解压hadoop压缩包到/hadoop
tar -zxvf hadoop-2.7.1.tar.gz /hadoop
2)安装hadoop在~/.bashrc中配置hadoop的环境变量信息,如下图所示:
# the variable for hadoop
export JAVA_HOME=/usr/lib/java/jdk1.7.0_79
export JRE_HOME=${JAVA_HOME}/jre
export CLASS_PATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export HADOOP_HOME=/hadoop/hadoop-2.7.1
export ZOOKEEPER_HOME=/hadoop/zookeeper-3.4.6
export PATH=${JAVA_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin:${ZOOKEEPER_HOME}/bin:${PATH}
export HADOOP_MAPRED_HOME=${HADOOP_HOME}
export HADOOP_COMMON_HOME=${HADOOP_HOME}
export HADOOP_HDFS_HOME=${HADOOP_HOME}
export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP}/lib/native
export YARN_HOME=${HADOOP_HOME}
export HADOOP_OPT="-Djava.library.path=${HADOOP_HOME}/lib/native"
5.配置hadoop
hadoop2.7.1的所有配置文件从存在/hadoop/hadoop-2.7.1/etc/hadoop之中。
cd /hadoop/hadoop-2.7.1/etc/hadoop
1)修改hadoop-env.sh 加入jdk家目录
export JAVA_HOME=/usr/lib/java/jdk1.7.0_79
2)修改core-site.xml
<configuration>
<!-- 指定hdfs的nameservice为ns -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://ns</value>
</property>
<!-- 指定hadoop临时目录 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/hadoop/tmp </value>
</property>
<!-- 指定zookeeper地址 -->
<property>
<name>ha.zookeeper.quorum</name>
<value>name:2181,sname:2181,amrm:2181</value>
</property>
</configuration>
3)修改hdfs-site.xml //
<configuration>
<!--指定hdfs的nameservice为ns,需要和core-site.xml中的保持一致 -->
<property>
<name>dfs.nameservices</name>
<value>ns</value>
</property>
<!-- ns下面有两个name,分别是nm,snm -->
<property>
<name>dfs.ha.names.ns</name>
<value>nm,snm</value>
</property>
<!-- nm的RPC通信地址 -->
<property>
<name>dfs.name.rpc-address.ns.nm</name>
<value>name:9000</value>
</property>
<!-- nm的http通信地址 -->
<property>
<name>dfs.name.http-address.ns.nm</name>
<value>name:50070</value>
</property>
<!-- snm的RPC通信地址 -->
<property>
<name>dfs.name.rpc-address.ns.snm</name>
<value>sname:9000</value>
</property>
<!-- snm的http通信地址 -->
<property>
<name>dfs.name.http-address.ns.snm</name>
<value>sname:50070</value>
</property>
<!-- hadoop.tmp.dir 在core-site.xml中设置这里不用设,否者则添加如下两个属性 -->
<property>
<name>dfs.name.name.dir</name>
<value>/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.name.data.dir</name>
<value>/hadoop/dfs/data</value>
</property>
<!-- 指定name的元数据在JournalNode上的存放位置 加入amrm集群更健壮-->
<property>
<name>dfs.name.shared.edits.dir</name>
<value>qjournal://name:8485;sname:8485;amrm:8485/ns</value>
</property>
<!-- 指定JournalNode在本地磁盘存放数据的位置 -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/hadoop/journal</value>
</property>
<!-- 开启name失败自动切换 -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!-- 配置失败自动切换实现方式 -->
<property>
<name>dfs.client.failover.proxy.provider.ns</name>
<value>org.apache.hadoop.hdfs.server.name.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!-- 配置隔离机制方法,多个机制用换行分割,即每个机制暂用一行-->
<property>
<name>dfs.ha.fencing.methods</name>
<value>
sshfence
shell(/bin/true)
</value>
</property>
<!-- 使用sshfence隔离机制时需要ssh免登陆 -->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_dsa</value>
</property>
<!-- 配置sshfence隔离机制超时时间 -->
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
</configuration>
4)修改mapred-site.xml
<configuration>
<!-- 指定mr框架为yarn方式 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!-- 启动historyserver -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>name:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>name:19888</value>
</property>
<!--dir为分布式文件系统中的文件目录,启动时先启动dfs,在启动historyserver -->
<property>
<name>mapreduce.jobhistory.intermediate-done-dir</name>
<value>/history/indone</value>
</property>
<!--dir为分布式文件系统中的文件目录,启动时先启动dfs,在启动historyserver -->
<property>
<name>mapreduce.jobhistory.done-dir</name>
<value>/history/done</value>
</property>
</configuration>
5)修改yarn-site.xml
<configuration>
<!-- 指定resourcemanager地址 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>amrm</value>
</property>
<!--ResourceManager 对客户端暴露的地址。
客户端通过该地址向RM提交应用程序,杀死应用程序等-->
<property>
<name>yarn.resourcemanager.address</name>
<value>${yarn.resourcemanager.hostname}:8032</value>
</property>
<!--ResourceManager 对ApplicationMaster暴露的访问地址。
ApplicationMaster通过该地址向RM申请资源、释放资源等。-->
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>${yarn.resourcemanager.hostname}:8030</value>
</property>
<!-- ResourceManager 对NodeManager暴露的地址。
NodeManager通过该地址向RM汇报心跳,领取任务等。-->
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>${yarn.resourcemanager.hostname}:8031</value>
</property>
<!--ResourceManager 对管理员暴露的访问地址。
管理员通过该地址向RM发送管理命令等。默认值:${yarn.resourcemanager.hostname}:8033-->
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>${yarn.resourcemanager.hostname}:8033</value>
</property>
<!--ResourceManager对外web ui地址-->
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>${yarn.resourcemanager.hostname}:8088</value>
</property>
<!-- 指定nodemanager启动时加载server的方式为shuffle server -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
6)修改slaves
slaves是指定子节点的位置,因为要在name上启动HDFS、在amrm启动yarn,所以name上的slaves文件指定的是datanode的位置,amrm上的slaves文件指定的是nodemanager的位置
cd /hadoop/hadoop-2.7.1/tmp/hadoop/etc/hadoop/
vim slaves
name
sname
amrm
注:name中slaves为amrm和journalnode的地址,amrm中slaves为nodeamananger的地址。
6.将配置好的hadoop拷贝到sname和amrm
scp -r /hadoop/hadoop-2.7.1/tmp root@amrm:/hadoop/hadoop-2.7.1/tmp
scp -r /hadoop/hadoop-2.7.1/tmp root@sname:/hadoop/hadoop-2.7.1/tmp/
scp -r /hadoop/hadoop-2.7.1/tmp root@amrm:/hadoop/hadoop-2.7.1/tmp/
*********************注意:以下操作必须严格按照顺序*****************************
7.启动zookeeper集群,(在name,sname,amrm的/hadoop/hadoop-2.7.1/tmp/zk/bin/里开启)
cd /hadoop/hadoop-2.7.1/tmp/zk/bin // 按顺序启动name,sname,amrm
./zkServer.sh start(启动zookeeper节点)
./zkServer.sh status(查看zookeeper状态)
8.启动journalnode,(在name,sname,amrm的/hadoop/hadoop-2.7.1/tmp/hadoop/sbin里启动)//在name中启动即可 非hadoop-daemon.sh
hadoop-daemons.sh start journalnode
jps(依次在每个节点中查看各节点是否多了Journalnode进程)
9.格式化HDFS,在name上执行格式化命令
hdfs namenode -format ns
格式化后会在根据core-site.xml中的hadoop.tmp.dir配置生成个文件,这里我配置的是/hadoop/hadoop-2.7.1/tmp,然后将/hadoop/hadoop-2.7.1/tmp拷贝到sname和amrm的/hadoop/hadoop-2.7.1/tmp下。
scp -r /hadoop/hadoop-2.7.1/dfs root@sname:/hadoop/hadoop-2.7.1
scp -r /hadoop/hadoop-2.7.1/dfs root@amrm:/hadoop/hadoop-2.7.1
注:格式化生成的目录不要轻易删除,否者启动回报不一致异常
10.格式化ZK,在name上执行格式化命令
hdfs zkfc -formatZK
11.启动HDFS,在name的/hadoop/hadoop-2.7.1/tmp/hadoop/sbin中执行start-dfs.sh命令
cd /hadoop/hadoop-2.7.1/sbin/
start-dfs.sh
启动之后,分别进入name,sname,amrm中jps,查看是否多了name 和 DFSZKFailoverController两个进程(name,sname)
12.启动 historyserver 在name中的/hadoop/hadoop-2.7.1/tmp/hadoop/sbin中执行,
hdfs dfs -mkdir /histroy
hdfs dfs -mkdir /histroy/indone
hdfs dfs -mkdir /histroy/done
mr-jobhistory-daemon.sh start historyserver
13.启动YARN
在 amrm 中的/hadoop/hadoop-2.7.1/tmp/hadoop/sbin中执行start-yarn.sh命令
cd /hadoop/hadoop-2.7.1/sbin/
start-yarn.sh
是在amrm上执行start-yarn.sh,把name和resourcemanager分开是因为性能问题,因为他们都要占用大量资源,所以把他们分开了,他们分开了就要分别在不同的机器上启动
14.到此,hadoop2.7.0的配置完毕,可以通过浏览器访问来查看部署是否成功
1) http://192.168.32.137:50070 namenode
2)http://192.168.32.136:8088 resourcemanager
3)http://192.168.32.137:19888 jobhistroysever
15.执行job
1)hdfs dfs -mkdir /test
2)hdfs dfs -mkdir /test/input
3)hdfs dfs -put etc/hadoop/*.xml /test/input
4)hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar grep /test/input /test/output 'dfs[a-z.]+'
5) hdfs dfs -get /test/output output //当前目录
6) cat output/* 查看结果
备注:另外一种查看结果的方式
hdfs dfs -cat /test/output/*
查看job状态:
Jobhistorysever:
16.关闭hadoop
在amrm中
stop-yarn.sh
在name中
mr-jobhistory-daemon.sh stop historyserver
stop-dfs.sh
17.hadoop dfsadmin -safemode leave
注:以上过程有什么问题,可以查看相关日志文件
相关异常
1.org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby
因为掉电,导致hadoop 的HA 出现 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby 此问题,原来从web 页面查看,是两个节点都变成了standy,所以要切换
hdfs haadmin -transitionToActive --forcemanual nm
2. org.apache.hadoop.ipc.Client: Retrying connect to server: amrm/192.168.32.136:8032. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
telnet: Unable to connect to remote host: Connection refused Ubuntu 15.10
查看能否ping通,查看端口是否开放,如果能ping通,同时端口开放,用如下命令查看系统端口监听
netstat -ntulp
确保local Address的地址为0.0.0.0 或192.168.32.137。
解决办法 修改/etc/hosts 地址映射
3. org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category WRITE is not supported in state standby
name 处于standby状态
4.org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://sname:9000/user/root/grep-temp-1382738569
这个是由于map的产生的文件放在分布式文件系统/user/${username}中新建
hdfs dfs -mkdir /user
hdfs dfs -mkdir /user/${username}
5.java.io.IOException: Unknown Job job_1450012188054_0001 at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler.verifyAndGetJob(HistoryClientService.java:218)
at org.apache.hadoop.mapreduce.v2.hs.HistoryClientService$HSClientProtocolHandler.getCounters(HistoryClientService.java:232) at org.apache.hadoop.mapreduce.v2.api.impl.pb.service.MRClientProtocolPBServiceImpl.getCounters(MRClientProtocolPBServiceImpl.java:159) at org.apache.hadoop.yarn.proto.MRClientProtocol$MRClientProtocolService$2.callBlockingMethod(MRClientProtocol.java:281)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
解决办法:hdfs dfs -chmod -R 777 /history
5
解决方式:在/etc/hosts中,添加jamel地址映射。
注:
1.Job 成功的显示输出结果
15/12/13 22:17:44 INFO mapreduce.Job: Job job_1450012188054_0002 completed successfully
15/12/13 22:17:45 INFO mapreduce.Job: Counters: 50
File System Counters
FILE: Number of bytes read=493
FILE: Number of bytes written=1176179
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=33949
HDFS: Number of bytes written=663
HDFS: Number of read operations=30
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Killed map tasks=3
Launched map tasks=12
Launched reduce tasks=1
Data-local map tasks=12
Total time spent by all maps in occupied slots (ms)=1450715
Total time spent by all reduces in occupied slots (ms)=112387
Total time spent by all map tasks (ms)=1450715
Total time spent by all reduce tasks (ms)=112387
Total vcore-seconds taken by all map tasks=1450715
Total vcore-seconds taken by all reduce tasks=112387
Total megabyte-seconds taken by all map tasks=1485532160
Total megabyte-seconds taken by all reduce tasks=115084288
Map-Reduce Framework
Map input records=926
Map output records=17
Map output bytes=508
Map output materialized bytes=541
Input split bytes=969
Combine input records=17
Combine output records=15
Reduce input groups=15
Reduce shuffle bytes=541
Reduce input records=15
Reduce output records=15
Spilled Records=30
Shuffled Maps =9
Failed Shuffles=0
Merged Map outputs=9
GC time elapsed (ms)=67395
CPU time spent (ms)=15090
Physical memory (bytes) snapshot=1492398080
Virtual memory (bytes) snapshot=6682742784
Total committed heap usage (bytes)=1178963968
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=32980
File Output Format Counters
Bytes Written=663
2.本文所搭建的是高可用对于namenode而言,而RM HA可以访问如下地址:
ResourceMananger HA 访问-
http://hadoop.apache.org/docs/stable/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html、