Hadoop2.7.7 centos7 完全分布式 配置与问题随记
这里是当初在三个ECS节点上搭建hadoop+zookeeper+hbase+solr的主要步骤,文章内容未经过润色,请参考的同学搭配其他博客一同使用,并记得根据实际情况调整相关参数。
0.prepare
jdk,推荐1.8
关闭防火墙
开放ECS安全组
三台机器之间的免密登陆ssh
ip映射:【question1】hadoop启动时出现报错java.net.BindException: Cannot assign requested address
说明ip映射没有配置正确,正确的方式是在每一个节点上,都执行"内外外"的配置方式,即将本机与本机的内网ip对应,其他机器设置为外网ip
下面的文件要在每个节点上都修改
1. vi /etc/profile
1. vi /etc/profile
/opt/hadoop/hadoop-2.7.7
export HADOOP_HOME=/opt/hadoop/hadoop-2.7.7
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
export PATH=.:${JAVA_HOME}/bin:${HADOOP_HOME}/bin:$PATH
#使环境变量生效
souce /etc/profile
#检验
hadoop version
2. vi /.../hadoop-2.7.7/etc/hadoop/core-site.xml
fs.defaultFS
hdfs://Gwj:8020
定义默认的文件系统主机和端口
io.file.buffer.size
4096
流文件的缓冲区为4K
hadoop.tmp.dir
file:/opt/hadoop/hadoop-2.7.7/tempdata
A base for other temporary directories.
3. vi /.../hadoop-2.7.7/etc/hadoop/hdfs-site.xml
dfs.replication
2
dfs.namenode.name.dir
/opt/hadoop/hadoop-2.7.7/dfs/name
dfs.datanode.data.dir
/opt/hadoop/hadoop-2.7.7/dfs/data
dfs.webhdfs.enabled
true
dfs.webhdfs.enabled
true
dfs.permissions.enabled
false
dfs.http.address
Gwj:50070
dfs.namenode.secondary.http-address
Ssj:50090
4. vi /.../hadoop-2.7.7/etc/hadoop/mapred-site.xml
mapreduce.framework.name
local
mapreduce.jobhistory.address
0.0.0.0:10020
mapreduce.jobhistory.webapp.address
0.0.0.0:19888
5. vi /.../hadoop-2.7.7/etc/hadoop/yarn-site.xml
yarn.resourcemanager.hostname
Gwj
指定resourcemanager所在的hostname
yarn.nodemanager.aux-services
mapreduce_shuffle
NodeManager上运行的附属服务。需配置成mapreduce_shuffle,才可运行MapReduce程序
6.vi /.../hadoop-2.7.7/etc/hadoop/slaves
老版本是slaves文件,3.0.3 用 workers 文件代替 slaves 文件
将localhost删掉,加入dataNode节点的主机名
[root@Gwj ~]# cat /opt/hadoop/hadoop-2.7.7/etc/hadoop/slaves
Ssj
Pyf
7.首次使用进行格式化
hdfs namenode -format
8.启动
/.../hadoop-2.7.7/sbin/start/start-all.sh
hdfs
/.../hadoop-2.7.7/sbin/start/start-dfs.sh
Yarn
/.../hadoop-2.7.7/sbin/start/start-yarn.sh
#start可替换为stop、status
9.检验
使用jps检验
hadoop
hdfs
Master---NameNode (SecondaryNameNode)
Slave---DataNode
Yarn
Master---ResourceManager
Slave---NodeManager
或者使用 “Master ip+50070”
---以下的yarn未设置,注意!!!
yarn.resourcemanager.address
${yarn.resourcemanager.hostname}:8032
The address of the scheduler interface.
yarn.resourcemanager.scheduler.address
${yarn.resourcemanager.hostname}:8030
The http address of the RM web application.
yarn.resourcemanager.webapp.address
${yarn.resourcemanager.hostname}:8088
The https adddress of the RM web application.
yarn.resourcemanager.webapp.https.address
${yarn.resourcemanager.hostname}:8090
yarn.resourcemanager.resource-tracker.address
${yarn.resourcemanager.hostname}:8031
The address of the RM admin interface.
yarn.resourcemanager.admin.address
${yarn.resourcemanager.hostname}:8033
yarn.scheduler.maximum-allocation-mb
2048
每个节点可用内存,单位MB,默认8182MB,根据阿里云ECS性能配置为2048MB
yarn.nodemanager.vmem-pmem-ratio
2.1
yarn.nodemanager.resource.memory-mb
2048
yarn.nodemanager.vmem-check-enabled
false