Hadoop集群的部署
开始安装Hadoop之前,为了让 Master 节点可以无密码 SSH 登陆到各个 Slave 节点上,所以需要配置 SSH 无密码登陆
安装版本: hadoop-2.8.3.tar.gz
mkdir /usr/local/hadoop
tar zxvf hadoop-2.8.3.tar.gz -C /usr/local/hadoop
修改域名与IP的对应关系(hadoop2和hadoop3同样也需要修改hosts文件)
vi /etc/hosts
10.2.15.176 hadoop1
10.2.15.177 hadoop2
10.2.15.170 hadoop3
配置环境变量(hadoop2和hadoop3同样也需要修改hosts文件)
vi /etc/profile
export FLINK_HOME=/usr/local/hadoop/hadoop-2.8.3
export PATH=$FLINK_HOME/bin:$PATH
source /etc/profile
先建好稍后需要用到的文件夹
mkdir /usr/local/hadoop
mkdir /usr/local/hadoop/tmp
mkdir /usr/local/hadoop/var
mkdir /usr/local/hadoop/dfs
mkdir /usr/local/hadoop/dfs/name
mkdir /usr/local/hadoop/dfs/data
修改core-site.xml文件
vi /usr/local/hadoop/hadoop-2.8.3/etc/hadoop/core-site.xml
hadoop.tmp.dir
/usr/local/hadoop/tmp
Abase for other temporary directories.
fs.defaultFS
hdfs://hadoop1:9000
修改mapred-site.xml文件
cp /usr/local/hadoop/hadoop-2.8.3/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/hadoop-2.8.3/etc/hadoop/mapred-site.xml
vi /usr/local/hadoop/hadoop-2.8.3/etc/hadoop/mapred-site.xml
mapreduce.framework.name
yarn
mapreduce.jobhistory.address
hadoop1:10020
mapreduce.jobhistory.webapp.address
hadoop1:19888
修改hdfs-site.xml文件
vi /usr/local/hadoop/hadoop-2.8.3/etc/hadoop/hdfs-site.xml
dfs.replication
1
dfs.namenode.name.dir
/usr/local/hadoop/dfs/name
dfs.datanode.data.dir
/usr/local/hadoop/dfs/data
dfs.replication
2
HDFS 的数据块的副本存储个数, 默认是3
dfs.permissions
false
need not permissions
修改hdfs-site.xml文件
vi /usr/local/hadoop/hadoop-2.8.3/etc/hadoop/yarn-site.xml
yarn.resourcemanager.hostname
hadoop1
yarn.nodemanager.aux-services
mapreduce_shuffle
yarn.resourcemanager.address
hadoop1:8032
yarn.resourcemanager.scheduler.address
hadoop1:8030
yarn.resourcemanager.resource-tracker.address
hadoop1:8031
yarn.resourcemanager.admin.address
hadoop1:8033
yarn.resourcemanager.webapp.address
hadoop1:8088
yarn.resourcemanager.am.max-attempts
4
The maximum number of application master execution attempts.
如果是以Flink on Yarn方式启动的,因为Hadoop Yarn是一个资源调度器,所以我们应该考虑好每个Conatiner被分配到的内存资源,所以需要在文件
hdfs-site.xml
中配置好yarn.nodemanager.resource.memory-mb
,yarn.scheduler.minimum-allocation-mb
,yarn.scheduler.maximum-allocation-mb
,yarn.app.mapreduce.am.resource.mb
和yarn.app.mapreduce.am.command-opts
,不然会发生内存不足,导致Application启动失败。Current usage: 303.2 MB of 1 GB physical memory used; 2.3 GB of 2.1 GB virtual memory used. Killing container.
vi /usr/local/hadoop/hadoop-2.8.3/etc/hadoop/yarn-site.xml
yarn.nodemanager.vmem-check-enbaled false yarn.nodemanager.resource.memory-mb 106496 yarn.scheduler.minimum-allocation-mb 2048 yarn.scheduler.maximum-allocation-mb 106496 yarn.app.mapreduce.am.resource.mb 4096 yarn.app.mapreduce.am.command-opts -Xmx3276m
修改 hadoop-env.sh
, mapred-env.sh
和 yarn-env.sh
vi /usr/local/hadoop/hadoop-2.8.3/etc/hadoop/hadoop-env.sh
export JAVA_HOME="/usr/local/jdk/jdk1.8.0_251"
vi /usr/local/hadoop/hadoop-2.8.3/etc/hadoop/mapred-env.sh
export JAVA_HOME="/usr/local/jdk/jdk1.8.0_251"
vi /usr/local/hadoop/hadoop-2.8.3/etc/hadoop/yarn-env.sh
export JAVA_HOME="/usr/local/jdk/jdk1.8.0_251"
把/hadoop发送给另外两台服务器
scp -r /usr/local/hadoop hadoop2:/usr/local
scp -r /usr/local/hadoop hadoop3:/usr/local
启动Hadoop集群
初始化HDFS系统
/usr/local/hadoop/hadoop-2.8.3/bin/hdfs namenode -format
开启 NameNode 和 DataNode 守护进程
/usr/local/hadoop/hadoop-2.8.3/sbin/start-all.sh
在浏览器中输入 http://hadoop1:50070
,可查看相关信息
运行wordcount demo
bin/hdfs dfs -mkdir /input
bin/hdfs dfs -ls /
bin/hdfs dfs -put /usr/local/hadoop/tmp/input_hadoop_demo_test.txt /input/
bin/hdfs dfs -ls /input/
bin/hadoop jar /usr/local/hadoop/hadoop-2.8.5/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.5.jar wordcount /input/input_hadoop_demo_test.txt /output
bin/hdfs dfs -ls /output
bin/hdfs dfs -cat /output/part-r-00000