CentOS7
jdk-8u212-linux-x64.rpm
hadoop-2.9.2.tar.gz
关闭防火墙 和 SELinux
systemctl stop firewalld.service
systemctl disable firewalld.service
setenforce 0
vi /etc/selinux/config 将“SELINUX=enforcing”值改为“disabled”
安装jdk1.8
# rpm -ivh jdk-8u212-linux-x64.rpm
# vi /etc/profile.d/java.sh 输入以下几行
export JAVA_HOME=/usr/java/jdk1.8.0_212-amd64
export CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$JAVA_HOME/bin:$PATH
# source /etc/profile (执行后生效)
# java -version
# tar zxvf hadoop-2.9.2.tar.gz
# ln -s hadoop-2.9.2 hadoop
# vi /etc/profile.d/hadoop.sh 内容如下
export HADOOP_HOME=/data/soft/hadoop
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
# source /etc/profile
# whereis hdfs 验证环境变量
Hadoop单机模式没有HDFS,只有MapReduce程序。并且hadoop发布的版本默认就是单机模式,不需要做任何配置,适合本机调试。
(hadoop环境也可以单独设置jdk,编辑 hadoop/etc/hadoop/hadoop-env.sh 的 JAVA_HOME 即可。这步不是必须的)
Hadoop解压缩完后就可以用于单机版的测试。
# mkdir -p /data/input
# vi /data/input/data.txt 内容如下
hello world
hello hadoop
# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar wordcount /data/input/data.txt /data/output
# cat /data/output/*
hadoop 1
hello 2
world 1
上面是mapreduce运行完后的结果。运行日志有一个保错 EBADF: Bad file descriptor 不用管,忽略即可。
伪分布式是完全分布式的特例,只有一个节点。
涉及hadoop/etc/hadoop目录下的4个配置文件: core-site.xml, hdfs-site.xml, mapred-site.xml, yarn-site.xml
etc/hadoop/core-site.xml :
etc/hadoop/hdfs-site.xml :
如果想使用yarn 则需要编辑下面两个配置文件
etc/hadoop/mapred-site.xml :
etc/hadoop/yarn-site.xml :
使用命令 ssh localhost 看是否能免密登录,如果不行则执行下面的命令配置免密登录
# ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
# cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
# chmod 0600 ~/.ssh/authorized_keys
再验证ssh localhost是否能免密登录。
# ssh localhost
Last login: Sat Jul 13 22:02:41 2019 from localhost
# mkdir -p /data/hadoop/tmp
# hdfs namenode -format
格式化只做一次即可,再次格式化会丢失datanode的数据
# start-dfs.sh
Starting namenodes on [localhost]
localhost: starting namenode, logging to /data/soft/hadoop-2.9.2/logs/hadoop-root-namenode-localhost.localdomain.out
localhost: starting datanode, logging to /data/soft/hadoop-2.9.2/logs/hadoop-root-datanode-localhost.localdomain.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /data/soft/hadoop-2.9.2/logs/hadoop-root-secondarynamenode-localhost.localdomain.out
# jps
17185 DataNode
17091 NameNode
17561 Jps
17357 SecondaryNameNode
[root@localhost hadoop]# start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /data/soft/hadoop-2.9.2/logs/yarn-root-resourcemanager-localhost.localdomain.out
localhost: starting nodemanager, logging to /data/soft/hadoop-2.9.2/logs/yarn-root-nodemanager-localhost.localdomain.out
# jps
17185 DataNode
17091 NameNode
17767 Jps
17705 NodeManager
17357 SecondaryNameNode
17613 ResourceManager
http://192.168.100.160:50070/ HDFS管理界面(即namenode和datanode)
http://192.168.100.160:50090/ SecondaryNameNode管理界面
http://192.168.100.160:8088 yarn管理界面
准备测试数据
# mkdir -p /data/input
# vi /data/input/data.txt 内容如下
hello world
hello hadoop
# hdfs dfs -mkdir /input
# hdfs dfs -put /data/input/data.txt /input
# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar wordcount /input/data.txt /data/output
# hdfs dfs -cat /data/output/part-r-00000
hadoop 1
hello 2
world 1
说明测试通过。
如果想查看hdfs文件系统中的内容,可以使用hdfs dfs命令,如:
# hdfs dfs -ls /
Found 3 items
drwxr-xr-x - root supergroup 0 2019-07-13 23:04 /data
drwxr-xr-x - root supergroup 0 2019-07-13 23:03 /input
drwx------ - root supergroup 0 2019-07-13 22:58 /tmp
# hdfs dfs -ls /data
Found 1 items
drwxr-xr-x - root supergroup 0 2019-07-13 23:04 /data/output
停止命令: stop-dfs.sh, stop-yarn.sh 或 stop-all.sh
参考资料:
http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html
本文内容到此结束,更多内容可关注公众号