从官网下载hadoop包,http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz
#解压,路径为/Users/zheng/hadoop/hadoop-3.2.1
$ tar -zxvf hadoop-3.2.1.tar.gz
#设置环境变量
$ vim /etc/profile
#加入以下设置
export HADOOP_HOME=/Users/zheng/hadoop/hadoop-3.2.1
export PATH=$PATH:$HADOOP_HOME/bin
#生效
$ source /etc/profile
#检查环境变量是否设置成功
$ hadoop version
#以下则表示设置成功
2020-04-09 09:20:20,371 DEBUG util.VersionInfo: version: 3.2.1
Hadoop 3.2.1
Source code repository https://gitbox.apache.org/repos/asf/hadoop.git -r b3cbbb467e22ea829b3808f4b7b01d07e0bf3842
Compiled by rohithsharmaks on 2019-09-10T15:56Z
Compiled with protoc 2.5.0
From source with checksum 776eaf9eee9c0ffc370bcbc1888737
This command was run using /Users/zheng/hadoop-3.2.1/share/hadoop/common/hadoop-common-3.2.1.jar
core-site.xml:集群全局参数,定义系统级别的参数,如HDFS URL 、Hadoop的临时目录等
# 修改/Users/zheng/hadoop-3.2.1/etc/hadoop/core-site.xml
<!--文件系统主机和端口-->
fs.defaultFS</name>
hdfs://localhost:9000</value>
</property>
<!--指定hadoop运行时产生文件的存放目录,不需要提交创建好,后续会自动生成-->
hadoop.tmp.dir</name>
file:/Users/zheng/hadoop/tmp</value>
</property>
</configuration>
hdfs-site.xml:namenode,datanode存放位置、文件副本的个数、文件的读取权限等
# 修改//Users/zheng/hadoop-3.2.1/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<!--不需要提交创建好,后续会自动生成-->
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/Users/zheng/hadoop/dfs/name</value>
</property>
<!--不需要提交创建好,后续会自动生成-->
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/Users/zheng/hadoop/dfs/data</value>
</property>
</configuration>
mapred-site.xml:Mapreduce参数
# 修改/Users/zheng/hadoop-3.2.1/etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
yarn-site.xml:集群资源管理系统参数,ResourceManager ,nodeManager的通信端口,web监控端口等
# 修改/Users/zheng/hadoop-3.2.1/etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
初始化hdfs
# 格式化HDFS,可以执行hdfs namenode -format,如果这个命令不行执行以下命令
cd /Users/zheng/hadoop/hadoop-3.2.1/bin
./hdfs namenode -format
启动hadoop
cd /Users/zheng/hadoop/hadoop-3.2.1/sbin
./start-all.sh
报如下错误:
WARNING: Attempting to start all Apache Hadoop daemons as zheng in 10 seconds.
WARNING: This is not a recommended production deployment configuration.
WARNING: Use CTRL-C to abort.
Starting namenodes on [zheng-2.local]
zheng-2.local: ERROR: Cannot set priority of namenode process 3282
Starting datanodes
Starting secondary namenodes [account.jetbrains.com]
account.jetbrains.com: ERROR: Cannot set priority of secondarynamenode process 3524
2020-04-09 12:01:03,083 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting resourcemanager
Starting nodemanagers
如何查错:进入/Users/zheng/hadoop/hadoop-3.2.1/logs可看到启动日志。
这里namenode启动报错,可以看见以下错误:
java.lang.IllegalArgumentException: Invalid URI for NameNode address (check fs.defaultFS): file:/Users/zheng/hadoop/tmp has no authority.
at org.apache.hadoop.hdfs.DFSUtilClient.getNNAddress(DFSUtilClient.java:780)
at org.apache.hadoop.hdfs.DFSUtilClient.getNNAddressCheckLogical(DFSUtilClient.java:809)
at org.apache.hadoop.hdfs.DFSUtilClient.getNNAddress(DFSUtilClient.java:771)
at org.apache.hadoop.hdfs.server.namenode.NameNode.getRpcServerAddress(NameNode.java:545)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loginAsNameNodeUser(NameNode.java:676)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:696)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:953)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:926)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1692)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1759)
解决:
本以为是/Users/zheng/hadoop/tmp文件权限问题,chmod 777 /Users/zheng/hadoop/tmp赋权以后还是不行,最后发现之前配置core-site.xml设置复制粘贴错了,
<property>
<name>hadoop.tmp.dir</name>
<value>file:/Users/zheng/hadoop/tmp</value>
</property>
修改参数后需要重新hdfs namenode -format一下文件,再重新启动
确认启动成功
#启动成功后执行jps检查下是否成功启动
$ jps
17521 SecondaryNameNode
17717 ResourceManager
17369 DataNode
17820 NodeManager
17262 NameNode
17886 Jps
hadoop web页面默认地址:http://localhost:9870/
yarn默认地址:http://localhost:8088
1、需要提前装好jdk,如何安装这个自行百度
2、ssh localhost
#ssh登录,好像之前配置过,所以本地直接执行以下命令就能登录,如果没配置过好像会报权限之类的问题
ssh localhost
# 如有问题执行以下命令添加权限
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys