本教程采用 CDH 版,以避免版本依赖冲突导致错误,本教程同样适用于 Linux(推荐 CentOS);
本教程 Hadoop 使用伪分布式模式
Hadoop 运行模式
本地模式(单机模式)
Hadoop 默认模式为非分布式模式(本地模式),无需进行配置即可运行,即单 java 进程,方便进行调试。
伪分布式模式
Hadoop 可以在单节点上以伪分布式的方式运行,Hadoop 进程以分离的 Java 进程来运行,节点即作为 NameNode,也作为 DataNode,同时,读取的是 HDFS 中的文件
分布式模式
使用多个节点构成集群环境来运行 Hadoop
Hadoop CDH版本下载
下载地址:https://archive.cloudera.com/cdh5/cdh/5/
版本:hadoop-2.6.0-cdh5.9.3.tar.gz
环境准备
ssh 免密登录(此步骤可以忽略,但 Hadoop 每次启动都需要输入密码)
终端执行以下命令:
zhangzhaodeMacBook-Pro:~ zhangzhao$ ssh-keygen -t rsa -P "" //一直回车即可
zhangzhaodeMacBook-Pro:~ zhangzhao$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
验证免密登录
zhangzhaodeMacBook-Pro:~ zhangzhao$ ssh localhost
Last login: Fri Jan 4 13:45:54 2019 //出现这个结果表示免密登录成功
JDK 安装
JDK 版本:
macOS:jdk-8u192-macosx-x64.dmg
Linux:jdk-8u192-linux-x64.tar.gz
macOS 双击安装,Linux 解压即可
JDK 环境变量配置:
macOS:
在系统根目录(~)下打开.bash_profile
zhangzhaodeMacBook-Pro:~ zhangzhao$ vim .bash_profile
添加以下内容:
1 JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_192.jdk/Contents/Home/
2 CLASSPAHT=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
3 PATH=$JAVA_HOME/bin:$PATH:
4 export JAVA_HOME
5 export CLASSPATH
6 export PATH
最后使环境变量生效:
zhangzhaodeMacBook-Pro:~ zhangzhao$ source .bash_profile
JDK 验证:
zhangzhaodeMacBook-Pro:~ zhangzhao$ java -version
java version "1.8.0_192"
Java(TM) SE Runtime Environment (build 1.8.0_192-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.192-b12, mixed mode) java -version
Linux(有默认的 openJDK 的话,可以忽略):
在系统根目录(~)下打开.bash_profile
vim .bash_profile
添加以下内容:
JAVA_HOME=/usr/lib/jdk1.8.0_192
CLASSPATH=.:$JAVA_HOME/lib/tools.jar:$JAVA_HOME/lib/dt.jar
PATH=$JAVA_HOME/bin:$HOME/bin:$HOME/.local/bin:$PATH
最后使环境变量生效:
source .bash_profile
JDK 验证:
java -version
java version "1.8.0_192"
Java(TM) SE Runtime Environment (build 1.8.0_192-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.192-b12, mixed mode) java -version
下载 Hadoop
使用 wget 命令,也可以手动下载
我这里下载到 /Users/zhangzhao/develop/hadoop
zhangzhaodeMacBook-Pro:hadoop zhangzhao$ wget https://archive.cloudera.com/cdh5/cdh/5/hadoop-2.6.0-cdh5.9.3.tar.gz
mac 系统默认没有 wget,使用 Homebrew 安装(Linux 请忽略):
zhangzhaodeMacBook-Pro:~ zhangzhao$ brew install wget
Homebrew官网
安装Homebrew(Linux 请忽略):
zhangzhaodeMacBook-Pro:~ zhangzhao$ /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)"
Homebrew使用请参考官网
解压 Hadoop
zhangzhaodeMacBook-Pro:hadoop zhangzhao$ zhangzhao$ tar -zxvf hadoop-2.6.0-cdh5.9.3.tar.gz
zhangzhaodeMacBook-Pro:hadoop zhangzhao$ ls
hadoop-2.6.0-cdh5.9.3
hadoop-2.6.0-cdh5.9.3.tar.gz
Hadoop 目录结构
zhangzhaodeMacBook-Pro:hadoop zhangzhao$ cd hadoop-2.6.0-cdh5.9.3/
zhangzhaodeMacBook-Pro:hadoop-2.6.0-cdh5.9.3 zhangzhao$ ls
LICENSE.txt cloudera lib
NOTICE.txt etc libexec
README.txt examples sbin
bin examples-mapreduce1 share
bin-mapreduce1 include src
bin:存放基础的管理脚本和使用脚本,这些脚本是sbin目录下管理脚本的基础实现,用户可以用这些脚本管理和使用hadoop
etc:存放包括core-site.xml、hdfs-site.xml、mapred-site.xml和yarn-site.xml等配置文件。.template是模板文件。
lib:存放Hadoop的本地库(对数据进行压缩解压缩功能)
sbin:存放启动或停止Hadoop集群相关服务的脚本
share:存放Hadoop的依赖jar包、文档、和官方案例
libexec:各个服务所对应的shell配置文件所在目录,可用于配置日志输出目录、启动参数(比如JVM参数)等基本信息
Hadoop 核心配置文件配置
配置文件目录:~/develop/hadoop/hadoop-2.6.0-cdh5.9.3/etc/hadoop
hadoop-env.sh
添加 JDK 安装目录路径:
export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_192.jdk/Contents/Home/
vim hadoop-env.sh
core-site.xml
添加如下配置:
fs.defaultFS
hdfs://localhost:8020
hadoop.tmp.dir
/Users/zhangzhao/develop/tmp
vim core-site.xml
hdfs-site.xml
添加如下配置:
dfs.replication
1
dfs.namenode.name.dir
/Users/zhangzhao/develop/tmp/dfs/name
dfs.datanode.data.dir
/Users/zhangzhao/develop/tmp/dfs/data
vim hdfs-site.xml
Hadoop 环境变量
vim ~/.bash_profile
添加如下配置:
# added by Hadoop installer
export HADOOP_HOME=/Users/zhangzhao/develop/hadoop/hadoop-2.6.0-cdh5.9.3
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
使配置生效
source ~/.bash_profile
HDFS 格式化与启动停止
格式化 HDFS
注意:这一步操作,只在初始化时执行一次,如果每次都格式化,那么 HDFS 上的数据会全部清空。
zhangzhaodeMacBook-Pro:bin zhangzhao$ hdfs namenode -format
出现以下日志表示格式化成功:
启动 HDFS
zhangzhaodeMacBook-Pro:hadoop-2.6.0-cdh5.9.3 zhangzhao$ cd sbin/
zhangzhaodeMacBook-Pro:sbin zhangzhao$ start-dfs.sh
19/01/05 12:43:51 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
localhost: starting namenode, logging to /Users/zhangzhao/develop/hadoop/hadoop-2.6.0-cdh5.9.3/logs/hadoop-zhangzhao-namenode-zhangzhaodeMacBook-Pro.local.out
localhost: starting datanode, logging to /Users/zhangzhao/develop/hadoop/hadoop-2.6.0-cdh5.9.3/logs/hadoop-zhangzhao-datanode-zhangzhaodeMacBook-Pro.local.out
Starting secondary namenodes [account.jetbrains.com]
account.jetbrains.com: starting secondarynamenode, logging to /Users/zhangzhao/develop/hadoop/hadoop-2.6.0-cdh5.9.3/logs/hadoop-zhangzhao-secondarynamenode-zhangzhaodeMacBook-Pro.local.out
19/01/05 12:44:06 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
验证 HDFS 启动是否成功
zhangzhaodeMacBook-Pro:sbin zhangzhao$ jps
87715 NameNode
87781 DataNode
87871 SecondaryNameNode
87950 Jps
出现以上三个 node,表示成功
访问 HDFS:http://localhost:50070
停止 HDFS
zhangzhaodeMacBook-Pro:sbin zhangzhao$ stop-dfs.sh
19/01/05 12:47:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Stopping namenodes on [localhost]
localhost: stopping namenode
localhost: stopping datanode
Stopping secondary namenodes [account.jetbrains.com]
account.jetbrains.com: stopping secondarynamenode
19/01/05 12:48:05 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
zhangzhaodeMacBook-Pro:sbin zhangzhao$ jps
88263 Jps
启动 Hadoop 集群
zhangzhaodeMacBook-Pro:sbin zhangzhao$ start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
19/01/05 13:13:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
localhost: namenode running as process 88426. Stop it first.
localhost: datanode running as process 88500. Stop it first.
Starting secondary namenodes [account.jetbrains.com]
account.jetbrains.com: secondarynamenode running as process 88592. Stop it first.
19/01/05 13:13:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
starting yarn daemons
starting resourcemanager, logging to /Users/zhangzhao/develop/hadoop/hadoop-2.6.0-cdh5.9.3/logs/yarn-zhangzhao-resourcemanager-zhangzhaodeMacBook-Pro.local.out
localhost: starting nodemanager, logging to /Users/zhangzhao/develop/hadoop/hadoop-2.6.0-cdh5.9.3/logs/yarn-zhangzhao-nodemanager-zhangzhaodeMacBook-Pro.local.out
zhangzhaodeMacBook-Pro:sbin zhangzhao$ jps
88592 SecondaryNameNode
88500 DataNode
89591 NodeManager
88426 NameNode
89519 ResourceManager
89615 Jps
jps 命令出现以上 5 个服务表示正常
停止 Hadoop 集群
zhangzhaodeMacBook-Pro:sbin zhangzhao$ start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
19/01/05 13:15:11 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Starting namenodes on [localhost]
localhost: namenode running as process 88426. Stop it first.
localhost: datanode running as process 88500. Stop it first.
Starting secondary namenodes [account.jetbrains.com]
account.jetbrains.com: secondarynamenode running as process 88592. Stop it first.
19/01/05 13:15:14 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
starting yarn daemons
resourcemanager running as process 89519. Stop it first.
localhost: nodemanager running as process 89591. Stop it first.