021 Installation of Hadoop 3.x on Ubuntu on Single Node Cluster
1. Objective
1. 目标
In this tutorial on Installation of Hadoop 3.x on Ubuntu, we are going to learn steps for setting up a pseudo-distributed, single-node Hadoop 3.x cluster on Ubuntu. We will learn steps like how to install java, how to install SSH and configure passwordless SSH, how to download Hadoop, how to setup Hadoop configurations like .bashrc file, hadoop-env.sh, core-site.xml, hdfs-site.xml, mapred-site.xml, YARN-site.xml, how to start the Hadoop cluster and how to stop the Hadoop services.
在本教程中,安装Hadoop3. 个 Ubuntu 的,我们要学习步骤建立伪分布式,单-Hadoop 3. 个集群节点在 Ubuntu.我们将学习如何安装 java 、如何安装 SSH 和配置无密码 SSH 、如何下载 Hadoop 、如何设置 Hadoop 配置等步骤.如何启动 Hadoop 集群以及如何停止 Hadoop 服务,bashrc 文件、 hadoop-env.sh 、 core-site.xml 、 hdfs-site.xml 、 mapred-site.xml 、 YARN-site.xml.
Learn step by step installation of Hadoop 2.7.x on Ubuntu.
在 Ubuntu 上了解 Hadoop 2.7.X 的分步安装.
Installation of Hadoop 3.x on Ubuntu on Single Node Cluster
2. Installation of Hadoop 3.x on Ubuntu
Hadoop 2. 安装 Ubuntu 的 3.x
Before we start with Hadoop 3.x installation on Ubuntu, let us understand key features that have been added in Hadoop 3 that makes the comparison between Hadoop 2 and Hadoop 3.
在我们开始在 Ubuntu 上安装 Hadoop 3.X 之前,让我们了解 Hadoop 3 中添加的关键功能,这些功能使得Hadoop 2 和 3. 的比较.
2.1. Java 8 installation
2.1.安装 Java 8
Hadoop requires working java installation. Let us start with steps for installing java 8:
Hadoop 需要安装 java.让我们从安装 java 8 的步骤开始:
a. Install Python Software Properties
A.安装 Python 软件属性
sudo apt-get install python-software-properties
安装 python-软件-属性
b. Add Repository
B.添加存储库
sudo add-apt-repository ppa:webupd8team/java
Sudo add-apt-repository ppa: webupd8team/java
c. Update the source list
C.更新源列表
sudo apt-get update
Sudo apt-获取更新
d. Install Java 8
安装 Java 8
sudo apt-get install oracle-java8-installer
Sudo apt-get 安装 oracle-java8-installer
e. Check if java is correctly installed
检查 java 是否安装正确
java -version
Java 版本
2.2. Configure SSH
2.2.配置 SSH
SSH is used for remote login. SSH is required in Hadoop to manage its nodes, i.e. remote machines and local machine if you want to use Hadoop on it. Let us now see SSH installation of Hadoop 3.x on Ubuntu:
远程登录使用 SSH.Hadoop 中需要 SSH 来管理其节点,即如果要在其上使用 Hadoop,则需要远程机器和本地机器.现在让我们在 Ubuntu 上看到 Hadoop 3.X 的 SSH 安装:
a. Installation of passwordless SSH
答: 无密码 SSH 的安装
sudo apt-get install ssh
sudo apt-get install pdsh
安装 ssh sudo apt
Sudo apt-获取安装 pdsh
b. Generate Key Pairs
生成密钥对
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
Ssh-密钥生成-t rsa-P '-f ~/.ssh/id _ rsa
c. Configure passwordless ssh
配置无密码 ssh
cat /.ssh/id_rsa.pub>>/.ssh/authorized_keys
Ssh/id _ rsa.Pub>>/.ssh/授权 _ 密钥
e. Change the permission of file that contains the key
E.更改包含密钥的文件的权限
chmod 0600 ~/.ssh/authorized_keys
Chmod 0600 ~/.ssh/授权 _ 密钥
f. ****check ssh to the localhost
F.****检查本地主机的 ssh
ssh localhost
Ssh 本地主机
2.3. Install Hadoop
2.3.安装 Hadoop
a. Download Hadoop
A.下载 Hadoop
http://redrockdigimark.com/apachemirror/hadoop/common/hadoop-3.0.0-alpha2/hadoop-3.0.0-alpha2.tar.gz
(Download the latest version of Hadoop hadoop-3.0.0-alpha2.tar.gz)
(下载最新版 Hadoop hadoop-3.0.0-alpha2.tar.gz)
b. Untar Tarball
Untar Tarball
tar -xzf hadoop-3.0.0-alpha2.tar.gz
Tar-xzf hadoop-3.0.0-alpha2.tar.gz
2.4. Hadoop Setup Configuration
2.4.Hadoop 设置配置
a. Edit .Bashrc
Open .bashrc
A.编辑.Bashrc
打开.Bashrc
nano ~/.bashrc
Edit .bashrc:
编辑.Bashrc:
Edit .bashrc file is located in user’s home directory and adds following parameters:
编辑.Bashrc 文件位于用户的主目录中,并添加了以下参数:
export HADOOP_PREFIX="/home/dataflair/hadoop-3.0.0-alpha2"
export PATH=HADOOP_PREFIX/bin
export PATH=HADOOP_PREFIX/sbin
export HADOOP_MAPRED_HOME={HADOOP_PREFIX}
export HADOOP_HDFS_HOME={HADOOP_PREFIX}
导出 hadoop _ 前缀 = "/home/dataflair/hadoop-3.0.0-alpha2"
导出路径 = hadoop _ 前缀/bin
导出路径 = HADOOP_PREFIX/sbin
出口 HADOOP_MAPRED_HOME ={ HADOOP_PREFIX}
出口 HADOOP_HDFS_HOME ={ HADOOP_PREFIX}
Then run
然后跑
Source ~/.bashrc
源 ~/.bashrc
b. Edit hadoop-env.sh
编辑 hadoop-env.sh
Edit configuration file hadoop-env.sh (located in HADOOP_HOME/etc/hadoop) and set JAVA_HOME:
编辑配置文件 hadoop-env.sh (位于 hadoop _ home/etc/hadoop 中) 并设置 java _ home:
export JAVA_HOME=/usr/lib/jvm/java-8-oracle/
导出 java _ home =/usr/lib/jvm/java-8-oracle/
c. Edit core-site.xml
编辑 core-site.xml
Edit configuration file core-site.xml (located in HADOOP_HOME/etc/hadoop) and add following entries:
编辑配置文件 core-site.xml (位于 hadoop _ home/etc/hadoop 中) 并添加以下条目:
fs.defaultFS
hdfs://localhost:9000
hadoop.tmp.dir
/home/dataflair/hdata
<配置>
<物业>
fs.defaultFS
Local://本地主机: 9000
物业>
<物业>
hadoop.tmp.dir
/home/dataflair/hdata
物业>
配置>
d. ****Edit hdfs-site.xml
D.****编辑 hdfs-site.xml
Edit configuration file hdfs-site.xml (located in HADOOP_HOME/etc/hadoop) and add following entries:
编辑配置文件 hdfs-site.xml (位于 hadoop _ home/etc/hadoop 中) 并添加以下条目:
dfs.replication
1
<配置>
<物业>
Dfs.复制
<值> 1 值>
物业>
配置>
e. Edit mapred-site.xml
E.编辑 mapred-site.xml
If mapred-site.xml file is not available, then use
如果 mapred-site.xml 文件不可用,则使用
cp mapred-site.xml.template mapred-site.xml
Mapred-site.xml.template cp mapred-site.xml
Edit configuration file mapred-site.xml (located in HADOOP_HOME/etc/hadoop) and add following entries:
编辑配置文件 mapred-site.xml (位于 hadoop _ home/etc/hadoop 中) 并添加以下条目:
mapreduce.framework.name
yarn
<配置>
<物业>
mapreduce.framework.name
纱线
物业>
配置>
f. Yarn-site.xml
F.Yarn-site.xml
Edit configuration file mapred-site.xml (located in HADOOP_HOME/etc/hadoop) and add following entries:
编辑配置文件 mapred-site.xml (位于 hadoop _ home/etc/hadoop 中) 并添加以下条目:
yarn.nodemanager.aux-services
mapreduce_shuffle
yarn.nodemanager.aux-services.mapreduce.shuffle.class
org.apache.hadoop.mapred.ShuffleHandler
<配置>
<物业>
Aux.nodemanager.aux-服务
<<价值> mapreduce_shuffle/超值>
物业>
<物业>
Yarn.nodemanager.Aux-services.mapreduce.shuffle.class
org.apache.hadoop.mapred.ShuffleHandler
物业>
配置>
Test your Hadoop knowledge with this Big data Hadoop quiz.
测试你的 Hadoop 知识Hadoop 大数据问答.
2.5. How to Start the Hadoop services
2.5.Hadoop 服务怎么启动:
Let us now see how to start the Hadoop cluster:
现在让我们来看看如何启动 Hadoop 集群:
The first step to starting up your Hadoop installation is formatting the Hadoop filesystem which is implemented on top of the local filesystem of your “cluster”. This is done as follows:
启动 Hadoop 安装的第一步是格式化 Hadoop 文件系统,该文件系统是在 “集群” 的本地文件系统之上实现的.具体做法如下:
a. Format the namenode
A. 格式名称节点
bin/hdfs namenode -format
Bin/hdfs 名称节点格式
NOTE: This activity should be done once when you install Hadoop and not for running Hadoop filesystem, else it will delete all your data from HDFS
注意: 当你安装 Hadoop 而不是运行 Hadoop 文件系统时,这个活动应该完成一次,否则它会从 HDFS 中删除你所有的数据
b. ****Start HDFS Services
B.****启动 HDFS 服务
sbin/start-dfs.sh
/Sbin/start-dfs.sh
It will give an error at the time of start HDFS services then use:
它会在开始时出错HDFS然后使用服务:
echo "ssh" | sudo tee /etc/pdsh/rcmd_default
Echo “ssh” | sudo tee/etc/pdsh/rcmd _ default
c. Start YARN Services
开始纱线服务
sbin/start-yarn.sh
/Sbin/start-yarn.sh
d. ****Check how many daemons are running
D.****检查运行了多少守护进程
Let us now see whether expected Hadoop processes are running or not:
现在让我们看看预期的 Hadoop 进程是否正在运行:
jps
2961 ResourceManager
2482 DataNode
3077 NodeManager
2366 NameNode
2686 SecondaryNameNode
3199 Jps
Jps
2961 资源管理器
2482 DataNode
3077 节点管理器
2366 南德
2686 秒
3199 Jps
Learn How to install Cloudera Hadoop CDH5 on ubuntu from this installation guide.
了解如何在 ubuntu 上安装 Cloudera Hadoop CDH5此安装指南.
2.6. How to Stop the Hadoop services
2.6.Hadoop 服务怎么停止:
Let us learn how to stop Hadoop services now:
现在让我们来了解一下如何停止 Hadoop 服务:
a. Stop YARN services
停止纱线服务
sbin/stop-yarn.sh
/Sbin/stop-yarn.sh
b. ****Stop HDFS services
B.****停止 HDFS 服务
sbin/stop-dfs.sh
/Sbin/stop-dfs.sh
Note:
注:
Browse the web interface for the NameNode; by default, it is available at:
浏览 NameNode 的 web 界面; 默认情况下,它在以下位置可用:
NameNode – http://localhost:9870/
Http://本地主机: 9870/
Browse the web interface for the ResourceManager; by default, it is available at:
浏览资源管理器的 web 界面; 默认情况下,它在以下位置可用:
ResourceManager – http://localhost:8088/
//本地主机: 8088/
Run a MapReduce job
运行 MapReduce 作业
We are all ready to start our first Hadoop MapReduce job through Hadoop word count example.
我们已经准备好开始我们的第一个 HadoopMapReduce通过 Hadoop 字数统计示例进行作业.
Learn MapReduce job optimization and performance tuning techniques.
学习MapReduce 作业优化与性能调优技术.
Also see:
另见:
Hadoop 2.6 Multi Node Cluster Setup and Hadoop Installation
How Hadoop Works Internally – Inside Hadoop
Hadoop 2.6 多节点集群设置及 Hadoop 安装
Hadoop 内部如何工作-在 Hadoop 内部
Reference
参考
https://data-flair.training/blogs/installation-of-hadoop-3-x-on-ubuntu