021 单节点集群上 Ubuntu 安装 Hadoop 3.X

021 Installation of Hadoop 3.x on Ubuntu on Single Node Cluster

1. Objective

1. 目标

In this tutorial on Installation of Hadoop 3.x on Ubuntu, we are going to learn steps for setting up a pseudo-distributed, single-node Hadoop 3.x cluster on Ubuntu. We will learn steps like how to install java, how to install SSH and configure passwordless SSH, how to download Hadoop, how to setup Hadoop configurations like .bashrc file, hadoop-env.sh, core-site.xml, hdfs-site.xml, mapred-site.xml, YARN-site.xml, how to start the Hadoop cluster and how to stop the Hadoop services.

在本教程中，安装Hadoop3. 个 Ubuntu 的，我们要学习步骤建立伪分布式，单-Hadoop 3. 个集群节点在 Ubuntu.我们将学习如何安装 java 、如何安装 SSH 和配置无密码 SSH 、如何下载 Hadoop 、如何设置 Hadoop 配置等步骤.如何启动 Hadoop 集群以及如何停止 Hadoop 服务，bashrc 文件、 hadoop-env.sh 、 core-site.xml 、 hdfs-site.xml 、 mapred-site.xml 、 YARN-site.xml.

Learn step by step installation of Hadoop 2.7.x on Ubuntu.

在 Ubuntu 上了解 Hadoop 2.7.X 的分步安装.

Installation of Hadoop 3.x on Ubuntu on Single Node Cluster

2. Installation of Hadoop 3.x on Ubuntu

Hadoop 2. 安装 Ubuntu 的 3.x

Before we start with Hadoop 3.x installation on Ubuntu, let us understand key features that have been added in Hadoop 3 that makes the comparison between Hadoop 2 and Hadoop 3.

在我们开始在 Ubuntu 上安装 Hadoop 3.X 之前，让我们了解 Hadoop 3 中添加的关键功能，这些功能使得Hadoop 2 和 3. 的比较.

2.1. Java 8 installation

2.1.安装 Java 8

Hadoop requires working java installation. Let us start with steps for installing java 8:

Hadoop 需要安装 java.让我们从安装 java 8 的步骤开始:

a. Install Python Software Properties

A.安装 Python 软件属性

sudo apt-get install python-software-properties

安装 python-软件-属性

b. Add Repository

B.添加存储库

sudo add-apt-repository ppa:webupd8team/java

Sudo add-apt-repository ppa: webupd8team/java

c. Update the source list

C.更新源列表

sudo apt-get update

Sudo apt-获取更新

d. Install Java 8

安装 Java 8

sudo apt-get install oracle-java8-installer

Sudo apt-get 安装 oracle-java8-installer

e. Check if java is correctly installed

检查 java 是否安装正确

java -version

Java 版本

2.2. Configure SSH

2.2.配置 SSH

SSH is used for remote login. SSH is required in Hadoop to manage its nodes, i.e. remote machines and local machine if you want to use Hadoop on it. Let us now see SSH installation of Hadoop 3.x on Ubuntu:

远程登录使用 SSH.Hadoop 中需要 SSH 来管理其节点，即如果要在其上使用 Hadoop，则需要远程机器和本地机器.现在让我们在 Ubuntu 上看到 Hadoop 3.X 的 SSH 安装:

a. Installation of passwordless SSH

答: 无密码 SSH 的安装

sudo apt-get install ssh
 sudo apt-get install pdsh

安装 ssh sudo apt
 Sudo apt-获取安装 pdsh

b. Generate Key Pairs

生成密钥对

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa

Ssh-密钥生成-t rsa-P '-f ~/.ssh/id _ rsa

c. Configure passwordless ssh

配置无密码 ssh

cat _{/.ssh/id_rsa.pub>>}/.ssh/authorized_keys

Ssh/id _ rsa.Pub>>/.ssh/授权 _ 密钥

e. Change the permission of file that contains the key

E.更改包含密钥的文件的权限

chmod 0600 ~/.ssh/authorized_keys

Chmod 0600 ~/.ssh/授权 _ 密钥

f. ****check ssh to the localhost

F.****检查本地主机的 ssh

ssh localhost

Ssh 本地主机

2.3. Install Hadoop

2.3.安装 Hadoop

a. Download Hadoop

A.下载 Hadoop

http://redrockdigimark.com/apachemirror/hadoop/common/hadoop-3.0.0-alpha2/hadoop-3.0.0-alpha2.tar.gz

(Download the latest version of Hadoop hadoop-3.0.0-alpha2.tar.gz)

(下载最新版 Hadoop hadoop-3.0.0-alpha2.tar.gz)

b. Untar Tarball

Untar Tarball

tar -xzf hadoop-3.0.0-alpha2.tar.gz

Tar-xzf hadoop-3.0.0-alpha2.tar.gz

2.4. Hadoop Setup Configuration

2.4.Hadoop 设置配置

a. Edit .Bashrc
Open .bashrc

A.编辑.Bashrc
打开.Bashrc

nano ~/.bashrc

Edit .bashrc:

编辑.Bashrc:

Edit .bashrc file is located in user’s home directory and adds following parameters:

编辑.Bashrc 文件位于用户的主目录中，并添加了以下参数:

export HADOOP_PREFIX="/home/dataflair/hadoop-3.0.0-alpha2"
 export PATH=HADOOP_PREFIX/bin
 export PATH=HADOOP_PREFIX/sbin
 export HADOOP_MAPRED_HOME={HADOOP_PREFIX}
 export HADOOP_HDFS_HOME={HADOOP_PREFIX}

导出 hadoop _ 前缀 = "/home/dataflair/hadoop-3.0.0-alpha2"
 导出路径 =  hadoop _ 前缀/bin
 导出路径 =  HADOOP_PREFIX/sbin
 出口 HADOOP_MAPRED_HOME ={ HADOOP_PREFIX}
 出口 HADOOP_HDFS_HOME ={ HADOOP_PREFIX}

Then run

然后跑

Source ~/.bashrc

源 ~/.bashrc

b. Edit hadoop-env.sh

编辑 hadoop-env.sh

Edit configuration file hadoop-env.sh (located in HADOOP_HOME/etc/hadoop) and set JAVA_HOME:

编辑配置文件 hadoop-env.sh (位于 hadoop _ home/etc/hadoop 中) 并设置 java _ home:

export JAVA_HOME=/usr/lib/jvm/java-8-oracle/

导出 java _ home =/usr/lib/jvm/java-8-oracle/

c. Edit core-site.xml

编辑 core-site.xml

Edit configuration file core-site.xml (located in HADOOP_HOME/etc/hadoop) and add following entries:

编辑配置文件 core-site.xml (位于 hadoop _ home/etc/hadoop 中) 并添加以下条目:


 
 fs.defaultFS
 hdfs://localhost:9000
 
 
 hadoop.tmp.dir
 /home/dataflair/hdata

<配置>
 <物业>
  fs.defaultFS 
 Local://本地主机: 9000 
 
 <物业>
  hadoop.tmp.dir 
 /home/dataflair/hdata

d. ****Edit hdfs-site.xml

D.****编辑 hdfs-site.xml

Edit configuration file hdfs-site.xml (located in HADOOP_HOME/etc/hadoop) and add following entries:

编辑配置文件 hdfs-site.xml (位于 hadoop _ home/etc/hadoop 中) 并添加以下条目:


 
 dfs.replication
 1

<配置>
 <物业>
 Dfs.复制 
 <值> 1

e. Edit mapred-site.xml

E.编辑 mapred-site.xml

If mapred-site.xml file is not available, then use

如果 mapred-site.xml 文件不可用，则使用

cp mapred-site.xml.template mapred-site.xml

Mapred-site.xml.template cp mapred-site.xml

Edit configuration file mapred-site.xml (located in HADOOP_HOME/etc/hadoop) and add following entries:

编辑配置文件 mapred-site.xml (位于 hadoop _ home/etc/hadoop 中) 并添加以下条目:


 
 mapreduce.framework.name
 yarn

<配置>
 <物业>
  mapreduce.framework.name 
 纱线

f. Yarn-site.xml

F.Yarn-site.xml

Edit configuration file mapred-site.xml (located in HADOOP_HOME/etc/hadoop) and add following entries:

编辑配置文件 mapred-site.xml (位于 hadoop _ home/etc/hadoop 中) 并添加以下条目:


 
 yarn.nodemanager.aux-services
 mapreduce_shuffle
 
 
 yarn.nodemanager.aux-services.mapreduce.shuffle.class
 org.apache.hadoop.mapred.ShuffleHandler

<配置>
 <物业>
 Aux.nodemanager.aux-服务 
 <<价值> mapreduce_shuffle/超值>
 
 <物业>
 Yarn.nodemanager.Aux-services.mapreduce.shuffle.class 
  org.apache.hadoop.mapred.ShuffleHandler

Test your Hadoop knowledge with this Big data Hadoop quiz.

测试你的 Hadoop 知识Hadoop 大数据问答.

2.5. How to Start the Hadoop services

2.5.Hadoop 服务怎么启动:

Let us now see how to start the Hadoop cluster:

现在让我们来看看如何启动 Hadoop 集群:

The first step to starting up your Hadoop installation is formatting the Hadoop filesystem which is implemented on top of the local filesystem of your “cluster”. This is done as follows:

启动 Hadoop 安装的第一步是格式化 Hadoop 文件系统，该文件系统是在 “集群” 的本地文件系统之上实现的.具体做法如下:

a. Format the namenode

A. 格式名称节点

bin/hdfs namenode -format

Bin/hdfs 名称节点格式

NOTE: This activity should be done once when you install Hadoop and not for running Hadoop filesystem, else it will delete all your data from HDFS

注意: 当你安装 Hadoop 而不是运行 Hadoop 文件系统时，这个活动应该完成一次，否则它会从 HDFS 中删除你所有的数据

b. ****Start HDFS Services

B.****启动 HDFS 服务

sbin/start-dfs.sh

/Sbin/start-dfs.sh

It will give an error at the time of start HDFS services then use:

它会在开始时出错HDFS然后使用服务:

echo "ssh" | sudo tee /etc/pdsh/rcmd_default

Echo “ssh” | sudo tee/etc/pdsh/rcmd _ default

c. Start YARN Services

开始纱线服务

sbin/start-yarn.sh

/Sbin/start-yarn.sh

d. ****Check how many daemons are running

D.****检查运行了多少守护进程

Let us now see whether expected Hadoop processes are running or not:

现在让我们看看预期的 Hadoop 进程是否正在运行:

jps
 2961 ResourceManager
 2482 DataNode
 3077 NodeManager
 2366 NameNode
 2686 SecondaryNameNode
 3199 Jps

Jps
 2961 资源管理器
 2482 DataNode
 3077 节点管理器
 2366 南德
 2686 秒
 3199 Jps

Learn How to install Cloudera Hadoop CDH5 on ubuntu from this installation guide.

了解如何在 ubuntu 上安装 Cloudera Hadoop CDH5此安装指南.

2.6. How to Stop the Hadoop services

2.6.Hadoop 服务怎么停止:

Let us learn how to stop Hadoop services now:

现在让我们来了解一下如何停止 Hadoop 服务:

a. Stop YARN services

停止纱线服务

sbin/stop-yarn.sh

/Sbin/stop-yarn.sh

b. ****Stop HDFS services

B.****停止 HDFS 服务

sbin/stop-dfs.sh

/Sbin/stop-dfs.sh

Note:

注:

Browse the web interface for the NameNode; by default, it is available at:

浏览 NameNode 的 web 界面; 默认情况下，它在以下位置可用:

NameNode – http://localhost:9870/

Http://本地主机: 9870/

Browse the web interface for the ResourceManager; by default, it is available at:

浏览资源管理器的 web 界面; 默认情况下，它在以下位置可用:

ResourceManager – http://localhost:8088/

//本地主机: 8088/

Run a MapReduce job

运行 MapReduce 作业

We are all ready to start our first Hadoop MapReduce job through Hadoop word count example.

我们已经准备好开始我们的第一个 HadoopMapReduce通过 Hadoop 字数统计示例进行作业.

Learn MapReduce job optimization and performance tuning techniques.

学习MapReduce 作业优化与性能调优技术.

Also see:

另见:

Hadoop 2.6 Multi Node Cluster Setup and Hadoop Installation
How Hadoop Works Internally – Inside Hadoop
Hadoop 2.6 多节点集群设置及 Hadoop 安装
Hadoop 内部如何工作-在 Hadoop 内部

Reference

参考

https://data-flair.training/blogs/installation-of-hadoop-3-x-on-ubuntu