018 Install Hadoop 2 on Ubuntu 16.0.4 | Apache Hadoop Installation
1. Install Hadoop 2 on Ubuntu 16.0.4: Objective
1. 安装 Hadoop 月在 Ubuntu 16.0.4: 目标
This document describes how to install Hadoop 2 Ubuntu 16.0.4 OS. Single machine Hadoop cluster is also called as Hadoop Pseudo-Distributed Mode. The steps and procedure given in this document to install Hadoop 2 on Ubuntu 16.0.4 and to install Hadoop cluster are very simple and to the point, so that you can install Hadoop very easily on Ubuntu 16.0.4 and within some minutes of time. Once the installation is done you can play with Hadoop and its components like **MapReduce **for data processing and HDFS for data storage.
本文档描述了如何安装 Hadoop 2 Ubuntu 16.0.4 操作系统.单机 Hadoop 集群也称为 Hadoop 伪分布式模式.本文档中给出的在 Ubuntu 16.0.4 上安装 Hadoop 2 和安装 Hadoop 集群的步骤和过程非常简单, 因此,您可以在 Ubuntu 16.0.4 上,在几分钟内轻松安装 Hadoop.安装完成后,您可以使用 Hadoop 及其组件,如MapReduce 用于数据处理和HDFS用于数据存储.
Install Hadoop 2 on Ubuntu 16.0.4 | Apache Hadoop Installation
2. Steps to Install Hadoop 2 on Ubuntu 16.0.4
2.1 Recommended Platform to install Hadoop 2
2.1 安装 Hadoop 2 的推荐平台
I. Platform Requirements
一、平台要求
Operating system: Ubuntu 16.04 or later, other Linux flavors like CentOS, Redhat, etc.
Hadoop: Cloudera Distribution for Apache Hadoop CDH5.x (you can use Apache Hadoop 2.x)
操作系统: Ubuntu 16.04 或更高版本,其他 Linux 版本,如 CentOS 、 Redhat 等.
Hadoop: Apache Hadoop CDH5.x 的 Cloudera 发行版 (您可以使用 Apache Hadoop 2.X)
II. Configure & Setup Platform
二、配置和设置平台
If you are using Windows/Mac OS you can create virtual machine and install Ubuntu using VMWare Player, alternatively, you can create virtual machine and install Ubuntu using Oracle Virtual Box
如果您使用的是 Windows/Mac 操作系统,您可以使用 VMWare Player 创建虚拟机并安装 Ubuntu或者,你可以使用 Oracle virtual Box 创建虚拟机并安装 Ubuntu
2.2. Prerequisites to install Hadoop 2 on Ubuntu
2.2.在 Ubuntu 上安装 Hadoop 2 的先决条件
Following are the important steps you need to follow before you install Hadoop 2 on Ubuntu:
以下是在 Ubuntu 上安装 Hadoop 2 之前需要遵循的重要步骤:
I. Install Java 8
一、安装 Java 8
a. Install Python Software Properties
To add the java repositories we need to download python-software-properties. To download and install python software properties run below command in terminal:
A.安装 Python 软件属性
要添加 java 存储库,我们需要下载 python 软件属性.要下载并安装 python 软件属性,请在终端中运行以下命令:
sudo apt-get install python-software-properties
安装 python-软件-属性
NOTE: After you press “Enter”. It will ask for your password since we are using “sudo” command to provide root privileges for the installation. For any installation or configuration, we always need root privileges.
注意: 按 “Enter” 后.因为我们使用 “sudo” 命令来提供安装的 root 权限,所以它会询问你的密码.对于任何安装或配置,我们总是需要 root 权限.
b. Add Repository
Now we will add a repository manually from where Ubuntu will install the Java. To add repository type the below command in terminal:
B.添加存储库
现在,我们将从 Ubuntu 安装 Java 的地方手动添加一个存储库.要在终端中添加存储库,请键入以下命令:
sudo add-apt-repository ppa:webupd8team/java
Sudo add-apt-repository ppa: webupd8team/java
Now it will ask you to Press [Enter] to continue. Press “Enter”.
现在,它会要求你按 [Enter] 继续.按 “Enter” 键.
c. Update the source list
It is recommended to update the source list periodically. If you want to update, install a new package, always update the source list. The source list is a location from where Ubuntu can download and install the software. To update source list type the below command in terminal:
C.更新源列表
建议定期更新源列表.如果要更新,请安装新的软件包,始终更新源列表.源列表是 Ubuntu 可以下载和安装软件的位置.要更新源列表,请在终端中键入以下命令:
sudo apt-get update
Sudo apt-获取更新
When you run the above command Ubuntu updates its source list.
当你运行上面的命令时,Ubuntu 会更新它的源列表.
d. Install Java
Now we will download and install the Java. To download and install Java type the below command in terminal:
安装 Java.
现在,我们将下载并安装 Java.要下载并安装 Java,请在终端中键入以下命令:
sudo apt-get install oracle-java8-installer
Sudo apt-get 安装 oracle-java8-installer
When you will press enter it will start downloading and installing Java.
当你按下 enter 键时,它将开始下载和安装 Java.
To confirm Java installation has successfully completed or not and to check the version of your Java type the below command in terminal:
要确认 Java 安装是否成功完成,并检查 Java 版本,请在终端中键入以下命令:
java –version
Java 版本
II. Configure SSH
二、配置 SSH
SSH means secured shell which is used for the remote login. We can login to a remote machine using SSH. Now we need to configure password less SSH. Password-less SSH means without a password we can login to a remote machine. Password-less SSH setup is required for remote script invocation. Automatically remotely master will start the demons on slaves.
SSH 是指用于远程登录的安全 shell.我们可以使用 SSH 登录远程机器.现在我们需要少配置密码 SSH.无密码 SSH 意味着没有密码,我们可以登录远程机器.远程脚本调用需要无密码的 SSH 设置.远程自动主将启动奴隶的恶魔.
a. Install Open SSH Server-Client
These are the SSH tools.
A.安装打开的 SSH 服务器客户端
这些是 SSH 工具.
sudo apt-get install openssh-server openssh-client
更新源安装 openssh openssh 服务器-客户端
b. Generate Key Pairs
ssh-keygen -t rsa -P ""
生成密钥对
Ssh-keygen-t rsa-P"
It will ask “Enter the name of file in which to save the key (/home/dataflair/.ssh/id_rsa):” let it be the default, don’t specify any path just press “Enter”. Now it will be available in the default path i.e. “.ssh”. To check the default path use command “$ls .ssh/” and you will see that two files are created “id_rsa” which is a private key and “id_rsa.pub” which is a public key.
它会要求 “输入保存密钥的文件名 (/home/dataflair/. ssh/id _ rsa): “默认设置,不要指定任何路径,只需按“ Enter ”.现在,它将在默认路径 (即 “.ssh”) 中可用.使用命令 “$ ls” 检查默认路径.Ssh/”,您将看到创建了两个文件“ id _ rsa ”,这是私钥,“ id _ rsa.pub ”是公钥.
c. Configure passwordless SSH
We will copy the contents of “id_rsa.pub” into the “authorized_keys” file by using below command:
配置无密码 SSH
我们将使用以下命令将 “id _ rsa.pub” 的内容复制到 “授权密钥” 文件中:
cat $HOME/.ssh/id_rsa.pub>>$HOME/.ssh/authorized_keys
Cat $ HOME/.ssh/id _ rsa.Pub> $ HOME/.ssh/授权 _ keys
d. Check by SSH to localhost
ssh localhost
通过 SSH 检查到本地主机
Ssh 本地主机
It will not ask for any password and you can easily get logged into localhost since we have configured passwordless SSH.
它不会要求任何密码,因为我们已经配置了无密码 SSH,所以你可以很容易地登录到本地主机.
2.3. Install Hadoop 2 on Ubuntu 16.0.4
2.3.在 Ubuntu 16.0.4 上安装 Hadoop 2
I. Download Hadoop
一、下载 Hadoop
Download Hadoop from the below link:
http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.5.0-cdh5.3.2.tar.gz
从以下链接下载 Hadoop:
Http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.5.0-cdh5.3.2.tar.gz
After downloading Hadoop just copy it to your desktop and from desktop move it to your home directory by using the following command:
下载 Hadoop 后,只需使用以下命令将其复制到桌面并从桌面移动到您的主目录:
mv Desktop/hadoop-2.5.0-cdh5.3.2.tar.gz /home/dataflair/
桌面/hadoop-2.5.0-cdh5.3.2.tar.gz/家庭/数据仓库/
Note: /home/dataflair/ is my home directory path.
To know the path of your home directory use command: $pwd
Copy this path to the above command and hence the setup file will get moved to your home directory.
注意:/home/dataflair/是我的主目录路径.
使用命令: $ pwd 了解您的主目录的路径
将此路径复制到上述命令,因此安装文件将被移动到您的主目录.
II. Untar Tarball
II.Untar Tarball
tar xzf hadoop-2.5.0-cdh5.3.2.tar.gz
Note: All the necessary files like jars, scripts, configuration files, and so on are already available in HADOOP_HOME directory (hadoop-2.5.0-cdh5.3.2).
Tar xzf hadoop-2.5.0-cdh5.3.2.tar.gz
注意: HADOOP_HOME 目录 (hadoop-2.5.0-cdh5.3.2) 中已经提供了所有必要的文件,如 jar 、脚本、配置文件等.
III. Setup Configuration to Install Hadoop 2 on Ubuntu
III.在 Ubuntu 上安装 Hadoop 2 的安装配置
a. Edit configuration .bashrc file
Edit “.bashrc” file which is present in your home directory. You can identify and edit this file in your home directory by using the following command: $ nano -/.bashrc”. Now write the below text at the end of this file:
A.编辑配置.Bashrc 文件
编辑 home 目录中存在的 “.bashrc” 文件.您可以使用以下命令在您的主目录中识别和编辑此文件: $ nano-/.bashrc ”.现在在这个文件的末尾写下下面的文本:
export HADOOP_PREFIX="/home/dataflair/hadoop-2.5.0-cdh5.3.2"
export PATH=HADOOP_PREFIX/bin
export PATH=HADOOP_PREFIX/sbin
export HADOOP_MAPRED_HOME={HADOOP_PREFIX}
export HADOOP_HDFS_HOME={HADOOP_PREFIX}
导出 hadoop _ 前缀 = "/home/dataflair/hadoop-2.5.0-cdh5.3.2"
导出路径 = hadoop _ 前缀/bin
导出路径 = HADOOP_PREFIX/sbin
出口 HADOOP_MAPRED_HOME ={ HADOOP_PREFIX}
出口 HADOOP_HDFS_HOME ={ HADOOP_PREFIX}
Note: Make sure that you enter the correct path. “/home/dataflair/hadoop-2.5.0-cdh5.3.2” this is my home directory path. To know the path of your home directory use command: $pwd
注意: 请确保输入正确的路径.“/home/dataflair/hadoop-2.5.0-cdh5.3.2” 这是我的主目录路径.使用命令: $ pwd 了解您的主目录的路径
After adding the above parameters we need to save this file. To save this file Press “Ctrl+X”.
Note: After above step restart the terminal, in order to make all the environment variables will start running
添加上述参数后,我们需要保存此文件.要保存此文件,请按 “Ctrl + X”.
注意: 以上步骤重启终端后,为了使所有的环境变量都开始运行
b. Edit hadoop-env.sh file
编辑 hadoop-env.sh 文件
Edit configuration file “hadoop-env.sh” located in configuration directory (HADOOP_HOME/etc/hadoop) and set JAVA_HOME:
编辑位于配置目录 (hadoop _ home/etc/hadoop) 中的配置文件 “hadoop-env.sh”,并设置 java _ home:
dataflair@ubuntu:~ cd etc/hadoop
dataflair@ubuntu:/hadoop-2.5.0-cdh5.3.2/etc/hadoop$ nano hadoop-env.sh
Ubuntu lair @ ubuntu: ~ cd 等/hadoop
Dataflair @ ubuntu:/hadoop-2.5.0-cdh5.3.2/etc/hadoop $ nano hadoop-env.sh
In this file set JAVA_HOME as:
在这个文件中,将 java _ home 设置为:
export JAVA_HOME=
After adding the above parameters we need to save this file. To save this file press “Ctrl+X”.
Note: “/usr/lib/jvm/java-8-oracle/” is default Java path. If you had changed your java path then enter your java path here.
导出 Java _ home =
添加上述参数后,我们需要保存此文件.要保存此文件,请按 “Ctrl + X”.
注意: 默认 java 路径为 “/usr/lib/jvm/Java-8-oracle/”.如果你已经改变了你的 java 路径,那么在这里输入你的 java 路径.
c. Edit core-site.xml file
Edit configuration file core-site.xml (located in HADOOP_HOME/etc/hadoop) by using the following command:
编辑 core-site.xml 文件
使用以下命令编辑配置文件 core-site.xml (位于 hadoop _ home/etc/hadoop 中):
dataflair@ubuntu:/hadoop-2.5.0-cdh5.3.2/etc/hadoop$ nano core-site.xml
Dataflair @ ubuntu:/hadoop-2.5.0-cdh5.3.2/etc/hadoop $ nano core-site.xml
And add below entries between
并在此文件末尾的
fs.defaultFS
hdfs://localhost:9000
hadoop.tmp.dir
/home/dataflair/hdata
<物业>
fs.defaultFS
Local://本地主机: 9000
物业>
<物业>
hadoop.tmp.dir
/home/dataflair/hdata
物业>
*Note: *“/home/dataflair/hdata” is my location; please insert the location where you have Read and Write privileges.
After adding the above parameters we need to save this file. To save this file press “Ctrl+X”.
注:“/Home/dataflair/hdata” 是我的位置; 请插入您有读写权限的位置.
添加上述参数后,我们需要保存此文件.要保存此文件,请按 “Ctrl + X”.
d. Edit hdfs-site.xml file
Edit configuration file hdfs-site.xml (located in HADOOP_HOME/etc/hadoop) by using the following command:
编辑 hdfs-site.xml 文件
使用以下命令编辑配置文件 hdfs-site.xml (位于 hadoop _ home/etc/hadoop 中):
dataflair@ubuntu:/hadoop-2.5.0-cdh5.3.2/etc/hadoop$ nano hdfs-site.xml
Dataflair @ ubuntu:/hadoop-2.5.0-cdh5.3.2/etc/hadoop $ nano hdfs-site.xml
And add below entries between
并在此文件末尾的
dfs.replication
1
<物业>
Dfs.复制
<值> 1 值>
物业>
After adding the above parameters we need to save this file. To save this file press “Ctrl+X”.
添加上述参数后,我们需要保存此文件.要保存此文件,请按 “Ctrl + X”.
e. Edit mapred-site.xml file
Edit configuration file mapred-site.xml (located in HADOOP_HOME/etc/hadoop) by using the following command:
*Note: There is no such file present in your home directory as mapred-site.xml. There is a template file available as mapred-site.xml.template. So to edit file mapred-site.xml you have to first create a copy of file mapred-site.xml.template. *
To make a copy of this file use the following command:
编辑 mapred-site.xml 文件
使用以下命令编辑配置文件 mapred-site.xml (位于 hadoop _ home/etc/hadoop 中):
注意: 您的主目录中没有 mapred-site.xml 这样的文件.Mapred-site.xml.template 有一个模板文件.因此,要编辑文件 mapred-site.xml,必须首先创建文件 mapred-site.xml.template 副本.
要复制此文件,请使用以下命令:
dataflair@ubuntu:/hadoop-2.5.0-cdh5.3.2/etc/hadoop$ cp mapred-site.xml.template mapred-site.xml
Ubuntu lair @ ubuntu:/hadoop-2.5.0-cdh5.3.2/etc/hadoop $ cp mapred-site.xml.template 的 mapred-site.xml
Now edit the file mapred-site.xml by using the following command:
现在使用以下命令编辑文件 mapred-site.xml:
dataflair@ubuntu:/hadoop-2.5.0-cdh5.3.2/etc/hadoop$ nano mapred-site.xml
Dataflair @ ubuntu:/hadoop-2.5.0-cdh5.3.2/etc/hadoop $ nano mapred-site.xml
Now add below entries between at the end of this file:
现在在此文件末尾之间添加以下条目:
mapreduce.framework.name
yarn
<物业>
mapreduce.framework.name
纱线
物业>
f. Edit yarn-site.xml file
Edit configuration file yarn-site.xml (located in HADOOP_HOME/etc/hadoop) by using the following command:
dataflair@ubuntu:/hadoop-2.5.0-cdh5.3.2/etc/hadoop$ nano yarn-site.xml
And add below entries between at the end of this file:
编辑 yarn-site.xml 文件
编辑配置文件纱线-通过使用以下命令,site.xml (位于 hadoop _ home/etc/hadoop 中):
Dataflair @ ubuntu:/hadoop-2.5.0-cdh5.3.2/etc/hadoop $ nano yarn-site.xml
并在此文件末尾之间添加以下条目:
yarn.nodemanager.aux-services
mapreduce_shuffle
yarn.nodemanager.aux-services.mapreduce.shuffle.class
org.apache.hadoop.mapred.ShuffleHandler
<物业>
Aux.nodemanager.aux-服务
<<价值> mapreduce_shuffle/超值>
物业>
<物业>
Yarn.nodemanager.Aux-services.mapreduce.shuffle.class
org.apache.hadoop.mapred.ShuffleHandler
物业>
IV. Start the Cluster
四、启动集群
a. Format the name node
dataflair@ubuntu:~$ hdfs namenode -format
NOTE: Do this activity only once when you had successfully installed Hadoop, else it will delete all your data from HDFS.
A.设置名称节点的格式
Dataflair @ ubuntu: ~ $ hdfs 、复制指令格式
注意: 当您成功安装 Hadoop 时,只执行一次此活动,否则它将从HDFS.
b. Start HDFS Services
dataflair@ubuntu:~$ start-dfs.sh
B.启动 HDFS 服务
Dataflair @ ubuntu: ~ $ start-dfs.sh
c. Start YARN Services
dataflair@ubuntu:~$ start-yarn.sh
开始纱线服务
Dataflair @ ubuntu: ~ $ start-yarn.sh
d. Check running Hadoop services
检查运行 Hadoop 服务
dataflair@ubuntu:~$ jps
NameNode
DataNode
ResourceManager
NodeManager
Dataflair @ ubuntu: ~ $ 联购处
南德
DataNode
资源管理器
NodeManager
See Also-
另见-
Install & Configure Apache Hadoop 2.7.x on Ubuntu
Install Hadoop 1.x on multi-node cluster
Top 10 Useful Hdfs Commands Part-I
Comparison between Hadoop 2.x vs Hadoop 3.x
在 Ubuntu 上安装和配置 Apache Hadoop 2.7.X
在多节点集群上安装 Hadoop 1.X
Part 命令第一部分十大有用命令
Hadoop 2.X 与 Hadoop 3.X 的比较
Hope the tutorial on Install Hadoop 2 on Ubuntu was helpful. For any difficulties while you install Hadoop 2 on Ubuntu just drop a comment and our support team will help you out.
希望关于在 Ubuntu 上安装 Hadoop 2 的教程有所帮助.当你在 Ubuntu 上安装 Hadoop 2 时,如果有任何困难,只需发表评论,我们的支持团队会帮助你解决.
https://data-flair.training/blogs/install-hadoop-2-on-ubuntu