一、在Ubuntu配置java环境变量
我是下载jdk1.8.0_151最新版本的。官方网站:http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
(因为我是Linux64位的,就下载了jdk-8u151-linux-x64.tar.gz)
1.vim /etc/profile
#我的java根目录是/java
export JAVA_HOME=/java/jdk1.8.0_151
export JRE_HOME=${JAVA_HOME}/jre
export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
export PATH=${JAVA_HOME}/bin:$PATH
二、安装ssh-server并实现免密码登录
1.在Ubuntu中下载ssh-server
sudo apt-get install openssh-server
2.启动ssh-server
sudo /etc/init.d/ssh start
出现以下字样
[ ok ] Starting ssh (via systemctl): ssh.service.
查看ssh-server服务是否启动
ps -ef|grep ssh
如果出现下面情况:
root 1073 1 0 13:05 ? 00:00:00 /usr/sbin/sshd -D
root 6799 2245 0 14:02 pts/19 00:00:00 grep --color=auto ssh
说明ssh-server启动成功
3.设置ssh-server免密码登录
使用如下命令,一直回车,直到生成了rsa
ssh-keygen -t rsa
导入authorized_keys:
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
测试可不可以免密码登录了
ssh localhost
如果出现以下情况:
Welcome to Ubuntu 16.04.3 LTS (GNU/Linux 4.10.0-28-generic x86_64)
* Documentation: https://help.ubuntu.com
* Management: https://landscape.canonical.com
* Support: https://ubuntu.com/advantage
packages can be updated.
updates are security updates.
Last login: Sat Oct 28 15:05:51 2017 from 127.0.0.1
关闭防火墙
ufw disable
三、安装hadoop单机模式和伪分布模式
1.下载hadoop-2.7.3.tar.gz,解压到/usr/local(单机模式搭建):
下载网站:http://archive.apache.org/dist/hadoop/core/hadoop-2.7.3/?C=S;O=A
切换到/usr/local下,将hadoop-2.7.3重命名为hadoop:
cd /usr/local
sudo mv hadoop-2.7.3 hadoop
修改/usr/local/hadoop的使用权限:
sudo chmod 777 /usr/local/hadoop
配置.bashrc文件
在文件末尾追加下面内容,然后保存:
#HADOOP VARIABLES START
export HADOOP_INSTALL=/bigdata/hadoop-2.7.3
export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_HOME=$HADOOP_INSTALL
export HADOOP_HDFS_MOME=$HADOOP_INSTALL
export YARE_HOME=$HADOOP_INSTALL
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"
export PATH=$PATH:$HADOOP_INSTALL/sbin
export PATH=$PATH:$HADOOP_INSTALL/bin
#HADOOP VARIABLES END
执行下面命令,使添加的环境变量生效:
source ~/.bashrc
hadoop配置 (伪分布模式搭建)
1.配置hadoop-env.sh:
sudo vim /usr/local/hadoop/etc/hadoop/hadoop-env.sh
在文件加入以下代码:
# the Java implementation to use.
export JAVA_HOME=/java/jdk1.8.0_151
export HADOOP_HOME=/bigdata/hadoop-2.7.3
export PATH=$PATH:/bigdata/hadoop-2.7.3/bin
export HADOOP_OPTS="-Djava.library.path=${HADOOP_HOME}/lib/native"
配置yarn-env.sh:
sudo vim /usr/local/hadoop/etc/hadoop/yarn-env.sh
在文件最后加上以下代码:
# JAVA_HOME=/java/jdk1.8.0_151
export /java/jdk1.8.0_151
配置core-site.xml,在home目录下创建/usr/local/hadoop/tmp目录,然后在core-site.xml中添加下列内容
sudo mkdir /usr/local/hadoop/tmp
sudo vim /usr/local/hadoop/etc/hadoop/core-site.xml
在文件最后加上以下代码:
fs.defaultFS
hdfs://hadoop4:9000
dfs.replication
1
hadoop.tmp.dir
/bigdata/hadoop-2.7.3/tmp
dfs.name.dir
/home/hdfs/name
hadoop.tmp.dir
/home/hadoop3/hadoop_tmp
A base for other temporary directories.
dfs.permissions
false
If "true", enable permission checking in HDFS. If "false", permiss ion checking is turned off,but all other behavior is unchanged. Switching f rom one parameter value to the other does not change the mode, owner or gro up of files or directories
配置hdfs-site.xml:
sudo vim /usr/local/hadoop/etc/hadoop/hdfs-site.xml
在文件最后加上以下代码:
dfs.replication
1
dfs.data.dir
/bigdata/hadoop-2.7.3/data
配置mapred-site.xml:
sudo /usr/local/hadoop/etc/hadoop/mapred-site.xml
在文件最后加上以下代码:
mapreduce.framework.name
yarn
mapred.job.tracker
localhost:9001
~
配置yarn-site.xml:
sudo vim /usr/local/hadoop/etc/hadoop/yarn-site.xml
在文件最后加上以下代码:
yarn.nodemanager.aux-services
mapreduce_shuffle
yarn.nodemanager.aux-services.mapreduce.shuffle.class
org.apache.hadoop.mapred.ShuffleHandler
yarn.resourcemanager.scheduler.address
hadoop4:8030
yarn.resourcemanager.address
hadoop4:8032
yarn.resourcemanager.resource-traker.address
hadoop4:8031
关机重启系统。
sudo reboot
测试Hadoop是否安装并配置成功
验证Hadoop单机模式安装完成:
hadoop version
出现Hadoop版本信息单机模式成功了
例如以下信息
Hadoop 2.7.3
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r baa91f7c6bc9cb92be5982de4719c1c8af91ccff
Compiled by root on 2016-08-18T01:41Z
Compiled with protoc 2.5.0
From source with checksum 2e4ce5f957ea4db193bce3734ff29ff4
This command was run using /usr/local/bigdata/hadoop-2.7.3/share/hadoop/common/hadoop-common-2.7.3.ja
启动hdfs使用为分布模式
格式化namenode:
hdfs namenode -format
有 “……has been successfully formatted” 等字样出现即说明格式化成功
启动hdfs
start-dfs.sh
显示进程
jps
出现以下六个就成功了
ResourceManager
Jps
DataNode
SecondaryNameNode
NameNode
NodeManager
在浏览器中输入http://localhost:50070/进行测试
输入 http://localhost:8088/测试伪分布安装配置是否成功
停止hdfs
stop-all.sh
运行wordcount
启动hdfs
start-all.sh
查看hdfs底下包含的文件目录
hadoop dfs -ls /
如果是第一次运行hdfs,则什么都不会显示
在hdfs中创建一个文件目录input,将/usr/local/hadoop/README.txt上传至input中
hdfs dfs -mkdir /input
hadoop fs -put /usr/local/hadoop/README.txt /input
执行以下命令运行wordcount,并将结果输出到output中
hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount /input /output
执行成功后output 目录底下会生成两个文件 _SUCCESS 成功标志的文件,里面没有内容。 一个是 part-r-00000 ,通过以下命令查看执行的结果
hadoop fs -cat /output/part-r-00000