一、安装及配置java运行环境
1、安装OpenJDK1.8
#安装jdk
#yum install java-1.8.0-openjdk-devel.x86_64
注:这里体验yum软件包管理工具:
(1)查找包(1)
#yum list | grep telnet-server
(2)查找包(2)
#yum search MySQL
(3)安装MySQL(在CentOS7中叫MariaDB)
#yum install mariadb-server.x86_64
(4)安装telnet
#yum install telnet.*
2、配置OpenJDK1.8运行环境
#vi ~/.bashrc (或~/profile或/etc/profile)
export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.191.b12-1.e17_6.x86_64/jre
export CLASSPATH=.:$JAVA_HOME/lib/rt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin
二、安装及配置HADOOP运行环境
1、安装HADOOP
下载(参考curl命令:Linux curl命令详解)Hadoop3.1.1二进制发行包(参考hadoop-3.1.1.tar.gz下载页面),并解压至/home/hadoop/hadoop-3.1.1目录
#curl -O http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-3.1.1/hadoop-3.1.1.tar.gz /home/hadoop/hadoop-3.1.1.tar.gz
#tar -xvf hadoop-3.1.1.tar.gz
2、配置HADOOP(不是必须)
#vi /etc/profile( 或/root/profile或 /root/.bashrc)
export HADOOP_INSTALL=/home/hadoop/hadoop-3.1.1
export PATH=${HADOOP_INSTALL}/bin:${HADOOP_INSTALL}/sbin:${PATH}
export HADOOP_MAPRED_HOME=${HADOOP_INSTALL}
export HADOOP_COMMON_HOME=${HADOOP_INSTALL}
export HADOOP_HDFS_HOME=${HADOOP_INSTALL}
export YARN_HOME=${HADOOP_INSTALL}
export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_INSTALL}/lib/native
export HADOOP_OPTS="-Djava.library.path=${HADOOP_INSTALL}/lib:${HADOOP_INSTALL}/lib/native"
#source /etc/profile
三、通过VirtualBox管理器复制上术首台虚拟机获得其他虚拟机
1、复制过程:略。
2、网络配置:启动各虚拟机后,分别按照规划配置主机名(hn2、hn3)、网络地址(hn2:192.168.56.101)和(hn3:192.168.56.102)。主机名修改文件/etc/hostname文件(略)。配置本地域名解析文件/etc/hosts,增加配置内容如下。
hn1 192.168.56.100
hn2 192.168.56.101
hn3 192.168.56.102
3、网络规划:hn1(192.168.56.100):NameNode, Secondary NameNode, ResourceManager
hn2(192.168.56.101)、hn3(192.168.56.102):DataNode, NodeManager
四、配置SSH环境
hn1中执行如下三条命令:
#ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
#cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
#chmod 0600 ~/.ssh/authorized_keys
从hn1中分别登录hn2、hn3:
[root@hn1]#ssh [email protected] #登录hn2
依次执行hn1中执行的SSH配置指令(如上,略)
注:输入密码后切换至hn2,可通过hostname命令查看当前主机名
[root@hn2]#exit #从hn2中退出远程主机登录
[root@hn1]#ssh [email protected] #登录hn3
依次执行hn1中执行的SSH配置指令(如上,略)
[root@hn3]#exit #从hn3中退出远程主机登录
[root@hn1]#scp [email protected]:/root/.ssh/id_rsa.pub /root/id_rsa_hn2.pub
[root@hn1]#scp [email protected]:/root/.ssh/id_rsa.pub /root/id_rsa_hn3.pub
[root@hn1]#cat /root/id_rsa_hn2.pub >> /root/.ssh/authorized_keys
[root@hn1]#cat /root/id_rsa_hn3.pub >> /root/.ssh/authorized_keys
[root@hn1]#scp /root/.ssh/authorized_keys [email protected]:/root/.ssh/authorized_keys
[root@hn1]#scp /root/.ssh/authorized_keys [email protected]:/root/.ssh/authorized_keys
此时,三台机器可以互相通过SSH远程登录。
五、hadoop文件配置
参考:ClusterSetup
主要配置文件在${HADOOP_HOME}/etc/hadoop目录下的core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml和workers。
core-site.xml
fs.defaultFS
hdfs://hn1:9000
Hadoop.tmp.dir
/home/Hadoop/tmp
hdfs-site.xml
dfs.replication
1
dfs.namenode.name.dir
file:///home/hadoop/dfs/name
dfs.datanode.name.dir
file:///home/hadoop/dfs/data
dfs.namenode.http-address
hn1:50070
dfs.namenode.secondary.http-address
hn1:50090
dfs.permissions.enabled
false
mapred-site.xml
mapreduce.framework.name
yarn
mapred.job.tracker
hn1:100220
mapred.jobhistory.webapp.address
hn1:19888
yarn-site.xml
yarn.nodemanager.aux-service
mapreduce_shuffle
yarn.resourcemanager.hostname
hn1
yarn.nodemanager.local-dirs
file:///home/hadoop/yarn/nm
workers内容(在Hadoop2.X中是修改slaves文件)
hn2
hn3
六、Hadoop运行环境配置
修改 ${HADOOP_HOME}/sbin目录下:start-dfs.sh、stop-dfs.sh、start-yarn.sh、stop-yarn.sh
1、在start-dfs.sh、stop-dfs.sh开头部分添加以下环境
HDFS_DATANODE_USER=root
HDFS_DATANODE_SECURE_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_NAMENODE_SECONDARYNAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
2、在start-yarn.sh、stop-yarn.sh开头部分添加以下环境
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root
七、分发Hadoop配置至其他虚拟机
#scp ${HADOOP_HOME}/etc/hadoop/*.* [email protected]:/home/hadoop/hadoop-3.1.1/etc/hadoop/
#scp ${HADOOP_HOME}/etc/hadoop/*.* [email protected]:/home/hadoop/hadoop-3.1.1/etc/hadoop/
#scp ${HADOOP_HOME}/sbin/*.* [email protected]:/home/hadoop/hadoop-3.1.1/sbin
#scp ${HADOOP_HOME}/sbin/*.* [email protected]:/home/hadoop/hadoop-3.1.1/sbin
八、Hadoop启动及文件管理
1、文件管理
#${HADOOP_HOME}/bin/hdfs -namenode format #启动Hadoop前必须执行语句
#${HADOOP_HOME}/bin/hdfs dfs -chmod ugo+rwx / #给根目录授权,这里允许所有用户都可以读写执行,请根据实际设置
#${HADOOP_HOME}/bin/hdfs dfs -mkdir /user #在根目录下创建目录文件/user
#${HADOOP_HOME}/bin/hdfs dfs -rm -f -r /user #在根目录下删除目录文件/user
#${HADOOP_HOME}/bin/hdfs dfs -rm -f -r /user/input #在根目录下删除目录文件/user/input
#${HADOOP_HOME}/bin/hdfs dfs -put ${HADOOP_HOME}/etc/hadoop/*.xml /user/input/ 上传文件
2、启动
${HADOOP_HOME}/sbin/start-all.sh
九、管理页面
Windows7中,打开firefox(IE会报js方面错误)。访问:http://192.168.56.100:8088
HDFS访问:http://192.168.56.100:50070/