docker hadoop3.2.0搭建集群

2019独角兽企业重金招聘Python工程师标准>>> hot3.png

1、docker安装centos镜像
从 Docker 镜像仓库获取镜像的命令是 docker pull。其命令格式为:
docker pull [选项] [Docker Registry 地址[:端口号]/]仓库名[:标签]
可以直接使用docker pull centos:7命令安装镜像
下载好之后,使用docker image ls查看拥有的镜像:
[hadoop @localhost ~]$ docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
docker.io/centos latest 2d194b392dd1 3 weeks ago 195 MB
docker.io/hello-world latest f2a91732366c 4 months ago 1.85 kB
一个是centos镜像,另一个是我们之前使用docker run hello-world命令下载的镜像。
镜像(Image)和容器(Container)的关系,就像是面向对象程序设计中的 类 和 实例 一样,镜像是静态的定义,容器是镜像运行时的实体。容器可以被创建、启动、停止、删除、暂停等。
2、运行容器
有了镜像后,我们就能够以这个镜像为基础启动并运行一个容器。
[hadoop @localhost ~]$ docker run -it --rm centos bash
[root@58f67e873eb9 /]# cat /etc/os-release
NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"
CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"
 
2、安装java版本
安装之前先检查一下系统有没有自带open-jdk
命令:
rpm -qa |grep java
rpm -qa |grep jdk
rpm -qa |grep gcj
如果没有输入信息表示没有安装。
如果安装可以使用rpm -qa | grep java | xargs rpm -e --nodeps 批量卸载所有带有Java的文件 这句命令的关键字是java
首先检索包含java的列表
yum list java*
检索1.8的列表
yum list java-1.8*   
安装1.8.0的所有文件
yum install java-1.8.0-openjdk* -y
使用命令检查是否安装成功
java -version
  wget 下载jdk
   wget --no-cookies --no-check-certificate --header "Cookie:     gpw_e24=http%3A%2F%2Fwww.oracle.com%2F; oraclelicense=accept-securebackup-cookie"     "https://download.oracle.com/otn-pub/java/jdk/8u201-b09/42970487e3af4f5aa5bca3f542482c60/jdk-8u201-linux-x64.tar.gz"
3、下载hadoop最新版本
mkdir /usr/hadoop/
wget  mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-3.2.0/hadoop-3.2.0.tar.gz
tar -zxvf  hadoop-3.2.0.tar.gz
4、配置环境变量
vim /etc/profile
添加如下内容:
#JAVA VARIABLES START
export JAVA_HOME=/usr/java/jdk1.8.0_201
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin
#JAVA VARIABLES END
 
#HADOOP VARIABLES START
export HADOOP_HOME=/usr/local/hadoop/hadoop-2.8.3
#export HADOOP_INSTALL=$HADOOP_HOME
#export HADOOP_MAPRED_HOME=$HADOOP_HOME
#export HADOOP_COMMON_HOME=$HADOOP_HOME
#export HADOOP_HDFS_HOME=$HADOOP_HOME
#export YARN_HOME=$HADOOP_HOME
#export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$HADOOP_HOME/sbin:$HADOOP_HOME/bin:$PATH
#export CLASSPATH=$($HADOOP_HOME/bin/hadoop classpath):$CLASSPATH
#HADOOP VARIABLES END 
执行命令:source /etc/profile
 
保存镜像更新信息 docker commit 6ebd4423e2de hadoop-master
5、配置hadoop
hadoop配置文件修改
1).core-site.xml配置
 
    hadoop.tmp.dir
    /usr/hadoop/hadoop-3.2.0/tmp
 
 
    fs.default.name
 hdfs://master:9000
 true
 
 
2).hdfs-site.xml配置
 
    dfs.replication
    2
 true
 
 
    dfs.namenode.name.dir
 /usr/hadoop/namenode
 
 
    dfs.datanode.data.dir
 /usr/local/hadoop/datanode
 
 
3).mapred-site.xml配置
 
    mapred.job.tracker
    master:9001
 
 
4)指定JAVA_HOME环境变量
vim /usr/hadoop/hadoop-3.2.0/etc/hadoop/hadoop-env.sh
修改JAVA_HOME=/usr/java/jdk1.8.0_201
 
5).格式化 namenode
cd /usr/hadoop/hadoop-3.2.0/bin
hadoop namenode -format
 
安装SSH
查看是否安装 rpm -qa | grep ssh
 安装SSH yum install openssh*
centos7设置SSH免密码登录
1、ssh-keygen -t rsa 生成公钥
 
2、把公钥文件放入授权文件中
cat id_rsa.pub >> authorized_keys
 
将镜像保存到新的容器
docker commit b243b3926f0a hadoop-basic
将hadoop-basic 创建master,slave1,slave2
运行如下命令:
docker run -p 50070:50070 -p 19888:19888 -p 8088:8088 --name master -ti -h master hadoop-master
docker run -it -h slave1 --name slave1 hadoop-slave1 /bin/bash
docker run -it -h slave2 --name slave2 hadoop-slave2 /bin/bash
 
hdfs dfsadmin -report 查看DataNode是否正常启动
 
 错误处理
 问题1:
   Starting namenodes on [localhost]
    ERROR: Attempting to operate on hdfs namenode as root
    ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation.
    Starting datanodes
    ERROR: Attempting to operate on hdfs datanode as root
    ERROR: but there is no HDFS_DATANODE_USER defined. Aborting operation.
    Starting secondary namenodes [bogon]
    ERROR: Attempting to operate on hdfs secondarynamenode as root
    ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting operation.
    处理1
        $ vim sbin/start-dfs.sh
        $ vim sbin/stop-dfs.sh
    两处增加以下内容
HDFS_DATANODE_USER=root
HDFS_DATANODE_SECURE_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root
    处理2
        $ vim sbin/start-yarn.sh
        $ vim sbin/stop-yarn.sh
    两处增加以下内容
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root
 
问题2:
localhost: ssh: connect to host localhost port 22: Cannot assign requested address
cd /etc/ssh
vim sshd_config
添加 Port 22
问题3:
Failed to get D-Bus connection: Operation not permitted
解决方法:docker run --privileged -ti -e "container=docker" -v /sys/fs/cgroup:/sys/fs/cgroup hadoop-master /usr/sbin/init
 
问题4:
sshd re-exec requires execution with an absolute path
在开启SSHD服务时报错.
sshd re-exec requires execution with an absolute path
用绝对路径启动,也报错如下:
Could not load host key: /etc/ssh/ssh_host_key
Could not load host key: /etc/ssh/ssh_host_rsa_key
Could not load host key: /etc/ssh/ssh_host_dsa_key
Disabling protocol version 1. Could not load host key
Disabling protocol version 2. Could not load host key
sshd: no hostkeys available — exiting
解决过程:
#ssh-keygen -t dsa -f /etc/ssh/ssh_host_dsa_key
#ssh-keygen -t rsa -f /etc/ssh/ssh_host_rsa_key
#/usr/sbin/sshd
执行后报错:
Could not load host key: /etc/ssh/ssh_host_ecdsa_key
Could not load host key: /etc/ssh/ssh_host_ed25519_key
解决过程:
#ssh-keygen -t dsa -f /etc/ssh/ssh_host_ecdsa_key
#ssh-keygen -t rsa -f /etc/ssh/ssh_host_ed25519_key
#/usr/sbin/sshd
 
hadoop集群搭建
 
 
问题5、
WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.
Starting namenodes on [master]
master: /usr/hadoop/hadoop-3.2.0/libexec/hadoop-functions.sh: line 982: ssh: command not found
Starting datanodes
Last login: Mon Jan 28 08:32:32 UTC 2019 on pts/0
localhost: /usr/hadoop/hadoop-3.2.0/libexec/hadoop-functions.sh: line 982: ssh: command not found
Starting secondary namenodes [b982e2adc393]
Last login: Mon Jan 28 08:32:33 UTC 2019 on pts/0
b982e2adc393: /usr/hadoop/hadoop-3.2.0/libexec/hadoop-functions.sh: line 982: ssh: command not found
Starting resourcemanager
Last login: Mon Jan 28 08:32:35 UTC 2019 on pts/0
Starting nodemanagers
Last login: Mon Jan 28 08:32:42 UTC 2019 on pts/0
localhost: /usr/hadoop/hadoop-3.2.0/libexec/hadoop-functions.sh: line 982: ssh: command not found
 
解决:
 $ vim sbin/start-dfs.sh
 $ vim sbin/stop-dfs.sh
将HADOOP_SECURE_DN_USER=hdfs替换为HADOOP_DATANODE_SECURE_DN_USER=hdfs
centos默认安装有ssh服务,没有客户端。
查看ssh安装
# rpm -qa | grep openssh
openssh-5.3p1-123.el6_9.x86_64
openssh-server-5.3p1-123.el6_9.x86_64
没有安装openssh-clients
yum安装ssh客户端
yum -y install openssh-clients
 
 
问题6、Failed to get D-Bus connection: Operation not permitted
问题7、docker: Error response from daemon: cgroups: cannot find cgroup mount destination: unknown.
没有找到具体的解决方法,重启后可以访问
 
问题8:Datanode denied communication with namenode because hostname cannot be resolved
 

 

转载于:https://my.oschina.net/u/3896892/blog/3007374

你可能感兴趣的:(docker hadoop3.2.0搭建集群)