使用Docker搭建hadoop集群

1.获取镜像

lcc@lcc ~$ docker pull registry.cn-hangzhou.aliyuncs.com/kaibb/hadoop
Using default tag: latest
# 差看镜像
lcc@lcc ~$ docker images
REPOSITORY                                       TAG                 IMAGE ID            CREATED             SIZE
registry.cn-hangzhou.aliyuncs.com/kaibb/hadoop   latest              44c31aee79de        2 years ago         927MB
lcc@lcc ~$

2.创建容器。

有了镜像之后,我们根据该镜像创建三个容器,分别是一个Master用来作为hadoop集群的namenode,剩下两个Slave用来作为datanode。
创建NameNode

lcc@lcc ~$ docker run -i -t --name Master -h Master registry.cn-hangzhou.aliyuncs.com/kaibb/hadoop /bin/bash
root@Master:/#
lcc@lcc ~$ docker ps
CONTAINER ID        IMAGE                                            COMMAND                  CREATED             STATUS              PORTS                                                                                                                                        NAMES
9fa16e92911e        registry.cn-hangzhou.aliyuncs.com/kaibb/hadoop   "/bin/bash"              7 seconds ago       Up 12 seconds       22/tcp, 2122/tcp, 8030-8033/tcp, 8040/tcp, 8042/tcp, 8088/tcp, 19888/tcp, 49707/tcp, 50010/tcp, 50020/tcp, 50070/tcp, 50075/tcp, 50090/tcp   Master

可以看到启动了很多的端口

创建datanode1,新开一个命令行窗口

lcc@lcc ~$ docker run -i -t --name Slave1 -h Slave1 registry.cn-hangzhou.aliyuncs.com/kaibb/hadoop /bin/bash
root@Slave1:/# jps

创建datanode2,新开一个命令行窗口

lcc@lcc ~$ docker run -i -t --name Slave2 -h Slave2 registry.cn-hangzhou.aliyuncs.com/kaibb/hadoop /bin/bash
root@Slave1:/# jps

最终结果

lcc@lcc $ docker ps
CONTAINER ID        IMAGE                                            COMMAND                  CREATED             STATUS              PORTS                                                                                                                                        NAMES
5c0ddf8dbaeb        registry.cn-hangzhou.aliyuncs.com/kaibb/hadoop   "/bin/bash"              2 minutes ago       Up 3 minutes        22/tcp, 2122/tcp, 8030-8033/tcp, 8040/tcp, 8042/tcp, 8088/tcp, 19888/tcp, 49707/tcp, 50010/tcp, 50020/tcp, 50070/tcp, 50075/tcp, 50090/tcp   Slave2
7a70a2a67ec0        registry.cn-hangzhou.aliyuncs.com/kaibb/hadoop   "/bin/bash"              3 minutes ago       Up 3 minutes        22/tcp, 2122/tcp, 8030-8033/tcp, 8040/tcp, 8042/tcp, 8088/tcp, 19888/tcp, 49707/tcp, 50010/tcp, 50020/tcp, 50070/tcp, 50075/tcp, 50090/tcp   Slave1
9fa16e92911e        registry.cn-hangzhou.aliyuncs.com/kaibb/hadoop   "/bin/bash"              3 minutes ago       Up 4 minutes        22/tcp, 2122/tcp, 8030-8033/tcp, 8040/tcp, 8042/tcp, 8088/tcp, 19888/tcp, 49707/tcp, 50010/tcp, 50020/tcp, 50070/tcp, 50075/tcp, 50090/tcp   Master

3.环境配置

查看环境

root@Master:/# echo ${PATH}
/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/tools/jdk1.8.0_77/bin:/opt/tools/hadoop/bin:/opt/tools/hadoop/sbin
root@Master:/#
root@Master:/# java -version
java version "1.8.0_77"
Java(TM) SE Runtime Environment (build 1.8.0_77-b03)
Java HotSpot(TM) 64-Bit Server VM (build 25.77-b03, mixed mode)
root@Master:/#

root@Master:/# hostname
Master
root@Master:/#
root@Slave1:/# hostname
Slave1
root@Slave1:/#
root@Slave2:/# hostname
Slave2
root@Slave2:/#

从中间可以看出包含了jdk和hadoop的bin目录,所以我们可以直接使用这两个命令。java的环境也包含了。

3.1 配置无秘的SSH

首先将SSH运行起来。

root@Master:/# /etc/init.d/ssh start
 * Starting OpenBSD Secure Shell server sshd                                                                                                                                                                                                                                               [ OK ]
root@Master:/#

然后生成秘钥,保存到authorized_keys中。

root@Master:/# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
/root/.ssh/id_rsa already exists.
Overwrite (y/n)? y
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
df:da:ed:b1:b3:1b:ef:ee:6a:e6:8d:05:fe:d5:3c:66 root@Master
The key's randomart image is:
+---[RSA 2048]----+
|                 |
|                 |
|                 |
|                 |
|        S     .  |
|         . . . o.|
|          . . +E=|
|           o .*Xo|
|          . .=XOB|
+-----------------+
root@Master:/#
root@Master:/# cat /root/.ssh/id_rsa.pub > /root/.ssh/authorized_keys
root@Master:/# cat  /root/.ssh/authorized_keys
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDrIp0gUg7JPuLZaLUtAr8UvQqdWyhNkazp+Vpas+QlwmyQY6UZpmpQG8YF6RycBaoq7sq9A0SzHyrsRGwF1L7QoL5hwt6Bm83CbXkJvXhn9lRqGJ+6fVJ7O04gNWpYkzrW1rmGd7Lt3aRjNfG3lnmFe3/Li8llLlU/wZhJ3bcIMVQRu361HQMqUICBpTG/1LpCOQXLpuamp4cDc+Jh2Tvz7JS3HX6OHRDKPGhPLHSINeRaRL14RE9r+amB0IhgkhkNBMNNfxieAPs2TO/b3V70mWVDPHKsBUsVnScjDY9KmhO7sMbgbVQPbCayoJq44Xkq5323cs0nKHwVwF09hrUr root@Master
root@Master:/#

接下来在两个Slave节点中做相同的事情,并且将每个节点生成的秘钥互相保存在authorized_keys中。

datanode1

root@Slave1:/# ssh-keygen -t rsa
root@Slave1:/# cat /root/.ssh/id_rsa.pub > /root/.ssh/authorized_keys
root@Slave1:/# cat  /root/.ssh/authorized_keys
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDAFdJXMhswac1BV20jmM5XjoJxZ1lVLTubRiim8+zLcFAvHOPsrG4JfW9XFE+VWnbfKoBuBdzdY90VgeRUnv6IkhxegOhLlJ0nB9WI53VA4s81TgcVkGA/K94aUw48sCGyCpJVEWXN1NLmoEn5ir87A2uxOGfcgZmgWM/P/d+eQDHvFYA3Ix2lUlmDVyELtCDlXzWDJWgjuxFlb+Je2oFUNqeZIVNN8TXZwhP30PZEooIgbZ0QXa6ETQLrQwMK7f6FIvwZ6egwD5n0iNTNE+3NVY0NOu7lL7lIBQrsy7ACa8QD2sTPNnHjLPEzJp/U81aSZ53mwlmCD2WWIYsu4jNn root@Slave1
root@Slave1:/#

datanode2

root@Slave2:/# ssh-keygen -t rsa
root@Slave2:/# cat /root/.ssh/id_rsa.pub > /root/.ssh/authorized_keys
root@Slave2:/# cat  /root/.ssh/authorized_keys
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDAFdJXMhswac1BV20jmM5XjoJxZ1lVLTubRiim8+zLcFAvHOPsrG4JfW9XFE+VWnbfKoBuBdzdY90VgeRUnv6IkhxegOhLlJ0nB9WI53VA4s81TgcVkGA/K94aUw48sCGyCpJVEWXN1NLmoEn5ir87A2uxOGfcgZmgWM/P/d+eQDHvFYA3Ix2lUlmDVyELtCDlXzWDJWgjuxFlb+Je2oFUNqeZIVNN8TXZwhP30PZEooIgbZ0QXa6ETQLrQwMK7f6FIvwZ6egwD5n0iNTNE+3NVY0NOu7lL7lIBQrsy7ACa8QD2sTPNnHjLPEzJp/U81aSZ53mwlmCD2WWIYsu4jNn root@Slave2
root@Slave2:/#

三台机器互相无秘钥

root@Master:/# vi   /root/.ssh/authorized_keys
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC1K1POJ35zoeGNR0wnla1T5tmR0ZqudhefiU9nSsW4DrCFTcYaLvw5ldw8YkaQWNFHHr4MIOuR80ehmgiqH4ULfHfIXNPM/RNYNp6NFJFaP+/PJfqa2ojjMHT3Tm4OxQvy374q/hR6mSufnvBG092LugusZwfO/OrPxy2mpo24iymFe8NTECSGS4PvuxQK+pigkmxf3Oy2fy6oUEtC5KYpWfbZAsk5C4ZVVz8+hCg9D9a5KQ9Kukks/3yczPSRx0p4A3YCrIdRfE2B1mUGI/c5yDhIxx7HLxeF9fZjXQCh4lPVar0bME2yPJ/2KUvv19126ETnqyiFJkjH/W19cNIR root@Master
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC7ENTLgDisi2ljJEuZDdyYe8waL2EpaBVqpqDKnSk6sY8sslNQ+Pn7WBBl+dLis7PVRjgykSja6L+7hlJeQVnf0oF8/hQdyh+jBwzgMADZKMpgVX3/XnOdsdKdBHekpzTNrRavLFUSENAym379A+sp8inOyy300sHyQ5RH9ZB29HOZIld+MJlAdUi13hi3UNMB2p/tH7Id3xaiKrMaHKbjnVoGh9/khsaz8l7Zveo6P1+SdZz35dnA8mQkqtObShPNTKWAkMinp85f2P/pqpifxm+sHlK5//jrj6GI/+/D1zMDpF1t/YdshLseCd+MtiGHLccWILwpgoTECWSn+5uH root@Slave1
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQD09mXG7uy+bfliCFJk2i6t3W2sXZsIn5bRBnpNlf3/a0SnKnGyv8d4bAj9xab5RT+k+5Zfc5pLyqqRn7o8jVzctozk+fWI5tkVlgWz2+F+Xl9bm+SW2ZUTevkaov5pmhEjsabOK9y/8D7knTOpfqPV1d+auR8dYYhuz9N9ZJDoJpkNKq5oNU6H3aQHoSDxVkRY0c0e6v9SF5rPvdPKovFD1ixmdxJJyoJ/9fgWzZAb9ZLmbRrI/ySKnezkbGX2JrXQjhdkS2z+OAY7rhcCLDwsELFNGk69nRYMLnsEGNl9eLRWYi2yJA+ryK6ul4PxWyljlt6NlHdE69z3tHT48uof root@Slave2

root@Slave1:/# vi  /root/.ssh/authorized_keys
root@Slave2:/# vi  /root/.ssh/authorized_keys

接下来查看一下各自节点的ip地址,但是如果直接使用熟悉的ifconfig时,就会发现没有这个命令。那是因为镜像本身就比较简洁,简洁到连这个命令都没有.
可以使用ip addr查看,或者:apt-get install ifconfig net-tools,安装相关命令

root@Master:/# ip addr
inet 172.17.0.5/16 brd 172.17.255.255 scope global eth0
root@Slave1:/# ip addr
inet 172.17.0.6/16 brd 172.17.255.255 scope global eth0
root@Slave2:/# ip addr
inet 172.17.0.7/16 brd 172.17.255.255 scope global eth0

然后修改/etc/hosts,将主机名和对应的ip地址添加进去,这样做为了方便调用ssh:

root@Master:/# vi /etc/hosts
172.17.0.5      Master
172.17.0.6      Slave1
172.17.0.7      Slave2

三天都添加

验证
如果使用ssh Slave1之后没有提示需要输入密码,就代表成功了(第一次的话会让输入yes或者no,是为了记住该地址,如果输入yes之后就可以直接连接就像下面这样):

root@Master:/# ssh Slave1
ssh: connect to host slave1 port 22: Connection refused
说明其他机器没有开通ssh,或者自己没有,都开启ssh 
root@Master:/# /etc/init.d/ssh start
 * Starting OpenBSD Secure Shell server sshd
 root@Slave1:/# /etc/init.d/ssh start
 * Starting OpenBSD Secure Shell server sshd                                                                                                                                                  [ OK ]
root@Slave1:/#
root@Slave2:/# /etc/init.d/ssh start
 * Starting OpenBSD Secure Shell server sshd                                                                                                                                                  [ OK ]
root@Slave2:/#

再次ssh就可以了
root@Master:/# ssh Slave1
The authenticity of host 'slave1 (172.17.0.6)' can't be established.
ECDSA key fingerprint is b4:6a:fd:7d:b6:bc:c2:2e:cd:d3:20:c0:57:6b:3b:2d.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'slave1,172.17.0.6' (ECDSA) to the list of known hosts.
Welcome to Ubuntu 15.04 (GNU/Linux 4.9.87-linuxkit-aufs x86_64)

 * Documentation:  https://help.ubuntu.com/

The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.

root@Slave1:~# 

4.配置hadoop

hadoop-env.sh:修改有关java的环境

root@Master:/# vi .bash_profile
export HADOOP_HOME=/opt/tools/hadoop
root@Master:/# source .bash_profile
root@Master:/# echo $HADOOP_HOME
/opt/tools/hadoop
root@Master:/#

root@Master:~# vi $HADOOP_HOME/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/opt/tools/jdk1.8.0_77
root@Master:~# vi $HADOOP_HOME/etc/hadoop/core-site.xml


  
    fs.defaultFS
    hdfs://localhost:9000
  
  
 		 hadoop.tmp.dir
  		 /hadoop/tmp
  


root@Master:~# vi $HADOOP_HOME/etc/hadoop/hdfs-site.xml

  
    dfs.replication
    1
  
  
    dfs.datanode.data.dir
    file:/hadoop/data
  
  
    dfs.namenode.name.dir
    file:/hadoop/name
  

root@Master:~# vi $HADOOP_HOME/etc/hadoop/mapred-site.xml

  
    mapred.job.tracker
    localhost:9001
  
  
    mapreduce.framework.name
    yarn
  

root@Master:~# vi $HADOOP_HOME/etc/hadoop/yarn-site.xml



     
        yarn.nodemanager.aux-services
        mapreduce_shuffle
    

        
                yarn.resourcemanager.address
                Master:8032
        
        
                yarn.resourcemanager.scheduler.address
                Master:8030
        
        
                yarn.resourcemanager.resource-tracker.address
                Master:8031
        
        
                yarn.resourcemanager.admin.address
                Master:8033
        
        
                yarn.resourcemanager.webapp.address
                Master:8088
        
        
                 yarn.nodemanager.aux-services.mapreduce.shuffle.class
                 org.apache.hadoop.mapred.ShuffleHandler
        


root@Master:/opt/tools/hadoop# vi etc/hadoop/slaves
localhost
Master
Slave1
Slave2

这里填写localhost,不写Master,比较好

相关目录会在格式化的时候会自动创建,hdfs-site.xml中的有关目录都是以core-site.xml中的hadoop.tmp.dir为主进行配置的,所以可以先创建hadoop.tmp.dir的目录。

root@Master:~# mkdir -p  /hadoop/tmp
root@Master:~# mkdir -p  /hadoop/data
root@Master:~# mkdir -p  /hadoop/name

root@Slave1:/# mkdir -p  /hadoop/tmp
root@Slave1:/# mkdir -p  /hadoop/data
root@Slave1:/# mkdir -p  /hadoop/name

root@Slave2:/# mkdir -p  /hadoop/tmp
root@Slave2:/# mkdir -p  /hadoop/data
root@Slave2:/# mkdir -p  /hadoop/name

然后将这些文件通过scp发送到各个Slave节点上覆盖原来的文件:

root@Master:~# scp $HADOOP_HOME/etc/hadoop/*   Slave1:/opt/tools/hadoop/etc/hadoop/
root@Master:~# scp $HADOOP_HOME/etc/hadoop/*   Slave2:/opt/tools/hadoop/etc/hadoop/

到这里为止,所有的准备工作都做好了。

5.运行hadoop

进行格式化操作:


root@Master:~# hadoop namenode -format

启动集群:

使用./start-all.sh,如果启动过程中提示关于0.0.0.0地址输入yes或no,输入yes即可:

root@Master:~# cd $HADOOP_HOME
root@Master:/opt/tools/hadoop# sbin/start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [localhost]
The authenticity of host 'localhost (127.0.0.1)' can't be established.
ECDSA key fingerprint is b4:6a:fd:7d:b6:bc:c2:2e:cd:d3:20:c0:57:6b:3b:2d.
Are you sure you want to continue connecting (yes/no)? yes
localhost: Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
localhost: starting namenode, logging to /opt/tools/hadoop-2.7.2/logs/hadoop-root-namenode-Master.out
localhost: starting datanode, logging to /opt/tools/hadoop-2.7.2/logs/hadoop-root-datanode-Master.out
Starting secondary namenodes [0.0.0.0]
The authenticity of host '0.0.0.0 (0.0.0.0)' can't be established.
ECDSA key fingerprint is b4:6a:fd:7d:b6:bc:c2:2e:cd:d3:20:c0:57:6b:3b:2d.
Are you sure you want to continue connecting (yes/no)? yes
0.0.0.0: Warning: Permanently added '0.0.0.0' (ECDSA) to the list of known hosts.
0.0.0.0: starting secondarynamenode, logging to /opt/tools/hadoop-2.7.2/logs/hadoop-root-secondarynamenode-Master.out
starting yarn daemons
starting resourcemanager, logging to /opt/tools/hadoop/logs/yarn--resourcemanager-Master.out
localhost: starting nodemanager, logging to /opt/tools/hadoop-2.7.2/logs/yarn-root-nodemanager-Master.out

查看

root@Master:/opt/tools/hadoop# jps
12833 SecondaryNameNode
13141 Jps
12998 ResourceManager
13110 NodeManager
12536 NameNode
root@Master:/opt/tools/hadoop#

root@Slave1:/opt/tools/hadoop# jps
2802 Jps
2678 NodeManager
2574 DataNode
root@Slave1:/opt/tools/hadoop#

root@Slave2:/# jps
2517 DataNode
2746 Jps
2621 NodeManager
root@Slave2:/#
root@Master:/opt/tools/hadoop# netstat -ano |grep 50070
tcp        0      0 0.0.0.0:50070           0.0.0.0:*               LISTEN      off (0.00/0/0)

参考:

https://blog.csdn.net/qq_33530388/article/details/72811705

你可能感兴趣的:(工具-docker)