Create a hadoop2.4.1 Cluster on Cent OS 6.5 (host

1. Prepare three Cent Os hosts for poc

10.28.241.174 shuynh-gecko1
10.28.241.172 shuynh-gecko2
10.28.241.175 shuynh-gecko3

root@shuynh-gecko1:~# cat /etc/os-release

2. Get all images related to Hadoop Cluster on each node

2.1 get all images

root@shuynh-gecko1:~# docker pull sequenceiq/hadoop-docker

root@shuynh-gecko2:~# docker pull sequenceiq/hadoop-docker

root@shuynh-gecko3:~# docker pull sequenceiq/hadoop-docker

2.2 check the image on each node

root@shuynh-gecko1:~# docker images |grep sequenceiq/hadoop-docker |grep 2.4.1
sequenceiq/hadoop-docker           2.4.1               8040f2b27b10        4 weeks ago         854.1 MB


3. Get source code of docker-scripts on each node

root@shuynh-gecko1:/# git clone https://github.com/jay-lau/hadoop-docker-master-cluster.git
Cloning into 'hadoop-docker-master-cluster'...
remote: Counting objects: 16, done.
remote: Compressing objects: 100% (13/13), done.
remote: Total 16 (delta 1), reused 16 (delta 1)
Unpacking objects: 100% (16/16), done.
Checking connectivity... done.

root@shuynh-gecko2:/# git clone https://github.com/jay-lau/hadoop-docker-master-cluster.git

root@shuynh-gecko3:/# git clone https://github.com/jay-lau/hadoop-docker-master-cluster.git


Notes: By default, if nodetype is N, we will start both of namendoe and datanode. If we want only start namenode, please remove the start logic in bootstrap.sh.

if [ $3 = "N" ] ; then
    echo "starting Hadoop Namenode,resourcemanager,datanode,nodemanager"

    #rm -rf  /tmp/hadoop-root
    #$HADOOP_PREFIX/bin/hdfs namenode -format> /dev/null 2>&1
    $HADOOP_PREFIX/sbin/hadoop-daemon.sh  start namenode > /dev/null 2>&1
    echo "Succeed to start namenode"

    $HADOOP_PREFIX/sbin/yarn-daemon.sh  start resourcemanager > /dev/null 2>&1
    echo "Succeed to start resourcemanager"


    #$HADOOP_PREFIX/sbin/hadoop-daemon.sh  start datanode > /dev/null 2>&1
    #echo "Succeed to start datanode"

    #$HADOOP_PREFIX/sbin/yarn-daemon.sh  start nodemanager > /dev/null 2>&1
    #echo "Succeed to start nodemanager"

    $HADOOP_PREFIX/bin/hadoop dfsadmin -safemode leave
else
    echo "starting Hadoop Datanode,nodemanager"

    rm -rf  /tmp/hadoop-root
    $HADOOP_PREFIX/sbin/hadoop-daemon.sh  start datanode > /dev/null 2>&1
    echo "Succeed to start datanode"

    $HADOOP_PREFIX/sbin/yarn-daemon.sh  start nodemanager > /dev/null 2>&1
    echo "Succeed to start nodemanager"
fi

4. Build Hadoop docker on each node

#enter the folder of  the hadoop-docker-master-cluster scripts.

4.1 build the images on shuynh-gecko1

root@shuynh-gecko1:~#cd /root/hadoop-docker-master-cluster
root@shuynh-gecko1:~#docker build -t="sequenceiq/hadoop-cluster-docker:2.4.1" .

root@shuynh-gecko1:~/hadoop-docker-master-cluster# docker build -t="sequenceiq/hadoop-cluster-docker:2.4.1" .                       

Sending build context to

Docker daemon   149 kB
Sending build context to Docker daemon
Step 0 : FROM sequenceiq/hadoop-docker:2.4.1
 ---> 8040f2b27b10
Step 1 : MAINTAINER SequenceIQ
 ---> Using cache
 ---> 882cff7182a4
Step 2 : USER root
 ---> Using cache
 ---> 408f0a434373
Step 3 : ADD core-site.xml $HADOOP_PREFIX/etc/hadoop/core-site.xml
 ---> 927521fd85ae
Removing intermediate container 7df7dba3d730
Step 4 : ADD hdfs-site.xml $HADOOP_PREFIX/etc/hadoop/hdfs-site.xml
 ---> 949460061b1e
Removing intermediate container e4cb6829fdb9
Step 5 : ADD mapred-site.xml $HADOOP_PREFIX/etc/hadoop/mapred-site.xml
 ---> e268a15c1d3f
Removing intermediate container 5c901152fb30
Step 6 : ADD yarn-site.xml $HADOOP_PREFIX/etc/hadoop/yarn-site.xml
 ---> 284ca37d9857
Removing intermediate container 9c780fc17aa7
Step 7 : ADD slaves $HADOOP_PREFIX/etc/hadoop/slaves
 ---> 1e3a4ffa5632
Removing intermediate container 2094c6c5622f
Step 8 : ADD bootstrap.sh /etc/bootstrap.sh
 ---> b8c32c42b655
Removing intermediate container 0d9616f32157
Step 9 : RUN chown root:root /etc/bootstrap.sh
 ---> Running in 103d2f89a580
 ---> 765f1e58c184
Removing intermediate container 103d2f89a580
Step 10 : RUN chmod 700 /etc/bootstrap.sh
 ---> Running in 5cc86e285299
 ---> 1a4b1dfb615c
Removing intermediate container 5cc86e285299
Step 11 : ENV BOOTSTRAP /etc/bootstrap.sh
 ---> Running in 57e323c93b5f
 ---> a1082f764127
Removing intermediate container 57e323c93b5f
Step 12 : RUN rm -f /etc/ssh/ssh_host_dsa_key
 ---> Running in 6be294648cc9
 ---> a9c5d835c39c
Removing intermediate container 6be294648cc9
Step 13 : RUN rm -f /etc/ssh/ssh_host_rsa_key
 ---> Running in 80f727977a76
 ---> ae19d6e5171d
Removing intermediate container 80f727977a76
Step 14 : RUN rm -f /root/.ssh/id_rsa
 ---> Running in 3fbebc17ee38
 ---> c473e2ed5f6f
Removing intermediate container 3fbebc17ee38
Step 15 : RUN ssh-keygen -q -N "" -t dsa -f /etc/ssh/ssh_host_dsa_key
 ---> Running in 72b62e9a0656
 ---> f7444a1eb624
Removing intermediate container 72b62e9a0656
Step 16 : RUN ssh-keygen -q -N "" -t rsa -f /etc/ssh/ssh_host_rsa_key
 ---> Running in 550b8fb8809d
 ---> 3338f146799a
Removing intermediate container 550b8fb8809d
Step 17 : RUN ssh-keygen -q -N "" -t rsa -f /root/.ssh/id_rsa
 ---> Running in 99d28e7ead76
 ---> d4befb3f8898
Removing intermediate container 99d28e7ead76
Step 18 : RUN cp /root/.ssh/id_rsa.pub /root/.ssh/authorized_keys
 ---> Running in 74be0823aad2
 ---> cf03143c566f
Removing intermediate container 74be0823aad2
Step 19 : EXPOSE 50020 50021 50090 50070 50010 50011 50075 50076 8031 8032 8033 8040 8042 49707 22 8088 8030
 ---> Running in 46c625d45f0d
 ---> 62fce6617879
Removing intermediate container 46c625d45f0d
Step 20 : CMD ["-h"]
 ---> Running in 4677defeb509
 ---> 268426cafb54
Removing intermediate container 4677defeb509
Step 21 : ENTRYPOINT ["/etc/bootstrap.sh"]
 ---> Running in d5d4c1a34868
 ---> b51d46a23ae3
Removing intermediate container d5d4c1a34868
Successfully built b51d46a23ae3

root@shuynh-gecko1:~/hadoop-docker-master-cluster# docker images|grep sequenceiq/hadoop-cluster-docker
sequenceiq/hadoop-cluster-docker   2.4.1               b51d46a23ae3        6 minutes ago       854.1 MB

4.2 build the images on shuynh-gecko2

root@shuynh-gecko2:~#cd /root/hadoop-docker-master-cluster
root@shuynh-gecko2:~#docker build -t="sequenceiq/hadoop-cluster-docker:2.4.1" .

root@shuynh-gecko2:~/hadoop-docker-master-cluster# docker build -t="sequenceiq/hadoop-cluster-docker:2.4.1" .     

4.3 build the images on shuynh-gecko3

root@shuynh-gecko3:~#cd /root/hadoop-docker-master-cluster
root@shuynh-gecko3:~#docker build -t="sequenceiq/hadoop-cluster-docker:2.4.1" .

root@shuynh-gecko3:~/hadoop-docker-master-cluster# docker build -t="sequenceiq/hadoop-cluster-docker:2.4.1" .     

5. Configure /etc/hosts file  for each node

configure /etc/hosts file on every nodes
10.28.241.174 shuynh-gecko1
10.28.241.172 shuynh-gecko2
10.28.241.175 shuynh-gecko3

6. Create Hadoop Cluster

# Start a container
docker run   --net=host  sequenceiq/hadoop-cluster-docker:2.4.1 $1 $2 $3 $4 $5 $6

Params definition as below:
$1:Hdfs port, such as 9000
$2:Hdfs DataNode port, such as 50010
$3:Type of Namenode or Datanode, such as N | D
$4:Number of hdfs replication, default is 1. Need more improvement for this param.
$5:Default command, run as service "-d", run as interactive "-bash"
$6:Master Node IP address, such as 10.28.241.174

#If we need run interactive, please add "-i -t " options.

6.1 Create NameNode and DataNode (interactive serivce, using -bash, add -i,-t ) on shuynh-gecko1:

[root@shuynh-gecko1 ~]# docker stop $(docker ps -a -q)
[root@shuynh-gecko1 ~]# docker rm $(docker ps -a -q)
[root@shuynh-gecko1 ~]# docker run  -i -t --net="host"  sequenceiq/hadoop-cluster-docker:2.4.1 9001 50010 N 1 -bash 10.28.241.174
BOOTSTRAP=/etc/bootstrap.sh
HOSTNAME=shuynh-gecko1
HADOOP_PREFIX=/usr/local/hadoop
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/java/default/bin
PWD=/
JAVA_HOME=/usr/java/default
SHLVL=1
HOME=/
_=/usr/bin/env
/
Hdfs port:9001
Hdfs DataNode port:50010
Namenode or datanode:N
Number of hdfs replication:1
Default command:-bash
Master ip:10.28.241.174
Starting sshd: [  OK  ]
starting Hadoop Namenode,resourcemanager,datanode,nodemanager
Succeed to start namenode
Succeed to start resourcemanager
Succeed to start datanode
Succeed to start nodemanager
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Safe mode is OFF

6.2 Create DataNode (backend service, using -d) on shuynh-gecko2:
[root@shuynh-gecko2 ~]# docker stop $(docker ps -a -q)
[root@shuynh-gecko2 ~]# docker rm $(docker ps -a -q)
[root@shuynh-gecko2 hadoop-docker-master-cluster]# docker run   --net="host"  sequenceiq/hadoop-cluster-docker:2.4.1 9001 50010 D 1 -d 10.28.241.174
BOOTSTRAP=/etc/bootstrap.sh
HOSTNAME=shuynh-gecko2
HADOOP_PREFIX=/usr/local/hadoop
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/java/default/bin
PWD=/
JAVA_HOME=/usr/java/default
SHLVL=1
HOME=/
_=/usr/bin/env
/
Hdfs port:9001
Hdfs DataNode port:50010
Namenode or datanode:D
Number of hdfs replication:1
Default command:-bash
Master ip:10.28.241.174
starting Hadoop Datanode,nodemanager
Succeed to start datanode
Succeed to start nodemanager

6.3 Create DataNode (backend service, using -d) on shuynh-gecko3:
[root@shuynh-gecko2 ~]# docker stop $(docker ps -a -q)
[root@shuynh-gecko2 ~]# docker rm $(docker ps -a -q)
[root@shuynh-gecko2 hadoop-docker-master-cluster]# docker run   --net="host"  sequenceiq/hadoop-cluster-docker:2.4.1 9001 50010 D 1 -d 10.28.241.174

7. Check the cluster status

7.1 Access the WEB GUI

Access http://10.28.241.174:50070/dfshealth.html#tab-datanode

or

Access http://10.28.241.174:50070/dfshealth.html#tab-datanode

7.2 Using command line to check the status

bash-4.1# $HADOOP_PREFIX/bin/hdfs dfsadmin -report

8. Run a sample hadoop case

bash-4.1# $HADOOP_PREFIX/bin/hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar grep input output 'dfs[a-z.]+'

14/07/30 23:59:40 INFO client.RMProxy: Connecting to ResourceManager at /10.28.241.174:8032
14/07/30 23:59:40 WARN mapreduce.JobSubmitter: No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
14/07/30 23:59:40 INFO input.FileInputFormat: Total input paths to process : 26
14/07/30 23:59:41 INFO mapreduce.JobSubmitter: number of splits:26
14/07/30 23:59:41 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1406778261600_0003
14/07/30 23:59:41 INFO mapred.YARNRunner: Job jar is not present. Not adding any jar to the list of resources.
14/07/30 23:59:42 INFO impl.YarnClientImpl: Submitted application application_1406778261600_0003
14/07/30 23:59:42 INFO mapreduce.Job: The url to track the job: http://shuynh-gecko1:8088/proxy/application_1406778261600_0003/
14/07/30 23:59:42 INFO mapreduce.Job: Running job: job_1406778261600_0003

14/07/30 23:59:49 INFO mapreduce.Job: Job job_1406778261600_0003 running in uber mode : false
14/07/30 23:59:49 INFO mapreduce.Job:  map 0% reduce 0%
14/07/30 23:59:57 INFO mapreduce.Job:  map 4% reduce 0%
14/07/30 23:59:58 INFO mapreduce.Job:  map 8% reduce 0%
14/07/30 23:59:59 INFO mapreduce.Job:  map 27% reduce 0%
14/07/31 00:00:00 INFO mapreduce.Job:  map 31% reduce 0%
14/07/31 00:00:01 INFO mapreduce.Job:  map 38% reduce 0%
14/07/31 00:00:04 INFO mapreduce.Job:  map 42% reduce 0%
....


你可能感兴趣的:(hadoop,docker)