Hadoop部署

阅读更多

一、Hadoop的三种运行模式(启动模式)

1.1、单机模式(独立模式)(Local或Standalone  Mode)

  -默认情况下,Hadoop即处于该模式,用于开发和调式。

  -不对配置文件进行修改。
  -使用本地文件系统,而不是分布式文件系统
  -Hadoop不会启动NameNode、DataNode、JobTracker、TaskTracker等守护进程,Map()和Reduce()任务作为同一个进程的不同部分来执行的
  -用于对MapReduce程序的逻辑进行调试,确保程序的正确

1.2、伪分布式模式(Pseudo-Distrubuted Mode)

  -Hadoop的守护进程运行在本机机器,模拟一个小规模的集群 

  -在一台主机模拟多主机
  -Hadoop启动NameNode、DataNode、JobTracker、TaskTracker这些守护进程都在同一台机器上运行,是相互独立的Java进程
  -在这种模式下,Hadoop使用的是分布式文件系统,各个作业也是由JobTraker服务,来管理的独立进程。在单机模式之上增加了代码调试功能,允许检查内存使用情况,HDFS输入输出,

    以及其他的守护进程交互。类似于完全分布式模式,因此,这种模式常用来开发测试Hadoop程序的执行是否正确。
  -修改3个配置文件:core-site.xml(Hadoop集群的特性,作用于全部进程及客户端)、hdfs-site.xml(配置HDFS集群的工作属性)、mapred-site.xml(配置MapReduce集群的属性)
  -格式化文件系统

1.3、全分布式集群模式(Full-Distributed Mode)

  -Hadoop的守护进程运行在一个集群上 

  -Hadoop的守护进程运行在由多台主机搭建的集群上,是真正的生产环境
  -在所有的主机上安装JDK和Hadoop,组成相互连通的网络
  -在主机间设置SSH免密码登录,把各从节点生成的公钥添加到主节点的信任列表
  -修改3个配置文件:core-site.xml、hdfs-site.xml、mapred-site.xml,指定NameNode和JobTraker的位置和端口,设置文件的副本等参数
  -格式化文件系统

 

二、搭建伪分布式集群的前提条件

环境:在Centos7

    jdk1.8.0_201

    hadoop 3.2

 

1、JDK部署 省略

2、配置环境变量

全局环境变量:/etc/profile




export JAVA_HOME=/usr/local/java/jdkexport JRE_HOME=$JAVA_HOME/jreexport CLASSPATH=.:$JAVA_HOME/lib/:$JRE_HOME/libexport PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin

source 相关文件(更新配置文件)

3、查看是否安装成功

java、javac、java -version

4、安装SSH,配置SSH免密码登录

1)检查是否安装SSH,若没,则安装;
[hadoop@strong ~]$ rpm -qa|grep ssh
openssh-7.4p1-16.el7.x86_64
openssh-server-7.4p1-16.el7.x86_64
libssh2-1.4.3-12.el7.x86_64
openssh-clients-7.4p1-16.el7.x86_64
2)配置SSH免密码登录

[xx@master ~]$ cd .ssh/
[xx@master .ssh]$ ssh-keygen -t rsa


[xx@master .ssh]$ cat id_rsa.pub >> authorized_keys
[xx@master .ssh]$ chmod 600 authorized_keys
[xx@master .ssh]$ ssh localhost
The authenticity of host 'localhost (127.0.0.1)' can't be established.
ECDSA key fingerprint is SHA256:uNXqrj0m4VpQRgv3LXDEV5si9fywauOOcxa9dOX17/4.
ECDSA key fingerprint is MD5:95:41:0a:7b:1d:d7:0a:5e:33:53:d9:b6:3c:0b:90:22.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
Last login: Wed Apr 3 15:37:00 2019

5、关闭防火墙

sudo service iptables stop # 关闭防火墙服务。

sudo chkconfig iptables off # 禁止防火墙开机自启。

三、搭建伪分布式集群

3.1、安装hadoop

  1)解压hadoop安装包到opt目录下

    tar -zxvf ~/Downloads/hadoop-3.2.0.tar.gz -C /usr/local/applications/

  2)创建软链接

         cd /usr/local/applications

   ln -s hadoop-3.2.0 hadoop

    3)配置环境变量

       在/etc/profile文件中加入以下内容:
  export HADOOP_HOME=/usr/local/applications/hadoop

      export HADOOP_COMMON_HOME=$HADOOP_HOME-  ---为wordcount使用
       export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
  

       source /etc/profile

  4)使用hadoop version命令测试是否配置成功






[root@master applications]# hadoop versionHadoop 3.2.0Source code repository https://github.com/apache/hadoop.git -r e97acb3bd8f3befd27418996fa5d4b50bf2e17bfCompiled by sunilg on 2019-01-08T06:08ZCompiled with protoc 2.5.0From source with checksum d3f0795ed0d9dc378e2c785d3668f39

3.2、配置hadoop

  配置文件存放在/usr/local/applications/hadoop/etc/hadoop中有n多个文件,暂时我们只需要修改的只有5个

  1)hadoop-env.sh

    大约在25行左右

export JAVA_HOME=/usr/local/java/jdk

export HADOOP_LOG_DIR=/wls/log/hadoop/logs

注意:在配置文件中有提示我们怎么设置,我们一般不删除,二回选择注释它的提示。

     2)core-site.xml 

  主机配置是master


[root@master hadoop]# cat /etc/hostnamemaster


   
        fs.defaultFS
        hdfs://master
   

  
   
        hadoop.tmp.dir
        /Data/hadoop/tmp
   

 

   
 
      fs.trash.interval
      7200
 

  
 
     io.file.buffer.size
     4096
 

 

[root@master hadoop]# ./sbin/start-all.sh
Starting namenodes on [master]
ERROR: Attempting to operate on hdfs namenode as root
ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation.
Starting datanodes

hadoop.http.staticuser.user配置了xx用户

 

3)hdfs-site.xml 

dfs.block.size配置64M,现在一般为128M


   
        dfs.nameservices
        master
   

    
   
       dfs.permissions.enabled
      false
   

   
dfs.replication
1



dfs.namenode.http-address
master:5007


   
        dfs.namenode.name.dir
        /Data/hadoop/hdfs/nn
   

   
        dfs.namenode.checkpoint.dir
        /Data/hadoop/hdfs/snn
   

   
        dfs.namenode.checkpoint.edits.dir
        /Data/hadoop/hdfs/snn
   

   
        dfs.datanode.data.dir
        /Data/hadoop/hdfs/dn
   

 
      dfs.block.size
     67108864
 

 
     dfs.namenode.secondary.http-address
     localhost:9001
 

 
     dfs.webhdfs.enabled
     true
 

新建数据文件夹

sudo mkdir -p /Data/hadoop/hdfs/nn
sudo mkdir -p /Data/hadoop/hdfs/nn
sudo mkdir -p /Data/hadoop/hdfs/nn
sudo mkdir -p /Data/hadoop/tmp

[xx@master applications]$ sudo mkdir -p /Data/hadoop/hdfs/nn
[sudo] password for xx: 
[xx@master applications]$ sudo mkdir -p /Data/hadoop/hdfs/nn
[xx@master applications]$ sudo mkdir -p /Data/hadoop/hdfs

[xx@master applications]$ sudo mkdir -p /Data/hadoop/tmp

[root@master ~]# chown -R xx:xx/Data/hadoop
[root@master ~]# chmod -R 777 /Data/hadoop

4)yarn-site.xml



yarn.resourcemanager.hostname
master



yarn.nodemanager.aux-services
mapreduce_shuffle



yarn.log-aggregation-enable
true


yarn.nodemanager.remote-app-log-dir
/wls/log/hadoop-yarn/apps
Where to aggregate logs to.
 

yarn.log-aggregation.retain-seconds
604800



yarn.nodemanager.resource.memory-mb
4096


yarn.log-aggregation.retain-seconds
604800



yarn.scheduler.minimum-allocation-mb
1024



yarn.log.server.url
http://master:19888/jobhistory/logs/

mkdir -p /wls/log/hadoop-yarn/apps

5)mapred-site.xml



mapreduce.framework.name
yarn


mapreduce.jobhistory.address
master:10020


mapreduce.jobhistory.webapp.address
master:19888

 
yarn.app.mapreduce.am.env
HADOOP_MAPRED_HOME=$HADOOP_COMMON_HOME


mapreduce.map.env
HADOOP_MAPRED_HOME=$HADOOP_COMMON_HOME
 

mapreduce.reduce.env
HADOOP_MAPRED_HOME=$HADOOP_COMMON_HOME

 

     yarn.app.mapreduce.am.staging-dir

     /tmp/hadoop-yarn/staging

   

   

     mapreduce.jobhistory.intermediate-done-dir

     ${yarn.app.mapreduce.am.staging-dir}/history/done_intermediate

   

   

     mapreduce.jobhistory.done-dir

     ${yarn.app.mapreduce.am.staging-dir}/history/done

   

3.3 Hadoop namenode格式化

hdfs namenode -format

3.4 Hadoop启动

./sbin/start-all.sh











[xx@master hadoop]$ ./sbin/start-all.sh Starting namenodes on [master]Starting datanodesStarting resourcemanagerStarting nodemanagers[xx@master hadoop]$ jps14736 NodeManager14050 DataNode14403 ResourceManager13865 NameNode14956 Jps

或者单独启动

sbin/hadoop-daemon.sh start namenode

sbin/hadoop-daemon.sh start datanode

sbin/yarn-daemon.sh start resourcemanager

sbin/yarn-daemon.sh start nodemanager

sbin/mr-jobhistory-daemon.sh start historyserver

再或者

sbin/start-dfs.sh

sbin/start-yarn.sh

sbin/mapred --daemon start






[xx@master hadoop]$ jps14736 NodeManager14050 DataNode14403 ResourceManager15240 Jps13865 NameNode15162 JobHistoryServer

3.5 验证

namenode信息http://localhost:9870

yarn资源调度信息http://localhost:8088

Job History: http://localhost:19888

 

HDFS测试



[xx@master hadoop]$ hdfs dfs -mkdir -p /input[xx@master hadoop]$ vi hello.txt[xx@master hadoop]$ hdfs dfs -put hello.txt /input/

运行wordcount示例:

hadoop jar /opt/applications/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0.jar wordcount /input/hello.txt /output

会报错

Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster

Please check whether your etc/hadoop/mapred-site.xml contains the below configuration:

yarn.app.mapreduce.am.env
HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}


mapreduce.map.env
HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}


mapreduce.reduce.env
HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}

处理方案在mapred-site.xml中配置


yarn.app.mapreduce.am









.envHADOOP_MAPRED_HOME=$HADOOP_COMMON_HOMEmapreduce.map.envHADOOP_MAPRED_HOME=$HADOOP_COMMON_HOMEmapreduce.reduce.envHADOOP_MAPRED_HOME=$HADOOP_COMMON_HOME

在/etc/profile配置

export HADOOP_COMMON_HOME=$HADOOP_HOME

hadoop jar /opt/applications/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0.jar wordcount /input/hello.txt /output

/tmp/hadoop-yarn/staging/xx/.staging/job_1554284100733_0001

运行成功后,查看目录:hadoop fs -ls / 多了一个output文件夹







[xx@master hadoop]$ hdfs dfs -ls /Found 5 itemsdrwxr-xr-x - xx supergroup 0 2019-04-03 16:55 /inputdrwxr-xr-x - xx supergroup 0 2019-04-03 17:37 /outputdrwxrwx--- - xx supergroup 0 2019-04-03 17:01 /tmpdrwx------ - xx supergroup 0 2019-04-03 17:36 /userdrwxr-xr-x - xx supergroup 0 2019-04-03 17:37 /wls

查看结果文件

















[xx@master hadoop]$ hdfs dfs -cat /output/part-r-00000"mapred 1--daemon 1./sbin/mr-jobhistory-daemon.sh 110 113865 114050 114403 114736 114956 1Apache 1Attempting 2CTRL-C 1DataNode 1Hadoop 1JobHistory 1...

你可能感兴趣的:(Hadoop部署)