Hadoop笔记(二)Hadoop伪集群搭建

Hadoop伪集群搭建

前言

需要先准备一下工作:

  1. 关闭防火墙
  2. 设置主机名称
  3. IP与主机名称进行绑定
  4. 安装JDK环境(Hadoop需要在Java环境中运行)

一.配置Hadoop环境

  1. 解压Hadoop安装包到opt/app路径下

    tar -zxvf hadoop-2.7.1.tar.gz -C /opt/app
    
  2. 配置etc/hadoop里面的配置文件

    • 设置文件夹权限

      sudo chown -R root:root /opt/app/hadoop-2.7.1
      
    • 配置hadoop-env.sh,填入jdk目录

      # The java implementation to use.
      export JAVA_HOME=/opt/app/jdk1.8.0_152
      
    • 返回hadoop根目录,检验环境配置是否有问题

      bin/hadoop
      Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]
        CLASSNAME            run the class named CLASSNAME
       or
        where COMMAND is one of:
        fs                   run a generic filesystem user client
        version              print the version
        jar             run a jar file
                             note: please use "yarn jar" to launch
                                   YARN applications, not this command.
        checknative [-a|-h]  check native hadoop and compression libraries availability
        distcp   copy file or directories recursively
        archive -archiveName NAME -p  *  create a hadoop archive
        classpath            prints the class path needed to get the
        credential           interact with credential providers
                             Hadoop jar and the required libraries
        daemonlog            get/set the log level for each daemon
        trace                view and modify Hadoop tracing settings
      
      Most commands print help when invoked w/o parameters.
      
      
    • 配置core-site.xml

      
          
              fs.defaultFS
              hdfs://localhost:9000
          
      
      
    • 配置hdfs-site.xml

      
          
              dfs.replication
              1
          
      
      
    • 配置mapred-site.xml(需要先把mapred-site.xml.template改成mapred-site.xml)

      
          
              mapreduce.framework.name
              yarn
          
          
              mapreduce.application.classpath
                       $HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*
          
      
      
    • 配置yarn-site.xml

      
          
              yarn.nodemanager.aux-services
              mapreduce_shuffle
          
          
              yarn.nodemanager.env-whitelist
              JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME
          
      
      

二、启动Hadoop服务

回到Hadoop文件夹根目录

  1. 格式化文件系统

    bin/hdfs namenode -format
    
  2. 启动NameNode和DataNode守护程序

    sbin/hadoop-daemon.sh start namenode
    sbin/hadoop-daemon.sh start datanode
    
  3. 启动Yarn中的ResourceManager和NodeManager守护程序

    sbin/yarn-daemon.sh start resourcemanager
    sbin/yarn-daemon.sh start nodemanager
    
  4. 用命令jps查看java进程

     jps
    126098 Jps
    14532 DataNode
    17284 ResourceManager
    14235 NameNode
    17679 NodeManager
    
    
  5. 可以在web中查看服务页面


三、运行简单实例单词统计

  1. 编写一个文件a.txt

    hadoop java hbase hello
    hadoop java zookeeper hello
    sqoop hbase flume spark
    
  2. 上传到hdfs集群的根目录上(可以在HDFS外部访问web界面 localhost:50070查看 )

    bin/hdfs dfs -put a.txt /a.txt
    
  3. 跑动MapReduce中的wordcount

    bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount /a.txt output
    
  4. 在hdfs文件系统上打开output,

    bin/hdfs dfs -ls /output
    Found 2 items
    -rw-r--r--   1 hadoop supergroup          0 2020-02-24 23:39 /output/_SUCCESS
    -rw-r--r--   1 hadoop supergroup         68 2020-02-24 23:39 /output/part-r-00000
    
    
  5. 查看文本中的文件

    bin/hdfs dfs -text /output/part*
    flume   1
    hadoop  2
    hbase   2
    hello   2
    java    2
    spark   1
    sqoop   1
    zookeeper       1
    
    

完成!


Hadoop目录中各个文件夹

bin 基本脚本管理

sbin 服务启动与关闭脚本

share jar包

etc 配置文件

你可能感兴趣的:(Hadoop笔记(二)Hadoop伪集群搭建)