【运维】Hadoop集群搭建

文章目录

      • 1.基本信息
      • 2.安装过程
        • 1).切换到hadoop账户,通过tar -zxvf命令将hadoop解压缩至目的安装目录:
        • 2).创建tmpdir目录:
        • 3).配置hadoop-env.sh文件:
        • 4).配置mapred-env.sh文件:
        • 5).配置core-site.xml文件 core-site.xml
        • 6).配置hdfs-site.xml文件 hdfs-site.xml
        • 7).配置mapred-site.xml文件 mapred-site.xml
        • 8).配置yarn-site.xml文件: yarn-site.xml
        • 9).配置hadoop运行的环境变量
        • 10).修改slaves文件:
        • 11).在test上复制hadoop-2.7.3到hadoop@test2和hadoop@test2机器并按照步骤9修改环境变量并执行以下操作:
        • 12).格式化namenode(仅第一次启动需要格式化!),启动hadoop,并启动jobhistory服务:
        • 13).检查每台机器的服务,test、test2、test3三台机器上分别输入jps:
      • Q&A
      • hadoop核心要素

1.基本信息

  • 版本2.7.3
  • 安装机器三台机器
  • 账号hadoop
  • 源路径/opt/software/hadoop-2.7.3.tar.gz
  • 目标路径/opt/hadoop -> /opt/hadoop-2.7.3
  • 依赖关系zookeeper

2.安装过程

1).切换到hadoop账户,通过tar -zxvf命令将hadoop解压缩至目的安装目录:

[root@test opt]# su hadoop
[hadoop@test opt]$ cd /opt/software
[hadoop@test software]$  tar -zxvf hadoop-${version}.tar.gz  -C /opt
[hadoop@test software]$ cd /opt
[hadoop@test opt]$ ln -s /opt/hadoop-${version} /opt/hadoop

2).创建tmpdir目录:

[hadoop@test opt]$ cd  /opt/hadoop
[hadoop@test hadoop]$ mkdir -p tmpdir

3).配置hadoop-env.sh文件:

[hadoop@test hadoop]$ cd /opt/hadoop/etc/hadoop/
[hadoop@test hadoop]$ mkdir -p /opt/hadoop/pids
[hadoop@test hadoop]$ vim hadoop-env.sh

在hadoop-env.sh文件中添加如下配置:

export JAVA_HOME=/opt/java
export HADOOP_PID_DIR=/opt/hadoop/pids

4).配置mapred-env.sh文件:

[hadoop@test hadoop]$ cd /opt/hadoop/etc/hadoop/
[hadoop@test hadoop]$ vim mapred-env.sh

在mapred-env.sh文件中添加如下配置:
export JAVA_HOME=/opt/java

5).配置core-site.xml文件 core-site.xml

[hadoop@test hadoop]$ cd /opt/hadoop/etc/hadoop/
[hadoop@test hadoop]$  vim core-site.xml

在core-site.xml文件中添加如下配置:



//namenode的临时工作目录
        hadoop.tmp.dir
        /opt/hadoop/tmpdir
    

//hdfs的入口,告诉namenode在那个机器上,端口号是什么。
        fs.defaultFS
        hdfs://test:8020
    
    
        io.file.buffer.size
        131072
    
    
        fs.trash.interval
        1440
    

6).配置hdfs-site.xml文件 hdfs-site.xml

在安装的时候如果没有安装过rnager,那么在该文件中需要将以下代码注释掉。


    dfs.namenode.inode.attributes.provider.class
    org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer

[hadoop@test hadoop]$ cd /opt/hadoop/etc/hadoop/
[hadoop@test hadoop]$ vim hdfs-site.xml

在hdfs-site.xml文件中添加如下配置:



#副本数量,一般是小于等于datanode的数量,
        dfs.replication
        2
    
    
        dfs.namenode.name.dir
        file:/opt/hadoop/data/namenode
    
    
        dfs.datanode.data.dir
        file:/opt/hadoop/data/datanode
    
     
        dfs.webhdfs.enabled 
        true 


        dfs.secondary.http.address
        test:50090
 

7).配置mapred-site.xml文件 mapred-site.xml

[hadoop@test hadoop]$ cd /opt/hadoop/etc/hadoop/
[hadoop@test hadoop]$ vim mapred-site.xml

在mapred-site.xml文件中添加如下配置:


    
        mapreduce.framework.name
        yarn
    
    
        mapreduce.jobhistory.address
        test:10020
    
    
        mapreduce.jobhistory.webapp.address
        test:19888
    

8).配置yarn-site.xml文件: yarn-site.xml

[hadoop@test hadoop]$ cd /opt/hadoop/etc/hadoop/
[hadoop@test hadoop]$ vim yarn-site.xml

在yarn-site.xml文件中添加如下配置:


    
        yarn.nodemanager.aux-services
        mapreduce_shuffle
    
    
        yarn.nodemanager.aux-services.mapreduce.shuffle.class
        org.apache.hadoop.mapred.ShuffleHandler
    
    
        yarn.resourcemanager.scheduler.address
        test:8030
    
    
        yarn.resourcemanager.resource-tracker.address
        test:8031
    
    
        yarn.resourcemanager.address
        test:8032
    
    
        yarn.resourcemanager.admin.address
        test:8033
    
    
        yarn.resourcemanager.webapp.address
        test:8088
    


9).配置hadoop运行的环境变量

[hadoop@test hadoop]$ vim /etc/profile
export HADOOP_HOME=/opt/hadoop
export PATH=$HADOOP_HOME/bin:$PATH

配置成功后,执行source /etc/profile使配置生效

[hadoop@test hadoop]$ source /etc/profile

10).修改slaves文件:

[hadoop@test hadoop]$ cd /opt/hadoop/etc/hadoop
[hadoop@test hadoop]$ vim slaves

在slaves文件中添加

//datanode的节点的位置
test2
test3

11).在test上复制hadoop-2.7.3到hadoop@test2和hadoop@test2机器并按照步骤9修改环境变量并执行以下操作:

[hadoop@test hadoop]$ scp -r /opt/hadoop-${version} hadoop@test2:/opt/
[hadoop@test hadoop]$ ln -s /opt/hadoop-${version} /opt/hadoop
[hadoop@test hadoop]$ scp -r /opt/hadoop-${version} hadoop@test3:/opt/
[hadoop@test hadoop]$ ln -s /opt/hadoop-${version} /opt/hadoop

12).格式化namenode(仅第一次启动需要格式化!),启动hadoop,并启动jobhistory服务:

# 格式化 namenode ,仅第一次启动需要格式化!!
[hadoop@test hadoop]$ hadoop namenode -format
# 启动
[hadoop@test hadoop]$ ${HADOOP_HOME}/sbin/start-all.sh
[hadoop@test hadoop]$ ${HADOOP_HOME}/sbin/mr-jobhistory-daemon.sh start historyserver

start-all.sh包含dfs和yarn两个模块的启动,分别为start-dfs.sh 、 start-yarn.sh,所以dfs和yarn可以单独启动。
注意:如果datanode没有启动起来,看看是不是tmpdir中有之前的脏数据,删除这个目录其他两台机器也要删除。

13).检查每台机器的服务,test、test2、test3三台机器上分别输入jps:

[hadoop@test ~]$ jps
24429 Jps
22898 ResourceManager
24383 JobHistoryServer
22722 SecondaryNameNode
22488 NameNode
[ahdoop@test2 ~]$ jps
7650 DataNode
7788 NodeManager
8018 Jps
[hadoop@test3 ~]$ jps
28407 Jps
28038 DataNode
28178 NodeManager

如果三台机器正常输出上述内容,则表示hadoop集群的服务正常工作。

访问hadoop的服务页面:在浏览器中输入如下地址:http://172.24.5.173:8088

跑一个简单的mr程序,验证集群是否安装成功

[hadoop@test mapreduce]$ cd /opt/hadoop/share/hadoop/mapreduce
[hadoop@test mapreduce]$ hadoop jar hadoop-mapreduce-examples-2.7.3.jar pi 2 4
Number of Maps  = 2
Samples per Map = 4
Wrote input for Map #0
Wrote input for Map #1
Starting Job
17/04/06 09:36:47 INFO client.RMProxy: Connecting to ResourceManager at test/172.24.5.173:8032
17/04/06 09:36:47 INFO input.FileInputFormat: Total input paths to process : 2
17/04/06 09:36:48 INFO mapreduce.JobSubmitter: number of splits:2
17/04/06 09:36:48 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1491470782060_0001
17/04/06 09:36:48 INFO impl.YarnClientImpl: Submitted application application_1491470782060_0001
17/04/06 09:36:48 INFO mapreduce.Job: The url to track the job: http://test:8088/proxy/application_1491470782060_0001/
17/04/06 09:36:48 INFO mapreduce.Job: Running job: job_1491470782060_0001
17/04/06 09:36:56 INFO mapreduce.Job: Job job_1491470782060_0001 running in uber mode : false
17/04/06 09:36:56 INFO mapreduce.Job:  map 0% reduce 0%
17/04/06 09:37:00 INFO mapreduce.Job:  map 50% reduce 0%
17/04/06 09:37:02 INFO mapreduce.Job:  map 100% reduce 0%
17/04/06 09:37:08 INFO mapreduce.Job:  map 100% reduce 100%
17/04/06 09:37:08 INFO mapreduce.Job: Job job_1491470782060_0001 completed successfully
17/04/06 09:37:08 INFO mapreduce.Job: Counters: 49
    File System Counters
        FILE: Number of bytes read=50
        FILE: Number of bytes written=357588
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=554
        HDFS: Number of bytes written=215
        HDFS: Number of read operations=11
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=3
    Job Counters
        Launched map tasks=2
        Launched reduce tasks=1
        Data-local map tasks=2
        Total time spent by all maps in occupied slots (ms)=6118
        Total time spent by all reduces in occupied slots (ms)=4004
        Total time spent by all map tasks (ms)=6118
        Total time spent by all reduce tasks (ms)=4004
        Total vcore-milliseconds taken by all map tasks=6118
        Total vcore-milliseconds taken by all reduce tasks=4004
        Total megabyte-milliseconds taken by all map tasks=6264832
        Total megabyte-milliseconds taken by all reduce tasks=4100096
    Map-Reduce Framework
        Map input records=2
        Map output records=4
        Map output bytes=36
        Map output materialized bytes=56
        Input split bytes=318
        Combine input records=0
        Combine output records=0
        Reduce input groups=2
        Reduce shuffle bytes=56
        Reduce input records=4
        Reduce output records=0
        Spilled Records=8
        Shuffled Maps =2
        Failed Shuffles=0
        Merged Map outputs=2
        GC time elapsed (ms)=213
        CPU time spent (ms)=2340
        Physical memory (bytes) snapshot=713646080
        Virtual memory (bytes) snapshot=6332133376
        Total committed heap usage (bytes)=546308096
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters
        Bytes Read=236
    File Output Format Counters
        Bytes Written=97
Job Finished in 20.744 seconds
Estimated value of Pi is 3.50000000000000000000

Q&A

Q: stop-all.sh无法停止hadoop集群 ?
A: 由于hadoop进程的信息保存在tmp中,而tmp会被定时清空

Q:无法启动namenode
A: core-site.xml 里的 namenode value 不能有下划线!!!!

hadoop核心要素

  • node
    • namenode
      存储元信息
  • manage
    • nodemanage

      1.管理单个节点中的计算功能
      2.与ResourcesManger(集群管理者)和ApplicationMaster(单机上的主进程)保持通信
      3.管理container的生命周期,监控每一个container的资源使用(内存、CPU,追踪节点健康状况、管理日志)

你可能感兴趣的:(数据科学,数据库,大数据,hadoop,运维)