大数据集群安装——hadoop

安装hadoop准备

Hadoop下载地址:

https://archive.apache.org/dist/hadoop/common/hadoop-2.7.2/

安装

解压安装文件到/opt/server下面
tar -zxvf hadoop-2.7.2.tar.gz -C /opt/server/
查看是否解压成功

将Hadoop添加到环境变量
打开/etc/profile文件

vi /etc/profile
在profile文件末尾添加JDK路径:(shitf+g)

HADOOP\_HOME

export HADOOP\_HOME=/opt/server/hadoop-2.7.2
export PATH=$PATH:$HADOOP\_HOME/bin
export PATH=$PATH:$HADOOP\_HOME/sbin

让修改后的文件生效

source /etc/profile

测试是否安装成功

hadoop version
Hadoop 2.7.2
...

配置文件

修改core-site.xml


    
        ha.zookeeper.quorum
        node01:2181,node02:2181,node03:2181
    
 
    
        fs.defaultFS
        hdfs://ns
    
 
    
        hadoop.tmp.dir
        /export/servers/hadoop-2.7.5/data/tmp
    
     
    
        fs.trash.interval
        10080
    
修改hdfs-site.xml


    
        dfs.nameservices
        ns
    

    
        dfs.ha.namenodes.ns
        nn1,nn2
    

    
    
        dfs.namenode.rpc-address.ns.nn1
        node01:8020
    
    
    
        dfs.namenode.rpc-address.ns.nn2
        node02:8020
    
    
    
        dfs.namenode.servicerpc-address.ns.nn1
        node01:8022
    
    
    
        dfs.namenode.servicerpc-address.ns.nn2
        node02:8022
    
    
    
    
        dfs.namenode.http-address.ns.nn1
        node01:50070
    
    
    
        dfs.namenode.http-address.ns.nn2
        node02:50070
    
    
    
    
        dfs.namenode.shared.edits.dir
        qjournal://node01:8485;node02:8485;node03:8485/ns1
    
    
    
        dfs.client.failover.proxy.provider.ns
        org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
    
    
    
    
        dfs.ha.fencing.methods
        sshfence
    
    
    
    
        dfs.ha.fencing.ssh.private-key-files
        /root/.ssh/id_rsa
    
    
    
        dfs.journalnode.edits.dir
        /export/servers/hadoop-2.7.5/data/dfs/jn
    
    
    
        dfs.ha.automatic-failover.enabled
        true
    
    
    
        dfs.namenode.name.dir
        file:///export/servers/hadoop-2.7.5/data/dfs/nn/name
    
    
    
        dfs.namenode.edits.dir
        file:///export/servers/hadoop-2.7.5/data/dfs/nn/edits
    
    
    
        dfs.datanode.data.dir
        file:///export/servers/hadoop-2.7.5/data/dfs/dn
    
    
    
        dfs.permissions
        false
    
    
    
        dfs.blocksize
        134217728
    
修改yarn-site.xml







    
            yarn.log-aggregation-enable
            true
    
 

 

        yarn.resourcemanager.ha.enabled
        true



        yarn.resourcemanager.cluster-id
        mycluster



        yarn.resourcemanager.ha.rm-ids
        rm1,rm2



        yarn.resourcemanager.hostname.rm1
        node03



        yarn.resourcemanager.hostname.rm2
        node02




        yarn.resourcemanager.address.rm1
        node03:8032


        yarn.resourcemanager.scheduler.address.rm1
        node03:8030


        yarn.resourcemanager.resource-tracker.address.rm1
        node03:8031


        yarn.resourcemanager.admin.address.rm1
        node03:8033


        yarn.resourcemanager.webapp.address.rm1
        node03:8088




        yarn.resourcemanager.address.rm2
        node02:8032


        yarn.resourcemanager.scheduler.address.rm2
        node02:8030


        yarn.resourcemanager.resource-tracker.address.rm2
        node02:8031


        yarn.resourcemanager.admin.address.rm2
        node02:8033


        yarn.resourcemanager.webapp.address.rm2
        node02:8088



        yarn.resourcemanager.recovery.enabled
        true


           
        yarn.resourcemanager.ha.id
        rm1
       If we want to launch more than one RM in single node, we need this configuration
    
       
       

        yarn.resourcemanager.store.class
        org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore


        yarn.resourcemanager.zk-address
        node02:2181,node03:2181,node01:2181
        For multiple zk services, separate them with comma

 

        yarn.resourcemanager.ha.automatic-failover.enabled
        true
        Enable automatic failover; By default, it is enabled only when HA is enabled.


        yarn.client.failover-proxy-provider
        org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider



        yarn.nodemanager.resource.cpu-vcores
        4



        yarn.nodemanager.resource.memory-mb
        512



        yarn.scheduler.minimum-allocation-mb
        512



        yarn.scheduler.maximum-allocation-mb
        512



        yarn.log-aggregation.retain-seconds
        2592000



        yarn.nodemanager.log.retain-seconds
        604800



        yarn.nodemanager.log-aggregation.compression-type
        gz



        yarn.nodemanager.local-dirs
        /export/servers/hadoop-2.7.5/yarn/local



        yarn.resourcemanager.max-completed-applications
        1000



        yarn.nodemanager.aux-services
        mapreduce_shuffle


 

        yarn.resourcemanager.connect.retry-interval.ms
        2000

修改mapred-site.xml



        mapreduce.framework.name
        yarn



        mapreduce.jobhistory.address
        node03:10020



        mapreduce.jobhistory.webapp.address
        node03:19888



        mapreduce.jobtracker.system.dir
        /export/servers/hadoop-2.7.5/data/system/jobtracker



        mapreduce.map.memory.mb
        1024




        mapreduce.reduce.memory.mb
        1024




        mapreduce.task.io.sort.mb
        100

 



        mapreduce.task.io.sort.factor
        10



        mapreduce.reduce.shuffle.parallelcopies
        25


        yarn.app.mapreduce.am.command-opts
        -Xmx1024m



        yarn.app.mapreduce.am.resource.mb
        1536



        mapreduce.cluster.local.dir
        /export/servers/hadoop-2.7.5/data/system/local

修改slaves
node01
node02
node03
修改hadoop-env.sh
export JAVA_HOME=/export/servers/jdk1.8.0_141

集群启动过程

将第一台机器的安装包发送到其他机器上
第一台机器执行以下命令

cd /export/servers
scp -r hadoop-2.7.5/ node02:$PWD
scp -r hadoop-2.7.5/ node03:$PWD

三台机器上共同创建目录
三台机器执行以下命令

mkdir -p /export/servers/hadoop-2.7.5/data/dfs/nn/name
mkdir -p /export/servers/hadoop-2.7.5/data/dfs/nn/edits
mkdir -p /export/servers/hadoop-2.7.5/data/dfs/nn/name
mkdir -p /export/servers/hadoop-2.7.5/data/dfs/nn/edits

更改node02的rm2
第二台机器执行以下命令

cd /export/servers/hadoop-2.7.5/etc/hadoop
vim  yarn-site.xml

           
        yarn.resourcemanager.ha.id
        rm2
       If we want to launch more than one RM in single node, we need this configuration
    
启动HDFS过程

node01机器执行以下命令

cd   /export/servers/hadoop-2.7.5
bin/hdfs zkfc -formatZK
sbin/hadoop-daemons.sh start journalnode
bin/hdfs namenode -format
bin/hdfs namenode -initializeSharedEdits -force
sbin/start-dfs.sh

node02上面执行

cd   /export/servers/hadoop-2.7.5
bin/hdfs namenode -bootstrapStandby
sbin/hadoop-daemon.sh start namenode
启动yarn过程

node03上面执行

cd   /export/servers/hadoop-2.7.5
sbin/start-yarn.sh

node02上执行

cd   /export/servers/hadoop-2.7.5
sbin/start-yarn.sh
查看resourceManager状态

node03上面执行

cd   /export/servers/hadoop-2.7.5
bin/yarn rmadmin -getServiceState rm1

node02上面执行

cd   /export/servers/hadoop-2.7.5
bin/yarn rmadmin -getServiceState rm2
node03启动jobHistory

node03机器执行以下命令启动jobHistory

cd /export/servers/hadoop-2.7.5
sbin/mr-jobhistory-daemon.sh start historyserver

hdfs状态查看

node01机器查看hdfs状态

http://192.168.52.100:50070/dfshealth.html#tab-overview

node02机器查看hdfs状态

http://192.168.52.110:50070/dfshealth.html#tab-overview

yarn集群访问查看

http://node03:8088/cluster

历史任务浏览界面

页面访问:

http://192.168.52.120:19888/jobhistory

你可能感兴趣的:(hadoop,大数据,安装问题)