Hadoop+Spark集群安装步骤详解

一、环境:
操作系统版本:SUSE Linux Enterprise Server 11 (x86_64) SP3
主机名:
192.168.0.10    node1
192.168.0.11    node2
192.168.0.12    node3
192.168.0.13    node4
 
软件路径:/data/install
Hadoop 集群路径:/data
JAVA_HOME 路径:/usr/jdk1.8.0_66
 
版本
组件名
版本
说明
JRE
jdk-8u66-linux-x64.tar.gz
 
zookeeper
zookeeper-3.4.6.tar.gz
 
Hadoop
hadoop-2.7.3.tar.gz
主程序包
spark
spark-2.0.2-bin-hadoop2.7.tgz
 
hbase
hbase-1.2.5-bin.tar.gz
 
 
 
一、        常用命令
1.         查看系统版本:
linux-n4ga:~ # uname –a                            # 内核版本
Linux node1 3.0.76-0.11-default #1 SMP Fri Jun 14 08:21:43 UTC 2013 (ccab990) x86_64 x86_64 x86_64 GNU/Linux
linux-n4ga:~ # lsb_release                         # 发行版本
LSB Version:    core-2.0-noarch:core-3.2-noarch:core-4.0-noarch:core-2.0-x86_64:core-3.2-x86_64:core-4.0-x86_64:desktop-4.0-amd64:desktop-4.0-noarch:graphics-2.0-amd64:graphics-2.0-noarch:graphics-3.2-amd64:graphics-3.2-noarch:graphics-4.0-amd64:graphics-4.0-noarch
linux-n4ga:~ # cat /etc/SuSE-release           # 补丁版本
SUSE Linux Enterprise Server 11 (x86_64)
VERSION = 11
PATCHLEVEL = 3
node1:~ # cat /etc/issue
Welcome to SUSE Linux Enterprise Server 11 SP3  (x86_64) - Kernel \r (\l).
node1:~ #
2.         启动集群
start-dfs.sh
start-yarn.sh
3.         关闭集群
stop-yarn.sh
stop-dfs.sh
4.         监控集群
hdfs dfsadmin -report
5.         单个进程启动 / 关闭
hadoop-daemon.sh start|stop namenode|datanode| journalnode
yarn-daemon.sh start |stop resourcemanager|nodemanager
http://blog.chinaunix.net/uid-25723371-id-4943894.html
 
 
 
二、        环境准备(所有服务器)
6.         关闭防火墙并禁止开机自启动
linux-n4ga:~ # rcSuSEfirewall2 stop
Shutting down the Firewall                                done
 
linux-n4ga:~ # chkconfig SuSEfirewall2_setup off
linux-n4ga:~ # chkconfig SuSEfirewall2_init off
linux-n4ga:~ # chkconfig --list|grep fire
SuSEfirewall2_init        0:off  1:off  2:off  3:off  4:off  5:off  6:off
SuSEfirewall2_setup       0:off  1:off  2:off  3:off  4:off  5:off  6:off
7.         设置主机名(其它类似)
linux-n4ga:~ # hostname node1
linux-n4ga:~ # vim /etc/HOSTNAME
node1.site
8.         ssh 免密登陆
node1:~ # ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
node1:~ # cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
node1:~ # ll -d .ssh/
drwx------ 2 root root 4096 Jun  5 08:50 .ssh/
node1:~ # ll .ssh/  
total 12
-rw-r--r-- 1 root root 599 Jun  5 08:50 authorized_keys
-rw------- 1 root root 672 Jun  5 08:50 id_dsa
-rw-r--r-- 1 root root 599 Jun  5 08:50 id_dsa.pub
把其它服务器的~/.ssh/id_dsa.pub 内容也追加到node1 服务器的~/.ssh/authorized_keys 文件中,然后分发
scp –rp ~/.ssh/authorized_keys [email protected]: ~/.ssh/
scp –rp ~/.ssh/authorized_keys [email protected]: ~/.ssh/
scp –rp ~/.ssh/authorized_keys [email protected]: ~/.ssh/
 
9.         修改 hosts 文件
node1:~ # vim /etc/hosts
… …
ff02::2         ipv6-allrouters
ff02::3         ipv6-allhosts
192.168.0.10    node1
192.168.0.11    node2
192.168.0.12    node3
192.168.0.13    node4
分发:
scp -rp /etc/hosts [email protected]:/etc/
scp -rp /etc/hosts [email protected]:/etc/
scp -rp /etc/hosts [email protected]:/etc/
10.     修改文件句柄数
node1:~ # vim /etc/security/limits.conf
*           soft   nofile       24000
*           hard  nofile       65535
*           soft  nproc        24000
*           hard  nproc       65535
node1:~ # source /etc/security/limits.conf
node1:~ # ulimit -n
24000
11.     时间同步
测试(举例)
node1 :~ # /usr/sbin/ntpdate 192.168.0.10
13 Jun 13:49:41 ntpdate[8370]: adjust time server 192.168.0.10 offset -0.007294 sec
添加定时任务
node1 :~ # crontab –e
*/10 * * * * /usr/sbin/ntpdate 192.168.0.10 > /dev/null 2>&1;/sbin/hwclock -w
node1:~ # service cron restart
Shutting down CRON daemon                                                          done
Starting CRON daemon                                                                             done
node1:~ # date
Tue Jun 13 05:32:49 CST 2017
node1:~ #
12.     上传安装包到 node1 服务器
node1:~ # mkdir –pv /data/install
node1:~ # cd  /data/install
node1:~ # pwd
/data/install
上传安装包到/data/install 目录下
node1:/data/install # ll
total 671968
-rw-r--r-- 1 root root 214092195 Jun  5 05:40 hadoop-2.7.3.tar.gz
-rw-r--r-- 1 root root 104584366 Jun  5 05:40 hbase-1.2.5-bin.tar.gz
-rw-r--r-- 1 root root 181287376 Jun  5 05:47 jdk-8u66-linux-x64.tar.gz
-rw-r--r-- 1 root root 187426587 Jun  5 05:40 spark-2.0.2-bin-hadoop2.7.tgz
-rw-r--r-- 1 root root 187426587 Jun  5 05:40 zookeeper-3.4.6.tar.gz
13.     安装 JDK
node1:~ # cd /data/install
node1:/data/install # tar -zxvf  jdk-8u66-linux-x64.tar.gz -C /usr/
配置环境变量
node1:/data/install #vim /etc/profile
export JAVA_HOME=/usr/jdk1.8.0_66
export HADOOP_HOME=/data/hadoop-2.7.3
export HBASE_HOME=/data/hbase-1.2.5
export SPARK_HOME=/data/spark-2.0.2
export ZOOKEEPER_HOME=/data/zookeeper-3.4.6
export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
export PATH=$ZOOKEEPER_HOME/bin:$PATH
export PATH=$HBASE_HOME/bin:$PATH
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
export PATH=$SPARK_HOME/bin:$PATH
 
node1:/opt # source /etc/profile
node1:~ # java –version               # 验证
java version "1.8.0_66"
Java(TM) SE Runtime Environment (build 1.8.0_66-b17)
Java HotSpot(TM) 64-Bit Server VM (build 25.66-b17, mixed mode)
node1:~ # echo $JAVA_HOME
/usr/jdk1.8.0_66
 
三、        安装 zookeeper
14.     解压 zookeeper
node1:~ # cd /data/install
node1:/data/install # tar -zxvf  zookeeper-3.4.6.tar.gz  -C /data/
15.     配置 zoo.cfg 文件
node1:/data/install # cd /data/zookeeper-3.4.6/conf/            # 进入conf 目录
node1: /data/zookeeper-3.4.6/conf/ # cp  zoo_sample.cfg  zoo.cfg                                      # 拷贝模板
node1: /data/zookeeper-3.4.6/conf/ # vi zoo.cfg
# The number of millinode2s of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/data/zookeeper-3.4.6/data
dataLogDir=/data/zookeeper-3.4.6/dataLog
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
server.1=node1:2888:3888
server.2=node2:2888:3888
server.3=node3:2888:3888
16.     添加 myid ,分发 ( 安装个数为奇数 )
创建指定目录:dataDir 目录下增加myid 文件;myid 中写当前zookeeper 服务的id, 因为server.1=node1:2888:3888 server指定的是1,
node1: /data/zookeeper-3.4.6/conf/ # mkdir  –pv /data/zookeeper-3.4.6/{data, dataLog}
node1: /data/zookeeper-3.4.6/conf/ # echo 1 > /data/zookeeper-3.4.6/data/myid
17.     分发:
node1: /data/zookeeper-3.4.6/conf/ # scp -rp /data/zookeeper-3.4.6  [email protected]:/data
node1: /data/zookeeper-3.4.6/conf/ # scp -rp /data/zookeeper-3.4.6  [email protected]:/data
在其余机子配置,node2 下面的myid 2 node3 下面myid 3 ,这些都是根据server 来的
node2: /data/zookeeper-3.4.6/conf/ # echo 2 > /data/zookeeper-3.4.6/data/myid
node3: /data/zookeeper-3.4.6/conf/ # echo 3> /data/zookeeper-3.4.6/data/myid

 
 
四、        安装 Hadoop
18.     解压 hadoop     
node1:~ # cd /data/install
node1:/data/install # tar -zxvf hadoop-2.7.3.tar.gz -C /data/
19.     配置 hadoop-env.sh
node1:~ # vim /data/hadoop-2.7.3/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/jdk1.8.0_66
20.     配置 core-site.xml
node1:~ # vim  /data/hadoop-2.7.3/etc/hadoop/core-site.xml    
  
 
      
        hdfs: //mycluster  
  
 
hadoop.tmp.dir
/data/hadoop-2.7.3/data/tmp
ha.zookeeper.quorum
node1:2181,node2:2181,node3:2181
zookeeper 客户端连接地址
 
ha.zookeeper.session-timeout.ms
10000
 
 
    fs.trash.interval
    1440
    以分钟为单位的垃圾回收时间,垃圾站中数据超过此时间,会被删除。如果是0 ,垃圾回收机制关闭。
 
 
 
    fs.trash.checkpoint.interval
    1440
    以分钟为单位的垃圾回收检查间隔。
 
21.   配置 yarn-site.xml
node1:~ # vim /data/hadoop-2.7.3/etc/ hadoop/yarn-site.xml #
   
        yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms
        5000
        schelduler 失联等待连接时间
   
   
        yarn.nodemanager.aux-services
        mapreduce_shuffle
        NodeManager 上运行的附属服务。需配置成 mapreduce_shuffle ,才可运行 MapReduce 程序
   
   
        yarn.resourcemanager.ha.enabled
        true
        是否启用 RM HA ,默认为 false (不启用)
   
   
        yarn.resourcemanager.cluster-id
        cluster1
        集群的 Id elector 使用该值确保 RM 不会做为其它集群的 active
   
   
        yarn.resourcemanager.ha.rm-ids
        rm1,rm2
        RMs 的逻辑 id 列表 , 用逗号分隔 , :rm1,rm2
   
   
        yarn.resourcemanager.hostname.rm1
        node3
        RM hostname
   
   
        yarn.resourcemanager.scheduler.address.rm1
        ${yarn.resourcemanager.hostname.rm1}:8030
        RM AM 暴露的地址 ,AM 通过地址想 RM 申请资源 , 释放资源等
   
   
        yarn.resourcemanager.resource-tracker.address.rm1
        ${yarn.resourcemanager.hostname.rm1}:8031
        RM NM 暴露地址 ,NM 通过该地址向 RM 汇报心跳 , 领取任务等
   
   
        yarn.resourcemanager.address.rm1
        ${yarn.resourcemanager.hostname.rm1}:8032
        RM 对客户端暴露的地址 , 客户端通过该地址向 RM 提交应用程序等
   
   
        yarn.resourcemanager.admin.address.rm1
        ${yarn.resourcemanager.hostname.rm1}:8033
        RM 对管理员暴露的地址 . 管理员通过该地址向 RM 发送管理命令等
   
   
        yarn.resourcemanager.webapp.address.rm1
        ${yarn.resourcemanager.hostname.rm1}:8088
        RM 对外暴露的 web http 地址,用户可通过该地址在浏览器中查看集群信息
   
   
        yarn.resourcemanager.hostname.rm2
        node4
   
   
        yarn.resourcemanager.scheduler.address.rm2
        ${yarn.resourcemanager.hostname.rm2}:8030
   
   
        yarn.resourcemanager.resource-tracker.address.rm2
        ${yarn.resourcemanager.hostname.rm2}:8031
   
   
        yarn.resourcemanager.address.rm2
        ${yarn.resourcemanager.hostname.rm2}:8032
   
   
        yarn.resourcemanager.admin.address.rm2
        ${yarn.resourcemanager.hostname.rm2}:8033
   
   
        yarn.resourcemanager.webapp.address.rm2
        ${yarn.resourcemanager.hostname.rm2}:8088
   
   
        yarn.resourcemanager.recovery.enabled
        true
        默认值为 false ,也就是说 resourcemanager 挂了相应的正在运行的任务在 rm 恢复后不能重新启动
   
   
        yarn.resourcemanager.store.class
        org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore
        状态存储的类
   
   
        yarn.resourcemanager.zk-address
        node1:2181,node2:2181,node3:2181
   
   
        yarn.nodemanager.resource.memory-mb
        240000
        该节点上 nodemanager 可使用的物理内存总量
   
   
        yarn.nodemanager.resource.cpu-vcores
        24
        该节点上 nodemanager 可使用的虚拟 CPU 个数
   
 
   
        yarn.scheduler.minimum-allocation-mb
        1024
        单个任务可申请的最小物理内存量
   
   
        yarn.scheduler.maximum-allocation-mb
        240000
        单个任务可申请的最大物理内存量
   
   
        yarn.scheduler.minimum-allocation-vcores
        1
        单个任务可申请的最小虚拟 CPU 个数
   
   
        yarn.scheduler.maximum-allocation-vcores
        24
        单个任务可申请的最大虚拟 CPU 个数
   
   
        yarn.nodemanager.vmem-pmem-ratio
        4
        任务每使用 1MB 物理内存,最多可使用虚拟内存量,默认是 2.1
   
22.     配置 mapred-site.xml
node1:~ # cp /data/hadoop-2.7.3/ etc/hadoop/mapred-site.xml{.template ,}
node1:~ # vim /data/hadoop-2.7.3/ etc/hadoop/mapred-site.xml
   
        mapreduce.framework.name
        yarn
   
23.     配置 hdfs-site.xml
node1:~ # vim /data/hadoop-2.7.3/etc/hadoop/hdfs-site.xml
   
        dfs.replication
        2
        保存副本数
   
   
        dfs.nameservices
        mycluster
   
   
        dfs.ha.namenodes.mycluster
        nn1,nn2
   
   
        dfs.namenode.rpc-address.mycluster.nn1
        node1:8020
   
   
        dfs.namenode.rpc-address.mycluster.nn2
        node2:8020
   
   
        dfs.namenode.http-address.mycluster.nn1
        node1:50070
   
   
        dfs.namenode.http-address.mycluster.nn2
        node2:50070
   
   
        dfs.namenode.shared.edits.dir
        qjournal://node1:8485;node2:8485;node3:8485/mycluster
   
   
        dfs.client.failover.proxy.provider.mycluster
        org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
   
   
        dfs.ha.fencing.methods
        sshfence
   
   
        dfs.ha.fencing.ssh.private-key-files
        /root/.ssh/id_dsa
   
   
        dfs.journalnode.edits.dir
        /data/ hadoop-2.7.3/data/journal
   
   
        dfs.permissions.superusergroup
        root
超级用户组名
   
 
   
        dfs.ha.automatic-failover.enabled
        true
开启自动故障转移
   
新建相应目录
node1:~ # mkdir -pv /data/ hadoop-2.7.3/data/{journal,tmp}
24.     配置 capacity-scheduler.xml
 
 
    yarn.scheduler.capacity.maximum-applications
    10000
   
      Maximum number of applications that can be pending and running.
   
 
 
 
    yarn.scheduler.capacity.maximum-am-resource-percent
    0.1
   
      Maximum percent of resources in the cluster which can be used to run
      application masters i.e. controls number of concurrent running
      applications.
   
 
 
 
    yarn.scheduler.capacity.resource-calculator
    org.apache.hadoop.yarn.util.resource.DominantResourceCalculator
   
      The ResourceCalculator implementation to be used to compare
      Resources in the scheduler.
      The default i.e. DefaultResourceCalculator only uses Memory while
      DominantResourceCalculator uses dominant-resource to compare
      multi-dimensional resources such as Memory, CPU etc.
   
 
 
 
    yarn.scheduler.capacity.root.queues
    default
   
      The queues at the this level (root is the root queue).
   
 
 
 
    yarn.scheduler.capacity.root.default.capacity
    100
    Default queue target capacity.
 
 
 
    yarn.scheduler.capacity.root.default.user-limit-factor
    1
   
      Default queue user limit a percentage from 0.0 to 1.0.
   
 
 
 
    yarn.scheduler.capacity.root.default.maximum-capacity
    100
   
      The maximum capacity of the default queue.
   
 
 
 
    yarn.scheduler.capacity.root.default.state
    RUNNING
      The state of the default queue. State can be one of RUNNING or STOPPED.
   
 
 
 
    yarn.scheduler.capacity.root.default.acl_submit_applications
    *
   
      The ACL of who can submit jobs to the default queue.
   
 
 
 
    yarn.scheduler.capacity.root.default.acl_administer_queue
    *
   
      The ACL of who can administer jobs on the default queue.
   
 
 
 
    yarn.scheduler.capacity.node-locality-delay
    40
   
      Number of missed scheduling opportunities after which the CapacityScheduler
      attempts to schedule rack-local containers.
      Typically this should be set to number of nodes in the cluster, By default is setting
      approximately number of nodes in one rack which is 40.
   
 
 
 
    yarn.scheduler.capacity.queue-mappings
   
   
      A list of mappings that will be used to assign jobs to queues
      The syntax for this list is [u|g]:[name]:[queue_name][,next mapping]*
      Typically this list will be used to map users to queues,
      for example, u:%user:%user maps all users to queues with the same name
      as the user.
   
 
 
 
    false
   
      If a queue mapping is present, will it override the value specified
      by the user? This can be used by administrators to place jobs in queues
      that are different than the one specified by the user.
      The default is false.
   
 
 
25.     配置 slaves
node1:~ # vim  /data/hadoop-2.7.3/etc/hadoop/
node1
node2
node3
node4
26.     修改 $HADOOP_HOME/sbin/hadoop-daemon.sh
node1: /data/hadoop-2.7.3 # cd /data/hadoop-2.7.3/sbin/
# 添加:
node1: /data/hadoop-2.7.3/sbin # HADOOP_PID_DIR=/data/hdfs/pids            
27.     修改 $HADOOP_HOME/sbin/yarn-daemon.sh
# 添加:
node1: /data/hadoop-2.7.3/sbin # HADOOP_PID_DIR=/data/hdfs/pids     
28.     分发
node1: /data/hadoop-2.7.3/etc/hadoop/ # scp -rp /data/hadoop-2.7.3  [email protected]:/data
node1: /data/hadoop-2.7.3/etc/hadoop/ # scp -rp /data/hadoop-2.7.3  [email protected]:/data
node1: /data/hadoop-2.7.3/etc/hadoop/ # scp -rp /data/hadoop-2.7.3  [email protected]:/data

 
五、        安装 hbase
29.     解压 hbase
node1:/data # cd /data/install
node1:/data/install # tar -zxvf  hbase-1.2.5-bin.tar.gz  -C /data
30.     修改 $HBASE_HOME/conf/hbase-env.sh, 添加
node1:/data # cd /data/hbase-1.2.5/conf
node1: /data/hbase-1.2.5 # vim hbase-env.sh
export HBASE_HOME=/data/hbase-1.2.5
export JAVA_HOME=/usr/jdk1.8.0_66
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$HADOOP_HOME/lib/native/
export HBASE_LIBRARY_PATH=$HBASE_LIBRARY_PATH:$HBASE_HOME/lib/native/
# 设置到Hadoop etc/hadoop 目录是用来引导Hbase 找到Hadoop, 也就是说hbase hadoop 进行关联【必须设置, 否则hmaster 起不来】
export HBASE_CLASSPATH=$HADOOP_HOME/etc/hadoop
export HBASE_MANAGES_ZK=false               # 不启用hbase 自带的zookeeper 
export HBASE_PID_DIR=/data/hdfs/pids
export HBASE_SSH_OPTS="-o ConnectTimeout=1 -p 36928"                #ssh 端口;
31.     修改 regionservers 文件
node1: /data/hbase-1.2.5 # vim regionservers
node1
node2
node3
node4
node1: /data/hbase-1.2.5 #
32.     修改 hbase-site.xml 文件
node1:/data/hbase-1.2.5/conf # vim hbase-site.xml
 
    hbase.rootdir
    hdfs://mycluster/hbase
 
      hbase.zookeeper.quorum
      node1,node2,node3
      hbase.zookeeper.property.clientPort
      2181
  
33.     分发
node1: /data/hbase-1.2.5/conf # scp -rp /data/hbase-1.2.5  [email protected]:/data
node1: /data/hbase-1.2.5/conf # scp -rp /data/hbase-1.2.5  [email protected]:/data
node1: /data/hbase-1.2.5/conf # scp -rp /data/hbase-1.2.5  [email protected]:/data

六、        安装 spark
34.     解压 spark
node1:/data #cd /data/install
node1:/data/install # tar -zxvf spark-2.0.2-bin-hadoop2.7.tgz  -C /data
35.     修改文件名: spark-2.0.2
node1:/data # mv spark-2.0.2-bin-hadoop2.7 spark-2.0.2
36.     配置 spark-env.sh
node1:/data #cd /data/spark-2.0.2/conf/
node1: /data/spark-2.0.2/conf/ #cp spark-env.sh.template spark-env.sh
node1: /data/spark-2.0.2/conf/ #vim spark-env.sh                     
# 添加:
export JAVA_HOME=/usr/jdk1.8.0_66
export SPARK_PID_DIR=/data/ spark-2.0.2/conf/pids
# 设置内存
export SPARK_WORKER_MEMORY=240g
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export LD_LIBRARY_PATH=$HADOOP_HOME/lib/native
export SPARK_MASTER_PORT=7077
export SPARK_WORKER_INSTANCES=1
export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080 -Dspark.history.retainedApplications=3 -Dspark.history.fs.logDirectory=hdfs://mycluster/directory"
# 限制程序申请资源最大核数
export SPARK_MASTER_OPTS="-Dspark.deploy.defaultCores=12"
export SPARK_SSH_OPTS="-p 36928 -o StrictHostKeyChecking=no $SPARK_SSH_OPTS"
export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=18080 -Dspark.history.retainedApplications=3 -Dspark.history.fs.logDirectory=hdfs://mycluster/directory"
# 内存小于32G ,配下面的
export SPARK_JAVA_OPTS="-XX:+UseCompressedOops -XX:+UseCompressedStrings $SPARK_JAVA_OPTS"
37.     配置 spark-defaults.conf
node1:/data #cd /data/spark-2.0.2/conf/
node1: /data/spark-2.0.2/conf/ #cp spark-defaults.conf.template spark-defaults.conf
node1: /data/spark-2.0.2/conf/ #vi spark-defaults.conf
# 添加
spark.serializer                  org.apache.spark.serializer.KryoSerializer
spark.eventLog.enabled           true
spark.eventLog.dir               hdfs://mycluster/directory
spark.local.dir                  /data/spark-2.0.2/sparktmp
38.     配置 slaves
node1:/data #cd /data/spark-2.0.2/conf/
node1: /data/spark-2.0.2/conf/ #mv slaves.template slaves
node1: /data/spark-2.0.2/conf/ # vim slaves
node1
node2
node3
node4
node1: /data/spark-2.0.2/conf/ #
39.     分发
node1: /data/spark-2.0.2/conf/ # scp -rp /data/spark-2.0.2  [email protected]:/data
node1: /data/spark-2.0.2/conf/ # scp -rp /data/spark-2.0.2  [email protected]:/data
node1: /data/spark-2.0.2/conf/ # scp -rp /data/spark-2.0.2  [email protected]:/data
  七、        启动过程
40.     同时开启所有 zookeeper 节点
node1:/data #cd /data/zookeeper-3.4.6/bin
node1: /data/zookeeper-3.4.6/bin #zkServer.sh start
node2: /data/zookeeper-3.4.6/bin #zkServer.sh start
node3: /data/zookeeper-3.4.6/bin #zkServer.sh start
41.     启动所有 journalnode 节点
node1:/data #cd /data/hadoop-2.7.3
node1:/data/hadoop-2.7.3 #sbin/hadoop-daemon.sh start journalnode
node2:/data/hadoop-2.7.3 #sbin/hadoop-daemon.sh start journalnode
node3:/data/hadoop-2.7.3 #sbin/hadoop-daemon.sh start journalnode
42.     格式化 namenode 目录 ( 主节点 node1)
node1:/data #cd /data/hadoop-2.7.3
node1:/data/hadoop-2.7.3 #./bin/hdfs namenode -format
43.     启动当前格式化的 namenode 进程 ( 主节点 node1)
node1:/data/hadoop-2.7.3 #./sbin/hadoop-daemon.sh start namenode
44.     在没有格式化的 NN 执行同步命令 ( 副节点 node2)
node2:/data/hadoop-2.7.3 #./bin/hdfs namenode -bootstrapStandby
45.     启动 hdfs
node1:/data/hadoop-2.7.3 #./sbin/hadoop-daemon.sh start namenode
node1:/data/hadoop-2.7.3 #./sbin/start-dfs.sh
46.     启动 yarn
node1:~ # $HADOOP_HOME/sbin/ start-yarn.sh 
47.     两台 resourcemanager 上启动 resourcemanager
node3:~ # $HADOOP_HOME/sbin/ yarn-daemon.sh start resourcemanager
node4:~ # $HADOOP_HOME/sbin/ yarn-daemon.sh start resourcemanager
HDFS yarn web 控制台默认监听端口分别为 50070 8088 。可以通过浏览放访问查看运行情况。
停止命令:
$HADOOP_HOME/sbin/stop-dfs.sh
$HADOOP_HOME/sbin/stop-yarn.sh
如果一切正常,使用 jps 可以查看到正在运行的 Hadoop 服务,在我机器上的显示结果为:
7312 Jps
1793 NameNode
2163 JournalNode
357 NodeManager
2696 QuorumPeerMain
14428 DFSZKFailoverController
1917 DataNode
48.     启动 hbase
node1:/data/hadoop-2.7.3 #cd /data/hbase-1.2.5/bin
node1:/data/hbase-1.2.5/bin #./start-hbase.sh
node1:/data/hbase-1.2.5/bin # jps
7312 Jps
8463 HMaster
1793 NameNode
2163 JournalNode
357 NodeManager
14632 HRegionServer
2696 QuorumPeerMain
14428 DFSZKFailoverController
1917 DataNode
 
Hbase web 页面 http://node1:16010
49.     启动 spark
node1: /data/hbase-1.2.5/bin #cd /data /spark-2.0.2/sbin
node1: /data /spark-2.0.2/sbin #./start-all.sh
node1: /data /spark-2.0.2/sbin #./start-history-server.sh
node1:/data/spark-2.0.2/sbin # jps
7312 Jps
8463 HMaster
1793 NameNode
2163 JournalNode
4901 Worker
357 NodeManager
14632 HRegionServer
2696 QuorumPeerMain
14428 DFSZKFailoverController
1917 DataNode
1722 Master
node1:/data/spark-2.0.2/sbin #
 
spark master web 页面访问 http://node1:8080
spark app 历史日志页面访问 http://node1:18080


你可能感兴趣的:(Spark,Hadoop)