hadoop+spark+zookeeper

主机名

IP地址

安装的软件

运行的进程

Node10

192.168.18.23

jdk,hadoop,spark

namenode,resourcemanager,zkfc,Master

Node20

192.168.18.230

jdk,hadoop,spark

namenode,resourcemanager,zkfc,Master

Node30

192.168.18.248

jdk,hadoop,zookeeper,spark

datanode,nodemanager,journalnode,QuorumPeerMain,Worker

Node40

192.168.18.246

jdk,hadoop,zookeeper,spark

datanode,nodemanager,journalnode,QuorumPeerMain,Worker

Node50

192.168.18.232

jdk,hadoop,zookeeper,spark

datanode,nodemanager,journalnode,QuorumPeerMain,Worker

 

 

1.关闭selinuxiptables

vi /etc/sysconfig/selinux

SELINUX=disabled

service iptables stop

chkconfig iptables off

 

2. 配置/etc/hosts

Vi /etc/hosts

192.168.1.23     node10

192.168.1.230    node20

192.168.1.248    node30

192.168.1.246    node40

192.168.1.232    node50

 

3.创建用户并配置ssh免密码登录

[root@localhost ~]# useradd heren

[root@localhost ~]# passwd heren

[root@localhost ~]# su - heren

[heren@localhost ~]$ ssh-keygen -t rsa

[heren@localhost ~]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

[heren@localhost ~]$ chmod 600 .ssh/authorized_keys

 

下面操作只需在node10上执行即可

[heren@node10 ~]$  ssh node20 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

[heren@node10 ~]$  ssh node30 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

[heren@node10 ~]$  ssh node40 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

[heren@node10 ~]$  ssh node50 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

[heren@node10 ~]$ scp ~/.ssh/authorized_keys node20:/home/heren/.ssh/authorized_keys

[heren@node10 ~]$ scp ~/.ssh/authorized_keys node30:/home/heren/.ssh/authorized_keys

[heren@node10 ~]$ scp ~/.ssh/authorized_keys node40:/home/heren/.ssh/authorized_keys

[heren@node10 ~]$ scp ~/.ssh/authorized_keys node50:/home/heren/.ssh/authorized_keys

 

 

4.安装jdk1.7,配置环境变量

rpm -qa | grep java | xargs rpm -e --nodeps

mkdir -p /usr/java

[root@localhost ~]#cp /root/jdk-7u80-linux-x64.tar.gz /usr/java/

[root@localhost java]#tar xf jdk-7u80-linux-x64.tar.gz

[root@localhost ~]#vi /etc/profile

 

export JAVA_HOME=/usr/java/jdk1.7.0_80

export JRE_HOME=/usr/java/jdk1.7.0_80/jre

export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH

export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$JAVA_HOME:$PATH

 

export ZOOKEEPER_HOME=/usr/local/software/zookeeper

export CLASSPATH=$CLASSPATH:$ZOOKEEPER_HOME/lib

export PATH=$PATH:$ZOOKEEPER_HOME/bin

 

export HADOOP_HOME=/usr/local/software/hadoop

export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

 

export SCALA_HOME=/usr/local/software/scala

export PATH=$SCALA_HOME/bin:$PATH

 

export SPARK_HOME=/usr/local/software/spark

export PATH=$SPARK_HOME/bin:$PATH

                                               

[root@localhost ~]#source .bash_profile

[root@localhost ~]#. /etc/profile

[root@localhost ~]#java -version

 

5.创建目录

创建应用目录

[root@node10 ~]# mkdir -p /usr/local/software

隶属于heren用户

[root@node10 ~]#chown -R heren:heren /usr/local/software

创建数据目录

mkdir -p /data/hadoop/

mkdir -p /data/spark/

mkdir -p /data/zookeeper/

chown -R heren:heren /data/

 

6.安装zookeeper集群  node30 node40 node50

[root@node30 software]# su - heren

[heren@node30 ~]$ cd /usr/local/software/

[heren@node30 software]$ tar xf zookeeper-3.4.6.tar.gz

[heren@node30 software]$ mv zookeeper-3.4.6 zookeeper

配置文件

[heren@node30 software]$ cp zookeeper/conf/zoo_sample.cfg zookeeper/conf/zoo.cfg

[heren@node30 software]$ vi zookeeper/conf/zoo.cfg

dataLogDir=/data/zookeeper/logs

dataDir=/data/zookeeper/data

server.1=node30:2888:3888

server.2=node40:2888:3888

server.3=node50:2888:3888

创建id文件

mkdir -p /data/zookeeper/data/

echo '1' > /data/zookeeper/data/myid

echo '2' > /data/zookeeper/data/myid

echo '3' > /data/zookeeper/data/myid

 

在各个节点上分别启动zookooper  (node30 node40 node50)

[heren@node30 zookeeper]$ ./bin/zkServer.sh start

JMX enabled by default

Using config: /usr/local/software/zookeeper/bin/../conf/zoo.cfg

Starting zookeeper ... STARTED

[heren@node30 ~]$ /usr/local/software/zookeeper/bin/zkServer.sh status

JMX enabled by default

Using config: /usr/local/software/zookeeper/bin/../conf/zoo.cfg

Mode: leader

 

启动zookeeper  sh /usr/local/software/zookeeper/bin/zkServer.sh start

检查zookeeper是否启动成功 sh /usr/local/software/zookeeper/bin/zkServer.sh status

停止zookeeper  sh /usr/local/software/zookeeper/bin/zkServer.sh stop

 

7.安装配置hadoop集群 node10-node50

[root@node10 ~]# su - heren

[heren@node10 ~]$ cd /usr/local/software/

[heren@node10 software]$ tar xf hadoop-2.6.0.tar.gz

[heren@node10 software]$ mv hadoop-2.6.0 hadoop

修改hadoop-env.sh

[heren@node10 hadoop]$ vi hadoop-env.sh

export JAVA_HOME=/usr/java/jdk1.7.0_80

修改core-site.xml

[heren@node10 hadoop]$ vi core-site.xml

fs.defaultFS

hdfs://masters

hadoop.tmp.dir

/usr/local/software/hadoop/tmp

ha.zookeeper.quorum

node30:2181,node40:2181,node50:2181

修改hdfs-stie.xml

[heren@node10 hadoop]$ vi hdfs-site.xml

dfs.nameservices

masters

dfs.ha.namenodes.masters

node10,node20

dfs.namenode.rpc-address.masters.node10

node10:9000

dfs.namenode.http-address.masters.node10

node10:50070

        

dfs.namenode.rpc-address.masters.node20

node20:9000

        

        

dfs.namenode.http-address.masters.node20

node20:50070

       

        

dfs.namenode.shared.edits.dir

qjournal://node30:8485;node40:8485;node50:8485/masters

       

        

dfs.journalnode.edits.dir

/usr/local/software/hadoop/journal

       

        

dfs.ha.automatic-failover.enabled

true

       

        

dfs.client.failover.proxy.provider.masters

org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider

       

        

dfs.ha.fencing.methods

   sshfence

   shell(/bin/true)

       

dfs.ha.fencing.ssh.private-key-files

/home/heren/.ssh/id_rsa

dfs.ha.fencing.ssh.connect-timeout

30000

修改mapred-site.xml

[heren@node10 hadoop]$ vi  mapred-site.xml

mapreduce.framework.name

yarn

修改yarn-site.xml

[heren@node10 hadoop]$ vi yarn-site.xml

 

yarn.resourcemanager.ha.enabled

true

yarn.resourcemanager.cluster-id

RM_HA_ID

yarn.resourcemanager.ha.rm-ids

rm1,rm2

yarn.resourcemanager.hostname.rm1

node10

yarn.resourcemanager.hostname.rm2

node20

yarn.resourcemanager.recovery.enabled

true

yarn.resourcemanager.store.class

org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore

yarn.resourcemanager.zk-address

node30:2181,node40:2181,node50:2181

yarn.nodemanager.aux-services

mapreduce_shuffle

 

修改slaves

[heren@node10 hadoop]$ vi slaves

node30

node40

node50

将配置好的hadoop复制到其他节点

[heren@node10 etc]$ scp -r hadoop [email protected]:/usr/local/software/hadoop/etc/

[heren@node10 etc]$ scp -r hadoop [email protected]:/usr/local/software/hadoop/etc/

[heren@node10 etc]$ scp -r hadoop [email protected]8:/usr/local/software/hadoop/etc/

[heren@node10 etc]$ scp -r hadoop [email protected]32:/usr/local/software/hadoop/etc/

启动格式化启动hadoop集群

启动journalnode(分别在node30,node40,node50上执行 

[heren@node30 software]$  hadoop-daemon.sh start journalnode

starting journalnode, logging to /usr/local/software/hadoop/logs/hadoop-heren-journalnode-node30.out

 

格式化hdfs

[heren@node10 software]$ hdfs namenode -format

[heren@node10 hadoop]$ scp -r tmp/ heren@node20:/usr/local/software/hadoop/

 

格式化zknode10上执行

[heren@node10 hadoop]$ hdfs zkfc -formatZK

启动hdfs

[heren@node10 hadoop]$ start-dfs.sh

 

启动yarn

[heren@node10 hadoop]$ start-yarn.sh

Node20上的standby resourcemanger是需要手动启动的

[heren@node20 hadoop]$  yarn-daemon.sh start resourcemanager

 

通过web查看集群状态

查看namenode

http://node10:50070/

http://node20:50070/

查看resourcemanger

http://node10:8088/

http://node20:8088/

通过hdfs命名查看集群状态

[heren@node10 hadoop]$  hdfs dfsadmin -report

下载完以后,解压到hadooplib/native目录下,覆盖原有文件即可

[heren@node10 lib]$tar xf hadoop-native-64-2.6.0.tar

[heren@node10 lib]$ mv libh* native

 

 

验证HDFS HA

首先向hdfs上传一个文件

[heren@node10 hadoop]$ hadoop fs -put /etc/profile /profile

[heren@node10 hadoop]$ hadoop fs -ls /

然后再killactiveNameNode

[heren@node10 hadoop]$ kill -9 10020

文件仍然在

[heren@node10 hadoop]$ hadoop fs -ls /

手动启动那个挂掉的NameNode

[heren@node10 hadoop]$ sbin/hadoop-daemon.sh start namenode

 

验证YARN
运行一下hadoop提供的demo中的WordCount程序:

[heren@node10 mapreduce]$ hadoop jar /usr/local/software/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /profile /out

 

启动停止集群命令

/usr/local/software/hadoop/sbin/start-dfs.sh

/usr/local/software/hadoop/sbin/stop-dfs.sh

 

 

8.安装spark集群

[root@node10 ~]#mkdir -p /usr/scala

[root@node10 scala]# tar xf scala-2.11.7.tgz

 

[heren@node10 software]$ tar xf spark-1.5.2-bin-hadoop2.6.tgz

[heren@node10 software]$ mv spark-1.5.2-bin-hadoop2.6 spark

 

[heren@node10 conf]$ vi spark-env.sh

export JAVA_HOME=/usr/java/jdk1.7.0_80

export SCALA_HOME=/usr/scala/scala-2.11.7

export SPARK_WORKER_MEMORY=1g

export HADOOP_CONF_DIR=/usr/local/software/hadoop/etc/hadoop

export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_HOME/lib/native"

export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER

-Dspark.deploy.zookeeper.url=node30:2181,node40:2181,node50:2181 -Dspark.deploy.zookeeper.dir=/spark"



[heren@node10 conf]$ vi spark-defaults.conf

spark.master            spark://node10:7077,node20:7077

spark.serializer        org.apache.spark.serializer.KryoSerializer

spark.eventLog.enabled  true

spark.eventLog.dir      hdfs://mycluster/sparklogs



[heren@node10 conf]$ vi slaves

node20

node30

node40

node50

 

启动停止spark集群

/usr/local/software/spark1.3/sbin/start-all.sh

/usr/local/software/spark1.3/sbin/stop-all.sh

 

你可能感兴趣的:(大数据)