Spark Hadoop Kafka 部署安装文档

SPARK集群 端口使用整理

服务
端口
备注
spark-master 7077  
spark-slave    
hadoop-master 9000  
kafka-zookeeper
2181
 
kafka-master 9092  

说明 带master的服务端口 需要暴露给业务程序;  hadoop master slave 和 spark master slave 部分通讯是以 ssh为通道, 所以master和slave之间需要开启ssh免密码登录

 

1.系统环境

1.1 master slave 之间ssh免密码登录


cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
# 保证 .ssh 权限 700
#      authorized_keys 权限 744
#      id_rsa.pub      权限 744

1.2 设置master slave 域名 vim /etc/hosts

{host ip}  spark-master
{host ip}  hdfs-master
{host ip}  kafka-master

2 hadoop

2.1 下载 解压


# 下载
wget http://********/hadoop-2.7.3/hadoop-2.7.3.tar.gz

# 解压
tar -zxvf hadoop-2.7.3.tar.gz

cd hadoop-2.7.3

2.2 配置环境变量 vim ~/.bashrc


export JAVA_HOME=/usr/java/latest
export HADOOP_HOME=/*****/hadoop-2.7.3
export JRE_HOME=$JAVA_HOME/jre
export CLASSPATH=$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
export HADOOP_HOME_WARN_SUPPRESS=1
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$HADOOP_HOME/bin:$PATH

2.3 hadoop 配置

2.3.1 core-site.xml

vim conf/core-site.xml
 


        
                fs.defaultFS
                hdfs://hdfs-master:9000
        
        
                hadoop.tmp.dir
                ********/hadoop/tmp
        
         
                 fs.trash.interval
                 1440
        

mkdir -pfr ******/hadoop/tmp

2.3.2 hdfs-site.xml

 
vim conf/hdfs-site.xml



    
        dfs.replication
        1
    
    
        dfs.permissions
        false
    
    
        dfs.support.append
        true
    
    
        dfs.client.block.write.replace-datanode-on-failure.enable
        true
    
    
        dfs.client.block.write.replace-datanode-on-failure.policy
        never
    


2.3.3 log4j.properties

vim conf/log4j.properties
log4j.logger.org.apache.hadoop.util.NativeCodeLoader=ERROR

2.3.4 mapred-site.xml

cp mapred-site.xml.template mapred-site.xml 
vim conf/mapred-site.xml



    
        mapreduce.framework.name
        yarn
    


2.3.5 yarn-site.xml

 
vim conf/yarn-site.xml
 
 

 

yarn.nodemanager.aux-services 

mapreduce_shuffle 

 

 

yarn.log-aggregation-enable

  true 

 

2.4 格式化namenode

hadoop namenode -format

 
看到倒数N行 包含
xx/xx/xx xx:xx:xx INFO common.Storage: Storage directory /*****/hadoop/tmp/dfs/name has been successfully formatted.
说明创建成功

2.5 启动hadoop


$HADOOP_HOME/sbin/start-dfs.sh

2.6 测试hdfs


hdfs dfs -put README.txt /README.txt
hdfs dfs -cat /README.txt
# 能打印README内容说明OK

hadoop(hdfs) 使用端口 hdfs-master:9000

3 spark

3.1 下载


wget http://*****/spark-2.1.0-bin-hadoop2.7.tgz

tar -zxvf spark-2.1.0-bin-hadoop2.7.tgz
cd spark-2.1.0-bin-hadoop2.7

3.2 vim ~/.bashrc


export SPARK_HOME=/home/*****/spark-2.1.0-bin-hadoop2.7

3.3 修改 conf/spark-defaults.conf

cp spark-defaults.conf.template spark-defaults.conf


spark.master                     spark://spark-master:7077
spark.eventLog.enabled           true
spark.serializer                 org.apache.spark.serializer.KryoSerializer
spark.driver.memory              1g
spark.executor.extraJavaOptions  -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"

spark.ui.enabled                 false
spark.executor.memory            1g

3.4 修改 vim $SPARK_HOME/conf/log4j.properties

cp log4j.properties.template log4j.propertie

修改
log4j.rootCategory=INFO, console, file   
 
新增
log4j.appender.file=org.apache.log4j.DailyRollingFileAppender 

log4j.appender.file.File=/home/******/spark/log 

log4j.appender.file.DatePattern='.'yyyy-MM-dd 

log4j.appender.file.layout=org.apache.log4j.PatternLayout log4j.appender.file.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1} - %m%n


mkdir -p /home/******/spark/

3.5 启动SPARK


$SPARK_HOME/sbin/start-all.sh

 

3.6 验证Spark是否启动

、、、
jps
```
 
$ jps


14340 SecondaryNameNode
14132 DataNode
13960 NameNode
14760 Master
14953 Jps
14892 Worker
出现Master Worker说明已启动

spark master 使用端口 spark-master 7077

4 kafka

4.1 下载


wget http://apache.mirror.iweb.ca/kafka/0.10.2.0/kafka_2.11-0.10.2.0.tgz
tar -zxvf kafka_2.11-0.10.2.0.tgz
cd kafka_2.11-0.10.2.0

4.2 kafka端口配置

vim config/server.properties


############################# Socket Server Settings #############################

# The address the socket server listens on. It will get the value returned from 
# java.net.InetAddress.getCanonicalHostName() if not configured.
#   FORMAT:
#     listeners = listener_name://host_name:port
#   EXAMPLE:
#     listeners = PLAINTEXT://your.host.name:9092
#listeners=PLAINTEXT://:9092

4.3 启动


nohup bin/zookeeper-server-start.sh config/zookeeper.properties  >/dev/null 2>&1 &
nohup bin/kafka-server-start.sh config/server.properties >/dev/null 2>&1 &

4.4 创建topic


bin/kafka-topics.sh --create --zookeeper kafka-master:2181 --replication-factor 1 --partitions 1 --topic log

4.5 测试


# 创建一个 test 主题
bin/kafka-topics.sh --create --zookeeper kafka-master:2181 --replication-factor 1 --partitions 1 --topic test
# 发送一个消息
bin/kafka-console-producer.sh --broker-list kafka-master:9092 --topic test
>  it's a test message!


bin/kafka-console-consumer.sh --bootstrap-server kafka-master:9092 --topic test --fm-beginning
 
# 看到接收到 "it's a test message!" 就OK

你可能感兴趣的:(Spark)