spark集成hadoop

hadoop环境搭建请参考hadoop3.2.2集群搭建

环境

centos7、jdk1.8.0_311、scala-2.12.15、zookeeper-3.6.3、hadoop3.2.2、spark-3.2.1-bin-hadoop3.2

spark配置

  1. 配置${SPARK_HOME}/conf/spark-defaults.conf,添加如下内容:
spark.serializer                   org.apache.spark.serializer.KryoSerializer
spark.eventLog.enabled             true
spark.eventLog.dir                 hdfs://vmcluster/spark-history
spark.eventLog.compress            true
spark.yarn.historyServer.address   node-3:18080
spark.history.ui.port              18080
spark.history.fs.logDirectory      hdfs://vmcluster/spark-history
spark.history.retainedApplications 10
spark.history.fs.update.interval   5s

注意:将spark-defaults.conf.template文件名修改为spark-defaults.conf

  1. 配置${SPARK_HOME}/conf/spark-env.sh,添加如下内容:
export JAVA_HOME=/home/bigdata/env/jdk1.8.0_311
export SCALA_HOME=/home/bigdata/env/scala-2.12.15
export SPARK_HOME=/home/bigdata/env/spark-3.2.1-bin-hadoop3.2
export SPARK_CONF=${SPARK_HOME}/conf
export HADOOP_HOME=/home/bigdata/env/hadoop-3.2.2
export YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoop
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop

注意:将spark-env.sh.template文件名修改为spark-env.sh

启动historyserver

start-history-server.sh

测试

提交spark自带的SparkPi进行测试,提交命令如下:

spark-submit \
--class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode cluster \
--driver-memory 1g \
--num-executors 1 \
--executor-memory 512m \
--executor-cores 1 \
--queue bigdata \
${SPARK_HOME}/examples/jars/spark-examples*.jar \
100

注意:配置spark的SPARK_HOME系统环境变量。
由于是cluster模式提交任务,结果不会输出到控制台。控制台日志输出如下:

2022-03-16 10:43:41,387 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2022-03-16 10:43:41,784 INFO yarn.Client: Requesting a new application from cluster with 3 NodeManagers
2022-03-16 10:43:42,334 INFO conf.Configuration: resource-types.xml not found
2022-03-16 10:43:42,335 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2022-03-16 10:43:42,357 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
2022-03-16 10:43:42,358 INFO yarn.Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead
2022-03-16 10:43:42,358 INFO yarn.Client: Setting up container launch context for our AM
2022-03-16 10:43:42,359 INFO yarn.Client: Setting up the launch environment for our AM container
2022-03-16 10:43:42,367 INFO yarn.Client: Preparing resources for our AM container
2022-03-16 10:43:42,487 WARN yarn.Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
2022-03-16 10:43:43,802 INFO yarn.Client: Uploading resource file:/tmp/spark-d6ff4da4-4283-43fb-a517-9085d51a1e82/__spark_libs__7226558732161014901.zip -> hdfs://lvcluster/user/bigdata/.sparkStaging/application_1647396476966_0002/__spark_libs__7226558732161014901.zip
2022-03-16 10:43:56,526 INFO yarn.Client: Uploading resource file:/home/bigdata/env/spark-3.2.1-bin-hadoop3.2/examples/jars/spark-examples_2.12-3.2.1.jar -> hdfs://lvcluster/user/bigdata/.sparkStaging/application_1647396476966_0002/spark-examples_2.12-3.2.1.jar
2022-03-16 10:43:57,009 INFO yarn.Client: Uploading resource file:/tmp/spark-d6ff4da4-4283-43fb-a517-9085d51a1e82/__spark_conf__3589752284083344005.zip -> hdfs://lvcluster/user/bigdata/.sparkStaging/application_1647396476966_0002/__spark_conf__.zip
2022-03-16 10:43:57,203 INFO spark.SecurityManager: Changing view acls to: bigdata
2022-03-16 10:43:57,203 INFO spark.SecurityManager: Changing modify acls to: bigdata
2022-03-16 10:43:57,203 INFO spark.SecurityManager: Changing view acls groups to: 
2022-03-16 10:43:57,204 INFO spark.SecurityManager: Changing modify acls groups to: 
2022-03-16 10:43:57,204 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(bigdata); groups with view permissions: Set(); users  with modify permissions: Set(bigdata); groups with modify permissions: Set()
2022-03-16 10:43:57,254 INFO yarn.Client: Submitting application application_1647396476966_0002 to ResourceManager
2022-03-16 10:43:57,515 INFO impl.YarnClientImpl: Submitted application application_1647396476966_0002
2022-03-16 10:43:58,520 INFO yarn.Client: Application report for application_1647396476966_0002 (state: ACCEPTED)
2022-03-16 10:43:58,522 INFO yarn.Client: 
         client token: N/A
         diagnostics: AM container is launched, waiting for AM container to Register with RM
         ApplicationMaster host: N/A
         ApplicationMaster RPC port: -1
         queue: root.bigdata
         start time: 1647398637277
         final status: UNDEFINED
         tracking URL: http://server1:8088/proxy/application_1647396476966_0002/
         user: bigdata
2022-03-16 10:43:59,527 INFO yarn.Client: Application report for application_1647396476966_0002 (state: ACCEPTED)
2022-03-16 10:44:00,537 INFO yarn.Client: Application report for application_1647396476966_0002 (state: ACCEPTED)
2022-03-16 10:44:01,548 INFO yarn.Client: Application report for application_1647396476966_0002 (state: ACCEPTED)
2022-03-16 10:44:02,555 INFO yarn.Client: Application report for application_1647396476966_0002 (state: ACCEPTED)
2022-03-16 10:44:03,557 INFO yarn.Client: Application report for application_1647396476966_0002 (state: ACCEPTED)
2022-03-16 10:44:04,562 INFO yarn.Client: Application report for application_1647396476966_0002 (state: ACCEPTED)
2022-03-16 10:44:05,564 INFO yarn.Client: Application report for application_1647396476966_0002 (state: ACCEPTED)
2022-03-16 10:44:06,574 INFO yarn.Client: Application report for application_1647396476966_0002 (state: ACCEPTED)
2022-03-16 10:44:07,588 INFO yarn.Client: Application report for application_1647396476966_0002 (state: ACCEPTED)
2022-03-16 10:44:08,595 INFO yarn.Client: Application report for application_1647396476966_0002 (state: ACCEPTED)
2022-03-16 10:44:09,605 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:09,605 INFO yarn.Client: 
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: server1
         ApplicationMaster RPC port: 44451
         queue: root.bigdata
         start time: 1647398637277
         final status: UNDEFINED
         tracking URL: http://server1:8088/proxy/application_1647396476966_0002/
         user: bigdata
2022-03-16 10:44:10,617 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:11,630 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:12,643 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:13,653 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:14,658 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:15,667 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:16,709 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:17,722 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:18,727 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:19,730 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:20,737 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:21,749 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:22,752 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:23,760 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:24,782 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:25,791 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:26,793 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:27,803 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:28,809 INFO yarn.Client: Application report for application_1647396476966_0002 (state: RUNNING)
2022-03-16 10:44:29,822 INFO yarn.Client: Application report for application_1647396476966_0002 (state: FINISHED)
2022-03-16 10:44:29,823 INFO yarn.Client: 
         client token: N/A
         diagnostics: N/A
         ApplicationMaster host: server1
         ApplicationMaster RPC port: 44451
         queue: root.bigdata
         start time: 1647398637277
         final status: SUCCEEDED
         tracking URL: http://server1:8088/proxy/application_1647396476966_0002/
         user: bigdata
2022-03-16 10:44:29,843 INFO util.ShutdownHookManager: Shutdown hook called
2022-03-16 10:44:29,844 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-d6ff4da4-4283-43fb-a517-9085d51a1e82
2022-03-16 10:44:29,848 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-35dc976c-c371-4888-acc8-25e3a44d60a5

yarn web ui

spark集成hadoop_第1张图片
spark集成hadoop_第2张图片
spark集成hadoop_第3张图片

yarn web ui 跳转到 spark web ui

spark集成hadoop_第4张图片
spark集成hadoop_第5张图片

还是比较简单,就不过多赘述。

你可能感兴趣的:(spark,hadoop,big,data)