Spark - 使用yarn client模式

SparkConf

如果这样写

new SparkConf().setMaster("yarn-client")

在idea内调试会报错:

Exception in thread "main" java.lang.IllegalStateException: Library directory '....../data-platform-task/assembly/target/scala-2.11/jars' does not exist; make sure Spark is built.
	at org.apache.spark.launcher.CommandBuilderUtils.checkState(CommandBuilderUtils.java:248)

查看Spark官方文档,需要设置spark.yarn.jars或者spark.yarn.archive。

  • spark.yarn.jars:支持本地jar,也支持hdfs路径。
  • spark.yarn.archive:压缩包。

修改程序:

new SparkConf().setMaster("yarn-client")
	.set("spark.yarn.archive", getProperty(HDFS_SPARK_ARCHIVE))

在idea内调试,报错:

Caused by: java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1.x$334 of type org.apache.spark.api.java.function.PairFunction in instance of org.apache.spark.api.java.JavaPairRDD$$anonfun$pairFunToScalaFun$1
	at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2287)

这是因为找不到任务依赖的类。
继续查阅文档,有

  • spark.yarn.dist.jars:逗号分隔的jar包。

继续修改程序:

new SparkConf().setMaster("yarn-client")
	.set("spark.yarn.archive", getProperty(HDFS_SPARK_ARCHIVE))
	.set("spark.yarn.dist.jars", getProperty(TASK_JARS))

调试,可以执行了。

19/06/27 12:53:12 INFO yarn.YarnAllocator: Will request 2 executor container(s), each with 1 core(s) and 1408 MB memory (including 384 MB of overhead)
19/06/27 12:53:12 INFO yarn.YarnAllocator: Submitted 2 unlocalized container requests.
19/06/27 12:53:12 INFO yarn.ApplicationMaster: Started progress reporter thread with (heartbeat : 3000, initial allocation : 200) intervals
19/06/27 12:53:12 INFO impl.AMRMClientImpl: Received new token for : leishu-OptiPlex-7060:39105
19/06/27 12:53:12 INFO yarn.YarnAllocator: Launching container container_1561543784696_0031_01_000002 on host leishu-OptiPlex-7060 for executor with ID 1
19/06/27 12:53:13 INFO yarn.YarnAllocator: Received 1 containers from YARN, launching executors on 1 of them.
19/06/27 12:53:13 INFO impl.ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0
19/06/27 12:53:13 INFO impl.ContainerManagementProtocolProxy: Opening proxy : leishu-OptiPlex-7060:39105
19/06/27 12:53:13 INFO yarn.YarnAllocator: Launching container container_1561543784696_0031_01_000003 on host leishu-OptiPlex-7060 for executor with ID 2
19/06/27 12:53:13 INFO yarn.YarnAllocator: Received 1 containers from YARN, launching executors on 1 of them.
19/06/27 12:53:13 INFO impl.ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0
19/06/27 12:53:13 INFO impl.ContainerManagementProtocolProxy: Opening proxy : leishu-OptiPlex-7060:39105
19/06/27 12:53:16 INFO yarn.YarnAllocator: Received 1 containers from YARN, launching executors on 0 of them.
19/06/27 12:53:18 INFO yarn.YarnAllocator: Driver requested a total number of 0 executor(s).
19/06/27 12:53:18 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. 172.16.209.105:33251
19/06/27 12:53:18 INFO yarn.ApplicationMaster$AMEndpoint: Driver terminated or disconnected! Shutting down. 172.16.209.105:33251
19/06/27 12:53:18 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0
19/06/27 12:53:18 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED
19/06/27 12:53:18 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered.
19/06/27 12:53:18 INFO yarn.ApplicationMaster: Deleting staging directory file:/home/.../.sparkStaging/application_1561543784696_0031
19/06/27 12:53:18 INFO util.ShutdownHookManager: Shutdown hook called

上传文件到hdfs

其中,对于spark.yarn.archive参数,我把需要的jar包(/spark-2.4.3-bin-hadoop2.7/jars目录下的全部文件)压缩成一个zip文件,上传到了hdfs。
使用使用代码,实现上传到hdfs的功能:

public class SparkJar2Hdfs {

    public static void main(String[] args) throws Exception {

        //要上传的源文件所在路径
        Path src = new Path(getProperty(SPARK_JARS_ZIP));

        Path dst = new Path(getProperty(HDFS_SPARK_JARS_PATH));

        removeDir(dst);

        if (createDir(dst) && uploadPath(src, dst)) {
            listStatus(dst);
        }
    }

    private static FileSystem getCorSys() {
        FileSystem coreSys = null;
        Configuration conf = new Configuration();
        try {
            return FileSystem.get(URI.create(getProperty(HDFS_SPARK_ROOT)), conf);
        } catch (Exception e) {
            e.printStackTrace();
        }
        return coreSys;
    }


    //创建目录
    private static boolean createDir(Path path) {
        try (FileSystem coreSys = getCorSys()) {
            if (coreSys.exists(path)) {
                return true;
            } else {
                return coreSys.mkdirs(path);
            }
        } catch (IOException e) {
            e.printStackTrace();
            return false;
        }
    }

    //删除目录
    private static boolean removeDir(Path path) {
        try (FileSystem coreSys = getCorSys()) {
            if (coreSys.exists(path)) {
                return true;
            } else {
                return coreSys.delete(path, true);
            }
        } catch (IOException e) {
            e.printStackTrace();
            return false;
        }
    }

    //文件上传
    private static boolean uploadPath(Path srcPath, Path desPath) {
        try (FileSystem coreSys = getCorSys()) {
            if (coreSys.isDirectory(desPath)) {
                coreSys.copyFromLocalFile(srcPath, desPath);
                return true;
            } else {
                throw new IOException("desPath is not exist");
            }
        } catch (IOException e) {
            e.printStackTrace();
            return false;
        }
    }

    //文件列表
    private static void listStatus(Path desPath) {
        try (FileSystem coreSys = getCorSys()) {
            FileStatus files[] = coreSys.listStatus(desPath);
            for (int i = 0; i < files.length; i++) {
                System.out.println(files[i].getPath());
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

执行,会输出文件URL。

Connected to the target VM, address: '127.0.0.1:39539', transport: 'socket'
log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
hdfs://localhost:9000/user/.../spark-libs/spark-2.4.3-hadoop2.7.7.zip
Disconnected from the target VM, address: '127.0.0.1:39539', transport: 'socket'

Process finished with exit code 0

maven-shade

对于spark.yarn.dist.jars参数,可以使用maven-shade-plugin:

            <plugin>
                <groupId>org.apache.maven.pluginsgroupId>
                <artifactId>maven-shade-pluginartifactId>
                <version>3.2.1version>
                <configuration>
                    <shadedArtifactAttached>falseshadedArtifactAttached>
                    <outputFile>${project.build.directory}/shaded/data-platform-task-${project.version}-shaded.jar
                    outputFile>
                    <artifactSet>
                        <includes>
                            <include>com.alibaba:druidinclude>
                            <include>com.aliyun:emr-coreinclude>
                            <include>com.google.inject:guiceinclude>
                            <include>log4j:log4jinclude>
                            <include>org.postgresql:postgresqlinclude>
                            <include>org.slf4j:slf4j-apiinclude>
                            <include>org.slf4j:slf4j-log4j12include>
                            <include>org.projectlombok.lombokinclude>
                            <include>org.springframework:spring-jdbcinclude>
                        includes>
                    artifactSet>
                configuration>
                <executions>
                    <execution>
                        <phase>packagephase>
                        <goals>
                            <goal>shadegoal>
                        goals>
                    execution>
                executions>
            plugin>

这样,任务所在工程的类,以及依赖的全部第三方jar包可以打成一个jar包。

你可能感兴趣的:(Spark)