Java本地模式开发Spark程序开发遇到的问题

1. spark应用打成Jar包提交到spark on yarn执行时依赖冲突

解决:使用maven项目开发时,可以把spark、scala、hadoop相关依赖添加以下标签

<scope>providedscope>

例如:

<dependencies>
    
    <dependency>
        <groupId>org.scala-langgroupId>
        <artifactId>scala-libraryartifactId>
        <version>${scala.version}version>
        <scope>providedscope>
    dependency>

    
    <dependency>
        <groupId>org.apache.sparkgroupId>
        <artifactId>spark-core_2.11artifactId>
        <version>${spark.version}version>
        <scope>providedscope>
    dependency>

    
    <dependency>
        <groupId>org.apache.sparkgroupId>
        <artifactId>spark-streaming_2.11artifactId>
        <version>${spark.version}version>
        <scope>providedscope>
    dependency>

    
    <dependency>
        <groupId>org.apache.hadoopgroupId>
        <artifactId>hadoop-clientartifactId>
        <version>${hadoop.version}version>
        <scope>providedscope>
    dependency>

    
    <dependency>
        <groupId>redis.clientsgroupId>
        <artifactId>jedisartifactId>
        <version>2.9.0version>
    dependency>

dependencies>

2. 运行Spark程序时可能出现Caused by: java.lang.ClassNotFoundException: jxl.read.biff.BiffException

解决:添加jxl的依赖


<dependency>
    <groupId>jexcelapigroupId>
    <artifactId>jxlartifactId>
    <version>2.4.2version>
dependency>

3.spark+Maven项目其他依赖没有打到Jar包

解决:添加 maven-assembly 插件

<build>
    <finalName>${project.artifactId}finalName>
    <plugins>
        <plugin>
            <artifactId>maven-assembly-pluginartifactId>
            <configuration>
                <appendAssemblyId>falseappendAssemblyId>
                <descriptorRefs>
                    <descriptorRef>jar-with-dependenciesdescriptorRef>
                descriptorRefs>
                <archive>
                    <manifest>
                        <mainClass>主类完整类名mainClass>
                    manifest>
                archive>          
            configuration>
            <executions>
                <execution>
                    <id>make-assemblyid>
                    <phase>packagephase>
                    <goals>
                        <goal>assemblygoal>
                    goals>
                execution>
            executions>
        plugin>
        <plugin>  
            <groupId>org.apache.maven.pluginsgroupId>  
            <artifactId>maven-compiler-pluginartifactId>  
            <configuration>  
                <source>1.7source>  
                <target>1.7target>
            configuration>  
        plugin>
    plugins>
build>

4. Eclipse 本地调试Spark执行时出现java.lang.OutOfMemoryError: GC overhead limit exceeded和java.lang.OutOfMemoryError: java heap space

解决:设置 SparkConf 相关参数spark.executor.memoryOverhead 及 JVM参数 -Xmx2048m

SparkConf sparkConf = new SparkConf()
        .setAppName("jobName")//Job名称
        .setMaster("local[1]")
        .set("spark.executor.memoryOverhead","2048");

Java本地模式开发Spark程序开发遇到的问题_第1张图片

5. 运行Spark程序时 Task not serializable

解决:将出现异常的类继承原类然后实现 Serializable 接口

Java本地模式开发Spark程序开发遇到的问题_第2张图片

你可能感兴趣的:(Spark开发问题,spark)