java -jar 运行spark程序出现问题汇总及解决方案

java.io.IOException: No FileSystem for scheme: file

  • 原因:

    hadoop-commonshadoop-hdfs两个jar文件中,在META-INFO/services下包含相同的文件名org.apache.hadoop.fs.FileSystem,而我们使用maven-assembly-plugin时,最终只有一个文件被保留,所以被重写的那个文件系统就无法找到。

  • 解决方案:

    JavaSparkContext sc = new JavaSparkContext(conf);
    
    Configuration h_conf = sc.hadoopConfiguration();
    h_conf.set("fs.hdfs.impl", "org.apache.hadoop.hdfs.DistributedFileSystem");
    h_conf.set("fs.file.impl", "org.apache.hadoop.fs.LocalFileSystem");
    

com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'akka.version'

  • 原因:

    Akka的配置方法在很大程度上依赖于每个模块/jar都有自己的reference.conf文件的概念,所有这些都将由配置发现并加载。不幸的是,这也意味着如果你将多个jar放入/合并到同一个jar中,你也需要合并所有的reference.confs。否则所有默认值都将丢失,Akka将无法运行

  • 解决方案:

    pom.xml中使用maven-shade-plugin插件

    <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-shade-plugin</artifactId>
        <version>2.3</version>
        <executions>
          <execution>
            <phase>package</phase>
            <goals>
              <goal>shade</goal>
            </goals>
            <configuration>
              <shadedArtifactAttached>true</shadedArtifactAttached>
              <shadedClassifierName>allinone</shadedClassifierName>
              <artifactSet>
                <includes>
                  <include>*:*</include>
                </includes>
              </artifactSet>
              <transformers>
                <transformer
                  implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                  <resource>reference.conf</resource>
                </transformer>
                <transformer
                  implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                  <manifestEntries>
                    <Main-Class>com.gm.hive.Demo</Main-Class>
                  </manifestEntries>
                </transformer>
              </transformers>
            </configuration>
          </execution>
        </executions>
      </plugin>
    

    此时打完包后还会存在以下问题

    Exception in thread "main" java.lang.SecurityException: Invalid signature file digest for Manifest main attributes
    
    • 原因:

      在打包时更改了MANIFEST.MF内容,导致它跟原先的jar包签名不符,导致校验失败,程序无法运行

    • 解决方案:

      打包时过滤掉 *.RSA, *.SF, *.DSA文件

    完整版maven-shade-plugin配置如下:

    <plugin>
    		<groupId>org.apache.maven.pluginsgroupId>
    		<artifactId>maven-shade-pluginartifactId>
    		<version>2.3version>
    		<executions>
    			<execution>
    				<phase>packagephase>
    				<goals>
    					<goal>shadegoal>
    				goals>
    				<configuration>
    					<shadedArtifactAttached>trueshadedArtifactAttached>
    					<shadedClassifierName>allinoneshadedClassifierName>
    					<artifactSet>
    						<includes>
    							<include>*:*include>
    						includes>
    					artifactSet>
    					<transformers>
    						<transformer
    							implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
    							<resource>reference.confresource>
    						transformer>
    						<transformer
    							implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
    							<manifestEntries>
    								<Main-Class>com.gm.hive.DemoMain-Class>
    							manifestEntries>
    						transformer>
    					transformers>
    					<filters>
                               <filter>
                                   <artifact>*:*artifact>
                                   <excludes>
                                       <exclude>META-INF/*.SFexclude>
                                       <exclude>META-INF/*.DSAexclude>
                                       <exclude>META-INF/*.RSAexclude>
                                   excludes>
                               filter>
                           filters>
    				configuration>
    			execution>
    		executions>
    	plugin>
    

java.lang.ClassNotFoundException: Failed to find data source: json

  • 解决方案:
    df_result.write().mode(SaveMode.Overwrite).json("hdfs://s0:8020/input/df_result");
    
    将以上代码改为以下部分,指定OutputFormat的输出格式的class
    df_result.write().format("org.apache.spark.sql.execution.datasources.json.JsonFileFormat").mode(SaveMode.Overwrite).save("hdfs://s0:8020/input/df_result");
    

java.lang.IllegalArgumentException: System memory 259522560 must be at least 471859200. Please increase heap size using the --driver-memory option or spark.driver.memory in Spark configuration.

  • 原因:

    Spark是非常依赖内存的计算框架,因而对于计算机的内存有一定要求,这是典型的因为计算机内存不足而抛出的异常。本机做为Driver Program该进程运行应用的 main()方法并且创建了 SparkContext

  • 解决方案:

    在设置SparkConf时,多设置一个如下参数,来指定分配给Spark的内存

    conf.set("spark.testing.memory", "2147480000");
    

Spark Sql 支持Hive

  • 解决方案:

    spark = SparkSession.builder().config(conf).enableHiveSupport().getOrCreate();
    

Spark通过hostname无法访问主机(Driver Program)的问题

  • 解决方案:
    conf.set("spark.driver.host", "192.168.104.251");

更多详情及代码示例

请访问Git:https://github.com/gm19900510/spark_learn ,欢迎Star

你可能感兴趣的:(大数据,Spark,spark,问题汇总)