spark运行java-jar:Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs

 

今天碰到的一个 spark问题,困扰好久才解决

首先我的spark集群部署使用的部署包是官方提供的

spark-1.0.2-bin-hadoop2.tgz

部署在hadoop集群上。

在运行java jar包的时候使用命令

java -jar chinahadoop-1.0-SNAPSHOT.jar  chinahadoop-1.0-SNAPSHOT.jar  hdfs://node1:8020/user/ning/data.txt /user/ning/output

出现了如下错误

14/08/23 23:18:55 INFO AppClient$ClientActor: Executor updated: app-20140823231852-0000/1 is now RUNNING
before count:MappedRDD[1] at textFile at Analysis.scala:35
Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs
 at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2421)
 at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2428)
 at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:88)
 at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2467)
 at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2449)
 at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:367)
 at org.apache.hadoop.fs.Path.getFileSystem(Path.java:287)
 at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:221)
 at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:270)
 at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:175)
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
 at scala.Option.getOrElse(Option.scala:120)
 at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
 at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
 at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
 at scala.Option.getOrElse(Option.scala:120)
 at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
 at org.apache.spark.SparkContext.runJob(SparkContext.scala:1097)
 at org.apache.spark.rdd.RDD.count(RDD.scala:861)
 at cn.chinahadoop.spark.Analysis$.main(Analysis.scala:39)
 at cn.chinahadoop.spark.Analysis.main(Analysis.scala)

 

 

 

 

在网上找了好久都没有找到答案,最终在我的maven配置文件 pom.xml添加上这么一行,终于运行通过

 

        
                                    META-INF/services/org.apache.hadoop.fs.FileSystem
          

 

 

maven的全部配置如下

 

xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0modelVersion>

    <groupId>chinahadoopgroupId>
    <artifactId>chinahadoopartifactId>
    <version>1.0-SNAPSHOTversion>

    <repositories>
        <repository>
            <id>Akka repositoryid>
            <url>http://repo.akka.io/releasesurl>
        repository>
    repositories>

    <build>
        <sourceDirectory>src/main/scala/sourceDirectory>
        <testSourceDirectory>src/test/scala/testSourceDirectory>

        <plugins>
            <plugin>
                <groupId>org.scala-toolsgroupId>
                <artifactId>maven-scala-pluginartifactId>
                <executions>
                    <execution>
                        <goals>
                            <goal>compilegoal>
                            <goal>testCompilegoal>
                        goals>
                    execution>
                executions>
                <configuration>
                    <scalaVersion>2.10.3scalaVersion>
                configuration>
            plugin>

            <plugin>
                <groupId>org.apache.maven.pluginsgroupId>
                <artifactId>maven-shade-pluginartifactId>
                <version>2.2version>
                <executions>
                    <execution>
                        <phase>packagephase>
                        <goals>
                            <goal>shadegoal>
                        goals>
                        <configuration>
                            <filters>
                                <filter>
                                    <artifact>*:*artifact>
                                    <excludes>
                                        <exclude>META-INF/*.SFexclude>
                                        <exclude>META-INF/*.DSAexclude>
                                        <exclude>META-INF/*.RSAexclude>
                                    excludes>
                                filter>
                            filters>
                            <transformers>
                                <transformer
                                        implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>reference.confresource>
                                transformer>

                                <transformer
                                        implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                    <manifestEntries>
                                        <Main-Class>cn.chinahadoop.spark.AnalysisMain-Class>
                                    manifestEntries>
                                transformer>
                                <transformer
                                        implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer">
                                    <resource>META-INF/services/org.apache.hadoop.fs.FileSystemresource>
                                transformer>
                            transformers>
                        configuration>
                    execution>
                executions>
            plugin>
        plugins>
    build>

    <dependencies>
        <dependency>
            <groupId>org.apache.sparkgroupId>
            <artifactId>spark-core_2.10artifactId>
            <version>1.0.2version>
        dependency>


        <dependency>
            <groupId>org.apache.hadoopgroupId>
            <artifactId>hadoop-clientartifactId>
            <version>2.4.1version>
        dependency>


        <dependency>
            <groupId>org.apache.sparkgroupId>
            <artifactId>spark-streaming_2.10artifactId>
            <version>1.0.2version>
        dependency>


        <dependency>
            <groupId>org.apache.hadoopgroupId>
            <artifactId>hadoop-hdfsartifactId>
            <version>2.4.1version>
        dependency>


    dependencies>

project>

 

你可能感兴趣的:(spark运行java-jar:Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs)