spark run using IDE / Maven

来自:http://stackoverflow.com/questions/26892389/org-apache-spark-sparkexception-job-aborted-due-to-stage-failure-task-from-app

 

  1. Create a Fat Jar ( One which includes all dependencies ). Use Shade Plugin for this. Example pom :

    org.apache.maven.plugins
    maven-shade-plugin
    2.2
    
        
            
                *:*
                
                    META-INF/*.SF
                    META-INF/*.DSA
                    META-INF/*.RSA
                
            
        
    
    
        
            job-driver-jar
            package
            
                shade
            
            
                true
                driver
                
                    
                    
                    
                        reference.conf
                    
                    
                        mainClass
                    
                
            
        
        
            worker-library-jar
            package
            
                shade
            
            
                true
                worker
                
                    
                
            
        
    

  1. Now we have to send the compiled jar file to the cluster. For this, specify the jar file in the spark config like this :

SparkConf conf = new SparkConf().setAppName("appName").setMaster("spark://machineName:7077").setJars(new String[] {"target/appName-1.0-SNAPSHOT-driver.jar"});

  1. Run mvn clean package to create the Jar file. It will be created in your target folder.

  2. Run using your IDE or using maven command :

mvn exec:java -Dexec.mainClass="className"

This does not require spark-submit. Just remember to package file before running

If you don't want to hardcode the jar path, you can do this :

  1. In the config, write :

SparkConf conf = new SparkConf() .setAppName("appName") .setMaster("spark://machineName:7077") .setJars(JavaSparkContext.jarOfClass(this.getClass()));

  1. Create the fat jar ( as above ) and run using maven after running package command :

java -jar target/application-1.0-SNAPSHOT-driver.jar

This will take the jar from the jar the class was loaded.

你可能感兴趣的:(spark run using IDE / Maven)