Spark踩坑

Spark sql以elasticSearch为数据源,访问数据,问题记录表:

问题1:

java.lang.RuntimeException: java.io.InvalidClassException: org.apache.spark.rpc.netty.RequestMessage; local class incompatible: stream classdesc serialVersionUID = -2221986757032131007, local class serialVersionUID = -5447855329526097695

at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:617)

at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)

at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)

at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)

at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)

at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)

解决办法:客户端spark版本和spark服务层版本不一致。统一客户端jar和服务端版本解决。

问题2:

java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:F:/WorkSpace/Work/AI+/AI+doc/spark/spark/spark-warehouse
	at org.apache.hadoop.fs.Path.initialize(Path.java:206)
	at org.apache.hadoop.fs.Path.(Path.java:172)
	at org.apache.spark.sql.catalyst.catalog.SessionCatalog.makeQualifiedPath(SessionCatalog.scala:114)
	at org.apache.spark.sql.catalyst.catalog.SessionCatalog.createDatabase(SessionCatalog.scala:145)
	at org.apache.spark.sql.catalyst.catalog.SessionCatalog.(SessionCatalog.scala:89)
	at org.apache.spark.sql.internal.SessionState.catalog$lzycompute(SessionState.scala:95)
	at org.apache.spark.sql.internal.SessionState.catalog(SessionState.scala:95)
	at org.apache.spark.sql.internal.SessionState$$anon$1.(SessionState.scala:112)
	at org.apache.spark.sql.internal.SessionState.analyzer$lzycompute(SessionState.scala:112)
	at org.apache.spark.sql.internal.SessionState.analyzer(SessionState.scala:111)
	at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49)
	at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
	at org.apache.spark.sql.SparkSession.baseRelationToDataFrame(SparkSession.scala:382)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:143)
	at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:132)


未指定spark.sql.warehouse.dir路径。

代码中添加一行即可解决。

conf.set("spark.sql.warehouse.dir", "F:/spark-warehouse");

问题3:

org.apache.spark.sql.AnalysisException: Table not found: emobilelog; line 1 pos 21  
    at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)  
    at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.getTable(Analyzer.scala:305)  
    at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$9.applyOrElse(Analyzer.scala:314)  
    at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$9.applyOrElse(Analyzer.scala:309)  
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:57)  
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$resolveOperators$1.apply(LogicalPlan.scala:57)  
    at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:69)  
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperators(LogicalPlan.scala:56)  
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$1.apply(LogicalPlan.scala:54)  
    at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan$$anonfun$1.apply(LogicalPlan.scala:54)  
    at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:281)  

解决方案:需JavaEsSparkSQL来指定es的index并创建关联表。

Dataset log = JavaEsSparkSQL.esDF(sqlContext, "aipplog_2017_05");
		log.createOrReplaceTempView("aipplog");
		// log.show();
		Dataset dataset = sqlContext.sql(sql);

问题4:

17/09/22 09:43:53 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
17/09/22 09:44:08 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
17/09/22 09:43:53 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
17/09/22 09:44:08 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

本地测试不需要连接服务端spark,

conf.setMaster(PropReader.getString(Constants.sparkUrl).trim());
spark.url=local
指定本地模式。 传递给spark的master url可以有如下几种:

local 本地单线程
local[K] 本地多线程(指定K个内核)
local[*] 本地多线程(指定所有可用内核)
spark://HOST:PORT 连接到指定的 Spark standalone cluster master,需要指定端口。
mesos://HOST:PORT 连接到指定的 Mesos 集群,需要指定端口。
yarn-client客户端模式 连接到 YARN 集群。需要配置 HADOOP_CONF_DIR。
yarn-cluster集群模式 连接到 YARN 集群 。需要配置 HADOOP_CONF_DIR。



你可能感兴趣的:(Spark踩坑)