Spark的spark.sql.warehouse.dir问题

最近学习Spark的MLib,做到NaiveByesExample例子一直报一个错,说是Spark-warehouse路径有问题(都不知道Spark-warehouse这玩意儿是啥)
一开始main下的代码是这样的
def main(args: Array[String]): Unit = {
  System.setProperty("hadoop.home.dir","D:\\hadoop\\hadoop-2.5.2");
  
  Logger.getLogger("org.apache.spark").setLevel(Level.WARN)
  Logger.getLogger("org.eclipse.jetty.server").setLevel(Level.OFF)
val conf = new SparkConf().setAppName("NaiveBayesExample").setMaster("local[2]")
val sc = new SparkContext(conf)

报错情况如下:
Exception in thread "main" java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:F:/spark培训/sparkML1/spark-warehouse
at org.apache.hadoop.fs.Path.initialize(Path.java:206)
at org.apache.hadoop.fs.Path.(Path.java:172)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.makeQualifiedPath(SessionCatalog.scala:114)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.createDatabase(SessionCatalog.scala:145)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.(SessionCatalog.scala:89)
at org.apache.spark.sql.internal.SessionState.catalog$lzycompute(SessionState.scala:95)
at org.apache.spark.sql.internal.SessionState.catalog(SessionState.scala:95)
at org.apache.spark.sql.internal.SessionState$$anon$1.(SessionState.scala:112)
at org.apache.spark.sql.internal.SessionState.analyzer$lzycompute(SessionState.scala:112)
at org.apache.spark.sql.internal.SessionState.analyzer(SessionState.scala:111)
at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:49)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:64)
at org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:266)
at org.apache.spark.mllib.classification.NaiveBayesModel$SaveLoadV2_0$.save(NaiveBayes.scala:205)
at org.apache.spark.mllib.classification.NaiveBayesModel.save(NaiveBayes.scala:170)
at com.dlmu.sparkML.test.NaiveBayesExample$.main(NaiveBayesExample.scala:36)
at com.dlmu.sparkML.test.NaiveBayesExample.main(NaiveBayesExample.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
Caused by: java.net.URISyntaxException: Relative path in absolute URI: file:F:/spark培训/sparkML1/spark-warehouse
at java.net.URI.checkPath(URI.java:1823)
at java.net.URI.(URI.java:745)
at org.apache.hadoop.fs.Path.initialize(Path.java:203)
... 21 more

后来各种上网查,好像Spark-warehouse和Spark sql有关系(我也没用到sql啊,也不知道是怎么回事),默认好像指向的是user.dir路径什么的,但是这个不存在,相关网址http://www.voidcn.com/blog/BrotherDong90/article/p-6210551.html,这个网址里用的是  SparkSession,但是我用的是Sparkcontext,用不了,所以放弃。

然后又各种查,好像说是spark.sql.warehouse.dir这个问题,就试着加上这个配置指到spark2.0.0的路径,
 System.setProperty("spark.sql.warehouse.dir","F:\\spark培训\\spark-2.0.0-bin-hadoop2.6");
加上这句话就行了,能运行出来。
代码如下:
def main(args: Array[String]): Unit = {
  System.setProperty("hadoop.home.dir","D:\\hadoop\\hadoop-2.5.2");
  System.setProperty("spark.sql.warehouse.dir","F:\\spark培训\\spark-2.0.0-bin-hadoop2.6");
  Logger.getLogger("org.apache.spark").setLevel(Level.WARN)
  Logger.getLogger("org.eclipse.jetty.server").setLevel(Level.OFF)

  val conf = new SparkConf().setAppName("NaiveBayesExample").setMaster("local[2]")
  val sc = new SparkContext(conf)
运行结果:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
16/10/18 10:57:12 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/10/18 10:57:17 INFO FileInputFormat: Total input paths to process : 1
16/10/18 10:57:17 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
16/10/18 10:57:17 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
16/10/18 10:57:17 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
16/10/18 10:57:17 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
16/10/18 10:57:17 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
16/10/18 10:57:18 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeSystemBLAS
16/10/18 10:57:18 WARN BLAS: Failed to load implementation from: com.github.fommil.netlib.NativeRefBLAS
16/10/18 10:57:19 INFO FileOutputCommitter: Saved output of task 'attempt_201610181057_0005_m_000000_10' to file:/F:/spark培训/sparkML1/spark-warehouse/myNaiveBayesModel/metadata/_temporary/0/task_201610181057_0005_m_000000
16/10/18 10:57:58 INFO CodecPool: Got brand-new compressor [.snappy]
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
16/10/18 10:58:05 INFO FileOutputCommitter: Saved output of task 'attempt_201610181057_0007_m_000000_0' to file:/F:/spark培训/sparkML1/spark-warehouse/myNaiveBayesModel/data/_temporary/0/task_201610181057_0007_m_000000
16/10/18 10:58:06 INFO FileInputFormat: Total input paths to process : 1
16/10/18 10:58:06 INFO ParquetFileReader: Initiating action with parallelism: 5
16/10/18 10:58:08 WARN ParquetRecordReader: Can not initialize counter due to context is not a instance of TaskInputOutputContext, but is org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
16/10/18 10:58:08 INFO InternalParquetRecordReader: RecordReader initialized will read a total of 1 records.
16/10/18 10:58:08 INFO InternalParquetRecordReader: at row 0. reading next block
16/10/18 10:58:08 INFO CodecPool: Got brand-new decompressor [.snappy]
16/10/18 10:58:08 INFO InternalParquetRecordReader: block read in memory in 37 ms. row count = 1


Process finished with exit code 0


你可能感兴趣的:(Spark,MLib)