最近尝试在sparksql上对hudi表进行insert数据,会报java.lang.NoSuchMethodError: org.apache.hudi.org.apache.jetty.server.session.SessionHandler.setHttpOnly(Z)V异常。
如果只是进行简单的create table, 然后进行select查询是没有问题的。但如果是create table test_cow using hudi as select * from source_cw这样的语句,由于
需要使用AbstractHoodieWriteClient进行写数据,就会报NoSuchMethodError:SessionHandler.setHttpOnly。
根据异常栈和源码,分析原因如下:spark写hudi数据时,SparkRDDWriteClient会调用AbstractHoodieWriteClient,AbstractHoodieWriteClient继承AbstractHoodieClient,AbstractHoodieClient持有EmbeddedTimelineService,EmbeddedTimelineService持有TimelineService,TimelineService是一个jetty server,用于给hudi client同步hdfs上文件元数据。SessionHandler是jetty-server的一个子类,hudi pom配置的jetty.version是9.4.15.v20190215。由于Hudi依赖的一些项目引入了版本更低的jetty,编译时没有正确exclude掉的话会打包到低版本的jetty,进行导致运行时报错。通过依赖分析可以知道,hudi-spark-bundle的几个hive项目的jetty是没有exclude掉的,我们把hive引入的jetty exclude,再加上hudi指定版本的jetty,重新打包就可以了。
hudi-spark-bundle的Pom修改:
<dependency>
<groupId>${hive.groupid}groupId>
<artifactId>hive-serviceartifactId>
<version>${hive.version}version>
<scope>${spark.bundle.hive.scope}scope>
<exclusions>
<exclusion>
<artifactId>guavaartifactId>
<groupId>com.google.guavagroupId>
exclusion>
<exclusion>
<groupId>org.eclipse.jettygroupId>
<artifactId>*artifactId>
exclusion>
<exclusion>
<groupId>org.pentahogroupId>
<artifactId>*artifactId>
exclusion>
exclusions>
dependency>
<dependency>
<groupId>${hive.groupid}groupId>
<artifactId>hive-service-rpcartifactId>
<version>${hive.version}version>
<scope>${spark.bundle.hive.scope}scope>
dependency>
<dependency>
<groupId>${hive.groupid}groupId>
<artifactId>hive-jdbcartifactId>
<version>${hive.version}version>
<scope>${spark.bundle.hive.scope}scope>
<exclusions>
<exclusion>
<groupId>javax.servletgroupId>
<artifactId>*artifactId>
exclusion>
<exclusion>
<groupId>javax.servlet.jspgroupId>
<artifactId>*artifactId>
exclusion>
<exclusion>
<groupId>org.eclipse.jettygroupId>
<artifactId>*artifactId>
exclusion>
exclusions>
dependency>
<dependency>
<groupId>${hive.groupid}groupId>
<artifactId>hive-metastoreartifactId>
<version>${hive.version}version>
<scope>${spark.bundle.hive.scope}scope>
<exclusions>
<exclusion>
<groupId>javax.servletgroupId>
<artifactId>*artifactId>
exclusion>
<exclusion>
<groupId>org.datanucleusgroupId>
<artifactId>datanucleus-coreartifactId>
exclusion>
<exclusion>
<groupId>javax.servlet.jspgroupId>
<artifactId>*artifactId>
exclusion>
<exclusion>
<artifactId>guavaartifactId>
<groupId>com.google.guavagroupId>
exclusion>
exclusions>
dependency>
<dependency>
<groupId>${hive.groupid}groupId>
<artifactId>hive-commonartifactId>
<version>${hive.version}version>
<scope>${spark.bundle.hive.scope}scope>
<exclusions>
<exclusion>
<groupId>org.eclipse.jetty.orbitgroupId>
<artifactId>javax.servletartifactId>
exclusion>
<exclusion>
<groupId>org.eclipse.jettygroupId>
<artifactId>*artifactId>
exclusion>
exclusions>
dependency>
<dependency>
<groupId>org.eclipse.jettygroupId>
<artifactId>jetty-serverartifactId>
<version>${jetty.version}version>
dependency>
<dependency>
<groupId>org.eclipse.jettygroupId>
<artifactId>jetty-utilartifactId>
<version>${jetty.version}version>
dependency>
<dependency>
<groupId>org.eclipse.jettygroupId>
<artifactId>jetty-webappartifactId>
<version>${jetty.version}version>
dependency>
<dependency>
<groupId>org.eclipse.jettygroupId>
<artifactId>jetty-httpartifactId>
<version>${jetty.version}version>
dependency>
NoSuchMethodError异常栈信息如下:
java.lang.NoSuchMethodError: org.apache.hudi.org.apache.jetty.server.session.SessionHandler.setHttpOnly(Z)V
at io.javalin.core.util.JettyServerUtil.defaultSessionHandler(JettyServerUtil.kt:50)
at io.javalin.Javalin.<init>(Javalin.java:94)
at io.javalin.Javalin.create(Javalin.java:107)
at org.apache.hudi.timeline.service.TimelineService.startService(TimelineService.java:270)
at org.apache.hudi.client.embedded.EmbeddedTimelineService.startServer(EmbeddedTimelineService.java:94)
at org.apache.hudi.client.embedded.EmbeddedTimelineServerHelper.startTimelineService(EmbeddedTimelineServerHelper.java:71)
at org.apache.hudi.client.embedded.EmbeddedTimelineServerHelper.createEmbeddedTimelineService(EmbeddedTimelineServerHelper.java:58)
at org.apache.hudi.client.AbstractHoodieClient.startEmbeddedServerView(AbstractHoodieClient.java:109)
at org.apache.hudi.client.AbstractHoodieClient.<init>(AbstractHoodieClient.java:77)
at org.apache.hudi.client.AbstractHoodieWriteClient.<init>(AbstractHoodieWriteClient.java:139)
at org.apache.hudi.client.SparkRDDWriteClient.<init>(SparkRDDWriteClient.java:98)
at org.apache.hudi.client.SparkRDDWriteClient.<init>(SparkRDDWriteClient.java:82)
at org.apache.hudi.internal.DataSourceInternalWriterHelper.<init>(DataSourceInternalWriterHelper.java:64)
at org.apache.hudi.spark3.internal.HoodieDataSourceInternalBatchWrite.<init>(HoodieDataSourceInternalBatchWrite.java:64)
at org.apache.hudi.spark3.internal.HoodieDataSourceInternalBatchWriteBuilder.buildForBatch(HoodieDataSourceInternalBatchWriteBuilder.java:61)
at org.apache.spark.sql.execution.datasources.v2.AppendDataExec.run(WriteToDataSourceV2Exec.scala:225)
at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:40)
at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:40)
at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.doExecute(V2CommandExec.scala:55)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:132)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:131)
at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:370)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:301)
at org.apache.hudi.HoodieSparkSqlWriter$.bulkInsertAsRow(HoodieSparkSqlWriter.scala:482)
at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:162)
at org.apache.spark.sql.hudi.command.InsertIntoHoodieTableCommand$.run(InsertIntoHoodieTableCommand.scala:110)
at org.apache.spark.sql.hudi.command.CreateHoodieTableAsSelectCommand.run(CreateHoodieTableAsSelectCommand.scala:92)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228)
at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3700)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3698)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:228)
at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:650)
at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:67)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:381)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:500)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1$adapted(SparkSQLCLIDriver.scala:494)
at scala.collection.Iterator.foreach(Iterator.scala:941)
at scala.collection.Iterator.foreach$(Iterator.scala:941)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
at scala.collection.IterableLike.foreach(IterableLike.scala:74)
at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:494)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:284)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
java.lang.NoSuchMethodError: org.apache.hudi.org.apache.jetty.server.session.SessionHandler.setHttpOnly(Z)V
at io.javalin.core.util.JettyServerUtil.defaultSessionHandler(JettyServerUtil.kt:50)
at io.javalin.Javalin.<init>(Javalin.java:94)
at io.javalin.Javalin.create(Javalin.java:107)
at org.apache.hudi.timeline.service.TimelineService.startService(TimelineService.java:270)
at org.apache.hudi.client.embedded.EmbeddedTimelineService.startServer(EmbeddedTimelineService.java:94)
at org.apache.hudi.client.embedded.EmbeddedTimelineServerHelper.startTimelineService(EmbeddedTimelineServerHelper.java:71)
at org.apache.hudi.client.embedded.EmbeddedTimelineServerHelper.createEmbeddedTimelineService(EmbeddedTimelineServerHelper.java:58)
at org.apache.hudi.client.AbstractHoodieClient.startEmbeddedServerView(AbstractHoodieClient.java:109)
at org.apache.hudi.client.AbstractHoodieClient.<init>(AbstractHoodieClient.java:77)
at org.apache.hudi.client.AbstractHoodieWriteClient.<init>(AbstractHoodieWriteClient.java:139)
at org.apache.hudi.client.SparkRDDWriteClient.<init>(SparkRDDWriteClient.java:98)
at org.apache.hudi.client.SparkRDDWriteClient.<init>(SparkRDDWriteClient.java:82)
at org.apache.hudi.internal.DataSourceInternalWriterHelper.<init>(DataSourceInternalWriterHelper.java:64)
at org.apache.hudi.spark3.internal.HoodieDataSourceInternalBatchWrite.<init>(HoodieDataSourceInternalBatchWrite.java:64)
at org.apache.hudi.spark3.internal.HoodieDataSourceInternalBatchWriteBuilder.buildForBatch(HoodieDataSourceInternalBatchWriteBuilder.java:61)
at org.apache.spark.sql.execution.datasources.v2.AppendDataExec.run(WriteToDataSourceV2Exec.scala:225)
at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:40)
at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:40)
at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.doExecute(V2CommandExec.scala:55)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:132)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:131)
at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:370)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:301)
at org.apache.hudi.HoodieSparkSqlWriter$.bulkInsertAsRow(HoodieSparkSqlWriter.scala:482)
at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:162)
at org.apache.spark.sql.hudi.command.InsertIntoHoodieTableCommand$.run(InsertIntoHoodieTableCommand.scala:110)
at org.apache.spark.sql.hudi.command.CreateHoodieTableAsSelectCommand.run(CreateHoodieTableAsSelectCommand.scala:92)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228)
at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3700)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3698)
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:228)
at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:650)
at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:67)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:381)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:500)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1$adapted(SparkSQLCLIDriver.scala:494)
at scala.collection.Iterator.foreach(Iterator.scala:941)
at scala.collection.Iterator.foreach$(Iterator.scala:941)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
at scala.collection.IterableLike.foreach(IterableLike.scala:74)
at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:494)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:284)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)