【Hudi数据湖应用】Hudi-Spark-Bundle NoSuchMethodError(SessionHandler.setHttpOnly)异常修复

最近尝试在sparksql上对hudi表进行insert数据,会报java.lang.NoSuchMethodError: org.apache.hudi.org.apache.jetty.server.session.SessionHandler.setHttpOnly(Z)V异常。

如果只是进行简单的create table, 然后进行select查询是没有问题的。但如果是create table test_cow using hudi as select * from source_cw这样的语句,由于
需要使用AbstractHoodieWriteClient进行写数据,就会报NoSuchMethodError:SessionHandler.setHttpOnly。

根据异常栈和源码,分析原因如下:spark写hudi数据时,SparkRDDWriteClient会调用AbstractHoodieWriteClient,AbstractHoodieWriteClient继承AbstractHoodieClient,AbstractHoodieClient持有EmbeddedTimelineService,EmbeddedTimelineService持有TimelineService,TimelineService是一个jetty server,用于给hudi client同步hdfs上文件元数据。SessionHandler是jetty-server的一个子类,hudi pom配置的jetty.version是9.4.15.v20190215。由于Hudi依赖的一些项目引入了版本更低的jetty,编译时没有正确exclude掉的话会打包到低版本的jetty,进行导致运行时报错。通过依赖分析可以知道,hudi-spark-bundle的几个hive项目的jetty是没有exclude掉的,我们把hive引入的jetty exclude,再加上hudi指定版本的jetty,重新打包就可以了。

hudi-spark-bundle的Pom修改:

 
    <dependency>
      <groupId>${hive.groupid}groupId>
      <artifactId>hive-serviceartifactId>
      <version>${hive.version}version>
      <scope>${spark.bundle.hive.scope}scope>
      <exclusions>
        <exclusion>
          <artifactId>guavaartifactId>
          <groupId>com.google.guavagroupId>
        exclusion>
        <exclusion>
          <groupId>org.eclipse.jettygroupId>
          <artifactId>*artifactId>
        exclusion>
        <exclusion>
          <groupId>org.pentahogroupId>
          <artifactId>*artifactId>
        exclusion>
      exclusions>
    dependency>

    <dependency>
      <groupId>${hive.groupid}groupId>
      <artifactId>hive-service-rpcartifactId>
      <version>${hive.version}version>
      <scope>${spark.bundle.hive.scope}scope>
    dependency>

    <dependency>
      <groupId>${hive.groupid}groupId>
      <artifactId>hive-jdbcartifactId>
      <version>${hive.version}version>
      <scope>${spark.bundle.hive.scope}scope>
      <exclusions>
        <exclusion>
          <groupId>javax.servletgroupId>
          <artifactId>*artifactId>
        exclusion>
        <exclusion>
          <groupId>javax.servlet.jspgroupId>
          <artifactId>*artifactId>
        exclusion>
        <exclusion>
          <groupId>org.eclipse.jettygroupId>
          <artifactId>*artifactId>
        exclusion>
      exclusions>
    dependency>

    <dependency>
      <groupId>${hive.groupid}groupId>
      <artifactId>hive-metastoreartifactId>
      <version>${hive.version}version>
      <scope>${spark.bundle.hive.scope}scope>
      <exclusions>
        <exclusion>
          <groupId>javax.servletgroupId>
          <artifactId>*artifactId>
        exclusion>
        <exclusion>
          <groupId>org.datanucleusgroupId>
          <artifactId>datanucleus-coreartifactId>
        exclusion>
        <exclusion>
          <groupId>javax.servlet.jspgroupId>
          <artifactId>*artifactId>
        exclusion>
        <exclusion>
          <artifactId>guavaartifactId>
          <groupId>com.google.guavagroupId>
        exclusion>
      exclusions>
    dependency>

    <dependency>
      <groupId>${hive.groupid}groupId>
      <artifactId>hive-commonartifactId>
      <version>${hive.version}version>
      <scope>${spark.bundle.hive.scope}scope>
      <exclusions>
        <exclusion>
          <groupId>org.eclipse.jetty.orbitgroupId>
          <artifactId>javax.servletartifactId>
        exclusion>
        <exclusion>
          <groupId>org.eclipse.jettygroupId>
          <artifactId>*artifactId>
        exclusion>
      exclusions>
    dependency>
 
    <dependency>
      <groupId>org.eclipse.jettygroupId>
      <artifactId>jetty-serverartifactId>
      <version>${jetty.version}version>
    dependency>
    <dependency>
      <groupId>org.eclipse.jettygroupId>
      <artifactId>jetty-utilartifactId>
      <version>${jetty.version}version>
    dependency>
    <dependency>
      <groupId>org.eclipse.jettygroupId>
      <artifactId>jetty-webappartifactId>
      <version>${jetty.version}version>
    dependency>
    <dependency>
      <groupId>org.eclipse.jettygroupId>
      <artifactId>jetty-httpartifactId>
      <version>${jetty.version}version>
    dependency>

NoSuchMethodError异常栈信息如下:

java.lang.NoSuchMethodError: org.apache.hudi.org.apache.jetty.server.session.SessionHandler.setHttpOnly(Z)V
        at io.javalin.core.util.JettyServerUtil.defaultSessionHandler(JettyServerUtil.kt:50)
        at io.javalin.Javalin.<init>(Javalin.java:94)
        at io.javalin.Javalin.create(Javalin.java:107)
        at org.apache.hudi.timeline.service.TimelineService.startService(TimelineService.java:270)
        at org.apache.hudi.client.embedded.EmbeddedTimelineService.startServer(EmbeddedTimelineService.java:94)
        at org.apache.hudi.client.embedded.EmbeddedTimelineServerHelper.startTimelineService(EmbeddedTimelineServerHelper.java:71)
        at org.apache.hudi.client.embedded.EmbeddedTimelineServerHelper.createEmbeddedTimelineService(EmbeddedTimelineServerHelper.java:58)
        at org.apache.hudi.client.AbstractHoodieClient.startEmbeddedServerView(AbstractHoodieClient.java:109)
        at org.apache.hudi.client.AbstractHoodieClient.<init>(AbstractHoodieClient.java:77)
        at org.apache.hudi.client.AbstractHoodieWriteClient.<init>(AbstractHoodieWriteClient.java:139)
        at org.apache.hudi.client.SparkRDDWriteClient.<init>(SparkRDDWriteClient.java:98)
        at org.apache.hudi.client.SparkRDDWriteClient.<init>(SparkRDDWriteClient.java:82)
        at org.apache.hudi.internal.DataSourceInternalWriterHelper.<init>(DataSourceInternalWriterHelper.java:64)
        at org.apache.hudi.spark3.internal.HoodieDataSourceInternalBatchWrite.<init>(HoodieDataSourceInternalBatchWrite.java:64)
        at org.apache.hudi.spark3.internal.HoodieDataSourceInternalBatchWriteBuilder.buildForBatch(HoodieDataSourceInternalBatchWriteBuilder.java:61)
        at org.apache.spark.sql.execution.datasources.v2.AppendDataExec.run(WriteToDataSourceV2Exec.scala:225)
        at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:40)
        at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:40)
        at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.doExecute(V2CommandExec.scala:55)
        at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
        at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
        at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:132)
        at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:131)
        at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
        at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
        at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
        at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
        at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
        at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
        at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:370)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:301)
        at org.apache.hudi.HoodieSparkSqlWriter$.bulkInsertAsRow(HoodieSparkSqlWriter.scala:482)
        at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:162)
        at org.apache.spark.sql.hudi.command.InsertIntoHoodieTableCommand$.run(InsertIntoHoodieTableCommand.scala:110)
        at org.apache.spark.sql.hudi.command.CreateHoodieTableAsSelectCommand.run(CreateHoodieTableAsSelectCommand.scala:92)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
        at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228)
        at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3700)
        at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
        at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
        at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
        at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
        at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3698)
        at org.apache.spark.sql.Dataset.<init>(Dataset.scala:228)
        at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
        at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
        at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613)
        at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:650)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:67)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:381)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:500)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1$adapted(SparkSQLCLIDriver.scala:494)
        at scala.collection.Iterator.foreach(Iterator.scala:941)
        at scala.collection.Iterator.foreach$(Iterator.scala:941)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
        at scala.collection.IterableLike.foreach(IterableLike.scala:74)
        at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
        at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:494)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:284)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
java.lang.NoSuchMethodError: org.apache.hudi.org.apache.jetty.server.session.SessionHandler.setHttpOnly(Z)V
        at io.javalin.core.util.JettyServerUtil.defaultSessionHandler(JettyServerUtil.kt:50)
        at io.javalin.Javalin.<init>(Javalin.java:94)
        at io.javalin.Javalin.create(Javalin.java:107)
        at org.apache.hudi.timeline.service.TimelineService.startService(TimelineService.java:270)
        at org.apache.hudi.client.embedded.EmbeddedTimelineService.startServer(EmbeddedTimelineService.java:94)
        at org.apache.hudi.client.embedded.EmbeddedTimelineServerHelper.startTimelineService(EmbeddedTimelineServerHelper.java:71)
        at org.apache.hudi.client.embedded.EmbeddedTimelineServerHelper.createEmbeddedTimelineService(EmbeddedTimelineServerHelper.java:58)
        at org.apache.hudi.client.AbstractHoodieClient.startEmbeddedServerView(AbstractHoodieClient.java:109)
        at org.apache.hudi.client.AbstractHoodieClient.<init>(AbstractHoodieClient.java:77)
        at org.apache.hudi.client.AbstractHoodieWriteClient.<init>(AbstractHoodieWriteClient.java:139)
        at org.apache.hudi.client.SparkRDDWriteClient.<init>(SparkRDDWriteClient.java:98)
        at org.apache.hudi.client.SparkRDDWriteClient.<init>(SparkRDDWriteClient.java:82)
        at org.apache.hudi.internal.DataSourceInternalWriterHelper.<init>(DataSourceInternalWriterHelper.java:64)
        at org.apache.hudi.spark3.internal.HoodieDataSourceInternalBatchWrite.<init>(HoodieDataSourceInternalBatchWrite.java:64)
        at org.apache.hudi.spark3.internal.HoodieDataSourceInternalBatchWriteBuilder.buildForBatch(HoodieDataSourceInternalBatchWriteBuilder.java:61)
        at org.apache.spark.sql.execution.datasources.v2.AppendDataExec.run(WriteToDataSourceV2Exec.scala:225)
        at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:40)
        at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:40)
        at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.doExecute(V2CommandExec.scala:55)
        at org.apache.spark.sql.execution.SparkPlan.$anonfun$execute$1(SparkPlan.scala:180)
        at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:218)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:215)
        at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:176)
        at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:132)
        at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:131)
        at org.apache.spark.sql.DataFrameWriter.$anonfun$runCommand$1(DataFrameWriter.scala:989)
        at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
        at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
        at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
        at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
        at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:989)
        at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:370)
        at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:301)
        at org.apache.hudi.HoodieSparkSqlWriter$.bulkInsertAsRow(HoodieSparkSqlWriter.scala:482)
        at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:162)
        at org.apache.spark.sql.hudi.command.InsertIntoHoodieTableCommand$.run(InsertIntoHoodieTableCommand.scala:110)
        at org.apache.spark.sql.hudi.command.CreateHoodieTableAsSelectCommand.run(CreateHoodieTableAsSelectCommand.scala:92)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
        at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:79)
        at org.apache.spark.sql.Dataset.$anonfun$logicalPlan$1(Dataset.scala:228)
        at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3700)
        at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
        at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
        at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
        at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
        at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3698)
        at org.apache.spark.sql.Dataset.<init>(Dataset.scala:228)
        at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:99)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
        at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:96)
        at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:618)
        at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
        at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613)
        at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:650)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:67)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:381)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:500)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1$adapted(SparkSQLCLIDriver.scala:494)
        at scala.collection.Iterator.foreach(Iterator.scala:941)
        at scala.collection.Iterator.foreach$(Iterator.scala:941)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
        at scala.collection.IterableLike.foreach(IterableLike.scala:74)
        at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
        at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:494)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:284)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

你可能感兴趣的:(Hudi,Spark,spark,大数据)