spark对接hudi遇到的坑

1. spark-sql读写MOR 的hudi表

spark版本:2.4.3
hudi版本:0.9.0


按照官网文档可正常独写cow表,但读写mor时报错:

Caused by: org.apache.hudi.exception.HoodieException: Unable to load class
	at org.apache.hudi.common.util.ReflectionUtils.getClass(ReflectionUtils.java:57)
	at org.apache.hudi.common.util.ReflectionUtils.loadPayload(ReflectionUtils.java:78)
	at org.apache.hudi.common.util.SpillableMapUtils.convertToHoodieRecordPayload(SpillableMapUtils.java:133)
	at org.apache.hudi.common.util.SpillableMapUtils.convertToHoodieRecordPayload(SpillableMapUtils.java:118)
	at org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.createHoodieRecord(AbstractHoodieLogRecordScanner.java:322)
	at org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.processDataBlock(AbstractHoodieLogRecordScanner.java:316)
	at org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.processQueuedBlocksForInstant(AbstractHoodieLogRecordScanner.java:352)
	at org.apache.hudi.common.table.log.AbstractHoodieLogRecordScanner.scan(AbstractHoodieLogRecordScanner.java:276)
	... 26 more
Caused by: java.lang.ClassNotFoundException: org.apache.hudi.common.model.EventTimeAvroPayload
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:264)
	at org.apache.hudi.common.util.ReflectionUtils.getClass(ReflectionUtils.java:54)
	... 33 more

报错说找不到org.apache.hudi.common.model.EventTimeAvroPayload类,目前使用的hudi-common包是最新发布版本,即0.9.0版本。但确实没有EventTimeAvroPayload类。在hudi的github上看,该类是2021-11-04新增的。而0.9.0版本是2021-08发布的。

故拉hudi的最新代码,手动打hudi-common包。

打包时需要先删除hudi的主pom文件下的:

      
        org.apache.maven.plugins
        maven-checkstyle-plugin
        3.0.0
        
          
            com.puppycrawl.tools
            checkstyle
            8.18
          
        
        
          true
          UTF-8
          style/checkstyle.xml
          style/checkstyle-suppressions.xml
          checkstyle.suppressions.file
          true
          warning
          true
          
            ${project.build.sourceDirectory}
          
          
          basedir=${maven.multiModuleProjectDirectory}
          **\/generated-sources\/
        
        
          
            compile
            
              check
            
          
        
      
	

否则会报错:Failed during checkstyle configuration

删除后编译hudi-common,将编译后的新包hudi-common-0.11.0-SNAPSHOT.jar放进服务器。

用以下命令启动spark-shell测试

spark-shell --jars /home/wuzixuan/hudi-spark-bundle_2.11-0.10.0-SNAPSHOT.jar,/home/wuzixuan/hudi-common-0.11.0-SNAPSHOT.jar

建mor表:

create table if not exists hudi_mor_wuzixuan
(  
    uuid string,   
    name string,   
    age int,  
    et timestamp,  
    ts timestamp,  
    par string
) using hudi location '/user/hive/warehouse/test.db/hudi_mor_wuzixuan 
options 
(  
    type = 'mor',  
    primaryKey = 'uuid,name',  
    preCombineField = 'ts' 
);

测试读:


scala>  spark.sql("select * from test.hudi_mor_wuzixuan").show()
21/12/06 14:05:49 WARN metadata.HoodieBackedTableMetadata: Metadata table was not found at path hdfs://testppdha/user/hive/warehouse/test.db/hudi_mor_wuzixuan/.hoodie/metadata
21/12/06 14:05:55 WARN metadata.HoodieBackedTableMetadata: Metadata table was not found at path hdfs://testppdha/user/hive/warehouse/test.db/hudi_mor_wuzixuan/.hoodie/metadata
+-------------------+--------------------+------------------+----------------------+--------------------+----+----+---+--------------------+--------------------+----------+
|_hoodie_commit_time|_hoodie_commit_seqno|_hoodie_record_key|_hoodie_partition_path|   _hoodie_file_name|uuid|name|age|                  et|                  ts|       par|
+-------------------+--------------------+------------------+----------------------+--------------------+----+----+---+--------------------+--------------------+----------+
|     20211206114614|20211206114614_0_...|               fc0|        par=2021/12/04|3fbb8a21-46cb-4d1...| fc0|52c0| 21|2021-12-01 11:47:...|2021-12-06 11:47:...|2021/12/04|
|     20211206114513|20211206114513_1_...|               d54|        par=2021/12/04|3fbb8a21-46cb-4d1...| d54|7356| 67|2021-12-05 11:41:...|2021-12-06 11:41:...|2021/12/04|
|     20211206114614|20211206114614_0_...|               e29|        par=2021/12/04|3fbb8a21-46cb-4d1...| e29|4d13| 75|2021-12-04 11:47:...|2021-12-06 11:47:...|2021/12/04|
|     20211206114614|20211206114614_0_...|               231|        par=2021/12/04|3fbb8a21-46cb-4d1...| 231|c9d8| 78|2021-12-03 11:47:...|2021-12-06 11:47:...|2021/12/04|
|     20211206114614|20211206114614_0_...|               240|        par=2021/12/04|3fbb8a21-46cb-4d1...| 240|10de|  6|2021-12-02 11:46:...|2021-12-06 11:46:...|2021/12/04|
|     20211206114513|20211206114513_1_...|               e9f|        par=2021/12/04|3fbb8a21-46cb-4d1...| e9f|6d50| 40|2021-12-03 11:45:...|2021-12-06 11:45:...|2021/12/04|
|     20211206114513|20211206114513_1_...|               271|        par=2021/12/04|3fbb8a21-46cb-4d1...| 271|d24b| 21|2021-12-05 11:43:...|2021-12-06 11:43:...|2021/12/04|
|     20211206114614|20211206114614_0_...|               399|        par=2021/12/04|3fbb8a21-46cb-4d1...| 399|3e32| 93|2021-12-05 11:46:...|2021-12-06 11:46:...|2021/12/04|
|     20211206114513|20211206114513_1_...|               ee6|        par=2021/12/04|3fbb8a21-46cb-4d1...| ee6|b3e7| 42|2021-12-02 11:44:...|2021-12-06 11:44:...|2021/12/04|
|     20211206114614|20211206114614_0_...|               2a7|        par=2021/12/04|3fbb8a21-46cb-4d1...| 2a7|e3f7| 20|2021-12-03 11:46:...|2021-12-06 11:46:...|2021/12/04|
|     20211206114614|20211206114614_0_...|               54c|        par=2021/12/04|3fbb8a21-46cb-4d1...| 54c|abba|  1|2021-12-05 11:46:...|2021-12-06 11:46:...|2021/12/04|
|     20211206114513|20211206114513_1_...|               a11|        par=2021/12/04|3fbb8a21-46cb-4d1...| a11|8c58| 45|2021-12-05 11:45:...|2021-12-06 11:45:...|2021/12/04|
|     20211206114514|20211206114514_0_...|               a69|        par=2021/12/04|3fbb8a21-46cb-4d1...| a69|c868| 90|2021-12-04 11:46:...|2021-12-06 11:46:...|2021/12/04|
|     20211206114513|20211206114513_1_...|               a77|        par=2021/12/04|3fbb8a21-46cb-4d1...| a77|4be7| 56|2021-12-01 11:43:...|2021-12-06 11:43:...|2021/12/04|
|     20211206114514|20211206114514_0_...|               a7c|        par=2021/12/04|3fbb8a21-46cb-4d1...| a7c|d571| 61|2021-12-01 11:45:...|2021-12-06 11:45:...|2021/12/04|
|     20211206114614|20211206114614_0_...|               703|        par=2021/12/04|3fbb8a21-46cb-4d1...| 703|dfb2| 47|2021-12-03 11:47:...|2021-12-06 11:47:...|2021/12/04|
|     20211206114513|20211206114513_1_...|               75c|        par=2021/12/04|3fbb8a21-46cb-4d1...| 75c|ccb7| 38|2021-12-03 11:45:...|2021-12-06 11:45:...|2021/12/04|
|     20211206114614|20211206114614_0_...|               048|        par=2021/12/04|3fbb8a21-46cb-4d1...| 048|84d5| 27|2021-12-05 11:47:...|2021-12-06 11:47:...|2021/12/04|
|     20211206114614|20211206114614_0_...|               7ea|        par=2021/12/04|3fbb8a21-46cb-4d1...| 7ea|3b2f| 48|2021-12-01 11:47:...|2021-12-06 11:47:...|2021/12/04|
|     20211206114513|20211206114513_1_...|               8c4|        par=2021/12/04|3fbb8a21-46cb-4d1...| 8c4|b4ad| 34|2021-12-03 11:45:...|2021-12-06 11:45:...|2021/12/04|
+-------------------+--------------------+------------------+----------------------+--------------------+----+----+---+--------------------+--------------------+----------+
only showing top 20 rows



你可能感兴趣的:(hudi,spark,spark,big,data,分布式)