通过spark程序向parquet格式的表写数据报错ParquetEncodingException: empty fields are illegal, the field should be ommited completely instead
报错如下:
org.apache.spark.SparkException: Task failed while writing rows.
at org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer.writeToFile(hiveWriterContainers.scala:333)
at org.apache.spark.sql.hive.execution.InsertIntoHiveTableKaTeX parse error: Can't use function '$' in math mode at position 8: anonfun$̲saveAsHiveFile$…anonfun$saveAsHiveFile 3. a p p l y ( I n s e r t I n t o H i v e T a b l e . s c a l a : 210 ) a t o r g . a p a c h e . s p a r k . s c h e d u l e r . R e s u l t T a s k . r u n T a s k ( R e s u l t T a s k . s c a l a : 87 ) a t o r g . a p a c h e . s p a r k . s c h e d u l e r . T a s k . r u n ( T a s k . s c a l a : 99 ) a t o r g . a p a c h e . s p a r k . e x e c u t o r . E x e c u t o r 3.apply(InsertIntoHiveTable.scala:210) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:99) at org.apache.spark.executor.Executor 3.apply(InsertIntoHiveTable.scala:210)atorg.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)atorg.apache.spark.scheduler.Task.run(Task.scala:99)atorg.apache.spark.executor.ExecutorTaskRunner.run(Executor.scala:322)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor W o r k e r . r u n ( T h r e a d P o o l E x e c u t o r . j a v a : 617 ) a t j a v a . l a n g . T h r e a d . r u n ( T h r e a d . j a v a : 745 ) C a u s e d b y : j a v a . l a n g . R u n t i m e E x c e p t i o n : P a r q u e t r e c o r d i s m a l f o r m e d : e m p t y f i e l d s a r e i l l e g a l , t h e f i e l d s h o u l d b e o m m i t e d c o m p l e t e l y i n s t e a d a t o r g . a p a c h e . h a d o o p . h i v e . q l . i o . p a r q u e t . w r i t e . D a t a W r i t a b l e W r i t e r . w r i t e ( D a t a W r i t a b l e W r i t e r . j a v a : 64 ) a t o r g . a p a c h e . h a d o o p . h i v e . q l . i o . p a r q u e t . w r i t e . D a t a W r i t a b l e W r i t e S u p p o r t . w r i t e ( D a t a W r i t a b l e W r i t e S u p p o r t . j a v a : 59 ) a t o r g . a p a c h e . h a d o o p . h i v e . q l . i o . p a r q u e t . w r i t e . D a t a W r i t a b l e W r i t e S u p p o r t . w r i t e ( D a t a W r i t a b l e W r i t e S u p p o r t . j a v a : 31 ) a t p a r q u e t . h a d o o p . I n t e r n a l P a r q u e t R e c o r d W r i t e r . w r i t e ( I n t e r n a l P a r q u e t R e c o r d W r i t e r . j a v a : 121 ) a t p a r q u e t . h a d o o p . P a r q u e t R e c o r d W r i t e r . w r i t e ( P a r q u e t R e c o r d W r i t e r . j a v a : 123 ) a t p a r q u e t . h a d o o p . P a r q u e t R e c o r d W r i t e r . w r i t e ( P a r q u e t R e c o r d W r i t e r . j a v a : 42 ) a t o r g . a p a c h e . h a d o o p . h i v e . q l . i o . p a r q u e t . w r i t e . P a r q u e t R e c o r d W r i t e r W r a p p e r . w r i t e ( P a r q u e t R e c o r d W r i t e r W r a p p e r . j a v a : 111 ) a t o r g . a p a c h e . h a d o o p . h i v e . q l . i o . p a r q u e t . w r i t e . P a r q u e t R e c o r d W r i t e r W r a p p e r . w r i t e ( P a r q u e t R e c o r d W r i t e r W r a p p e r . j a v a : 124 ) a t o r g . a p a c h e . s p a r k . s q l . h i v e . S p a r k H i v e D y n a m i c P a r t i t i o n W r i t e r C o n t a i n e r . w r i t e T o F i l e ( h i v e W r i t e r C o n t a i n e r s . s c a l a : 321 ) . . . 8 m o r e C a u s e d b y : p a r q u e t . i o . P a r q u e t E n c o d i n g E x c e p t i o n : e m p t y f i e l d s a r e i l l e g a l , t h e f i e l d s h o u l d b e o m m i t e d c o m p l e t e l y i n s t e a d a t p a r q u e t . i o . M e s s a g e C o l u m n I O Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: Parquet record is malformed: empty fields are illegal, the field should be ommited completely instead at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:64) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:59) at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:31) at parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:121) at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:123) at parquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:42) at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:111) at org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:124) at org.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer.writeToFile(hiveWriterContainers.scala:321) ... 8 more Caused by: parquet.io.ParquetEncodingException: empty fields are illegal, the field should be ommited completely instead at parquet.io.MessageColumnIO Worker.run(ThreadPoolExecutor.java:617)atjava.lang.Thread.run(Thread.java:745)Causedby:java.lang.RuntimeException:Parquetrecordismalformed:emptyfieldsareillegal,thefieldshouldbeommitedcompletelyinsteadatorg.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:64)atorg.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:59)atorg.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriteSupport.write(DataWritableWriteSupport.java:31)atparquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:121)atparquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:123)atparquet.hadoop.ParquetRecordWriter.write(ParquetRecordWriter.java:42)atorg.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:111)atorg.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.write(ParquetRecordWriterWrapper.java:124)atorg.apache.spark.sql.hive.SparkHiveDynamicPartitionWriterContainer.writeToFile(hiveWriterContainers.scala:321)...8moreCausedby:parquet.io.ParquetEncodingException:emptyfieldsareillegal,thefieldshouldbeommitedcompletelyinsteadatparquet.io.MessageColumnIOMessageColumnIORecordConsumer.endField(MessageColumnIO.java:244)
at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeMap(DataWritableWriter.java:241)
at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeValue(DataWritableWriter.java:116)1.
at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.writeGroupFields(DataWritableWriter.java:89)
at org.apache.hadoop.hive.ql.io.parquet.write.DataWritableWriter.write(DataWritableWriter.java:60)
解决方法:首先定位你的map,?>或array>类型的报错的位置,(定位的方法:一条尝map或者array的值手动设为null,尝试前面一半的时候,就把后面的一半数据设为null按照此方法依次尝试,)
第二:将对应的位置找到后,自己顾虑空值
列插入空集合或者map中存在key为null的情形时,就会触发这个错误,
后来发现官方已经有讨论:https://issues.apache.org/jira/browse/HIVE-11625
目前还没有修复这个问题的版本!!!