10w分区表,hive能跑,sparksql运行也完全能跑起来

 

1,问题来源:

对于有几个万分区的分区表,sparksql一跑就挂,但hive不会,请问怎么处理

执行sql:

ga10.coin_gain_lost是一个有几万个分区的分区表

date字段是一级分区

Caused by:org.apache.thrift.transport.TTransportException: Frame size (47350517) largerthan max length (16384000)!

         atorg.apache.spark.sql.hive.client.HiveTable.getAllPartitions(ClientInterface.scala:74)

         apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.get_partitions(ThriftHiveMetastore.java:1979)

初步判断:spark把这个表的所有分区信息抓取回来(HiveTable.getAllPartitions),

 

补充说明:这个sql在hive中能正常跑     

内容资源: spark-sql --num-executors 6 --driver-memory 20g--executor-memory  18g --master yarn

查看spark界面,没有job生成,没有stage信息

 

 

2,问题重现测试

根据分区重现这个问题的步骤,进行spark测试

 

Ø  spark 测试运行环境:

l  huawei  RH2285设备( 8核16线程 48G内存)

l  Windows +Vmvare Workstation +ubuntu Linux

l  9台虚拟机设备(1台Master 8台worker),Hadoop 2.6.0,Spark 1.6.0,Scala 2.10

 

 

 

Ø  模拟数据的生成

l  生成hivepartitiontest.txt,包含3行数据

root@master:/usr/local/IMF_testdata#cat hivepartitiontest.txt

001,zhangsan

002,lisi

003,wangwu

 

l  进行hive,创建分区表

createtable partition_test

(member_idstring,

namestring

)

partitionedby (

stat_datestring,

provincestring)

rowformat delimited fields terminated by ',';

 

l  表格式

hive>show create table partition_test;

OK

CREATETABLE `partition_test`(

  `member_id` string,

  `name` string)

PARTITIONEDBY (

  `stat_date` string,

  `province` string)

ROWFORMAT DELIMITED

  FIELDS TERMINATED BY ','

STORED ASINPUTFORMAT

  'org.apache.hadoop.mapred.TextInputFormat'

OUTPUTFORMAT

  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'

LOCATION

 'hdfs://master:9000/user/hive/warehouse/partition_test'

TBLPROPERTIES(

  'transient_lastDdlTime'='1460257115')

Timetaken: 20.963 seconds, Fetched: 16 row(s)

hive>

 

l  加载本地文件hivepartitiontest.txt到hive表partition_test中

 hive> LOAD DATA LOCAL INPATH'/usr/local/IMF_testdata/hivepartitiontest.txt' INTO TABLE partition_testPARTITION(stat_date='20160401', province='3');

Loading data to tabledefault.partition_test partition (stat_date=20160401, province=3)

Partitiondefault.partition_test{stat_date=20160401, province=3} stats: [numFiles=1,totalSize=33] OK  Time taken: 2.08seconds

    

l  生成分区模拟文件addpartitions.sh

 #/bin/sh

alias hive='/home/hadoop2/hive/bin/hive'

DATE_STR=`date +%Y%m%d`

for i in {1..100};

do

DAYS_AGO=`date +%Y%m%d -d "$i days ago"`

  for num in {1..1000};

  do

 

  echo  alter table partition_test add partition\(stat_date="'"${DAYS_AGO}"'",province="'"$num"'"\)\;

 done;

done;

 

 

 

   $chmod u+x addpartitions.sh

   $./addpartitions.sh > partitions10w  

  

  

   $hive -f partitions10w   

 

   这里分批生成数据,3w一批,5w一批,10w一批, 如for i in {1..30};for i in {30..53};for i in {53..101};逐步压测, 中间被重启过一次设备,因此有部分文件没有生成,总计生成了100531个分区文件

 

root@master:~# hadoop fs -count /user/hive/warehouse/partition_test

     100531            1                 33/user/hive/warehouse/partition_test

 

 

 

Ø  spark 模拟运行

启动hive 元数据

root@master:/usr/local/spark-1.6.0-bin-hadoop2.6/bin#hive --service metastore &

[1] 5089

 

   启动spark-sql

root@master:/usr/local/spark-1.6.0-bin-hadoop2.6/bin#spark-sql --master spark://192.168.189.1:7077

 

Ø  运行结果,hive10w分区表的情况下,spark-sql运行完全能跑起来

 

spark-sql> select * from partition_testwhere province ='3' and stat_date in ('20160401','20160328','20160405');

16/04/10 17:38:44 INFO parse.ParseDriver:Parsing command: select * from partition_test where province ='3' and stat_datein ('20160401','20160328','20160405')

16/04/10 17:38:44 INFO parse.ParseDriver:Parse Completed

 

......

 

16/04/10 17:41:13 INFO scheduler.DAGScheduler:Job 2 finished: processCmd at CliDriver.java:376, took 2.420353 s

001    zhangsan        20160401        3

002    lisi    20160401        3

003    wangwu  20160401        3

16/04/10 17:41:13 INFO CliDriver: Timetaken: 149.108 seconds, Fetched 3 row(s)

spark-sql> 16/04/10 17:41:14 INFOscheduler.StatsReportListener: Finished stage:org.apache.spark.scheduler.StageInfo@482626f0

 

Ø  spark historyserver观测  2个task

 

 

 

Ø  hive的运行结果

hive> select * from partition_test whereprovince ='3' and stat_date in ('20160401','20160328','20160405');

OK

001    zhangsan        20160401        3

002    lisi    20160401        3

003    wangwu  20160401        3

Time taken: 44.004 seconds, Fetched: 3row(s)

hive>

 

 

 

====================================================================

hive转化hive表的格式为parquet运行

====================================================================

 

Ø  hive设置表格式为parquet

hive> ALTER TABLE partition_test SET FILEFORMAT parquet;

OK

Time taken: 2.882 seconds

hive> show create table partition_test;

OK

CREATE TABLE `partition_test`(

 `member_id` string,

 `name` string)

PARTITIONED BY (

 `stat_date` string,

 `province` string)

ROW FORMAT DELIMITED

 FIELDS TERMINATED BY ','

STORED AS INPUTFORMAT

 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'

OUTPUTFORMAT

 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'

LOCATION

 'hdfs://master:9000/user/hive/warehouse/partition_test'

TBLPROPERTIES (

 'last_modified_by'='root',

 'last_modified_time'='1460286402',

 'transient_lastDdlTime'='1460286402')

Time taken: 0.262 seconds, Fetched: 18row(s)

hive>

 

 

 

Ø  hive再次运行sql

hive> select * from partition_test whereprovince ='3' and stat_date in ('20160401','20160328','20160405');

OK

001    zhangsan        20160401        3

002    lisi    20160401        3

003    wangwu  20160401        3

Time taken: 61.539 seconds, Fetched: 3row(s)

hive> 

 

 

Ø  spark sql运行查询

select * from partition_test where province='3' and stat_date in ('20160401','20160328','20160405');

报错了

org.apache.spark.SparkException: Jobaborted due to stage failure: Task 7 in stage 2.0 failed 4 times, most recentfailure: Lost task 7.3 in stage 2.0 (TID 20, worker3): java.io.IOException: Couldnot read footer: java.lang.RuntimeException:hdfs://master:9000/user/hive/warehouse/partition_test/stat_date=20160401/province=3/hivepartitiontest.txtis not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found[103, 119, 117, 10]

 

 

Ø  spark sql查有内容的分区会报错,查一个没分内容的分区仍报错

 

spark-sql>   select * from partition_test where province='2' and stat_date in ('20160402','20160322','20160406');

16/04/10 19:20:39 INFO parse.ParseDriver:Parsing command: select * from partition_test where province ='2' and stat_datein ('20160402','20160322','20160406')

16/04/10 19:20:39 INFO parse.ParseDriver:Parse Completed

 

 

 

Ø  删除掉有txt文件的分区

hive> alter table  partition_test drop partition(stat_date='20160401', province='3');

Dropped the partition stat_date=20160401/province=3

OK

Time taken: 41.553 seconds

hive>

   

 

Ø  sparksql再次运行跑一次,OK (此时hive表的格式为parquet)

spark-sql>   select * from partition_test where province='100' and stat_date in ('20160302','20160312','20160306');

16/04/10 19:40:48 INFO parse.ParseDriver: Parsingcommand: select * from partition_test where province ='100' and stat_date in('20160302','20160312','20160306')

16/04/10 19:40:48 INFO parse.ParseDriver:Parse Completed




16/04/10 19:44:42 INFO scheduler.StatsReportListener: Finished stage: org.apache.spark.scheduler.StageInfo@50b216d7
16/04/10 19:44:42 INFO scheduler.StatsReportListener: task runtime:(count: 8, mean: 180.750000, stdev: 27.316433, max: 206.000000, min: 124.000000)
16/04/10 19:44:42 INFO scheduler.StatsReportListener:   0%      5%      10%     25%     50%     75%     90%     95%     100%
16/04/10 19:44:42 INFO scheduler.StatsReportListener:   124.0 ms        124.0 ms        124.0 ms        169.0 ms        196.0 ms        204.0 ms206.0 ms 206.0 ms        206.0 ms
16/04/10 19:44:42 INFO scheduler.StatsReportListener: task result size:(count: 8, mean: 912.000000, stdev: 0.000000, max: 912.000000, min: 912.000000)
16/04/10 19:44:42 INFO scheduler.StatsReportListener:   0%      5%      10%     25%     50%     75%     90%     95%     100%
16/04/10 19:44:42 INFO scheduler.StatsReportListener:   912.0 B 912.0 B 912.0 B 912.0 B 912.0 B 912.0 B 912.0 B 912.0 B 912.0 B
16/04/10 19:44:42 INFO scheduler.StatsReportListener: executor (non-fetch) time pct: (count: 8, mean: 3.594028, stdev: 2.948552, max: 7.258065, min: 0.000000)
16/04/10 19:44:42 INFO scheduler.StatsReportListener:   0%      5%      10%     25%     50%     75%     90%     95%     100%
16/04/10 19:44:42 INFO scheduler.StatsReportListener:    0 %     0 %     0 %     1 %     5 %     7 %     7 %     7 %     7 %
16/04/10 19:44:42 INFO scheduler.StatsReportListener: other time pct: (count: 8, mean: 96.405972, stdev: 2.948552, max: 100.000000, min: 92.741935)
16/04/10 19:44:42 INFO scheduler.StatsReportListener:   0%      5%      10%     25%     50%     75%     90%     95%     100%
16/04/10 19:44:42 INFO scheduler.StatsReportListener:   93 %    93 %    93 %    94 %    98 %    99 %    100 %   100 %   100 %
16/04/10 19:44:44 INFO datasources.DataSourceStrategy: Selected 3 partitions out of 100427, pruned 99.99701275553386% partitions.
16/04/10 19:44:44 INFO storage.MemoryStore: Block broadcast_10 stored as values in memory (estimated size 212.8 KB, free 460.4 KB)
16/04/10 19:44:44 INFO storage.MemoryStore: Block broadcast_10_piece0 stored as bytes in memory (estimated size 19.5 KB, free 479.9 KB)
16/04/10 19:44:44 INFO storage.BlockManagerInfo: Added broadcast_10_piece0 in memory on 192.168.189.1:41077 (size: 19.5 KB, free: 517.3 MB)
16/04/10 19:44:44 INFO spark.SparkContext: Created broadcast 10 from processCmd at CliDriver.java:376
16/04/10 19:46:02 INFO spark.ContextCleaner: Cleaned accumulator 6
16/04/10 19:46:03 INFO storage.BlockManagerInfo: Removed broadcast_9_piece0 on worker6:33406 in memory (size: 20.8 KB, free: 517.4 MB)
16/04/10 19:46:04 INFO storage.BlockManagerInfo: Removed broadcast_9_piece0 on worker2:46221 in memory (size: 20.8 KB, free: 517.4 MB)
16/04/10 19:46:04 INFO storage.BlockManagerInfo: Removed broadcast_9_piece0 on worker8:52611 in memory (size: 20.8 KB, free: 517.4 MB)
16/04/10 19:46:04 INFO storage.BlockManagerInfo: Removed broadcast_9_piece0 on worker3:34777 in memory (size: 20.8 KB, free: 517.4 MB)
16/04/10 19:46:04 INFO storage.BlockManagerInfo: Removed broadcast_9_piece0 on worker7:51829 in memory (size: 20.8 KB, free: 517.4 MB)
16/04/10 19:46:04 INFO storage.BlockManagerInfo: Removed broadcast_9_piece0 on worker1:44626 in memory (size: 20.8 KB, free: 517.4 MB)
16/04/10 19:46:04 INFO storage.BlockManagerInfo: Removed broadcast_9_piece0 on worker4:52414 in memory (size: 20.8 KB, free: 517.4 MB)
16/04/10 19:46:04 INFO storage.BlockManagerInfo: Removed broadcast_9_piece0 on worker5:39102 in memory (size: 20.8 KB, free: 517.4 MB)
16/04/10 19:46:04 INFO storage.BlockManagerInfo: Removed broadcast_9_piece0 on 192.168.189.1:41077 in memory (size: 20.8 KB, free: 517.3 MB)
16/04/10 19:46:04 INFO spark.ContextCleaner: Cleaned accumulator 9
16/04/10 19:46:04 INFO storage.BlockManagerInfo: Removed broadcast_8_piece0 on worker6:33406 in memory (size: 20.9 KB, free: 517.4 MB)
16/04/10 19:46:04 INFO storage.BlockManagerInfo: Removed broadcast_8_piece0 on worker2:46221 in memory (size: 20.9 KB, free: 517.4 MB)
16/04/10 19:46:04 INFO storage.BlockManagerInfo: Removed broadcast_8_piece0 on worker3:34777 in memory (size: 20.9 KB, free: 517.4 MB)
16/04/10 19:46:04 INFO storage.BlockManagerInfo: Removed broadcast_8_piece0 on worker8:52611 in memory (size: 20.9 KB, free: 517.4 MB)
16/04/10 19:46:04 INFO storage.BlockManagerInfo: Removed broadcast_8_piece0 on worker4:52414 in memory (size: 20.9 KB, free: 517.4 MB)
16/04/10 19:46:04 INFO storage.BlockManagerInfo: Removed broadcast_8_piece0 on worker1:44626 in memory (size: 20.9 KB, free: 517.4 MB)
16/04/10 19:46:04 INFO storage.BlockManagerInfo: Removed broadcast_8_piece0 on worker5:39102 in memory (size: 20.9 KB, free: 517.4 MB)
16/04/10 19:46:04 INFO storage.BlockManagerInfo: Removed broadcast_8_piece0 on worker7:51829 in memory (size: 20.9 KB, free: 517.4 MB)
16/04/10 19:46:04 INFO storage.BlockManagerInfo: Removed broadcast_8_piece0 on 192.168.189.1:41077 in memory (size: 20.9 KB, free: 517.4 MB)
16/04/10 19:46:04 INFO spark.ContextCleaner: Cleaned accumulator 8
16/04/10 19:46:04 INFO storage.BlockManagerInfo: Removed broadcast_6_piece0 on 192.168.189.1:41077 in memory (size: 20.9 KB, free: 517.4 MB)
16/04/10 19:46:04 INFO storage.BlockManagerInfo: Removed broadcast_6_piece0 on worker6:33406 in memory (size: 20.9 KB, free: 517.4 MB)
16/04/10 19:46:04 INFO storage.BlockManagerInfo: Removed broadcast_6_piece0 on worker7:51829 in memory (size: 20.9 KB, free: 517.4 MB)
16/04/10 19:46:04 INFO storage.BlockManagerInfo: Removed broadcast_6_piece0 on worker8:52611 in memory (size: 20.9 KB, free: 517.4 MB)
16/04/10 19:46:04 INFO storage.BlockManagerInfo: Removed broadcast_6_piece0 on worker1:44626 in memory (size: 20.9 KB, free: 517.4 MB)
16/04/10 19:46:04 INFO storage.BlockManagerInfo: Removed broadcast_6_piece0 on worker3:34777 in memory (size: 20.9 KB, free: 517.4 MB)
16/04/10 19:46:04 INFO storage.BlockManagerInfo: Removed broadcast_6_piece0 on worker2:46221 in memory (size: 20.9 KB, free: 517.4 MB)
16/04/10 19:46:04 INFO storage.BlockManagerInfo: Removed broadcast_6_piece0 on worker5:39102 in memory (size: 20.9 KB, free: 517.4 MB)
16/04/10 19:46:04 INFO storage.BlockManagerInfo: Removed broadcast_6_piece0 on worker4:52414 in memory (size: 20.9 KB, free: 517.4 MB)
16/04/10 19:46:06 WARN hdfs.DFSClient: Slow ReadProcessor read fields took 78391ms (threshold=30000ms); ack: seqno: -2 status: SUCCESS status: ERROR downstreamAckTimeNanos: 0, targets: [192.168.189.4:50010, 192.168.189.7:50010, 192.168.189.9:50010]
16/04/10 19:46:06 WARN hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block BP-367257699-192.168.189.1-1454825792055:blk_1073741921_1107
java.io.IOException: Bad response ERROR for block BP-367257699-192.168.189.1-1454825792055:blk_1073741921_1107 from datanode 192.168.189.7:50010
        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:897)
16/04/10 19:46:06 WARN hdfs.DFSClient: Error Recovery for block BP-367257699-192.168.189.1-1454825792055:blk_1073741921_1107 in pipeline 192.168.189.4:50010, 192.168.189.7:50010, 192.168.189.9:50010: bad datanode 192.168.189.7:50010
16/04/10 19:46:09 INFO Configuration.deprecation: mapred.min.split.size is deprecated. Instead, use mapreduce.input.fileinputformat.split.minsize
16/04/10 19:46:09 INFO parquet.ParquetRelation: Reading Parquet file(s) from 
16/04/10 19:46:09 INFO parquet.ParquetRelation: Reading Parquet file(s) from 
16/04/10 19:46:09 INFO parquet.ParquetRelation: Reading Parquet file(s) from 
16/04/10 19:46:09 INFO spark.SparkContext: Starting job: processCmd at CliDriver.java:376
16/04/10 19:46:09 INFO scheduler.DAGScheduler: Job 9 finished: processCmd at CliDriver.java:376, took 0.000036 s
16/04/10 19:46:09 INFO CliDriver: Time taken: 322.634 seconds
spark-sql> 
         > 

你可能感兴趣的:(10w分区表,hive能跑,sparksql运行也完全能跑起来)