背景:
最近dw用户反馈wormhole传输速度很慢,有些作业甚至需要3-4个小时才能完成,会影响每天线上报表的及时推送。我看了下,基本都是从Hive到其他数据目的地,也就是使用的是hivereader,日志上也显示hivereader实时传输速度很慢,问题应该在hivereader上
先介绍下wormhole,wormhole是我们开发的一个高速数据传导工具(https://github.com/lalaguozhe/wormhole),它支持多种异构数据源,架构设计图如下:
问题描述:
每一个wormhole都是一个单机作业,用户需要填写wormhole job xml描述文件,定义好data source,data destination,还有其他一些列配置参数,然后提交job,wormhole 接受job xml文件后,会创建一个job,然后分别对reader和writer端分别进行预处理(Periphery),切分job(Splitter)。之后会起reader thread pool 和 writer thread pool 并发读取和写入数据,中间通过一个storage作为缓冲队列。
回到之前问题 hive reader中,我会将用户填写的hql,通过JDBC提交到Hive Server中,然后执行返回数据结果,这种方式有几点不好的地方
1. hql不能拆分,所以只能启动一个reader thread,发挥不了并行读取的优势
2. 我们hive server部署了两台,由于还有其他产品和查询也需要访问hive server,大规模数据拉取的话,会受限于hive server和service节点网络吞吐量
3. hql提交后,mapred job会将结果数据先放入一个临时目录下,然后通过一个fetch task拉取到hive server再吐出给wormhole client,经过了datanode -> hive server -> wormhole client, 仍然瓶颈在hive server上
解决方案:
提供另一种hivereader执行mode,既然hive server的数据读取是瓶颈,那我可以绕开hive server 直接并行从datanode上读数据,而hive server的作用仅仅是提交hql. 比如用户本身的查询语句是"select * from bi.dpdm_device_permanent_city",可以自动改写成"INSERT OVERWRITE DIRECTORY 'hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67' select * from bi.dpdm_device_permanent_city",将数据insert到一个我们指定的临时目录下,注意两点
1. 开启set hive.exec.compress.output=true 压缩结果文件,进一步减少和wormhole client交互时候的网络IO
2. 用户自定义reduce数set mapred.reduce.tasks=N,由于每一个reduce生成一个文件,而hive reader是按照文件数进行切分的,所以用户可以预估数据输出量来设置reduce数
在periphery环节将hql提交给hiveserver,这时数据已经落地在不同的datanode上,然后splitter根据文件数生成等量的split list,在启动concurrency数的Reader Thread Pool,多线程并行从不同的datanode上fetch(每个线程维护一个DFSClient,会先用ClientProtocol和Namenode通信,然后直接跟datanode 读取block data) , 最后再把临时目录删除掉。
性能对比:
测试表:dpdm_device_permanent_city
一共108593390条record, HDFS_BYTES_READ: 10,149,072,324
从hiveserver上读取:
2013-07-12 12:00:30,806 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO core.Engine - writer-id-0-hdfswriter stat: Read 107373504 | Write 107372736 | speed 2.89MB/s 34163L/s| 2013-07-12 12:00:40,809 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO core.Engine - writer-id-0-hdfswriter stat: Read 107695040 | Write 107694912 | speed 2.84MB/s 32192L/s| 2013-07-12 12:00:50,812 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO core.Engine - writer-id-0-hdfswriter stat: Read 108027968 | Write 108027392 | speed 2.83MB/s 33254L/s| 2013-07-12 12:01:00,815 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO core.Engine - writer-id-0-hdfswriter stat: Read 108386624 | Write 108386560 | speed 2.93MB/s 35904L/s| 2013-07-12 12:01:09,234 [main] com.dp.nebula.wormhole.plugins.writer.hdfswriter.HdfsWriterPeriphery.renameFiles(HdfsWriterPeriphery.java:124) INFO hdfswriter.HdfsWriterPeriphery - successfully rename file from file:/data/home/yukang.chen/wormhole_hive/_prefix-0 to file:/data/home/yukang.chen/wormhole_hive/prefix-0 2013-07-12 12:01:09,235 [main] com.dp.nebula.wormhole.plugins.writer.hdfswriter.HdfsWriterPeriphery.renameFiles(HdfsWriterPeriphery.java:124) INFO hdfswriter.HdfsWriterPeriphery - successfully rename file from file:/data/home/yukang.chen/wormhole_hive/_prefix-1 to file:/data/home/yukang.chen/wormhole_hive/prefix-1 2013-07-12 12:01:09,245 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:139) INFO core.Engine - Nebula wormhole Job is Completed successfully! 2013-07-12 12:01:09,592 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:206) INFO core.Engine - writer-id-0-hdfswriter: Wormhole starts work at : 2013-07-12 11:01:19 Wormhole ends work at : 2013-07-12 12:01:09 Total time costs : 3590.01s Average byte speed : 2.58MB/s Average line speed : 30248L/s Total transferred records : 108593326
直接从datanode上读取:
2013-07-12 10:21:47,431 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:66) INFO core.Engine - Nebula wormhole Start 2013-07-12 10:21:47,458 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:100) INFO core.Engine - Start Reader Threads 2013-07-12 10:21:47,550 [main] com.dp.nebula.wormhole.plugins.common.DFSUtils.getConf(DFSUtils.java:112) INFO common.DFSUtils - fs.default.name=hdfs://10.2.6.102:-1 2013-07-12 10:21:49,246 [main] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReaderPeriphery.createTempDir(HiveReaderPeriphery.java:86) INFO hivereader.HiveReaderPeriphery - create data temp directory successfully hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67 2013-07-12 10:21:50,685 [main] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveJdbcClient.processInsertQuery(HiveJdbcClient.java:65) INFO hivereader.HiveJdbcClient - hive execute insert sql:INSERT OVERWRITE DIRECTORY 'hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67' select * from bi.dpdm_device_permanent_city 2013-07-12 10:24:10,943 [main] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveJdbcClient.printMetaDataInfoAndGetColumnCount(HiveJdbcClient.java:104) INFO hivereader.HiveJdbcClient - selected column names: string deviceid, int trainid, int cityid, string first_day, string last_day, double confidence_lower_bound, double confidence_upper_bound, bigint month_state 2013-07-12 10:24:11,127 [main] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReaderSplitter.split(HiveReaderSplitter.java:69) INFO hivereader.HiveReaderSplitter - splitted files num:44 2013-07-12 10:24:11,151 [pool-1-thread-2] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000000_0 2013-07-12 10:24:11,154 [pool-1-thread-3] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000001_0 2013-07-12 10:24:11,157 [pool-1-thread-4] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000002_0 2013-07-12 10:24:11,161 [pool-1-thread-5] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000003_0 2013-07-12 10:24:11,164 [pool-1-thread-6] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000004_0 2013-07-12 10:24:11,169 [pool-1-thread-7] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000005_0 2013-07-12 10:24:11,172 [pool-1-thread-8] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000006_0 2013-07-12 10:24:11,177 [pool-1-thread-9] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000007_0 2013-07-12 10:24:11,181 [pool-1-thread-10] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000008_0 2013-07-12 10:24:11,185 [pool-1-thread-1] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000009_0 log4j:WARN No appenders could be found for logger (com.hadoop.compression.lzo.GPLNativeCodeLoader). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. 2013-07-12 10:24:11,296 [main] com.dp.nebula.wormhole.engine.core.ReaderManager.run(ReaderManager.java:125) INFO core.ReaderManager - Nebula WormHole start to read data 2013-07-12 10:24:11,297 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:105) INFO core.Engine - Start Writer Threads 2013-07-12 10:24:11,313 [main] com.dp.nebula.wormhole.plugins.common.DFSUtils.getConf(DFSUtils.java:112) INFO common.DFSUtils - fs.default.name=file://null:-1 2013-07-12 10:24:11,450 [main] com.dp.nebula.wormhole.plugins.writer.hdfswriter.HdfsDirSplitter.split(HdfsDirSplitter.java:73) INFO hdfswriter.HdfsDirSplitter - HdfsWriter splits file to 2 sub-files . 2013-07-12 10:24:11,457 [main] com.dp.nebula.wormhole.engine.core.WriterManager.run(WriterManager.java:147) INFO core.WriterManager - Writer: writer-id-0-hdfswriter start to write data 2013-07-12 10:24:20,481 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO core.Engine - writer-id-0-hdfswriter stat: Read 5116352 | Write 5115776 | speed 43.79MB/s 512748L/s| 2013-07-12 10:24:30,501 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO core.Engine - writer-id-0-hdfswriter stat: Read 10688896 | Write 10688320 | speed 47.99MB/s 556083L/s| 2013-07-12 10:24:40,510 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO core.Engine - writer-id-0-hdfswriter stat: Read 17341248 | Write 17340672 | speed 55.84MB/s 665222L/s| 2013-07-12 10:24:50,584 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO core.Engine - writer-id-0-hdfswriter stat: Read 22791040 | Write 22789824 | speed 46.90MB/s 544902L/s| 2013-07-12 10:24:53,507 [pool-1-thread-9] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000010_0 2013-07-12 10:24:53,599 [pool-1-thread-9] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO hivereader.HiveReader - codec not found, using text file reader 2013-07-12 10:25:00,597 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO core.Engine - writer-id-0-hdfswriter stat: Read 30125696 | Write 30124608 | speed 63.22MB/s 733472L/s| 2013-07-12 10:25:08,345 [pool-1-thread-8] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000011_0 2013-07-12 10:25:08,582 [pool-1-thread-8] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO hivereader.HiveReader - codec not found, using text file reader 2013-07-12 10:25:09,263 [pool-1-thread-3] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000012_0 2013-07-12 10:25:09,291 [pool-1-thread-3] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO hivereader.HiveReader - codec not found, using text file reader 2013-07-12 10:25:10,131 [pool-1-thread-10] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000013_0 2013-07-12 10:25:10,199 [pool-1-thread-10] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO hivereader.HiveReader - codec not found, using text file reader 2013-07-12 10:25:10,685 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO core.Engine - writer-id-0-hdfswriter stat: Read 36688002 | Write 36687106 | speed 55.07MB/s 656237L/s| 2013-07-12 10:25:12,262 [pool-1-thread-5] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000014_0 2013-07-12 10:25:12,274 [pool-1-thread-5] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO hivereader.HiveReader - codec not found, using text file reader 2013-07-12 10:26:01,532 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO core.Engine - writer-id-0-hdfswriter stat: Read 67816280 | Write 67815704 | speed 57.08MB/s 673481L/s| 2013-07-12 10:26:03,898 [pool-1-thread-7] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000025_0 2013-07-12 10:26:03,908 [pool-1-thread-7] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO hivereader.HiveReader - codec not found, using text file reader 2013-07-12 10:26:06,370 [pool-1-thread-8] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000026_0 2013-07-12 10:26:06,415 [pool-1-thread-8] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO hivereader.HiveReader - codec not found, using text file reader 2013-07-12 10:26:10,864 [pool-1-thread-10] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000027_0 2013-07-12 10:26:10,889 [pool-1-thread-10] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO hivereader.HiveReader - codec not found, using text file reader 2013-07-12 10:26:11,539 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO core.Engine - writer-id-0-hdfswriter stat: Read 73378191 | Write 73377295 | speed 47.58MB/s 556146L/s| 2013-07-12 10:26:21,576 [pool-1-thread-6] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO hivereader.HiveReader - codec not found, using text file reader 2013-07-12 10:26:21,690 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO core.Engine - writer-id-0-hdfswriter stat: Read 79406971 | Write 79405898 | speed 51.83MB/s 602846L/s| 2013-07-12 10:26:29,739 [pool-1-thread-1] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000031_0 2013-07-12 10:26:29,940 [pool-1-thread-1] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO hivereader.HiveReader - codec not found, using text file reader 2013-07-12 10:26:32,031 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO core.Engine - writer-id-0-hdfswriter stat: Read 85765697 | Write 85764545 | speed 53.87MB/s 635847L/s| 2013-07-12 10:26:34,598 [pool-1-thread-8] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000032_0 2013-07-12 10:26:34,606 [pool-1-thread-8] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO hivereader.HiveReader - codec not found, using text file reader 2013-07-12 10:26:36,369 [pool-1-thread-7] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000033_0 2013-07-12 10:26:36,373 [pool-1-thread-7] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO hivereader.HiveReader - codec not found, using text file reader 2013-07-12 10:26:38,984 [pool-1-thread-2] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000034_0 2013-07-12 10:26:38,990 [pool-1-thread-2] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO hivereader.HiveReader - codec not found, using text file reader 2013-07-12 10:26:39,126 [pool-1-thread-9] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000035_0 2013-07-12 10:26:39,134 [pool-1-thread-9] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO hivereader.HiveReader - codec not found, using text file reader 2013-07-12 10:26:42,090 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO core.Engine - writer-id-0-hdfswriter stat: Read 91872401 | Write 91872209 | speed 52.52MB/s 610760L/s| 2013-07-12 10:26:50,914 [pool-1-thread-5] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO hivereader.HiveReader - codec not found, using text file reader 2013-07-12 10:26:52,096 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO core.Engine - writer-id-0-hdfswriter stat: Read 97049556 | Write 97048852 | speed 43.83MB/s 517657L/s| 2013-07-12 10:26:53,283 [pool-1-thread-4] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000039_0 2013-07-12 10:26:53,304 [pool-1-thread-4] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO hivereader.HiveReader - codec not found, using text file reader 2013-07-12 10:26:54,701 [pool-1-thread-6] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000040_0 2013-07-12 10:26:54,709 [pool-1-thread-6] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO hivereader.HiveReader - codec not found, using text file reader 2013-07-12 10:27:02,163 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:145) INFO core.Engine - writer-id-0-hdfswriter stat: Read 103048760 | Write 103047800 | speed 51.35MB/s 599869L/s| 2013-07-12 10:27:03,159 [pool-1-thread-1] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000041_0 2013-07-12 10:27:03,170 [pool-1-thread-1] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO hivereader.HiveReader - codec not found, using text file reader 2013-07-12 10:27:03,266 [pool-1-thread-8] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000042_1 2013-07-12 10:27:03,281 [pool-1-thread-8] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO hivereader.HiveReader - codec not found, using text file reader 2013-07-12 10:27:03,742 [pool-1-thread-10] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.read(HiveReader.java:67) INFO hivereader.HiveReader - start to read hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67/000043_0 2013-07-12 10:27:03,754 [pool-1-thread-10] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReader.readFromHdfs(HiveReader.java:85) INFO hivereader.HiveReader - codec not found, using text file reader 2013-07-12 10:27:11,188 [main] com.dp.nebula.wormhole.plugins.reader.hivereader.HiveReaderPeriphery.doPost(HiveReaderPeriphery.java:106) INFO hivereader.HiveReaderPeriphery - hdfs://10.2.6.102/user/hadoop/wormhole_test_temp/83a6cc1ab94fe4d8d848bef133ff9d67 has been deleted at dopost stage 2013-07-12 10:27:12,212 [main] com.dp.nebula.wormhole.plugins.writer.hdfswriter.HdfsWriterPeriphery.renameFiles(HdfsWriterPeriphery.java:124) INFO hdfswriter.HdfsWriterPeriphery - successfully rename file from file:/data/home/yukang.chen/wormhole_hive/_prefix-0 to file:/data/home/yukang.chen/wormhole_hive/prefix-0 2013-07-12 10:27:12,213 [main] com.dp.nebula.wormhole.plugins.writer.hdfswriter.HdfsWriterPeriphery.renameFiles(HdfsWriterPeriphery.java:124) INFO hdfswriter.HdfsWriterPeriphery - successfully rename file from file:/data/home/yukang.chen/wormhole_hive/_prefix-1 to file:/data/home/yukang.chen/wormhole_hive/prefix-1 2013-07-12 10:27:12,214 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:139) INFO core.Engine - Nebula wormhole Job is Completed successfully! 2013-07-12 10:27:12,525 [main] com.dp.nebula.wormhole.engine.core.Engine.run(Engine.java:206) INFO core.Engine - writer-id-0-hdfswriter: Wormhole starts work at : 2013-07-12 10:21:47 Wormhole ends work at : 2013-07-12 10:27:12 Total time costs : 325.08s Average byte speed : 28.55MB/s Average line speed : 334046L/s Total transferred records : 108593262
直接从datanode上读取平均在53MB/S,从hiveserver读取平均在3MB/S,相差18倍,如果算上加上insert into directory后多出来的stage执行时间,总体相差时间也有11倍,提升还是很明显的.
本文链接http://blog.csdn.net/lalaguozhe/article/details/9465953,转载请注明