Hive0.7.1中CompactIndex的实现---一个Hive的MapredTask如何执行

我们的关键在plan的建立上!!!

    int ret = compile(command);//compile command---key point of plan generating :生成plan。

一个简单的query:

hive> select * from table02 where id=500000;

基本流程是这样的:

CliDriver.processLine(line);

CliDriver.processCmd(command);    run完,打印结果,打印TimeTaken

(Driver) qp.run(cmd)

Driver.compile(command);

Driver.execute();

Driver.launchTask(tsk,queryId,noName,running,jobname,jobs,cxt);

TaskRunner.runSequential();  //new TaskRunner(tsk,tskRes);

Task.executeTask();  //org.apache.hadoop.hive.ql.exec.MapRedTask@6f513

Task.execute(DriverContext);

MapreduceTask.execute(); //override

      executor = Runtime.getRuntime().exec(cmdLine, env, new File(workDir));
      //console.printError("After");
      StreamPrinter outPrinter = new StreamPrinter(
          executor.getInputStream(), null,
          SessionState.getConsole().getChildOutStream());
      StreamPrinter errPrinter = new StreamPrinter(
          executor.getErrorStream(), null,
          SessionState.getConsole().getChildErrStream());

      outPrinter.start();//start to print!!!
      errPrinter.start();
      
      int exitVal = executor.waitFor(); //Wait for ending of executing... sequentiallllllllll

就这样,一个query来到了新的进程ececutor,它的cmdLine如下:

/home/allen/Hadoop/hadoop-0.20.2/bin/hadoop jar /home/allen/Desktop/hive-0.7.1/lib/hive-exec-0.7.1.jar org.apache.hadoop.hive.ql.exec.ExecDriver  -plan file:/tmp/allen/hive_2012-03-05_16-15-28_863_4469375855705861948/-local-10002/plan.xml  -jobconf hive.mapjoin.hashtable.initialCapacity=100000 -jobconf datanucleus.connectionPoolingType=DBCP -jobconf hive.exec.script.allow.partial.consumption=false -jobconf hive.metastore.client.connect.retry.delay=1 -jobconf hive.stats.jdbc.atomic=false -jobconf hive.query.id=allen_20120305161515_3e3d24cc-7d40-41c7-b00b-bb35ba4ddd3c -jobconf hive.metastore.sasl.enabled=false -jobconf hive.hwi.listen.port=9999 -jobconf hive.mapjoin.followby.map.aggr.hash.percentmemory=0.3 -jobconf hive.mergejob.maponly=true -jobconf hive.support.concurrency=false -jobconf hive.map.aggr=true -jobconf hive.map.aggr.hash.min.reduction=0.5 -jobconf datanucleus.plugin.pluginRegistryBundleCheck=LOG -jobconf hive.exec.reducers.bytes.per.reducer=1000000000 -jobconf hive.exec.default.partition.name=__HIVE_DEFAULT_PARTITION__ -jobconf hive.optimize.cp=true -jobconf hive.exec.dynamic.partition.mode=strict -jobconf hive.metastore.client.socket.timeout=20 -jobconf datanucleus.cache.level2.type=SOFT -jobconf hive.exec.max.created.files=100000 -jobconf hive.script.serde=org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe -jobconf hive.error.on.empty.partition=false -jobconf hive.fileformat.check=true -jobconf hive.exec.max.dynamic.partitions.pernode=100 -jobconf hive.enforce.sorting=false -jobconf hive.optimize.ppd=true -jobconf hive.optimize.groupby=true -jobconf hive.enforce.bucketing=false -jobconf javax.jdo.option.ConnectionUserName=root -jobconf hive.mapjoin.check.memory.rows=100000 -jobconf hive.mapred.reduce.tasks.speculative.execution=true -jobconf mapred.job.name=select+*+from+table02+where+id%3D500000%28Stage-1%29 -jobconf javax.jdo.option.DetachAllOnCommit=true -jobconf hive.mapred.local.mem=0 -jobconf datanucleus.cache.level2=false -jobconf hive.session.id=allen_201203051549 -jobconf hive.lock.sleep.between.retries=60 -jobconf hive.exec.show.job.failure.debug.info=false -jobconf hive.script.operator.id.env.var=HIVE_SCRIPT_OPERATOR_ID -jobconf hive.archive.har.parentdir.settable=false -jobconf hive.metastore.server.max.threads=100000 -jobconf hive.udtf.auto.progress=false -jobconf hive.hwi.war.file=lib%2Fhive-hwi-0.7.1.war -jobconf hive.auto.progress.timeout=0 -jobconf datanucleus.validateTables=false -jobconf hive.optimize.ppd.storage=true -jobconf hive.exec.compress.output=false -jobconf hive.test.mode.prefix=test_ -jobconf hive.exec.drop.ignorenonexistent=true -jobconf hive.security.authenticator.manager=org.apache.hadoop.hive.ql.security.HadoopDefaultAuthenticator -jobconf hive.stats.dbconnectionstring=jdbc%3Aderby%3A%3BdatabaseName%3DTempStatsStore%3Bcreate%3Dtrue -jobconf hive.mapjoin.bucket.cache.size=100 -jobconf datanucleus.validateConstraints=false -jobconf hive.stats.jdbc.timeout=30 -jobconf hive.metastore.server.tcp.keepalive=true -jobconf mapred.reduce.tasks=-1 -jobconf hive.query.string=select+*+from+table02+where+id%3D500000 -jobconf hive.input.format=org.apache.hadoop.hive.ql.io.HiveInputFormat -jobconf hive.task.progress=false -jobconf hive.mapjoin.followby.gby.localtask.max.memory.usage=0.55 -jobconf hive.metastore.ds.retry.interval=1000 -jobconf javax.jdo.option.ConnectionDriverName=com.mysql.jdbc.Driver -jobconf hive.index.compact.file.ignore.hdfs=false -jobconf hive.skewjoin.mapjoin.map.tasks=10000 -jobconf hive.variable.substitute=true -jobconf hive.mapjoin.maxsize=100000 -jobconf hive.archive.enabled=false -jobconf hive.exec.dynamic.partition=false -jobconf hive.optimize.skewjoin=false -jobconf hive.groupby.mapaggr.checkinterval=100000 -jobconf hive.mapjoin.localtask.max.memory.usage=0.90 -jobconf hive.test.mode=false -jobconf hive.exec.parallel=false -jobconf hive.exec.counters.pull.interval=1000 -jobconf hive.default.fileformat=TextFile -jobconf hive.exec.max.dynamic.partitions=1000 -jobconf fs.har.impl=org.apache.hadoop.hive.shims.HiveHarFileSystem -jobconf hive.test.mode.samplefreq=32 -jobconf hive.metastore.ds.retry.attempts=1 -jobconf javax.jdo.option.NonTransactionalRead=true -jobconf hive.zookeeper.clean.extra.nodes=false -jobconf hive.script.auto.progress=false -jobconf hive.zookeeper.namespace=hive_zookeeper_namespace -jobconf hive.merge.mapredfiles=false -jobconf javax.jdo.option.ConnectionURL=jdbc%3Amysql%3A%2F%2Flocalhost%3A3306%2Fhive01%3FcreateDatabaseIfNotExist%3Dtrue -jobconf hive.exec.compress.intermediate=false -jobconf hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.ObjectStore -jobconf hive.map.aggr.hash.percentmemory=0.5 -jobconf hive.hwi.listen.host=0.0.0.0 -jobconf datanucleus.transactionIsolation=read-committed -jobconf hive.metastore.cache.pinobjtypes=Table%2CStorageDescriptor%2CSerDeInfo%2CPartition%2CDatabase%2CType%2CFieldSchema%2COrder -jobconf hive.merge.size.per.task=256000000 -jobconf hive.merge.smallfiles.avgsize=16000000 -jobconf datanucleus.autoCreateSchema=true -jobconf hive.groupby.skewindata=false -jobconf hive.metastore.local=true -jobconf hive.skewjoin.mapjoin.min.split=33554432 -jobconf hive.mapjoin.smalltable.filesize=25000000 -jobconf hive.mapred.mode=nonstrict -jobconf hive.optimize.pruner=true -jobconf hive.skewjoin.key=100000 -jobconf hive.security.authorization.enabled=false -jobconf hive.metastore.kerberos.principal=hive-metastore%2F_HOST%40EXAMPLE.COM -jobconf hive.cli.print.header=false -jobconf hive.hbase.wal.enabled=true -jobconf datanucleus.validateColumns=false -jobconf hive.auto.convert.join=false -jobconf datanucleus.identifierFactory=datanucleus -jobconf hive.session.silent=false -jobconf hive.lock.numretries=100 -jobconf hive.optimize.reducededuplication=true -jobconf hive.exec.reducers.max=999 -jobconf javax.jdo.PersistenceManagerFactoryClass=org.datanucleus.jdo.JDOPersistenceManagerFactory -jobconf hive.heartbeat.interval=1000 -jobconf hive.stats.autogather=true -jobconf hive.map.aggr.hash.force.flush.memory.threshold=0.9 -jobconf datanucleus.autoStartMechanismMode=checked -jobconf hive.join.cache.size=25000 -jobconf hive.metastore.warehouse.dir=%2Fuser%2Fhive%2Fwarehouse -jobconf javax.jdo.option.ConnectionPassword=password -jobconf hive.metastore.connect.retries=5 -jobconf hive.concurrency.manager=org.apache.hadoop.hive.ql.lockmgr.ZooKeeperLockMgr -jobconf hive.exec.mode.local.auto=false -jobconf hive.mapjoin.cache.numrows=25000 -jobconf hive.exec.parallel.thread.number=8 -jobconf hive.mapjoin.hashtable.loadfactor=0.75 -jobconf datanucleus.storeManagerType=rdbms -jobconf hive.script.recordreader=org.apache.hadoop.hive.ql.exec.TextRecordReader -jobconf hive.zookeeper.client.port=2181 -jobconf hive.exec.scratchdir=%2Ftmp%2Fhive-%24%7Buser.name%7D -jobconf hive.stats.dbclass=jdbc%3Aderby -jobconf hive.security.authorization.manager=org.apache.hadoop.hive.ql.security.authorization.DefaultHiveAuthorizationProvider -jobconf hive.metastore.server.min.threads=200 -jobconf hive.fetch.output.serde=org.apache.hadoop.hive.serde2.DelimitedJSONSerDe -jobconf hive.script.recordwriter=org.apache.hadoop.hive.ql.exec.TextRecordWriter -jobconf hive.merge.mapfiles=true -jobconf hive.stats.jdbcdriver=org.apache.derby.jdbc.EmbeddedDriver -jobconf hive.exec.script.maxerrsize=100000 -jobconf hive.join.emit.interval=1000 -jobconf hive.added.jars.path= -jobconf mapred.system.dir=%2Ftmp%2Fhadoop-allen%2Fmapred%2Fsystem%2F712166882 -jobconf mapred.local.dir=%2Ftmp%2Fhadoop-allen%2Fmapred%2Flocal%2F299118477

hive> Set hive.index.compact.file=/tmp/table02_index_data;
Set hive.index.compact.file=/tmp/table02_index_data;
hive> Set hive.optimize.index.filter=false;
Set hive.optimize.index.filter=false;
hive> Set hive.input.format=org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexInputFormat;
Set hive.input.format=org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexInputFormat;
hive> select * from table02 where id =500000;
select * from table02 where id =500000;

加了索引之后,cmdLine如下:

/home/allen/Hadoop/hadoop-0.20.2/bin/hadoop jar /home/allen/Desktop/hive-0.7.1/lib/hive-exec-0.7.1.jar org.apache.hadoop.hive.ql.exec.ExecDriver  -plan file:/tmp/allen/hive_2012-03-06_10-52-57_695_2032164111332457666/-local-10002/plan.xml  -jobconf hive.mapjoin.hashtable.initialCapacity=100000 -jobconf datanucleus.connectionPoolingType=DBCP -jobconf hive.exec.script.allow.partial.consumption=false -jobconf hive.metastore.client.connect.retry.delay=1 -jobconf hive.stats.jdbc.atomic=false -jobconf hive.query.id=allen_20120306105353_edd17c2f-008e-489a-a354-b0d65da5d99c -jobconf hive.metastore.sasl.enabled=false -jobconf hive.hwi.listen.port=9999 -jobconf hive.mapjoin.followby.map.aggr.hash.percentmemory=0.3 -jobconf hive.mergejob.maponly=true -jobconf hive.support.concurrency=false -jobconf hive.map.aggr=true -jobconf hive.map.aggr.hash.min.reduction=0.5 -jobconf datanucleus.plugin.pluginRegistryBundleCheck=LOG -jobconf hive.exec.reducers.bytes.per.reducer=1000000000 -jobconf hive.exec.default.partition.name=__HIVE_DEFAULT_PARTITION__ -jobconf hive.optimize.cp=true -jobconf hive.exec.dynamic.partition.mode=strict -jobconf hive.metastore.client.socket.timeout=20 -jobconf datanucleus.cache.level2.type=SOFT -jobconf hive.exec.max.created.files=100000 -jobconf hive.script.serde=org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe -jobconf hive.error.on.empty.partition=false -jobconf hive.fileformat.check=true -jobconf hive.exec.max.dynamic.partitions.pernode=100 -jobconf hive.enforce.sorting=false -jobconf hive.optimize.ppd=true -jobconf hive.optimize.groupby=true -jobconf hive.enforce.bucketing=false -jobconf javax.jdo.option.ConnectionUserName=root -jobconf hive.mapjoin.check.memory.rows=100000 -jobconf hive.mapred.reduce.tasks.speculative.execution=true -jobconf mapred.job.name=select+*+from+table02+where+id+%3D500000%28Stage-1%29 -jobconf javax.jdo.option.DetachAllOnCommit=true -jobconf hive.mapred.local.mem=0 -jobconf datanucleus.cache.level2=false -jobconf hive.session.id=allen_201203061051 -jobconf hive.lock.sleep.between.retries=60 -jobconf hive.exec.show.job.failure.debug.info=false -jobconf hive.script.operator.id.env.var=HIVE_SCRIPT_OPERATOR_ID -jobconf hive.archive.har.parentdir.settable=false -jobconf hive.metastore.server.max.threads=100000 -jobconf hive.udtf.auto.progress=false -jobconf hive.hwi.war.file=lib%2Fhive-hwi-0.7.1.war -jobconf hive.auto.progress.timeout=0 -jobconf datanucleus.validateTables=false -jobconf hive.optimize.ppd.storage=true -jobconf hive.exec.compress.output=false -jobconf hive.test.mode.prefix=test_ -jobconf hive.exec.drop.ignorenonexistent=true -jobconf hive.security.authenticator.manager=org.apache.hadoop.hive.ql.security.HadoopDefaultAuthenticator -jobconf hive.stats.dbconnectionstring=jdbc%3Aderby%3A%3BdatabaseName%3DTempStatsStore%3Bcreate%3Dtrue -jobconf hive.mapjoin.bucket.cache.size=100 -jobconf datanucleus.validateConstraints=false -jobconf hive.stats.jdbc.timeout=30 -jobconf hive.metastore.server.tcp.keepalive=true -jobconf mapred.reduce.tasks=-1 -jobconf hive.query.string=select+*+from+table02+where+id+%3D500000 -jobconf hive.input.format=org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexInputFormat -jobconf hive.task.progress=false -jobconf hive.index.compact.file=%2Ftmp%2Ftable02_index_data -jobconf hive.mapjoin.followby.gby.localtask.max.memory.usage=0.55 -jobconf hive.metastore.ds.retry.interval=1000 -jobconf javax.jdo.option.ConnectionDriverName=com.mysql.jdbc.Driver -jobconf hive.index.compact.file.ignore.hdfs=false -jobconf hive.skewjoin.mapjoin.map.tasks=10000 -jobconf hive.variable.substitute=true -jobconf hive.mapjoin.maxsize=100000 -jobconf hive.archive.enabled=false -jobconf hive.exec.dynamic.partition=false -jobconf hive.optimize.skewjoin=false -jobconf hive.groupby.mapaggr.checkinterval=100000 -jobconf hive.mapjoin.localtask.max.memory.usage=0.90 -jobconf hive.test.mode=false -jobconf hive.exec.parallel=false -jobconf hive.exec.counters.pull.interval=1000 -jobconf hive.default.fileformat=TextFile -jobconf hive.exec.max.dynamic.partitions=1000 -jobconf fs.har.impl=org.apache.hadoop.hive.shims.HiveHarFileSystem -jobconf hive.test.mode.samplefreq=32 -jobconf hive.metastore.ds.retry.attempts=1 -jobconf javax.jdo.option.NonTransactionalRead=true -jobconf hive.zookeeper.clean.extra.nodes=false -jobconf hive.script.auto.progress=false -jobconf hive.zookeeper.namespace=hive_zookeeper_namespace -jobconf hive.merge.mapredfiles=false -jobconf javax.jdo.option.ConnectionURL=jdbc%3Amysql%3A%2F%2Flocalhost%3A3306%2Fhive01%3FcreateDatabaseIfNotExist%3Dtrue -jobconf hive.exec.compress.intermediate=false -jobconf hive.metastore.rawstore.impl=org.apache.hadoop.hive.metastore.ObjectStore -jobconf hive.map.aggr.hash.percentmemory=0.5 -jobconf hive.hwi.listen.host=0.0.0.0 -jobconf datanucleus.transactionIsolation=read-committed -jobconf hive.metastore.cache.pinobjtypes=Table%2CStorageDescriptor%2CSerDeInfo%2CPartition%2CDatabase%2CType%2CFieldSchema%2COrder -jobconf hive.merge.size.per.task=256000000 -jobconf hive.merge.smallfiles.avgsize=16000000 -jobconf datanucleus.autoCreateSchema=true -jobconf hive.groupby.skewindata=false -jobconf hive.metastore.local=true -jobconf hive.skewjoin.mapjoin.min.split=33554432 -jobconf hive.mapjoin.smalltable.filesize=25000000 -jobconf hive.mapred.mode=nonstrict -jobconf hive.optimize.pruner=true -jobconf hive.skewjoin.key=100000 -jobconf hive.security.authorization.enabled=false -jobconf hive.metastore.kerberos.principal=hive-metastore%2F_HOST%40EXAMPLE.COM -jobconf hive.cli.print.header=false -jobconf hive.hbase.wal.enabled=true -jobconf datanucleus.validateColumns=false -jobconf hive.auto.convert.join=false -jobconf datanucleus.identifierFactory=datanucleus -jobconf hive.session.silent=false -jobconf hive.lock.numretries=100 -jobconf hive.optimize.reducededuplication=true -jobconf hive.exec.reducers.max=999 -jobconf javax.jdo.PersistenceManagerFactoryClass=org.datanucleus.jdo.JDOPersistenceManagerFactory -jobconf hive.optimize.index.filter=false -jobconf hive.heartbeat.interval=1000 -jobconf hive.stats.autogather=true -jobconf hive.map.aggr.hash.force.flush.memory.threshold=0.9 -jobconf datanucleus.autoStartMechanismMode=checked -jobconf hive.join.cache.size=25000 -jobconf hive.metastore.warehouse.dir=%2Fuser%2Fhive%2Fwarehouse -jobconf javax.jdo.option.ConnectionPassword=password -jobconf hive.metastore.connect.retries=5 -jobconf hive.concurrency.manager=org.apache.hadoop.hive.ql.lockmgr.ZooKeeperLockMgr -jobconf hive.exec.mode.local.auto=false -jobconf hive.mapjoin.cache.numrows=25000 -jobconf hive.exec.parallel.thread.number=8 -jobconf hive.mapjoin.hashtable.loadfactor=0.75 -jobconf datanucleus.storeManagerType=rdbms -jobconf hive.script.recordreader=org.apache.hadoop.hive.ql.exec.TextRecordReader -jobconf hive.zookeeper.client.port=2181 -jobconf hive.exec.scratchdir=%2Ftmp%2Fhive-%24%7Buser.name%7D -jobconf hive.stats.dbclass=jdbc%3Aderby -jobconf hive.security.authorization.manager=org.apache.hadoop.hive.ql.security.authorization.DefaultHiveAuthorizationProvider -jobconf hive.metastore.server.min.threads=200 -jobconf hive.fetch.output.serde=org.apache.hadoop.hive.serde2.DelimitedJSONSerDe -jobconf hive.script.recordwriter=org.apache.hadoop.hive.ql.exec.TextRecordWriter -jobconf hive.merge.mapfiles=true -jobconf hive.stats.jdbcdriver=org.apache.derby.jdbc.EmbeddedDriver -jobconf hive.exec.script.maxerrsize=100000 -jobconf hive.join.emit.interval=1000 -jobconf hive.added.jars.path= -jobconf mapred.system.dir=%2Ftmp%2Fhadoop-allen%2Fmapred%2Fsystem%2F1181744880 -jobconf mapred.local.dir=%2Ftmp%2Fhadoop-allen%2Fmapred%2Flocal%2F76345676

基本格式如下:

Hadoop jar hive-exec-0.7.1.jar

org.apache.hadoop.hive.ql.exec.EcecDriver

-plan  .../plan.xml

-jobConf Key= Value ....

调用hadoop执行

遇到一个问题,因为我在eclipse里跑的是Java程序,不是提交到集群去debug,所以要加上选项-jobconf fs.default.name=hdfs://localhost:9000 ,这样才会去HDFS中读取数据

所以我们需要仔细的了解一下ExecDriver

找到main函数,

一系列的解析参数打印出来之后看到:

MapredWork plan = Utilities.deserializeMapRedWork(pathData, conf);
ExecDriver ed = new ExecDriver(plan, conf, isSilent);
ret = ed.execute(new DriverContext());

来到ExecDriver.execute(ctx);  //Execute a query plan using Hadoop.

//We set it to  be org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexInputFormat
    String inpFormat = HiveConf.getVar(job, HiveConf.ConfVars.HIVEINPUTFORMAT);
    
    if ((inpFormat == null) || (!StringUtils.isNotBlank(inpFormat))) {
      inpFormat = ShimLoader.getHadoopShims().getInputFormatClassName();
    }
这里得到inputFormat为HiveCompactIndexInputFormat.,将它设为job的inputFormat

//12/03/06 13:36:26 INFO exec.ExecDriver: Processing alias table02
      //12/03/06 13:36:26 INFO exec.ExecDriver: Adding input file hdfs://localhost:9000/user/hive/warehouse/table02
      addInputPaths(job, work, emptyScratchDirStr);//put input path into jobconf from work.


JobClient jc = new JobClient(job);


      rj = jc.submitJob(job);

submitJob调用JobClient的submitJobInternal

    // Create the splits for the job
    LOG.debug("Creating splits at " + fs.makeQualified(submitSplitFile));
    int maps;
    if (job.getUseNewMapper()) {
      maps = writeNewSplits(context, submitSplitFile);
    } else {
      maps = writeOldSplits(job, submitSplitFile);
    }
此处writeOldSplits用来得到maps也就是分片数目,追踪进去

  private int writeOldSplits(JobConf job, 
                             Path submitSplitFile) throws IOException {
    InputSplit[] splits = 
      job.getInputFormat().getSplits(job, job.getNumMapTasks());
这里让HiveCompactIndexInputFormat来提供分片

此方法会读取索引文件/tmp/table02_index来得到真正的inputPath,返回实际的分片(table02中的一部分包含被索引id值的数据)期间还会调用TextInputFormat来切割数据。过滤掉未contain索引值的分片,返回。

然后在writeOldSplits中会将分片信息写到hdfs文件中

不过从writeOldSplits返回后,将含有分片信息的文件路径写到job的config中:DataOutputStream out = writeSplitsFileHeader(job, submitSplitFile, splits.length);然后还会再一次真真的submitJob

    //
    // Now, actually submit the job (using the submit name)
    //
    JobStatus status = jobSubmitClient.submitJob(jobId);
    if (status != null) {
      return new NetworkedJob(status);

所以之前的获得分片的操作会再一次执行一次。

返回到ExecDriver,ExecDriverTaskHandle th = new ExecDriverTaskHandle(jc, rj);//start to execute

success = progress(th);等待结束。

这样差不多一个job就算提交并运行了。


总的来说,就是通过ive.input.format=org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexInputFormat

hive.index.compact.file=%2Ftmp%2Ftable02_index_data

将获取文件分片的操作委托给HiveCompactInputFormat,让其分析自己的hive.index.compact.file中的索引信息,得到实际需要读取的HDFS文件分片,避免全部扫描!
而这些,是通过手动设置下面的参数达到的:
hive> Set hive.index.compact.file=/tmp/table02_index_data;  
hive> Set hive.optimize.index.filter=false;  
hive> Set hive.input.format=org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexInputFormat;  




   

你可能感兴趣的:(jdbc,command,table,null,query,dataset)