【大数据】Hive查询(select 1)源代码分析详解

查询结果

总共花费了 0.5 毫秒

日志

2021-02-03T10:50:05,288 INFO  [HiveServer2-Handler-Pool: Thread-83393]: conf.HiveConf (HiveConf.java:getLogIdVar(5130)) - Using the default value passed in for log id: 89172071-3587-4ed8-8e3f-d798d3e56695
2021-02-03T10:50:05,288 INFO  [HiveServer2-Handler-Pool: Thread-83393]: session.SessionState (:()) - Updating thread name to 89172071-3587-4ed8-8e3f-d798d3e56695 HiveServer2-Handler-Pool: Thread-83393
2021-02-03T10:50:05,289 INFO  [89172071-3587-4ed8-8e3f-d798d3e56695 HiveServer2-Handler-Pool: Thread-83393]: operation.OperationManager (:()) - Adding operation: OperationHandle [opType=EXECUTE_STATEMENT, getHandleIdentifier()=3def4007-b221-44a9-9896-c6c41c37ab83]
2021-02-03T10:50:05,289 INFO  [89172071-3587-4ed8-8e3f-d798d3e56695 HiveServer2-Handler-Pool: Thread-83393]: ql.Driver (:()) - Compiling command(queryId=hive_20210203105005_f3d9b645-22dd-4a93-b6e8-99bcb55cf58e): select 1
2021-02-03T10:50:05,318 INFO  [89172071-3587-4ed8-8e3f-d798d3e56695 HiveServer2-Handler-Pool: Thread-83393]: lockmgr.DbTxnManager (:()) - Opened txnid:34518
2021-02-03T10:50:05,320 INFO  [89172071-3587-4ed8-8e3f-d798d3e56695 HiveServer2-Handler-Pool: Thread-83393]: parse.CalcitePlanner (:()) - Starting Semantic Analysis
2021-02-03T10:50:05,320 INFO  [89172071-3587-4ed8-8e3f-d798d3e56695 HiveServer2-Handler-Pool: Thread-83393]: parse.CalcitePlanner (:()) - Completed phase 1 of Semantic Analysis
2021-02-03T10:50:05,320 INFO  [89172071-3587-4ed8-8e3f-d798d3e56695 HiveServer2-Handler-Pool: Thread-83393]: parse.CalcitePlanner (:()) - Get metadata for source tables
2021-02-03T10:50:05,320 INFO  [89172071-3587-4ed8-8e3f-d798d3e56695 HiveServer2-Handler-Pool: Thread-83393]: parse.CalcitePlanner (:()) - Get metadata for subqueries
2021-02-03T10:50:05,320 INFO  [89172071-3587-4ed8-8e3f-d798d3e56695 HiveServer2-Handler-Pool: Thread-83393]: parse.CalcitePlanner (:()) - Get metadata for destination tables
2021-02-03T10:50:05,323 INFO  [89172071-3587-4ed8-8e3f-d798d3e56695 HiveServer2-Handler-Pool: Thread-83393]: ql.Context (:()) - New scratch dir is hdfs://bigdev/tmp/hive/hive/89172071-3587-4ed8-8e3f-d798d3e56695/hive_2021-02-03_10-50-05_309_8616204617178133166-400
2021-02-03T10:50:05,323 INFO  [89172071-3587-4ed8-8e3f-d798d3e56695 HiveServer2-Handler-Pool: Thread-83393]: parse.CalcitePlanner (:()) - Completed getting MetaData in Semantic Analysis
2021-02-03T10:50:05,324 INFO  [89172071-3587-4ed8-8e3f-d798d3e56695 HiveServer2-Handler-Pool: Thread-83393]: ql.Context (:()) - New scratch dir is hdfs://bigdev/tmp/hive/hive/89172071-3587-4ed8-8e3f-d798d3e56695/hive_2021-02-03_10-50-05_309_8616204617178133166-400
2021-02-03T10:50:05,711 INFO  [89172071-3587-4ed8-8e3f-d798d3e56695 HiveServer2-Handler-Pool: Thread-83393]: parse.CalcitePlanner (:()) - Get metadata for source tables
2021-02-03T10:50:05,714 INFO  [89172071-3587-4ed8-8e3f-d798d3e56695 HiveServer2-Handler-Pool: Thread-83393]: parse.CalcitePlanner (:()) - Get metadata for subqueries
2021-02-03T10:50:05,714 INFO  [89172071-3587-4ed8-8e3f-d798d3e56695 HiveServer2-Handler-Pool: Thread-83393]: parse.CalcitePlanner (:()) - Get metadata for destination tables
2021-02-03T10:50:05,714 INFO  [89172071-3587-4ed8-8e3f-d798d3e56695 HiveServer2-Handler-Pool: Thread-83393]: ql.Context (:()) - New scratch dir is hdfs://bigdev/tmp/hive/hive/89172071-3587-4ed8-8e3f-d798d3e56695/hive_2021-02-03_10-50-05_309_8616204617178133166-400
2021-02-03T10:50:05,714 INFO  [89172071-3587-4ed8-8e3f-d798d3e56695 HiveServer2-Handler-Pool: Thread-83393]: ql.Context (:()) - New scratch dir is hdfs://bigdev/tmp/hive/hive/89172071-3587-4ed8-8e3f-d798d3e56695/hive_2021-02-03_10-50-05_309_8616204617178133166-400
2021-02-03T10:50:05,715 INFO  [89172071-3587-4ed8-8e3f-d798d3e56695 HiveServer2-Handler-Pool: Thread-83393]: common.FileUtils (FileUtils.java:mkdir(580)) - Creating directory if it doesn't exist: hdfs://bigdev/tmp/hive/hive/89172071-3587-4ed8-8e3f-d798d3e56695/hive_2021-02-03_10-50-05_309_8616204617178133166-400/-mr-10001/.hive-staging_hive_2021-02-03_10-50-05_309_8616204617178133166-400
2021-02-03T10:50:05,719 INFO  [89172071-3587-4ed8-8e3f-d798d3e56695 HiveServer2-Handler-Pool: Thread-83393]: parse.CalcitePlanner (:()) - CBO Succeeded; optimized logical plan.
2021-02-03T10:50:05,719 INFO  [89172071-3587-4ed8-8e3f-d798d3e56695 HiveServer2-Handler-Pool: Thread-83393]: ppd.OpProcFactory (:()) - Processing for FS(2)
2021-02-03T10:50:05,719 INFO  [89172071-3587-4ed8-8e3f-d798d3e56695 HiveServer2-Handler-Pool: Thread-83393]: ppd.OpProcFactory (:()) - Processing for SEL(1)
2021-02-03T10:50:05,719 INFO  [89172071-3587-4ed8-8e3f-d798d3e56695 HiveServer2-Handler-Pool: Thread-83393]: ppd.OpProcFactory (:()) - Processing for TS(0)
2021-02-03T10:50:05,720 INFO  [89172071-3587-4ed8-8e3f-d798d3e56695 HiveServer2-Handler-Pool: Thread-83393]: parse.CalcitePlanner (:()) - Completed plan generation
2021-02-03T10:50:05,720 INFO  [89172071-3587-4ed8-8e3f-d798d3e56695 HiveServer2-Handler-Pool: Thread-83393]: parse.CalcitePlanner (:()) - Not eligible for results caching - no mr/tez/spark jobs
2021-02-03T10:50:05,720 INFO  [89172071-3587-4ed8-8e3f-d798d3e56695 HiveServer2-Handler-Pool: Thread-83393]: ql.Driver (:()) - Semantic Analysis Completed (retrial = false)
2021-02-03T10:50:05,720 INFO  [89172071-3587-4ed8-8e3f-d798d3e56695 HiveServer2-Handler-Pool: Thread-83393]: ql.Driver (:()) - Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:_c0, type:int, comment:null)], properties:null)
2021-02-03T10:50:05,721 INFO  [89172071-3587-4ed8-8e3f-d798d3e56695 HiveServer2-Handler-Pool: Thread-83393]: exec.TableScanOperator (:()) - Initializing operator TS[0]
2021-02-03T10:50:05,722 INFO  [89172071-3587-4ed8-8e3f-d798d3e56695 HiveServer2-Handler-Pool: Thread-83393]: exec.SelectOperator (:()) - Initializing operator SEL[1]
2021-02-03T10:50:05,722 INFO  [89172071-3587-4ed8-8e3f-d798d3e56695 HiveServer2-Handler-Pool: Thread-83393]: exec.SelectOperator (:()) - SELECT null
2021-02-03T10:50:05,722 INFO  [89172071-3587-4ed8-8e3f-d798d3e56695 HiveServer2-Handler-Pool: Thread-83393]: exec.ListSinkOperator (:()) - Initializing operator LIST_SINK[3]
2021-02-03T10:50:05,722 INFO  [89172071-3587-4ed8-8e3f-d798d3e56695 HiveServer2-Handler-Pool: Thread-83393]: ql.Driver (:()) - Completed compiling command(queryId=hive_20210203105005_f3d9b645-22dd-4a93-b6e8-99bcb55cf58e); Time taken: 0.433 seconds
2021-02-03T10:50:05,723 INFO  [89172071-3587-4ed8-8e3f-d798d3e56695 HiveServer2-Handler-Pool: Thread-83393]: conf.HiveConf (HiveConf.java:getLogIdVar(5130)) - Using the default value passed in for log id: 89172071-3587-4ed8-8e3f-d798d3e56695
2021-02-03T10:50:05,723 INFO  [89172071-3587-4ed8-8e3f-d798d3e56695 HiveServer2-Handler-Pool: Thread-83393]: session.SessionState (:()) - Resetting thread name to  HiveServer2-Handler-Pool: Thread-83393
2021-02-03T10:50:05,723 INFO  [HiveServer2-Background-Pool: Thread-83405]: reexec.ReExecDriver (:()) - Execution #1 of query
2021-02-03T10:50:05,723 INFO  [HiveServer2-Background-Pool: Thread-83405]: lockmgr.DbTxnManager (:()) - Setting lock request transaction to txnid:34518 for queryId=hive_20210203105005_f3d9b645-22dd-4a93-b6e8-99bcb55cf58e
2021-02-03T10:50:05,723 INFO  [HiveServer2-Background-Pool: Thread-83405]: lockmgr.DbLockManager (:()) - Requesting: queryId=hive_20210203105005_f3d9b645-22dd-4a93-b6e8-99bcb55cf58e LockRequest(component:[LockComponent(type:SHARED_READ, level:TABLE, dbname:_dummy_database, tablename:_dummy_table, operationType:SELECT, isTransactional:false)], txnid:34518, user:hive, hostname:bigdev4.wuhan.cestc, agentInfo:hive_20210203105005_f3d9b645-22dd-4a93-b6e8-99bcb55cf58e)
2021-02-03T10:50:05,736 INFO  [HiveServer2-Background-Pool: Thread-83405]: lockmgr.DbLockManager (:()) - Response to queryId=hive_20210203105005_f3d9b645-22dd-4a93-b6e8-99bcb55cf58e LockResponse(lockid:25218, state:ACQUIRED)
2021-02-03T10:50:05,740 INFO  [HiveServer2-Background-Pool: Thread-83405]: ql.Driver (:()) - Executing command(queryId=hive_20210203105005_f3d9b645-22dd-4a93-b6e8-99bcb55cf58e): select 1
2021-02-03T10:50:05,740 INFO  [HiveServer2-Background-Pool: Thread-83405]: hooks.HiveProtoLoggingHook (:()) - Received pre-hook notification for: hive_20210203105005_f3d9b645-22dd-4a93-b6e8-99bcb55cf58e
2021-02-03T10:50:05,742 INFO  [HiveServer2-Background-Pool: Thread-83405]: conf.HiveConf (HiveConf.java:getLogIdVar(5130)) - Using the default value passed in for log id: 89172071-3587-4ed8-8e3f-d798d3e56695
2021-02-03T10:50:05,746 INFO  [HiveServer2-Background-Pool: Thread-83405]: hooks.HiveProtoLoggingHook (:()) - Received post-hook notification for: hive_20210203105005_f3d9b645-22dd-4a93-b6e8-99bcb55cf58e
2021-02-03T10:50:05,746 INFO  [HiveServer2-Background-Pool: Thread-83405]: ddlintercept.DDLIntercepter (:()) - DDLIntercepter run:
2021-02-03T10:50:05,746 INFO  [HiveServer2-Background-Pool: Thread-83405]: ql.Driver (:()) - Completed executing command(queryId=hive_20210203105005_f3d9b645-22dd-4a93-b6e8-99bcb55cf58e); Time taken: 0.006 seconds
2021-02-03T10:50:05,746 INFO  [HiveServer2-Background-Pool: Thread-83405]: ql.Driver (:()) - OK
在日志中看到 ,花费的时间主要在 这段内:
2021-02-03T10:50:05,324 INFO  [89172071-3587-4ed8-8e3f-d798d3e56695 HiveServer2-Handler-Pool: Thread-83393]: ql.Context (:()) - New scratch dir is hdfs://bigdev/tmp/hive/hive/89172071-3587-4ed8-8e3f-d798d3e56695/hive_2021-02-03_10-50-05_309_8616204617178133166-400
2021-02-03T10:50:05,711 INFO  [89172071-3587-4ed8-8e3f-d798d3e56695 HiveServer2-Handler-Pool: Thread-83393]: parse.CalcitePlanner (:()) - Get metadata for source tables
通过代码来分析,这段干了啥:

源代码都是基于 apache-hive-3.1.2

Driver.java

主要花费时间在 compile 阶段,直接看compile部分重要代码:

private void compile(String command, boolean resetTaskIds, boolean deferClose) throws CommandProcessorResponse {
    
    // ...部分代码已省略
    
    LOG.info("Compiling command(queryId=" + queryId + "): " + queryStr);

    conf.setQueryString(queryStr);
    // FIXME: sideeffect will leave the last query set at the session level
    if (SessionState.get() != null) {
      SessionState.get().getConf().setQueryString(queryStr);
      SessionState.get().setupQueryCurrentTimestamp();
    }

    // Whether any error occurred during query compilation. Used for query lifetime hook.
    boolean compileError = false;
    boolean parseError = false;

    // 此处通过 sessionState 创建 事务管理组件
    try {

      // Initialize the transaction manager.  This must be done before analyze is called.
      if (initTxnMgr != null) {
        queryTxnMgr = initTxnMgr;
      } else {
        queryTxnMgr = SessionState.get().initTxnMgr(conf);
      }
      if (queryTxnMgr instanceof Configurable) {
        ((Configurable) queryTxnMgr).setConf(conf);
      }
      queryState.setTxnManager(queryTxnMgr);

      // In case when user Ctrl-C twice to kill Hive CLI JVM, we want to release locks
      // if compile is being called multiple times, clear the old shutdownhook
      ShutdownHookManager.removeShutdownHook(shutdownRunner);
      final HiveTxnManager txnMgr = queryTxnMgr;
      // 钩子函数,目标是结束之前 释放锁并提交或者回退逻辑
      shutdownRunner = new Runnable() {
        @Override
        public void run() {
          try {
            releaseLocksAndCommitOrRollback(false, txnMgr);
          } catch (LockException e) {
            LOG.warn("Exception when releasing locks in ShutdownHook for Driver: " +
                e.getMessage());
          }
        }
      };
      ShutdownHookManager.addShutdownHook(shutdownRunner, SHUTDOWN_HOOK_PRIORITY);

      checkInterrupted("before parsing and analysing the query", null, null);

      if (ctx == null) {
        ctx = new Context(conf);
        setTriggerContext(queryId);
      }

      // 设置Context上下文需要的 组件
      ctx.setHiveTxnManager(queryTxnMgr);
      ctx.setStatsSource(statsSource);
      ctx.setCmd(command);
      ctx.setHDFSCleanup(true);

      perfLogger.PerfLogBegin(CLASS_NAME, PerfLogger.PARSE);

      // Trigger query hook before compilation
      hookRunner.runBeforeParseHook(command);

      // 重要解析工具,解析 输入的 command 
      ASTNode tree;
      try {
        tree = ParseUtils.parse(command, ctx);
      } catch (ParseException e) {
        parseError = true;
        throw e;
      } finally {
        hookRunner.runAfterParseHook(command, parseError);
      }
      
      // 性能日志 记录耗费的时间
      perfLogger.PerfLogEnd(CLASS_NAME, PerfLogger.PARSE);

      hookRunner.runBeforeCompileHook(command);
      // clear CurrentFunctionsInUse set, to capture new set of functions
      // that SemanticAnalyzer finds are in use
      SessionState.get().getCurrentFunctionsInUse().clear();
      perfLogger.PerfLogBegin(CLASS_NAME, PerfLogger.ANALYZE);

      // Flush the metastore cache.  This assures that we don't pick up objects from a previous
      // query running in this same thread.  This has to be done after we get our semantic
      // analyzer (this is when the connection to the metastore is made) but before we analyze,
      // because at that point we need access to the objects.
      // 拿到 metastore 刷新缓存。
      Hive.get().getMSC().flushCache();

      backupContext = new Context(ctx);
      boolean executeHooks = hookRunner.hasPreAnalyzeHooks();

      HiveSemanticAnalyzerHookContext hookCtx = new HiveSemanticAnalyzerHookContextImpl();
      if (executeHooks) {
        hookCtx.setConf(conf);
        hookCtx.setUserName(userName);
        hookCtx.setIpAddress(SessionState.get().getUserIpAddress());
        hookCtx.setCommand(command);
        hookCtx.setHiveOperation(queryState.getHiveOperation());

        tree =  hookRunner.runPreAnalyzeHooks(hookCtx, tree);
      }

      // 解析成 ASTTree之后,交给语法解析进行语义分析。
      // Do semantic analysis and plan generation
      // 依据上面日志打印的 此处得到的是 (CalcitePlanner)
      BaseSemanticAnalyzer sem = SemanticAnalyzerFactory.get(queryState, tree);

      if (!retrial) {
        openTransaction();
        generateValidTxnList();
      }

      // 开始 分析 
      // 耗费的时间都化费在此处了
      sem.analyze(tree, ctx);

      // 部分代码已省略...
  }

结合日志和代码来定位 ,花费的时间都在sem.analyze(tree, ctx)处了

看看此处究竟干了啥SemanticAnalyzer.java

void analyzeInternal(ASTNode ast, PlannerContextFactory pcf) throws SemanticException {
    LOG.info("Starting Semantic Analysis");
    // 1. Generate Resolved Parse tree from syntax tree
    boolean needsTransform = needsTransform();
    //change the location of position alias process here
    processPositionAlias(ast);
    PlannerContext plannerCtx = pcf.create();
    if (!genResolvedParseTree(ast, plannerCtx)) {
      return;
    }

    if (HiveConf.getBoolVar(conf, ConfVars.HIVE_REMOVE_ORDERBY_IN_SUBQUERY)) {
      for (String alias : qb.getSubqAliases()) {
        removeOBInSubQuery(qb.getSubqForAlias(alias));
      }
    }

    // Check query results cache.
    // If no masking/filtering required, then we can check the cache now, before
    // generating the operator tree and going through CBO.
    // Otherwise we have to wait until after the masking/filtering step.
    boolean isCacheEnabled = isResultsCacheEnabled();
    QueryResultsCache.LookupInfo lookupInfo = null;
    if (isCacheEnabled && !needsTransform && queryTypeCanUseCache()) {
      lookupInfo = createLookupInfoForQuery(ast);
      if (checkResultsCache(lookupInfo)) {
        return;
      }
    }

    ASTNode astForMasking;
    if (isCBOExecuted() && needsTransform &&
        (qb.isCTAS() || qb.isView() || qb.isMaterializedView() || qb.isMultiDestQuery())) {
      // If we use CBO and we may apply masking/filtering policies, we create a copy of the ast.
      // The reason is that the generation of the operator tree may modify the initial ast,
      // but if we need to parse for a second time, we would like to parse the unmodified ast.
      astForMasking = (ASTNode) ParseDriver.adaptor.dupTree(ast);
    } else {
      astForMasking = ast;
    }

    // 2. Gen OP Tree from resolved Parse Tree
    Operator sinkOp = genOPTree(ast, plannerCtx);

    boolean usesMasking = false;
    if (!unparseTranslator.isEnabled() &&
        (tableMask.isEnabled() && analyzeRewrite == null)) {
      // Here we rewrite the * and also the masking table
      ASTNode rewrittenAST = rewriteASTWithMaskAndFilter(tableMask, astForMasking, ctx.getTokenRewriteStream(),
          ctx, db, tabNameToTabObject, ignoredTokens);
      if (astForMasking != rewrittenAST) {
        usesMasking = true;
        plannerCtx = pcf.create();
        ctx.setSkipTableMasking(true);
        init(true);
        //change the location of position alias process here
        processPositionAlias(rewrittenAST);
        // 远程调试 ,发现耗时就在此处。
        genResolvedParseTree(rewrittenAST, plannerCtx);
        if (this instanceof CalcitePlanner) {
          ((CalcitePlanner) this).resetCalciteConfiguration();
        }
        sinkOp = genOPTree(rewrittenAST, plannerCtx);
      }
    }
    
    // 部分代码已省略...
  }

看 此处 SemanticAnalyzer.java 中 genResolvedParseTree

boolean genResolvedParseTree(ASTNode ast, PlannerContext plannerCtx) throws SemanticException {
    ASTNode child = ast;
    this.ast = ast;
    viewsExpanded = new ArrayList();
    ctesExpanded = new ArrayList();

    // 1. analyze and process the position alias
    // step processPositionAlias out of genResolvedParseTree

    // 2. analyze create table command
    if (ast.getToken().getType() == HiveParser.TOK_CREATETABLE) {
      // if it is not CTAS, we don't need to go further and just return
      if ((child = analyzeCreateTable(ast, qb, plannerCtx)) == null) {
        return false;
      }
    } else {
      queryState.setCommandType(HiveOperation.QUERY);
    }

    // 3. analyze create view command
    if (ast.getToken().getType() == HiveParser.TOK_CREATEVIEW ||
        ast.getToken().getType() == HiveParser.TOK_CREATE_MATERIALIZED_VIEW ||
        (ast.getToken().getType() == HiveParser.TOK_ALTERVIEW &&
            ast.getChild(1).getType() == HiveParser.TOK_QUERY)) {
      child = analyzeCreateView(ast, qb, plannerCtx);
      if (child == null) {
        return false;
      }
      viewSelect = child;
      // prevent view from referencing itself
      viewsExpanded.add(createVwDesc.getViewName());
    }

    switch(ast.getToken().getType()) {
    case HiveParser.TOK_SET_AUTOCOMMIT:
      assert ast.getChildCount() == 1;
      if(ast.getChild(0).getType() == HiveParser.TOK_TRUE) {
        setAutoCommitValue(true);
      }
      else if(ast.getChild(0).getType() == HiveParser.TOK_FALSE) {
        setAutoCommitValue(false);
      }
      else {
        assert false : "Unexpected child of TOK_SET_AUTOCOMMIT: " + ast.getChild(0).getType();
      }
      //fall through
    case HiveParser.TOK_START_TRANSACTION:
    case HiveParser.TOK_COMMIT:
    case HiveParser.TOK_ROLLBACK:
      if(!(conf.getBoolVar(ConfVars.HIVE_IN_TEST) || conf.getBoolVar(ConfVars.HIVE_IN_TEZ_TEST))) {
        throw new IllegalStateException(SemanticAnalyzerFactory.getOperation(ast.getToken().getType()) +
            " is not supported yet.");
      }
      queryState.setCommandType(SemanticAnalyzerFactory.getOperation(ast.getToken().getType()));
      return false;
    }

    // masking and filtering should be created here
    // the basic idea is similar to unparseTranslator.
    tableMask = new TableMask(this, conf, ctx.isSkipTableMasking());

    // 4. continue analyzing from the child ASTNode.
    Phase1Ctx ctx_1 = initPhase1Ctx();
    if (!doPhase1(child, qb, ctx_1, plannerCtx)) {
      // if phase1Result false return
      return false;
    }
    LOG.info("Completed phase 1 of Semantic Analysis");

    // 5. Resolve Parse Tree
    // Materialization is allowed if it is not a view definition
    getMetaData(qb, createVwDesc == null);
    LOG.info("Completed getting MetaData in Semantic Analysis");

    plannerCtx.setParseTreeAttr(child, ctx_1);

    return true;
  }

在 getMetaData的源代码中有 对 HDFS文件的操作,主要是创建 scratchFile(草稿目录)

private void getMetaData(QB qb, ReadEntity parentInput) throws HiveException {
    LOG.info("Get metadata for source tables");
    
    // 省略部分代码
    
    for (String alias : tabAliases) {
      String tabName = qb.getTabNameForAlias(alias);
      String cteName = tabName.toLowerCase();
      
      // 省略部分代码
      
      // 此处逻辑主要是对库表的定义,分析,描述,以及分区等信息
     
      Path location;
            // If the CTAS query does specify a location, use the table location, else use the db location
            if (qb.getTableDesc() != null && qb.getTableDesc().getLocation() != null) {
              location = new Path(qb.getTableDesc().getLocation());
            } else {
              // 如果表的描述 等于 null时, 创建一个临时目录
              // allocate a temporary output dir on the location of the table
              String tableName = getUnescapedName((ASTNode) ast.getChild(0));
              String[] names = Utilities.getDbTableName(tableName);
              try {
                Warehouse wh = new Warehouse(conf);
                //Use destination table's db location.
                String destTableDb = qb.getTableDesc() != null ? qb.getTableDesc().getDatabaseName() : null;
                if (destTableDb == null) {
                  destTableDb = names[0];
                }
                location = wh.getDatabasePath(db.getDatabase(destTableDb));
              } catch (MetaException e) {
                throw new SemanticException(e);
                }
              }
              try {
                CreateTableDesc tblDesc = qb.getTableDesc();
                // 此处与 HDFS 有交互。
                if (tblDesc != null
                    && tblDesc.isTemporary()
                    && AcidUtils.isInsertOnlyTable(tblDesc.getTblProps(), true)) {
                  fname = FileUtils.makeQualified(location, conf).toString();
                } else {
                  fname = ctx.getExtTmpPathRelTo(
                      FileUtils.makeQualified(location, conf)).toString();
                }
              } catch (Exception e) {
                throw new SemanticException(generateErrorMessage(ast,
                    "Error creating temporary folder on: " + location.toString()), e);
              }
              if (HiveConf.getBoolVar(conf, HiveConf.ConfVars.HIVESTATSAUTOGATHER)) {
                TableSpec ts = new TableSpec(db, conf, this.ast);
                // Add the table spec for the destination table.
                qb.getParseInfo().addTableSpec(ts.tableName.toLowerCase(), ts);
              }
          } else {
            // This is the only place where isQuery is set to true; it defaults to false.
            qb.setIsQuery(true);
            Path stagingPath = getStagingDirectoryPathname(qb);
            fname = stagingPath.toString();
            ctx.setResDir(stagingPath);
          }
      
    }  
    
}

结论

从日志以及最终的结果来看,select 1 慢的原因,与 操作库表元数据,HDFS交互有重大关系。

在排查系统性能指标的过程中。发现 系统的 网络连接数,有非常多的 CLOSE_WAIT.

[root@bigdev3 hive]# netstat -nat |awk '{print $6}'|sort|uniq -c
    4900 CLOSE_WAIT
      1 established)
    411 ESTABLISHED
      5 FIN_WAIT2
      1 Foreign
     82 LISTEN
    194 TIME_WAIT

出现CLOSE_WAIT如此之多的情况下,判断如下几种情况:

  1. 文件句柄数打开非常多,没有正常关闭
  2. 服务端出现异常,没有关闭Socket(三次握手,中最后一次等待超时了,导致客户端关了,服务端还没关闭。)

希望对正在查看文章的您有所帮助,记得关注、评论、收藏,谢谢您

你可能感兴趣的:(大数据,Hive,Spark,hive,大数据,hadoop)