HQL中count(*),count(1),count(column)的区别

1、执行计划:

count(*)

explain select count(*) from dpods_SMS_Queue limit 100;
Explain
ABSTRACT SYNTAX TREE:
  (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME dpods_SMS_Queue))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_FUNCTIONSTAR count))) (TOK_LIMIT 100)))
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 is a root stage
STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Alias -> Map Operator Tree:
        dpods_sms_queue 
          TableScan
            alias: dpods_sms_queue
            Select Operator
              Group By Operator
                aggregations:
                      expr: count()    // 此处不同
                bucketGroup: false
                mode: hash
                outputColumnNames: _col0
                Reduce Output Operator
                  sort order: 
                  tag: -1
                  value expressions:
                        expr: _col0
                        type: bigint
      Reduce Operator Tree:
        Group By Operator
          aggregations:
                expr: count(VALUE._col0)
          bucketGroup: false
          mode: mergepartial
          outputColumnNames: _col0
          Select Operator
            expressions:
                  expr: _col0
                  type: bigint
            outputColumnNames: _col0
            Limit
              File Output Operator
                compressed: false
                GlobalTableId: 0
                table:
                    input format: org.apache.hadoop.mapred.TextInputFormat
                    output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
  Stage: Stage-0
    Fetch Operator
      limit: 100

count(1)

explain select count(1) from dpods_SMS_Queue limit 100;
Explain
ABSTRACT SYNTAX TREE:
  (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME dpods_SMS_Queue))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_FUNCTION count 1))) (TOK_LIMIT 100)))
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 is a root stage
STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Alias -> Map Operator Tree:
        dpods_sms_queue 
          TableScan
            alias: dpods_sms_queue
            Select Operator
              Group By Operator
                aggregations:
                      expr: count(1)    // 此处不同
                bucketGroup: false
                mode: hash
                outputColumnNames: _col0
                Reduce Output Operator
                  sort order: 
                  tag: -1
                  value expressions:
                        expr: _col0
                        type: bigint
      Reduce Operator Tree:
        Group By Operator
          aggregations:
                expr: count(VALUE._col0)
          bucketGroup: false
          mode: mergepartial
          outputColumnNames: _col0
          Select Operator
            expressions:
                  expr: _col0
                  type: bigint
            outputColumnNames: _col0
            Limit
              File Output Operator
                compressed: false
                GlobalTableId: 0
                table:
                    input format: org.apache.hadoop.mapred.TextInputFormat
                    output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
  Stage: Stage-0
    Fetch Operator
      limit: 100

count(column)

explain select count(msgid) from dpods_SMS_Queue limit 100;
Explain
ABSTRACT SYNTAX TREE:
  (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME dpods_SMS_Queue))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_FUNCTION count (TOK_TABLE_OR_COL msgid)))) (TOK_LIMIT 100)))
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 is a root stage
STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Alias -> Map Operator Tree:
        dpods_sms_queue 
          TableScan
            alias: dpods_sms_queue
            Select Operator    // 此处多了一个Select Operator
              expressions:
                    expr: msgid
                    type: int
              outputColumnNames: msgid
              Group By Operator
                aggregations:
                      expr: count(msgid)    // 此处不同
                bucketGroup: false
                mode: hash
                outputColumnNames: _col0
                Reduce Output Operator
                  sort order: 
                  tag: -1
                  value expressions:
                        expr: _col0
                        type: bigint
      Reduce Operator Tree:
        Group By Operator
          aggregations:
                expr: count(VALUE._col0)
          bucketGroup: false
          mode: mergepartial
          outputColumnNames: _col0
          Select Operator
            expressions:
                  expr: _col0
                  type: bigint
            outputColumnNames: _col0
            Limit
              File Output Operator
                compressed: false
                GlobalTableId: 0
                table:
                    input format: org.apache.hadoop.mapred.TextInputFormat
                    output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
  Stage: Stage-0
    Fetch Operator
      limit: 100

从上面三个执行计划来看,只有两处不同:

  • count(column)的时候多了一个Select operator;

  • Group By Operator 的 aggregations过程略有不同;

2、结果:

count(*):所有行进行统计,包括NULL行

count(1):所有行进行统计,包括NULL行

count(column):对column中非Null进行统计

count(distinct column):对column中非Null进行去重统计

你可能感兴趣的:(HQL中count(*),count(1),count(column)的区别)