原因: 我的表格式是lzo的,但是我写入的时候并没有指定文件格式,造成select * from表没有数据, select 列 有数据
set mapred.output.compress=true;
set hive.exec.compress.output=true;
set mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec;
set mapred.reduce.tasks=137;
insert overwrite table e_ods.ods_dhprds_priceinfo
select priceid,
hotelid,
roomtypeid,
rateplanid,
begindate,
enddate,
rateplancode,
gensalecost,
gensaleprice,
weekendsalecost,
weekendsaleprice,
allowaddbed,
addbedprice,
currencycode,
ispriceset,
iseffective,
auditstatus,
operatetime,
operator,
operateip,
ratecalculationmodeltype,
commissioncalculationtype,
weekdaycommissioncalculationvalue,
weekendcommissioncalculationvalue,
weekdaynetrate,
weekendnetrate,
minprofit,
maxprofit
from e_ods.ods_dhprds_priceinfo_fl;
What is the difference between 'select * from table' and 'select column from table' in hive?
Your table in Hive is stored as a directory in the HDFS. When you do ‘select * from table’ the Hive query processor simply goes to the directory that will have one or more files adhering to table schema and it will dump all the data as it is on display immediately. You may do this if you have very very small data like less than a Gigabyte. In real clusters if you hit ‘select * from table’, it may have data in Terabytes and displaying that will run for long long time.
‘select column from table’ is a projection query on the table where the Hive query processor has to read all the rows in the table and extract the column value from each row and display it. Hive query processor compile the SQL query in to sequence of map reduce programs to achieve data processing. Any data processing you do in Hive is achieved through sequence of map reduce programs that reads data from table stored on HDFS. Hive is a map reduce based batch oriented query processing engine.
Similarly when you add where conditions in the SQL query it will do map reduce based data processing except if you have created partition and where clause is on the partition value e.g., if you have day partition for a table then, any new data you add for a new day a new subfolder is created under the table’s folder for each day. So if you have query like select * from table where day=’400′ it will dump all the files contents under sub directory 400 in the main table directory.
Further tables in Hive may have wide number of columns like 50 columns representing different values. If you want to do select column from table then map reduce program will scan all the rows and extract a column from 50 column values in a row. Better way to do this is to define columnar storage like Parquet files as a file format for Hive table files or RCFile format to extract one or more columns from the table frequently for processing.