hive修改分隔符:
- alter table tableName set SERDEPROPERTIES('field.delim'='\t');
hive根据数据创建分区,并且动态加载数据到分区
-
insert into table device_status_log partition( date )
select `vin`,`obd_id` , `function_id` , `message_id` ,`message_content` ,
`longitude`,`latitude` ,`speed` ,`engine_speed` ,`gps_stat`,`client_time`,
`create_time`,`analytical_result`,regexp_replace( to_date(create_time ) ,'-','') as date
from pre_device_status_log ;
如果报如下错误的话
Dynamic partition strict mode requires at least one static partition column. To turn this off set hive.exec.dynamic.partition.mode=nonstrict
按照提示在hivecCli设置 :set hive.exec.dynamic.partition.mode=nonstrict
Loading data to table obd_message.device_status_log partition (date=null)
Time taken for load dynamic partitions : 4073
Loading partition {date=20161020}
Loading partition {date=20161017}
Loading partition {date=20161024}
Loading partition {date=20161021}
Loading partition {date=20161023}
Loading partition {date=20161026}
Loading partition {date=20161015}
Loading partition {date=20161018}
Loading partition {date=20161016}
Loading partition {date=20161019}
Loading partition {date=20161025}
Loading partition {date=20161022}
Time taken for adding to write entity : 6
Partition obd_message.device_status_log{date=20161015} stats: [numFiles=1, numRows=188, totalSize=79565, rawDataSize=79377]
Partition obd_message.device_status_log{date=20161016} stats: [numFiles=1, numRows=648, totalSize=299298, rawDataSize=298650]
Partition obd_message.device_status_log{date=20161017} stats: [numFiles=1, numRows=912, totalSize=414597, rawDataSize=413685]
Partition obd_message.device_status_log{date=20161018} stats: [numFiles=1, numRows=895, totalSize=410935, rawDataSize=410040]
Partition obd_message.device_status_log{date=20161019} stats: [numFiles=1, numRows=1412, totalSize=613903, rawDataSize=612491]
Partition obd_message.device_status_log{date=20161020} stats: [numFiles=1, numRows=475, totalSize=204375, rawDataSize=203900]
Partition obd_message.device_status_log{date=20161021} stats: [numFiles=1, numRows=346, totalSize=142079, rawDataSize=141733]
Partition obd_message.device_status_log{date=20161022} stats: [numFiles=1, numRows=561, totalSize=220711, rawDataSize=220150]
Partition obd_message.device_status_log{date=20161023} stats: [numFiles=1, numRows=856, totalSize=352452, rawDataSize=351596]
Partition obd_message.device_status_log{date=20161024} stats: [numFiles=1, numRows=1997, totalSize=783701, rawDataSize=781704]
Partition obd_message.device_status_log{date=20161025} stats: [numFiles=1, numRows=1384, totalSize=556970, rawDataSize=555586]
Partition obd_message.device_status_log{date=20161026} stats: [numFiles=1, numRows=326, totalSize=133275, rawDataSize=132949]
hive查看分区
- show partitions device_status_log ;
hive正则匹配去除指定分隔符:
create_time 类型为2016-10-10 00:00:00
- regexp_replace( to_date(create_time ) ,'-','') as date
hive 时间函数 添加分钟或者秒
- from_unixtime(unix_timestamp(client_time) + 8*3600 ) as client_time
hive 自带的时间 函数 有date_add( ) 但是只能对天进行增加减少
- date date( date_add( date_sub( datediff( datetime
有些tips
创建hiveInit.sh
编辑内容如下 (此处的目的是为了能够尽量让job在本地执行,缩短等待时间,方便调试):
SET mapred.job.tracker=local;
set mapred.reduce.tasks = 1;
set hive.exec.mode.local.auto.input.files.max=1000;
set hive.exec.mode.local.auto.inputbytes.max=50000000;
set hive.exec.mode.local.auto.tasks.max=10;
set hive.exec.mode.local.auto=true;
set hive.cli.print.current.db=true;
set hive.cli.print.header=true;
show databases;
use obd_message;
在编辑 hiveStart.sh
hive -i hiveInit.sh
然后修改执行权限 在当前目录执行 ./hiveStart.sh 就能以指定的配置启动hiveClient