hive学习记录

===== 20131219 hive.limit.optimize.enable 优化

虽然我们limit了100而且是没有任何复杂条件的查询,hive竟然也会去扫描所有的数据,这非常奇怪也很浪费。原来hive的limit在默认的情况下的执行过程就是把所有数据都跑出来.

>>> test1

select * from s_test;

两种情况无影响

>>> test2

select dp_id from s_order_hbase limit 100;

number of mappers: 486; number of reducers: 0

---- set hive.limit.optimize.enable=true;后

number of mappers: 1; number of reducers: 0


2012.08.07

  1. python中运行hive
  • >>> command = "hive -e " + "\"" + load data inpath '/fenxi_system/cs/20120612/sms_20120612' overwrite into table s_sms partition(stat_time='20120612') + "\""
 File "<stdin>", line 1
 command = "hive -e " + "\"" + load data inpath '/fenxi_system/cs/20120612/sms_20120612' overwrite into table s_sms  partition(stat_time='20120612') + "\""
                                          ^
SyntaxError: invalid syntax
Error in sys.excepthook:
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/apport_python_hook.py", line 68, in apport_excepthook
    binary = os.path.realpath(os.path.join(os.getcwdu(), sys.argv[0]))
OSError: [Errno 2] No such file or directory

  •   >>> command = "hive -e " +  "'load data inpath '/fenxi_system/cs/20120612/sms_20120612' overwrite into table s_sms partition(stat_time='20120612')'"
ok!


2012.08.08

  1. 创建了表后,想从其他hive用户中copy同样结构表的内容
  • 直接hadoop fs -copy
hadoop fs -ls 是显示内容,可是  select * from s_sms,没有结果。可见元数据是没有的。
  • load data inpath '/user/hive/warehouse/s_edm/stat_time=20120612/edm_20120612' overwrite into table s_sms partition(stat_time='20120808');
select * from s_sms Ok了

  1. 关于row format delimited fields 
hive>  create table test2(uid string,name string)row format delimited fields terminated by 'aaa';
hive> load data local inpath '/home/mjiang/tes' overwrite into table test2;                      
hive> select * from test2;    
123

//tes内容为:123aaa456


hive>  create table test3(uid string,name string)row format delimited fields terminated by ',';  
hive> load data local inpath '/home/mjiang/tes' overwrite into table test3;                    
hive> select * from test3;                            
123 456

//tes内容为:123,456


hive>  create table test1(uid string,name string)row format delimited fields terminated by '/t';

hive> select * from test1;                                                 
123 t 456

//tes内容为:123/t456


分割符可能只能为一个。不过用',' ,是肯定可以的。






你可能感兴趣的:(hive学习记录)