-----HDFS:Hadoop分布式文件系统是Hadoop的框架的一部分,用于存储和处理数据集。它提供了一个容错文件系统
在普通硬件上运行。
联机事务处理:OLTP(on-line transaction processing)、联机分析处理OLAP:On-Line Analytical Processing.
HiveQL:数据定义
hive中的数据库的概念本质上市表的一个目录或者命名空间
1、查看当前的数据库:show databases;
--------------------------------------------- hive> show databases; OK default Time taken: 4.324 seconds, Fetched: 1 row(s) ---------------------------------------------OK 和 Time taken: 4.324 seconds, Fetched: 1 row(s)是系统的回答的执行信息
2、创建一个sample数据库
create database sample;note:当数据库本身含有sample这个数据库的时候,在用sample这个名字,那么回提示已经存在:
create database sample; FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Database sample already exists当然也可以用 if not exist来判断并执行
hive> create database if not exist sample; FAILED: ParseException line 1:23 missing KW_EXISTS at 'exist' near '<EOF>' line 1:29 extraneous input 'sample' expecting EOF near '<EOF>'
hive> show databases like 's.*' ; OK sample Time taken: 0.101 seconds, Fetched: 1 row(s)hive会为每一个数据库创建一个目录。数据库中的表会以这个数据库的字目录的形式存储。但是default例外
<property> <name>hive.metastore.warehouse.dir</name> <value>/user/hive/warehouse</value> <description>location of default database for the warehouse</description> </property>这个/user/hive/warehouse就是我们在hdfs下的hdfs://Master:9000/user/hive/warehouse/
4、查看一个数据库放在哪里:describe database sample
hive> describe database sample; OK sample hdfs://Master:9000/user/hive/warehouse/sample.db root USER Time taken: 0.696 seconds, Fetched: 1 row(s)這里的URI格式是hdfs,如果是MapR,那么就是maprfs
5、切换到数据库里面: use sample
hive> use sample; OK Time taken: 0.15 seconds那么你现在就可以sample这个数据库进行操作
6、删除一个数据库
drop database sample;当然还是可以用if exists
hive> drop database sample; FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. InvalidOperationException(message:Database sample is not empty. One or more tables exist.)处理方法:要么把表删除干净,然后再删除这个数据库
hive> drop database if exists sample cascade; OK Time taken: 4.089 seconds当然还有一个 Restrict 关键字 ,restrict和默认一样 ,也必须删除表才可以删除数据库
7、创建表
hive中创建一个表和sql大体相同,但是hive有一些扩展功能。比如这个数据文件存储在什么位置,和用什么样的格式存储
如下:创建一个employee表
create table employee( name STRING comment 'employee name', salary FLOAT comment 'employee salary', subordinates ARRAY<STRING> comment 'name of subordinates', deductions MAP<STRING,FLOAT> comment 'key are deduction names,value are percentages', address STRUCT<Street:STRING,City:STRING,State:STRING,Zip:INT> comment 'home address') comment 'description of the table' tblproperties('creator'='me','created_at'='2016-01-26 10:00:00',...) location '/user/hive/warehouse/sample.db/employee';分析:1、可以指定这个表放在什么位子location '/user/hive/warehouse/sample.db/employee';
行说明,hive对自动增加两个关于表的属性
1:last_modified_by 保存最后修改这个表的用户
2:last_modified_time 保存最后一次修改的时间
8、查看一个表的属性
hive> show tblproperties employee; OK comment description of table transient_lastDdlTime 1461649362 Time taken: 0.278 seconds, Fetched: 2 row(s)
hive> create table sample.employee11 > like sample.employees; OK Time taken: 0.486 seconds
use sample; show tables ‘empl.*’;note: in database_name 语句不可以和正则表达式一起用
hive> show tables 'empl.*' in sample; FAILED: ParseException line 1:21 missing EOF at 'in' near ''empl.*''
11、查看一个表内部的信息
hive> describe sample.employees; OK name string employee name salary float employee salary subordinates array<string> name of subordinates deductions map<string,float> key are deduction names,value are percentages address struct<Street:string,City:string,State:string,Zip:int> Time taken: 0.279 seconds, Fetched: 5 row(s) hive> describe extended sample.employees; OK name string employee name salary float employee salary subordinates array<string> name of subordinates deductions map<string,float> key are deduction names,value are percentages address struct<Street:string,City:string,State:string,Zip:int> Detailed Table Information Table(tableName:employees, dbName:sample, owner:root, createTime:1461649004, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:name, type:string, comment:employee name), FieldSchema(name:salary, type:float, comment:employee salary), FieldSchema(name:subordinates, type:array<string>, comment:name of subordinates), FieldSchema(name:deductions, type:map<string,float>, comment:key are deduction names,value are percentages), FieldSchema(name:address, type:struct<Street:string,City:string,State:string,Zip:int>, comment:null)], location:hdfs://Master:9000/user/local/hive/warehouse/sample.db/employees, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{transient_lastDdlTime=1461649004}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE) Time taken: 0.306 seconds, Fetched: 7 row(s) hive> describe formatted sample.employees; OK # col_name data_type comment name string employee name salary float employee salary subordinates array<string> name of subordinates deductions map<string,float> key are deduction names,value are percentages address struct<Street:string,City:string,State:string,Zip:int> # Detailed Table Information Database: sample Owner: root CreateTime: Mon Apr 25 22:36:44 PDT 2016 LastAccessTime: UNKNOWN Protect Mode: None Retention: 0 Location: hdfs://Master:9000/user/local/hive/warehouse/sample.db/employees Table Type: MANAGED_TABLE Table Parameters: transient_lastDdlTime 1461649004 # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat: org.apache.hadoop.mapred.TextInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Compressed: No Num Buckets: -1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: serialization.format 1 Time taken: 0.438 seconds, Fetched: 30 row(s)可以看出中间多一个extended,formatted的信息会罗列更多的具体信息,可以看出formatted得到的信息最详细。
表中
12、表的管理
我们创建的表都是所谓的管理表,有时也被称为内部表。有时候我们希望管理一些外部的表(比如pig创建的表),但
是我们并没有给予hive,对数据的所有权,此时,我们可以创建一个外部表指向这份数据,這样不用对这个外部表具有
所有权限,就可以执行查询
创建一个外部表
hive> create external table stocks( > symbol string, > ymd string, > price_open float, > price_high float, > price_low float, > price_close float, > volume int, > price_adj_close float) > row format delimited fields terminated by ',' > location '/data/stock'; OK Time taken: 0.734 seconds分析:一个外部表在table前面有external修饰,后面的location就是说明这个数据放在哪里
13、修改一个表的名字
把employee名字改为empl
hive> alter table employee rename to empl; OK Time taken: 5.14 seconds
hive> alter table employees add if not exists > partition(year = 2011,month = 1,day = 1) location '/logs/2011/01/01';
alter table employees drop if exists partition(year=2011,month=12,day=2);
----------------------------------------------------------------------------------------------- hive> describe formatted sample.employees; OK # col_name data_type comment name string employee name salary float employee salary subordinates array<string> name of subordinates deductions map<string,float> key are deduction names,value are percentages address struct<Street:string,City:string,State:string,Zip:int> -------------------------------------------------------------------------------------------------执行:
alter table employees add columns ( school string comment 'school infomation' )增加的部分放在最后一行
-------------------------------------------------------------------------------------------------- hive> describe formatted sample.employees; OK # col_name data_type comment name string employee name salary float employee salary subordinates array<string> name of subordinates deductions map<string,float> key are deduction names,value are percentages address struct<Street:string,City:string,State:string,Zip:int> school string school infomation --------------------------------------------------------------------------------------------------17、删除或者替换列
alter table employee1 replace columns( school sting comment 'school infomation', name string comment '-------'. age int comment ); ------------------------------------------ hive> describe formatted employee1; OK # col_name data_type comment school string school information name string ------------- age int -----------------------------------18、修改表的属性
alter table employee1 set tblproperties( 'note'='the process is add property!!!' )note:可以增加和修改表的属性,但是无法删除属性