一、基本使用

1、进入hive

使用xshell远程登陆，进入linux系统。任意位置输入hive即可。

任意位置输入hive

2、进入hive数据库进行查询操作

查看数据库

"show databases;"

（ctrl+l清屏）

"use wt;";"show tables;"

"select * from a"

set hive.cli.print.current.db=true; #意思是显示当前数据库

括号里显示的是当前库

3、使用shell脚本进行查询

方法一：hive -e执行sql语句； hive -v -e展示log信息

适用于当想把sql语句自动化执行时，就可以写成shell脚本的方法，然后通过调度平台调度shell脚本即可，是最常用的一种方法。可以做逻辑判断，利用变量，输入不同得到不同的输出。

创建一个.sh文件用于编写查询hive数据库的shell脚本

与进入hive数据库不同，查询完结果后，回到了linux系统

可以看到，在.sh文件中，如果再添加一个hive -v -e，则会显示出执行的sql语句

方法二：hive -f 执行sql脚本

适用于执行大规模的sql语句，当时相比较hive -e的方法，hive -f不能使用shell变量没那么灵活，因此不是很常用。

image.png

方法三：hive -i，执行配置参数，但会进入hive窗口。

frog3.conf中的语句如下

执行conf文件

二、sql语句

1、创建数据库

create database frog_db;
drop database frog_db;
drop database frog_db cascade; #强制删除非空数据库，慎用！！

注意一定执行了语句才算创建了数据库。

2、创建数据表

create table student(
id int comment 'id',
name string comment '姓名',
score decimal(30,6) comment '成绩') ,
stored as textfile;

comment表示注释内容
最后一行指定表数据的存储格式为textfile,存储到hdfs路径里面去。默认的存储方式也是textfile，这一行不写也可以。

3、查看hive数据表

查看表结构

查看创建表的语句

这一行表示hdfs存储路径，hdfs路径下的文件需要hadoop命令才看得到，和linux并不是一回事

自己指定建表格式：方便导入数据，用逗号分隔符分隔数据

use frog_db;
drop table student;
create table student(
  id int comment '识别码',
  name string comment '姓名',
  score string comment '成绩')
row format delimited fields terminated by ','
lines terminated by '\n' 
stored as textfile;

image.png

4、导入数据(从本地)：

image.png

load data local inpath '/home/froghd/student.txt' into table student;
load data local inpath '/home/froghd/student.txt' overwrite into table student;

第二行的意思是覆盖掉student表中原有数据。

注意（非常重要易混淆）：从本地系统导入数据和从hdfs文件系统导入数据用的命令都是load data，但是从本地系统导入数据要加local关键字，如果不加则是从hdfs文件系统导入数据。

参考教程：一起学Hive——详解四种导入数据的方式

注意：写数据前要在linux中查看数据路径。

ls和pwd分别确定目标文件和所在的目录

最好在查看一下目标文件中是否有数据

5、查看hdfs中的文件

linux中使用hadoop语句

在查看student表的创建语句中可以看到一个hfds路径，我们可以在linux中使用hadoop语句查看

hadoop语句：'hadoop fs -ls hdfs具体路径 '

hdfs中的命令执行语句：
hadoop fs -ls 路径
hadoop fs -mkdir 路径
……（其他的类推）

可以看到，/user/下游五个子目录

可以在 /user/tmo 目录下面创建文件或文件夹

hive中使用dfs语句
在hive中查看hdfs文件：

image.png

hive中的命令执行语句：
dfs -ls hdfs:具体路径

6、在linux中下载hdfs路径下的文件

hdfs fs -get

7、在linux中删除hdfs路径下的文件

hadoop fs -rm

以此类推，可以查看hdfs下面的student.txt文件

hadoop fs -cat student.txt hdfs路径

注意：使用hadoop命令删除（-rm）和上传（-put）的过程中，hive不需要重新在使用下列代码，即可恢复hive数据库中的数据。

load data local inpath '/home/froghd/student.txt' into table student;
load data local inpath '/home/froghd/student.txt' overwrite into table student;

8、内部表

如果我们不指定 location 那么创建的表就是内部表，如果指定了 location呢，那就是外部表。其实就是把数据从 linux 上放到了 hdfs 路径上

use frog_db;
drop table student;
create external table student(
  id int comment '识别码',
  name string comment '姓名')
row format delimited fields terminated by ','
lines terminated by '\n' 
stored as textfile
location '/tmp/student';

删除表时，内部表中的数据和元数据将会被同时删除，而外部表只删除元数据，不删除数据。

9.分区表

在表目录中为数据文件创建分区子目录，以便于在查询时，MR 程序可以针对分区子目录中的数据进行处理，缩减读取数据的范围（不然就要全部读取）。

比如：网站每天产生的浏览记录，浏览记录应该建一个表来存放，但是，有时候，我们可能只需要对某一天的浏览记录进行分析，这时，就可以将这个表建为分区表，每天的数据导入其中的一个分区，当然，每日的分区目录，应该有一个目录名（分区字段）。

# partitioned by（day string）就是分区的依据
use frog_db;
drop table pv_log;
create table pv_log(
  ip string,
  url string,
  visit_time string)
partitioned by(day string)
row format delimited fields terminated by ',';
#建表语句，只会建表目录，分区的目录是在放数据的时候建立，先建表再放数据

数据

27.38.32.58,http://www.baidu.com.cn,2006-12-13 12:34:16
27.38.32.59,http://www.baidu.com.cn,2011-08-13 08:37:16
27.38.32.60,http://www.baidu.com.cn,2006-12-13 12:24:16
27.38.32.61,http://www.baidu.com.cn,2016-06-13 06:34:16
27.38.32.54,http://www.baidu.com.cn,2012-12-15 12:34:16
27.38.32.55,http://www.baidu.com.cn,2009-08-13 09:24:16
27.38.32.57,http://www.baidu.com.cn,2005-12-13 12:14:16
27.38.32.50,http://www.baidu.com.cn,2003-07-16 10:04:16
27.38.32.52,http://www.baidu.com.cn,2007-12-13 12:34:16

建立分区表

查看分区表字典

load data local path '/home/frogdata005/lee1/pv.log' into table pv_log partition (day=20150120)
#此处day=是随便取名的，根据取消为分区表取名

在hive中导入数据

查看分区表数据

显示分区数据

show partitions pv_log;

显示分区数据

10、cats建表语法

# 创建一个和 table1 一样字段的 table2 表
create table table2 like table1

image.png

创建一个表类似与已有表，不仅字段一样而且还带有数据（不会有分区），查出来是什么字段名新表就是什么字段名

create table pv_log2
as 
select * from pv_log where visit_time>'2006-12-13';

这种查询的方式创建表，并不会形成分区

三、hive的常用函数

1、case when

和mysql中的用法一致，根据when后面的逻辑判断，给记录打上then后面的标签。

select 
  id, 
  name,
  case when score<=60 then '不及格'
       when score>60 and score <=80 then '良好'
       else '优秀' end as grade
from student;

2、if 语句，类似一个三元表达式

同样也是类似一个打标签的操作，根据if第一个参数的逻辑判断，如果是就返回第二个位置参数'pass'，否则返回第三个位置参数'fail'。

select id，if (score>=60,'pass','fail') from student;
# 如果分数大于60，就返回 pass，不然就返回 fail

3、nvl函数：空值转换函数

函数形式：nvl（expr1，expr2）
适用于数字型、字符型和日期型，但是expr1和expr2的数据类型必须为相同类型。
作用：将查询为Null值转换为指定值

# 查询插入一条 null 值数据
insert into student 
select
  5,
  'frog',
  null;

或者常规写法：

insert into table student values(5,'lisi','null');

使用nvl（expr1，expr2）

# 如果 score 字段为空，就返回  0 ，不为空就是 score 本身
select score,nvl(score,0) from student;

更多函数

四、窗口函数

1、row_number() over()
HIVESQL中ROW_NUMBER() OVER语法以及示例

row_number() OVER (PARTITION BY COL1 ORDERBY COL2)表示:
根据COL1分组，在分组内部根据COL2排序，而此函数计算的值就表示每组内部排序后的顺序编号（该编号在组内是连续并且唯一的)。

eg：有如下数据，要查出每种性别中年龄最大的2条数据

在linux中创建数据文件（.txt）

1,18,a,male
2,19,a,male
3,22,a,female
4,16,b,female
5,30,b,male
6,26,b,female

创建表(hive)

use frog_db;
drop table userinfo;
create table userinfo(
  id string,
  age int,
  title string,
  sex string)
row format delimited fields terminated by ','
lines terminated by '\n' 
stored as textfile;

导入数据(hive)

load data local inpath '/home/xxx/xxx.txt' into table userinfo;

select
  id,
  age,
  title,
  sex,
  row_number() over(partition by sex order by age desc) as rn
from userinfo;

image.png

select * 
from(select
  id,
  age,
  title,
  sex,
  row_number() over(partition by sex order by age desc) as rn
from userinfo) as a
where a.rn<3;

控制排序组的数量，前两个数据输出

2、sum() over(partition by column1 order by column2)
作用：计算累加值。

linux中使用vi命令存储数据到.txtw文件中

A,2012-01,1000
A,2012-02,2030
A,2012-03,3600
A,2012-04,6008
A,2012-05,3000
B,2012-01,2000
B,2012-02,2300
B,2012-03,1800
B,2012-04,2000
B,2012-05,1300
B,2012-06,1600
B,2012-07,5000
C,2012-01,1020
C,2012-02,2000
C,2012-03,3200
C,2012-04,6000
C,2012-05,5300
C,2012-06,8800
C,2012-07,9000

hive中

use frog_db;
drop table saleinfo;
create table saleinfo(
  product_name string,
  month string,
  money string)
row format delimited fields terminated by ','
lines terminated by '\n' 
stored as textfile;

在hive中导入linux中的本地数据

load data local inpath '/home/frog005/lee1/purchase_order.txt' overwrite into table saleinfo;

select
  product_name,
  month,
  money,
  sum(money) over(partition by product_name order by month) as all_money
from saleinfo;

输出结果

最后对比下分组求和的情况：

select product_name,sum(money) as sum_money from saleinfo group by product_name;

输出结果：

可见只是得到了分组的综合，并没有的得到函数的累加值。如果在mysql中实现累加需要借助局部变量

hive数据库

一、基本使用

1、进入hive

2、进入hive数据库进行查询操作

3、使用shell脚本进行查询

方法一：hive -e执行sql语句 ； hive -v -e展示log信息

方法二：hive -f 执行sql脚本

方法三：hive -i，执行配置参数，但会进入hive窗口。

二、sql语句

1、创建数据库

2、创建数据表

3、查看hive数据表

4、导入数据(从本地)：

5、查看hdfs中的文件

6、在linux中下载hdfs路径下的文件

7、在linux中删除hdfs路径下的文件

8、内部表

9.分区表

10、cats建表语法

三、hive的常用函数

1、case when

2、if 语句，类似一个三元表达式

3、nvl函数：空值转换函数

四、窗口函数

你可能感兴趣的:(hive数据库)

方法一：hive -e执行sql语句； hive -v -e展示log信息