软件版本
Hadoop版本:2.4.0
Hive版本:0.12.0
mysql版本: 5.1.73
1)
在mysql里创建hive用户,并赋予其足够权限
[root@master mysql]# mysql -u root -p
mysql> create user 'hive' identified by 'hive';
mysql> grant all privileges on *.* to 'hive' with grant option;
mysql> flush privileges;
2)
在mysql
创建hive数据库
[root@master mysql]# mysql -u hive -p
mysql> create database hive;
mysql> use hive;
mysql> show tables;
3)解压hive安装包
tar -xzvf hive-0.12.0-bin.tar.gz
[hadoop@master~]$ mv hive-0.12.0 hive
[hadoop@master~]$ cd hive
[hadoop@master~hive]$ ls
4)下载
连接
mysql
驱动:mysql-connector-java-5.1.24-bin.jar 并拷入hive home的lib下
[hadoop@master~]$ mv mysql-connector-java-5.1.24-bin.jar ./hive-0.12.0/lib
5)修改环境变量,加入Hive到PATH
vim /etc/profile
export HIVE_HOME=/home/hadoop/hive
export PATH=$PATH:$HIVE_HOME/bin
6)修改hive-env.sh,设置hadoop路径
[hadoop@master conf]$ cp hive-env.sh.template hive-env.sh
[hadoop@master conf]$ vim hive-env.sh
HADOOP_HOME=/home/hadoop/hadoop-2.4.0
7)拷贝hive-default.xml 并命名为 hive-site.xml,修改其中mysql的配置
[hadoop@master conf]$ cp hive-default.xml.template hive-site.xml
[hadoop@master conf]$ vim hive-site.xml
(文件较长,命令模式下用“/字符”命令搜索)
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive</value>
<description>password to use against metastore database</description>
</property>
8)启动Hadoop,打开hive shell 测试
先启动hadoop集群,然后hive的bin目录执行命令,进入hive控制台
[hadoop@master bin]$ ./hive
创建表1-测试
create table test(id int,name string);
查看数据表
show tables;
增加一列
alter table test add columns(remark string comment 'some remark');
加载数据
load data local inpath '/home/hadoop/file/test.csv' overwrite into table test;
查看表,很多语法和SQL一样
select * from test;
在hadoop中查看文件(根据设置目录不同)
hadoop fs -ls /home/hadoop/hive/warehouse/test
创建表2-指定数据格式
create table user_info (id int, name string, age string)
row format delimited
fields terminated by '\t'
lines terminated by '\n';
导入数据表的数据格式是:字段之间是tab键分割,行之间是换行。
即文件内容格式如下(user_info.txt):
1001 jack 30
1002 tom 25
1003 kate 20
(ps:我发现把这个文本直接复制到linux上文件保存后\t变空格了,导致后面导入hive后查看不到内容,所以复制的话最好检查下)
导入文件
load data local inpath '/home/hadoop/file/user_info.txt' overwrite into table user_info;
创建表3-CSV格式
create table product (id string, name string, remark string)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES ('input.regex' = '\"(.*)\",\"(.*)\",\"(.*)\"','output.format.string' = '%1$s\\001%2$s\\001%3$s')
STORED AS TEXTFILE;
文件内容格式(prd.csv每个字段都必须有双引号和逗号):
"1001","iWatch","new"
"1002","iPhone6 Plus","128G"
"1003","Macbook Air","test"
导入文件
load data local inpath '/home/hadoop/file/prd.csv' overwrite into table product ;
《完》