使用真实集群,三台腾讯云服务器,不是伪集群
Hive和Hbase的版本兼容性
看这里
目前只需要在一台主机上进行配置即可,这里使用MySQL
作为元数据库
参考链接点这里
查看MySQL是否安装
yum list installed mysql*
rpm -qa | grep -i mysql
rm -rf /usr/lib/mysql
rm -rf /usr/share/mysql
rm –rf /usr/my.cnf
rm -rf /root/.mysql_sercret
chkconfig --list | grep -i mysql
chkconfig --del mysqld
cd /usr/local/src
wget http://repo.mysql.com/mysql57-community-release-el7-8.noarch.rpm
rpm -ivh mysql57-community-release-el7-8.noarch.rpm
yum -y install mysql-server
rpm -qa | grep mysql
应该列出几个mysql相关的组件名称
service mysqld restart
grep "password" /var/log/mysqld.log
找到一行A temporary password is generated for root@localhost:xxxx
,xxxx即为临时密码
使用临时密码登录来更换密码
mysql -u root -p
输入刚刚的临时密码进行登录,此时虽然登录了,但是不能进行任何操作
,此处的新密码要求较高,未达到要求的密码将会被驳回。
ALTER USER 'root'@'localhost' identified by 'new_password';
exit;
mysql -u root -p
flush privileges;
grant all on *.* to 'root'@'%' identified by 'new_password' with grant option;
mysql -u root -p
set password for 'root'@'localhost' = password('new_password');
修改完后记得刷新权限并赋予远程登录能力。
使用SQLyog工具测试远程登录数据库,可以看到数据库和表都能正常的显示。(SQLyog)是试用版
hadoop fs -ls /
hadoop fs -mkdir /hive
hadoop fs -mkdir /hive/warehouse
hadoop fs -chmod 777 /hive
hadoop fs -chmod 777 /hive/warehouse
复制hive-default.xml.template为hive-site.xml
修改或添加:
hive.metastore.warehouse.dir
/root/hive/warehouse
hive.exec.scratchdir
/root/hive
hive.metastore.uris
javax.jdo.option.ConnectionURL
jdbc:mysql://master:3306/hive?createDatabaseIfNotExist=true
javax.jdo.option.ConnectionDriverName
com.mysql.jdbc.Driver
javax.jdo.option.ConnectionUserName
root
javax.jdo.option.ConnectionPassword
123456
hive.metastore.schema.verification
false
注意vim可以使用快捷命令 '/' 来快速搜索定位。
注意javax.jdo.option.ConnectionURL这一项中的master是主机名称。
此外,将配置文件中所有
的${system:java.io.tmpdir}
替换成/opt/hive/tmp
,将所有的${system:user.name}
替换成root
。
从hive-env.sh.template复制过来。
根据自己的版本
添加:
export HADOOP_HOME=/opt/hadoop/hadoop2.8
export HIVE_CONF_DIR=/opt/hive/hive1.2/conf
export HIVE_AUX_JARS_PATH=/opt/hive/hive1.2/lib
向hive1.2/lib中上传mysql驱动包。mysql-connector-java-5.1.28.jar
第一次启动时
,切换到bin目录,执行
schematool -initSchema -dbType mysql
之后可以进入hive命令行进行操作
使用正确的mysql-connector的jar包,确保所有的路径都配置正确,hadoop上的文件夹要事先建立好
配置完成之后,无论在哪里新建数据库或者表,都会在mysql中进行,不会像之前那样,在哪里执行hive就会在哪里建立新的数据库。
sql开启关闭操作参考这个链接
create table table_name(id int, name string, storage string, price double)(建立新的表)
row format delimited
fields termiated by ‘\t’(指定按照什么分隔符分割文本得到对应的表的键值,这里是tab,还可以是,)
stored as sequencefile(表对应的文件的存储格式,这里是sequencefile,还有textfile
load data local inpath ‘path to filename’ into table tablename(这里是本地文件,也可以是hdfs文件)
* select * from tablename;
![](http://pbpkien9l.bkt.clouddn.com/18-7-17/8877229.jpg)
* select count(*) from tablename;
耗时操作没跑出来。
直接用hadoop fs -put filepath/file tablepath,相当于直接将表数据当作文件上传到hdfs文件,hive照样可以读取表数据
如果上传的数据不能和表的键相对应,那么会出现什么情况缺少的键值会被当做null,多余的键值将会被舍弃
之前的数据导入都是将文件放到hive/warehouse里面来进行数据的导入,如果数据不再hive的工作目录下呢?
上图说明,uuu.data现在是存放在/根目录下
hive面对外部数据仍然能够导入数据,但是这个时候hdfs的根目录下面的uuu.data已经没有了,而是自动地被剪切到hive/warehouse对应的表文件夹下。
这样带来的问题是,数据会被强行地挪位置,对于一些依赖该数据路径工作的代码而言,是致命的错误所以hive支持external table
建立external表并指定路径,在hdfs上可以查看原来的文件位置并没有改变,hive的工作目录里也没有添加新的数据drop内部表的时候,整个表的文件都没有了,元数据也删了,但是drop外部表的时候,表内容仍然有,但是表没有了,类似于hive里的外部表只是一个链接
创建出来的表会按照创建时候的命令自动索引数据并存储
set hive.cli.print.header=true;
CREATE TABLE page_view(viewTime INT, userid BIGINT,
page_url STRING, referrer_url STRING,
ip STRING COMMENT 'IP Address of the User')
COMMENT 'This is the page view table'
PARTITIONED BY(dt STRING, country STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY '\001'
STORED AS SEQUENCEFILE; TEXTFILE
//sequencefile
create table tab_ip_seq(id int,name string,ip string,country string)
row format delimited
fields terminated by ','
stored as sequencefile;
insert overwrite table tab_ip_seq select * from tab_ext;
//create & load
create table tab_ip(id int,name string,ip string,country string)
row format delimited
fields terminated by ','
stored as textfile;
load data local inpath '/home/hadoop/ip.txt' into table tab_ext;
//external
CREATE EXTERNAL TABLE tab_ip_ext(id int, name string,
ip STRING,
country STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
STORED AS TEXTFILE
LOCATION '/external/hive';
// CTAS 用于创建一些临时表存储中间结果
CREATE TABLE tab_ip_ctas
AS
SELECT id new_id, name new_name, ip new_ip,country new_country
FROM tab_ip_ext
SORT BY new_id;
//insert from select 用于向临时表中追加中间结果数据
create table tab_ip_like like tab_ip;
insert overwrite table tab_ip_like
select * from tab_ip;
//CLUSTER <--相对高级一点,你可以放在有精力的时候才去学习>
create table tab_ip_cluster(id int,name string,ip string,country string)
clustered by(id) into 3 buckets;
load data local inpath '/home/hadoop/ip.txt' overwrite into table tab_ip_cluster;
set hive.enforce.bucketing=true;
insert into table tab_ip_cluster select * from tab_ip;
select * from tab_ip_cluster tablesample(bucket 2 out of 3 on id);
//PARTITION
create table tab_ip_part(id int,name string,ip string,country string)
partitioned by (part_flag string)
row format delimited fields terminated by ',';
load data local inpath '/home/hadoop/ip.txt' overwrite into table tab_ip_part
partition(part_flag='part1');
load data local inpath '/home/hadoop/ip_part2.txt' overwrite into table tab_ip_part
partition(part_flag='part2');
select * from tab_ip_part;
select * from tab_ip_part where part_flag='part2';
select count(*) from tab_ip_part where part_flag='part2';
alter table tab_ip change id id_alter string;
ALTER TABLE tab_cts ADD PARTITION (partCol = 'dt') location '/external/hive/dt';
show partitions tab_ip_part;
//write to hdfs
insert overwrite local directory '/home/hadoop/hivetemp/test.txt' select * from tab_ip_part where part_flag='part1';
insert overwrite directory '/hiveout.txt' select * from tab_ip_part where part_flag='part1';
//array
create table tab_array(a array<int>,b array<string>)
row format delimited
fields terminated by '\t'
collection items terminated by ',';
示例数据
tobenbrone,laihama,woshishui 13866987898,13287654321
abc,iloveyou,itcast 13866987898,13287654321
select a[0] from tab_array;
select * from tab_array where array_contains(b,'word');
insert into table tab_array select array(0),array(name,ip) from tab_ext t;
//map
create table tab_map(name string,info map<string,string>)
row format delimited
fields terminated by '\t'
collection items terminated by ';'
map keys terminated by ':';
示例数据:
fengjie age:18;size:36A;addr:usa
furong age:28;size:39C;addr:beijing;weight:180KG
load data local inpath '/home/hadoop/hivetemp/tab_map.txt' overwrite into table tab_map;
insert into table tab_map select name,map('name',name,'ip',ip) from tab_ext;
//struct
create table tab_struct(name string,info structint,tel:string,addr:string>)
row format delimited
fields terminated by '\t'
collection items terminated by ','
load data local inpath '/home/hadoop/hivetemp/tab_st.txt' overwrite into table tab_struct;
insert into table tab_struct select name,named_struct('age',id,'tel',name,'addr',country) from tab_ext;
//cli shell
hive -S -e 'select country,count(*) from tab_ext' > /home/hadoop/hivetemp/e.txt
有了这种执行机制,就使得我们可以利用脚本语言(bash shell,python)进行hql语句的批量执行
select * from tab_ext sort by id desc limit 5;
select a.ip,b.book from tab_ext a join tab_ip_book b on(a.name=b.name);
//UDF
select if(id=1,first,no-first),name from tab_ext;
hive>add jar /home/hadoop/myudf.jar;
hive>CREATE TEMPORARY FUNCTION my_lower AS 'org.dht.Lower';
select my_upper(name) from tab_ext;
1
1