1 启动hive出现
ls: 无法访问’/usr/local/spark/lib/spark-assembly-.jar’: 没有那个文件或目录
解决:
修改//bin/hive文件,将加载原来的lib/spark-assembly-.jar替换成jars/*.jar,就不会出现这样的问题。
hive使用MySQL作为元数据
2 mysql安装的简单方法 安装过程中直接设置用户和密码
sudo apt-get update -
sudo apt-get install mysql-server
3 修改用户名:
update mysql.user set authentication_string=PASSWORD(‘123456’), plugin=‘mysql_native_password’ where user=‘root’;
flush privileges;
搞定
4 创建新的mysql用户
GRANT USAGE ON . TO ‘hive’@‘localhost’ IDENTIFIED BY ‘123456’ WITH GRANT OPTION;
grant all privileges on hive.* to ‘hive’@‘localhost’ identified by ‘123456’;
flush privileges;
5 验证hive配置
schematool -dbType mysql -initSchema 出现
Error: Specified key was too long; max key length is 3072 bytes (state=42000,code=1071)
解决:删除hive的元数据库,重新创建一个新的数据库,原因是数据库的编码错误,为UTF-8
查看mysql编码
show variables like '%char%';
drop hive;
create database hive character set latin1;
#创建表并设置编码
grant all on *.* to hive@localhost identified by '123456';
#将所有数据库的所有表的所有权限赋给hive用户,
后面的hive是配置hive-site.xml中配置的连接密码
flush privileges; #刷新mysql系统权限关系表
6 用spark应用创建一张hive表后,在通过hive shell来操作hive时报如下错;
启动hive --service metastore 会出现的错误
MetaException(message:Hive Schema version 2.3.0 does not match metastore’s schema version 1.2.0
原因:
spark应用创建表时,指定的schema版本为1.2.0,而hive的schema版本为2.3.0,版本不兼容导致
解决办法:
临时解决:
在mysql(假定metastore的数据库为mysql)中,切换到hive库,执行如下命令
UPDATE VERSION SET SCHEMA_VERSION=‘2.3.0’, VERSION_COMMENT=‘fix conflict’ where VER_ID=1;
解决方案2:在hvie-site.xml中将hive.metastore.schema.verification参数设置为false
谨记几点:
1 配置site文件需要正确
2 hiveserver2是一个服务,不是命令,所以会一直启动
3 dbeave 连接数据库的用户名为Hadoop,登陆时的用户名
4 hive版本要求 2.3.4 之前用的1.2.1启动hiveserver失败
hive-size.xml的配置文件如下:
javax.jdo.option.ConnectionURL
jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&useSSL=false
JDBC connect string for a JDBC metastore
javax.jdo.option.ConnectionDriverName
com.mysql.jdbc.Driver
Driver class name for a JDBC metastore
javax.jdo.option.ConnectionUserName
hive
username to use against metastore database
javax.jdo.option.ConnectionPassword
123456
password to use against metastore database
hive.metastore.local
true
hive.server2.logging.opreation.log.location
/tmp/hive/operation_logs
hive.server2.long.polling.timeout
5000
Time in milliseconds that HiveServer2 will wait, before responding to asynchronous calls that use long polling
hive.server2.thrift.port
10000
Port number of HiveServer2 Thrift interface when hive.server2.transport.mode is 'binary'.
hive.server2.thrift.bind.host
192.168.147.159
hive.metastore.schema.verification
false
启动命令
1 start-all.sh
2 service mysql start
3 hdfs dfsadmin -safemode leave
4 hiveserver2
5 dbeave(hive数据库可视化工具)
为什么用MySQL做元数据
因为Hive默认的metadata(元数据)是存储在Derby里面的,但是有一个弊端就是同一时间只能有一个Hive实例访问,
这适合做开发程序时做本地测试。所以用MySQL可以实现多个用户连接的目的
以上信息来源
Ubuntu安装hive,并配置mysql作为元数据库
http://dblab.xmu.edu.cn/blog/install-hive/
hive报错: Specified key was too long; max key length is 767 bytes(详解!!!)
https://blog.csdn.net/lsr40/article/details/79422718#commentBox
[Hive]那些年我们踩过的Hive坑 大部分错误里面都有
https://blog.csdn.net/sunnyyoona/article/details/51648871
Hive的常用HiveQL操作
http://dblab.xmu.edu.cn/blog/hive-in-practice/#more-509
删除数据库
drop database if exists dbrecruitment cascade;
删除表
DROP TABLE IF EXISTS table_name;
创建数据库
create database if not exists dbrecruitment;
建表
CREATE TABLE student
(
id
int, name
string, gender
string, age
int)
ROW FORMAT SERDE
‘org.apache.hadoop.hive.serde2.OpenCSVSerde’
WITH SERDEPROPERTIES (
‘separatorChar’=’ ‘)
STORED AS textfile;
导入数据
use dbrecuitment;
load data local inpath ‘/home/hadoop/from_window/jobs1.csv’ into table jobs1;
去空
create table job2_clean as SELECT * from jobs2 WHERE job_info!=’\N’AND job_info is not null AND LENGTH(job_info)>0
AND job_name!=’\N’AND job_name is not null AND LENGTH(job_name)>0 AND job_name NOT LIKE “%实习%”