这次来讲讲Hive小白常见的坑。
首先成功搭建Hadoop集群 https://blog.csdn.net/H_crab/article/details/79673885
一、Hive
hive的版本号要求是很高的。对以后的spark on hive 和hive on spark 都有关系
所以请严格参照cloudera的CDH版本对照表,大神无视啦~~~
并不是越新就肯定越好,具体看公司上层的决定最好能所有集群同步一个版本。
1 hive是什么?
简单点来说,hive就是建立在hadoop生态圈上的仓库工具,本质上与mysql等数据库不同,并不支持数据单个删改(小量的更改删除也没有问题的)只是比较麻烦。
hive 存储数据仍是在hadoop上的,hive默认元数据是存储在derby的
应该给hive配上mysql存储元数据
很简单的检测方法 开两个链接服务器的窗口 如果可以同时开启 则是mysql 如果只能开启一个第二个报错则是derby
先安装MySQL
yum service mysql
yum问题一般来说都是可以的。。如果yum报错请先检查yum所对应的python版本对不对
如果yum 还是用不了
yum clean all
rm -f /var/lib/rpm/__db*
rpm –rebuilddb
yum update
删了重新装+升级
mysql问题
MySQL中mysql.sock找不到的解决方法
https://blog.csdn.net/u012346692/article/details/52329553
https://blog.csdn.net/FrankieHello/article/details/78304092
mysql死锁问题
https://www.cnblogs.com/sivkun/p/7518540.html
这个问题
还有安装MySQL找不到资源的问题
yum list mysql* //查看有没有装过
yum install mysql-server
yum install mysql-devel
没有yum源
http://repo.mysql.com/mysql-community-release-el7-5.noarch.rpm
rpm -ivh mysql-community-release-el7-5.noarch.rpm
yum install mysql-server
service mysqld start
成功安装后 service mysqld start 开启mysql服务
grant all privileges on *.* to 'hive'@'%' identified by 'hive' with grant option;
第一个hive是用户名 第二个hive是密码这是给hive 登陆mysql存储数据用的
flush privileges;
刷新mysql的权限
service mysqld restart
重启
然后再配置hive 的配置文件
hive-site.xml
hive.metastore.local
true
hive.metastore.warehouse.dir
/home/hive/warehouse
hive.exec.scratchdir
/home/hive/scratchdir
hive.querylog.location
/home/hive/logs
javax.jdo.option.ConnectionURL
jdbc:mysql://主机1:3306/hive?createDatabaseIfNotExist=true
javax.jdo.option.ConnectionDriverName
com.mysql.jdbc.Driver
javax.jdo.option.ConnectionUserName
hive
javax.jdo.option.ConnectionPassword
hive
hive.cli.print.header
true
hive.cli.print.current.db
true
hive.fetch.task.conversion
more
Expects one of [none, minimal, more].
Some select queries can be converted to single FETCH task minimizing latency.
Currently the query should be single sourced not having any subquery and should not have
any aggregations or distincts (which incurs RS), lateral views and joins.
0. none : disable hive.fetch.task.conversion
1. minimal : SELECT STAR, FILTER on partition columns, LIMIT only
2. more : SELECT, FILTER, LIMIT only (support TABLESAMPLE and virtual columns)
hive.auto.convert.join.noconditionaltask.size
10000000
If hive.auto.convert.join.noconditionaltask is off, this parameter does not take affect. However, if it
is on, and the sum of size for n-1 of the tables/partitions for a n-way join is smaller than this size, the join is directly
converted to a mapjoin(there is no conditional task). The default is 10MB
hive.cluster.delegation.token.store.class
org.apache.hadoop.hive.thrift.ZooKeeperTokenStore
hive.server2.thrift.bind.host
主机1
Bind host on which to run the HiveServer2 Thrift service.
hive.server2.thrift.port
10000
Port number of HiveServer2 Thrift interface when hive.server2.transport.mode is 'binary'.
hive.server2.thrift.min.worker.threads
5
Minimum number of Thrift worker threads
hive.server2.thrift.max.worker.threads
500
Maximum number of Thrift worker threads
hive.server2.authentication
NONE
hive.server2.transport.mode
binary
hive.server2.thrift.http.port
10001
hive.exec.reducers.bytes.per.reducer
1000000000
size per reducer.The default is 1G, i.e if the input size is 10G, it will use 10 reducers.
hive.enable.spark.execution.engine
true
hive-env.sh
export JAVA_HOME=/home/jdk1.8.0_161
# Set HADOOP_HOME to point to a specific hadoop install directory
HADOOP_HOME=/home/hadoop
# Hive Configuration Directory can be controlled by:这是配置hive-site的配置文件在那个目录下我是丢到hadoop的配置中了
export HIVE_CONF_DIR=/home/hadoop/etc/hadoop
配置好以上之后 接下来记得开启
bin/hive --service metastore &
&这个符号代表后台执行。。执行后打JPS可以看到一个RUNJAR进程这个就是了
!!如果不开 则启动hive卡住不动 或者报错
直接打hive就可以进入hiveshell操作了
千万千万千万不要删除mysql 的文件 因为hive 的元数据存储再mysql中删了后果很严重