简介
hive是一个客户端, 也可以当作一个软件, 它可以将hql(类似于sql)语句转化为mapreduce算法执行, 得到需要的结果.
原理就是将hadoop文件系统中的一定格式的文件的解析思路保存到mysql(或者其他数据库)中, 这样就可以从数据库取解析方法去操作分布式文件系统的文件了!
环境准备
1. 3台centOS 6.5
关闭防火墙
安装jdk
配置host ( zk1, zk2, zk3)
配置免密钥ssh (包括自己链接自己)
2. mysql一台(主机名mysql)
允许远程连接
给与数据库权限
安装运行hadoop
1. 配置hadoop
解压
mkdir -p /opt/modules/cdh/
tar -zxvf hadoop-2.5.0-cdh5.3.6.tar.gz -C /opt/modules/cdh
cd /opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/etc/hadoop
修改配置文件
- 将
core-site.xml
hdfs-site.xml
yarn-site.xml
mapred-site.xml
hadoop.env.sh
yarn-env.sh
mapred-env.sh
去掉后缀.template
- 在
hadoop.env.sh
yarn-env.sh
mapred-env.sh
添加JAVA_HOME的变量
- core-site.xml
fs.defaultFS
hdfs://zk1:8020
hadoop.tmp.dir
/opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/data
hdfs-site.xml
dfs.replication
3
dfs.permissions.enable
false
dfs.namenode.secondary.http-address
zk3:50090
dfs.namenode.http-address
zk1:50070
dfs.webhdfs.enabled
true
- yarn-site.xml
yarn.nodemanager.aux-services
mapreduce_shuffle
yarn.resourcemanager.hostname
zk2
yarn.log-aggregation-enable
true
yarn.log-aggregation.retain-seconds
86400
yarn.log.server.url
http://zk1:19888/jobhistory/logs/
- mapred-site.xml
mapreduce.framework.name
yarn
mapreduce.jobhistory.adress
zk1:10020
mapreduce.jobhistory.webapp.adress
zk1:19888
添加slave
文件 (etc/hadoop/目录下)
vi slave
添加
zk1
zk2
zk3
配置完成后scp
到其他两台主机上
scp -r /opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/etc/hadoop root@zk2:/opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/etc/
scp -r /opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/etc/hadoop root@zk3:/opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/etc/
在namenode机器(zk1)执行格式化namenode
/opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/bin/hdfs namenode -format
2. 启动hadoop
启动namenode (zk1)
/opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/sbin/hadoop-daemon.sh start namenode
启动secondarynamenode (zk3)
/opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/sbin/hadoop-daemon.sh start secondarynamenode
启动datanode (zk1, zk2, zk3)
/opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/sbin/hadoop-daemon.sh start datanode
启动resourcemanager (zk2)
/opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/sbin/yarn-daemon.sh start resourcemanager
启动nodemanager (zk1, zk2, zk3)
/opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/sbin/yarn-daemon.sh start nodemanager
启动historyserver (zk1)
/opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/sbin/mr-jobhistory-daemon.sh start historyserver
验证是否完成启动, 浏览器访问http://zk1:50070
安装运行hive
安装hive
解压tarbao
tar -zxvf hive-0.13.1-cdh5.3.6.tar.gz -C /opt/modules/cdh/
修改配置文件
- 重命名配置文件
mv hive-default.xml.template hive-site.xml
mv hive-env.sh.template hive-env.sh
mv hive-log4j.properties.template hive-log4j.properties
- hive-env.sh
JAVA_HOME=/usr/local/jdk
HADOOP_HOME=/opt/modules/cdh/hadoop-2.5.0-cdh5.3.6/
export HIVE_CONF_DIR=/opt/modules/cdh/hive-0.13.1-cdh5.3.6/conf
- hive-site.xml (修改不是添加)
javax.jdo.option.ConnectionURL
jdbc:mysql://mysql:3306/metastore?createDatabaseIfNotExist=true
JDBC connect string for a JDBC metastore
javax.jdo.option.ConnectionDriverName
com.mysql.jdbc.Driver
Driver class name for a JDBC metastore
javax.jdo.option.ConnectionUserName
root
username to use against metastore database
javax.jdo.option.ConnectionPassword
123123
password to use against metastore database
- hive-log4j.properties
hive.log.dir=/opt/modules/cdh/hive-0.13.1-cdh5.3.6/logs
拷贝jdbc驱动到lib目录下
cp -a mysql-connector-java-5.1.27-bin.jar /opt/modules/cdh/hive-0.13.1-cdh5.3.6/lib/
运行hive
/opt/modules/cdh/hive-0.13.1-cdh5.3.6/bin/hive