hadoop是为了存储数据和计算而推广的技术,而和数据挂钩的也就属于数据库的领域了,所以hadoop和DBA挂钩也就是情理之中的事情,在这个基础之上,我们就需要为了DBA创作适合的技术。
hive正是实现了这个,hive是要类SQL语句(HiveQL)来实现对hadoop下的数据管理
官网下载mysql-server(yum安装)
wget http://dev.mysql.com/get/mysql-community-release-el7-5.noarch.rpm
解压
rpm -ivh mysql-community-release-el7-5.noarch.rpm
安装
yum install mysql-community-server
重启mysql服务:
service mysqld restart
进入mysql
mysql -u root
为root用户设置密码
mysql> set password for 'root'@'localhost' =password('root');
远程连接设置:
把在所有数据库的所有表的所有权限赋值给位于所有IP地址的root用户:
mysql> grant all privileges on *.* to root@' %'identified by 'root';
mysql>flush privileges; 刷新权限
如果是新用户而不是root,则要先新建用户:
mysql>create user 'username'@' %' identified by 'password';
以hive-2.3.5为例子
通过wget下载hive-2.3.5
wget http://mirror.bit.edu.cn/apache/hive/hive-2.3.5/apache-hive-2.3.5-bin.tar.gz
将hive解压到/usr/local
tar -zxvf apache-hive-2.3.5-bin.tar.gz -C /usr/local/
将文件重命名为hive文件:
mv apache-hive-2.3.5-bin hive
在 vi /etc/profile末尾加
export HIVE_HOME=/usr/local/hive
export PATH=$PATH:$HIVE_HOME/bin
执行source /etc.profile
执行hive --version
有hive的版本显现,安装成功!
配置hive-env.sh
cp hive-env.sh.template hive-env.sh
修改Hadoop的安装路径
HADOOP_HOME=/opt/module /hadoop-2.3.5
修改Hive的conf目录的路径
export HIVE_CONF_DIR=/usr/local/hive/conf
配置hive-site.xml
cp hive- default.xml.template hive-site.xml
javax.jdo.option.ConnectionURL
jdbc:mysql://bigdata131:3306/hivedb?createDatabaseIfNotExist=true
JDBC connect string for a JDBC metastore
javax.jdo.option.ConnectionDriverName
com.mysql.jdbc.Driver
Driver class name for a JDBC metastore
javax.jdo.option.ConnectionUserName
root
username to use against metastore database
javax.jdo.option.ConnectionPassword
root
password to use against metastore database
启动Hadoop
初始化Metastore架构:schematool -dbType mysql -initSchema
启动Hive:hive
hive> 进入hive shell
建数据源文件并上传到hdfs的/user/input目录下
建数据源表t1:create table t1 (line string)
装载数据:load data inpath ‘/user/input’ overwrite into table t1;
编写HiveQL语句实现wordcount算法,建表wct1保存计算结果:
create table wct1 as select word, count(1) as count from (select explode (split (line, ' ')) as word from t1) w group by word order by word;
查看wordcount计算结果:
select * from wct1