参考陈凯地址,完成相关50个案例
https://www.jianshu.com/nb/11486691
Hive上课笔记
1.创建一张psn1表,演示如何将File文件中的数据加载到Hive表中
数据格式 : 1,xiaoming,tv-read-game,shanghai:pudong-beijing:chaoyang
CREATE TABLE IF NOT EXISTS PSN1
(
ID INT,
NAME STRING,
HOBBY ARRAY
ADDRESS MAP
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
COLLECTION ITEMS TERMINATED BY '-'
MAP KEYS TERMINATED BY ':';
2.将数据加载到hive表中
hive> load data local inpath '/home/hadoop/hive/data' into table psn1;
3.外部表、内部表、临时表
内部表:管理元数据和HDFS上的数据
如果drop table,hfs上的数据也会被删除
外部表:只管理元数据信息,drop表后,hdfs上的数据不会被删除
关键词是external
需要配置location的信息,数据文件会存储在配置的路径
4.DDL&DML
DDL基础数据结构和sql很像
DML
loading files into tables;
inserting data into hive tables from queries
5.今天要学的内容
### HiveServer2 Client and Beeline
HiveServer2
Beeline
### Operators and UDFs
Operators操作符
UDFs:User-Defined Functions
###Hive 所有知识点
#HiveServer2 Client and Beeline
执行beeline之前要启动HiveServer2
Dependencies of HS2
启动HS2之前,依赖Metastore
需要启动metastore
hive --service metastore &
Hadoop cluster
start-all.sh
启动hs2
两种方式
$HIVE_HOME/bin/hiveserver2
$HIVE_HOME/bin/hive --service hiveserver2
##启动hiveserver2服务前需要修改hive-site.xml
添加如下配置信息
#写一个jdbc的程序连接hvie 操作数据库表
###JDBC Client Sample Code官网参考地址
https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients
1)创建java项目
2)引入hive工程的jar包和hadoop的jar包
3)创建文件,注意文件格式
echo -e '1\x01foo' >> /home/hadoop/hive/demo
echo -e '2\x01rdd' >> /home/hadoop/hive/demo
---
Connection con = DriverManager.getConnection("jdbc:hive2://192.168.4.110:10000/default", "hadoop", "");
---
注意:此处的用户名和密码为hadoop集群的用户名及密码
package com.hive;
import java.sql.SQLException;
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.Statement;
import java.sql.DriverManager;
public class HiveJdbcClient {
private static String driverName = "org.apache.hive.jdbc.HiveDriver";
public static void main(String[] args) throws SQLException {
try {
Class.forName(driverName);
} catch (ClassNotFoundException e) {
e.printStackTrace();
System.exit(1);
}
Connection con = DriverManager.getConnection("jdbc:hive2://192.168.4.110:10000/default", "hadoop", "");
Statement stmt = con.createStatement();
String tableName = "testHiveDriverTable";
stmt.execute("drop table if exists " + tableName);
stmt.execute("create table " + tableName + " (key int, value string)");
// show tables
String sql = "show tables '" + tableName + "'";
System.out.println("Running: " + sql);
ResultSet res = stmt.executeQuery(sql);
if (res.next()) {
System.out.println(res.getString(1));
}
// describe table
sql = "describe " + tableName;
System.out.println("Running: " + sql);
res = stmt.executeQuery(sql);
while (res.next()) {
System.out.println(res.getString(1) + "\t" + res.getString(2));
}
// load data into table
// NOTE: filepath has to be local to the hive server
// NOTE: /tmp/a.txt is a ctrl-A separated file with two fields per line
// String filepath = "/tmp/a.txt";
String filepath = "/home/hadoop/hive/demo";
sql = "load data local inpath '" + filepath + "' into table " + tableName;
System.out.println("Running: " + sql);
stmt.execute(sql);
// select * query
sql = "select * from " + tableName;
System.out.println("Running: " + sql);
res = stmt.executeQuery(sql);
while (res.next()) {
System.out.println(String.valueOf(res.getInt(1)) + "\t" + res.getString(2));
}
// regular hive query
sql = "select count(1) from " + tableName;
System.out.println("Running: " + sql);
res = stmt.executeQuery(sql);
while (res.next()) {
System.out.println(res.getString(1));
}
}
}
###Operators and UDFs
####Creating Custom UDFs官网参考地址
https://cwiki.apache.org/confluence/display/Hive/HivePlugins
第一步,创建一个class继承UDF
package com.example.hive.udf;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;
public final class Lower extends UDF {
public Text evaluate(final Text s) {
if (s == null) { return null; }
return new Text(s.toString().toLowerCase());
}
}
第二步,把代码打成jar包,并添加到Hive classpath中;
hive> add jar /home/hadoop/Lower.jar
第三步,第一次运行hive的时候,需要注册你的function
create temporary function lower as 'com.hive.Lower';
第四步,可以使用这个funciton了
hive> select key,lower(value),value from testhivedrivertable;
OK
1 FOO foo
2 RDD rdd
Time taken: 0.7 seconds, Fetched: 2 row(s)
### Row Format
正则表达式
收尾:
Hive优化面试整理 --TODO