备注:如果你是用hive的api去连,那么就需要打开Hive beeline,如果是用spark SQL就需要开启spark的ThriftServer
上面安装后直接输入hive是CLI的操作方式,在后面可能会被舍弃掉,换用beeline方式去连接,因为目前使用CLI的人比较多,所以还暂未舍弃
1.打开hiveserver2的服务:nohup $HIVE_HOME/bin/hiveserver2 > /tmp/hiveserver2.log 2>&1 &
2.执行beeline脚本:$HIVE_HOME/bin/beeline
3.!connect jdbc:hive2://master:10000
连接hiveserver2服务, 输入用户名:hadoop-jrq,不需要密码
4.如果报如下的错,则在hdfs的core-site.xml中增加两个属性配置
WARN jdbc.HiveConnection: Failed to connect to master:10000
Error: Could not open client transport with JDBC Uri: jdbc:hive2://master:10000: Failed to open new session : java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.Au thorizationException): User: hadoop-jrq is not allowed to impersonate anonymous (state=08S01,code=0)
hadoop.proxyuser.hadoop-jrq.hosts
*
hadoop.proxyuser.hadoop-jrq.groups
*
注意:三个集群都需要配置,配置好后需要重启集群,并且需要重启hiveserver2
5.如果报如下的错,则执行hadoop dfsadmin -safemode leave
6.如果没有报错的话,则可以执行查询语句sql了
依赖包
org.apache.hive
hive-jdbc
1.2.1
public class HiveJdbcClient {
private static String driverName = "org.apache.hive.jdbc.HiveDriver";
public static void main(String[] args) throws SQLException {
try {
// 注册驱动
Class.forName(driverName);
} catch (ClassNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
System.exit(1);
}
//replace "hive" here with the name of the user the queries should run as
// 获取连接,不是在安全模式下,所以密码是空的
Connection con =
DriverManager.getConnection("jdbc:hive2://master:10000", "hadoop-jrq", "");
// 创建编译对象,之后就可以用SQL的方式对hive进行操作了
Statement stmt = con.createStatement();
// 数据库的名字
String tableName = "jrq.tracker_session";
// 创建数据库,存在则不创建,不存在则创建
stmt.execute("CREATE DATABASE IF NOT EXISTS jrq");
// 创建一张表
stmt.execute("CREATE TABLE IF NOT EXISTS " + tableName + " (\n" +
" session_id string,\n" +
" session_server_time string,\n" +
" cookie string,\n" +
" cookie_label string,\n" +
" ip string,\n" +
" landing_url string,\n" +
" pageview_count int,\n" +
" click_count int,\n" +
" domain string,\n" +
" domain_label string)\n" +
"STORED AS PARQUET");
// show tables
String sql = "show tables '" + tableName + "'";
System.out.println("Running: " + sql);
// 提取结果集
ResultSet res = stmt.executeQuery(sql);
if (res.next()) {
System.out.println(res.getString(1));
}
// describe table 看table的行和列
sql = "describe " + tableName;
System.out.println("Running: " + sql);
res = stmt.executeQuery(sql);
while (res.next()) {
System.out.println(res.getString(1) + "\t" + res.getString(2));
}
// load data into table
// 不写的话默认就是hdfs
// 数据导入成功后,这个目录都会被删除
sql =
"LOAD DATA INPATH 'hdfs://slave1:8020/user/hadoop-jrq/example/output/trackerSession'" +
" OVERWRITE INTO TABLE " + tableName;
System.out.println("Running: " + sql);
stmt.execute(sql);
// select * query
sql = "select * from " + tableName;
System.out.println("Running: " + sql);
// 获取结果,循环打印
res = stmt.executeQuery(sql);
while (res.next()) {
System.out.println(String.valueOf(res.getString(1)) + "\t" + res.getString(2));
}
// regular hive query
sql = "select count(1) from " + tableName;
System.out.println("Running: " + sql);
res = stmt.executeQuery(sql);
while (res.next()) {
System.out.println(res.getString(1));
}
//操作完成后需要关闭连接
res.close();
stmt.close();
con.close();
}
}
1.将$HIVE_HOME/conf/hive-site.conf
拷贝到$SPARK_HOME/conf
下
cp $HIVE_HOME/conf/hive-site.xml $SPARK_HOME/conf
2.将mysql-connector-java-x.x.x-bin.jar
上传到$SPARK_HOME/jars
下
3.重启spark集群
4.$SPARK_HOME/bin/spark-sql
脚本操作Hive
这个就相当于hive的CLI操作
1.开启spark SQL HiveThriftServer
服务端执行:
$SPARK_HOME/sbin/start-thriftserver.sh \
--hiveconf hive.server2.thrift.port=10000 \
--hiveconf hive.server2.thrift.bind.host=master \
--master spark://master:7077
注意这里有个坑点,请移步我的另一篇文章https://blog.csdn.net/weixin_42411818/article/details/100116987
执行后利用jps可以看到多了个sparksubmit进程
2.Spark SQL beeline:
执行$SPARK_HOME/bin/beeline
然后输入!connect jdbc:hive2://master:10000
账号是你目前机器的账号,密码无
3.代码中JDBC的方式访问Spark SQL
客户端JDBC依赖的是1.2.1版本,
jdbc url为:jdbc:hive2/master:10000
org.apache.hive
hive-jdbc
1.2.1
4.Spark SQL读写数据源中的table和saveAsTable
spark.read.table读一张表,saveAsTable保存一张新表