本文讲解部署hive3的过程中遇到的问题和解决方案
一 :hive的部署
安装方法见Hive3整合Hadoop3的安装配置
二:安装tez作为hive的计算引擎
下载tez的安装包下载地址,解压到安装目录,安装指南(英文版),
简要说一下安装步骤:
1:确保部署tez之前先部署hadoop,并且版本大于等于2.7.0
2:编译tez,如果下载的是编译好的bin版本,该步骤可以省略,我们用的是bin版本
3:复制tez相关的jar和配置tez-site.xml文件
hadoop fs -mkdir /user/tez
hadoop fs -put ${TEZ_HOME}/tez.tar.gz /user/tez
3.1 在tez-site.xml中设置 tez.lib.uris
参数,只想我们刚刚put上的hdfs的路径
tez.lib.uris
/user/tez/tez.tar.gz
确保 tez.use.cluster.hadoop-libs没有被设置在tez-site.xml中,如果设置了该参数则应该值为false。
4:如果要运行MapReduce任务(job)在tez之上,修改hadoop的mapred-site.xml配置文件的以下参数
mapreduce.framework.name
yarn-tez
5:修改客户端节点配置,保证tez相关的类库在hadoop的classpath下
编辑 hadoop-env.sh,在文件末尾追加以下配置
TEZ_CONF_DIR=/opt/programs/hadoop-3.2.2/etc/hadoop/tez-site.xml
TEZ_JARS=/opt/programs/apache-tez-0.9.2-bin
export HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:${TEZ_CONF_DIR}:${TEZ_JARS}/*:${TEZ_JARS}/lib/*
请注意“*”,在为包含jar文件的目录设置类路径时,这是一个重要的要求。
6:在tez-examples.jar中有一个使用MRR作业的基本示例。请参阅源代码中的OrderedWordCount.java。要运行这个例子:
hadoop jar tez-examples.jar orderedwordcount
:hive支持多用户,自定义增加验证机制,hadoop配置修改
默认hive是不需要认证的,所有人都可以直接访问数据,这样是很不安全的,所以这里我们需要自定义认证机制,然后才能通过beeline或者jdbc连接hive。
代码主要是实现一个PasswdAuthenticationProvider
的子类,实现抽象方法,实现认证逻辑,
public class BasicUsernamePasswdAuthenticator implements PasswdAuthenticationProvider {
private final static Logger LOGGER = LoggerFactory.getLogger(BasicUsernamePasswdAuthenticator.class);
private static final String HIVE_JDBC_PASSWD_AUTH_PREFIX = "hive.jdbc.passwd.%s";
private Configuration conf = null;
@Override
public void Authenticate(String user, String password) throws AuthenticationException {
LOGGER.info("user: " + user + " try login.");
String passwdFromConf = getConf().get(String.format(HIVE_JDBC_PASSWD_AUTH_PREFIX, user));
LOGGER.info("读取到用户{}的配置密码为{},传入密码为{}", user, passwdFromConf, password);
if (passwdFromConf == null) {
String message = "user's ACL configuration is not found. user:" + user + ",passwdFromConf:" + passwdFromConf;
LOGGER.info(message);
throw new AuthenticationException(message);
}
if (!passwdFromConf.equals(password)) {
String message = "user name and password is mismatch. user:" + user + ",passwdFromConf:" + passwdFromConf;
LOGGER.error(message);
throw new AuthenticationException(message);
}
LOGGER.info("认证通过");
}
public Configuration getConf() {
if (conf == null) {
this.conf = new Configuration(new HiveConf());
}
return conf;
}
public void setConf(Configuration conf) {
this.conf = conf;
}
}
然后将打包该class为jar,上传到hive 的lib目录中,修改hive的配置,增加如下配置
hive.server2.authentication
CUSTOM
Expects one of [nosasl, none, ldap, kerberos, pam, custom].
Client authentication types.
NONE: no authentication check
LDAP: LDAP/AD based authentication
KERBEROS: Kerberos/GSSAPI authentication
CUSTOM: Custom authentication provider
(Use with property hive.server2.custom.authentication.class)
PAM: Pluggable authentication module
NOSASL: Raw transport
hive.server2.custom.authentication.class
org.puppy.hive.auth.basic.BasicUsernamePasswdAuthenticator
hive.jdbc.passwd.hadoop
123456789
以上配置了一个hadoop用户,密码是1-9,用这个账户连接hive操作hdfs中的数据,用户名hadoop是在代码中用String.format模式匹配得到的,所以这里只有一个配置。
如果只是配置到这里,我们尝试启动hive,然后用beeline连接
$ nohup hiveserver2 >> /opt/programs/apache-hive-3.1.2-bin/logs/hive.log &
$ beeline
Beeline version 3.1.2 by Apache Hive
beeline> !connect jdbc:hive2://hadoop000:10000
Enter username for jdbc:hive2://hadoop000:10000: hadoop
Enter password for jdbc:hive2://hadoop000:10000: *********
回车之后会得到一个错误
Error: Could not open client transport with JDBC Uri:
jdbc:hive2://hadoop000:10000: Failed to open new session: java.lang.RuntimeException:
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException):
User: puppy is not allowed to impersonate joo (state=08S01,code=0)
看后台的hive日志会看到这样的异常
hadoop is not allowed to impersonate puppy
解释一下,意思就是hadoop这个用户不允许乔装为puppy这个用户,为什么会这样呢?
有两个原因,这哥就是hadoop的安全机制了,hadoop不是随便哪个用户都可以操作的,要用super user来代替hadoop这个用户操作hdfs才管用,hadoop提供了一种impersonate的机制,
Superusers Acting On Behalf Of Other Users,然后super user可以proxy user的身份来提交任务,需要修改以下配置
hive-site.xml
hive.server2.enable.doAs
true
core-site.xml
hadoop.proxyuser.puppy.hosts
hadoop000,172.24.163.174,localhost,127.0.0.1
hadoop.proxyuser.puppy.groups
supergroup
hadoop.proxyuser.puppy.users
hadoop,bob,joe
权限刷新
hdfs dfsadmin -refreshUserToGroupsMappings
yarn rmadmin -refreshSuperUserGroupsConfiguration
为了保险起见,重启hadoop,就可以了
如果上面的hadoop.proxyuser.puppy.hosts配置错误,会有下面的错误
Caused by: org.apache.hadoop.ipc.RemoteException: Unauthorized connection for super-user: puppy from IP /172.24.163.174
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1562) ~[hadoop-common-3.2.2.jar:?]
at org.apache.hadoop.ipc.Client.call(Client.java:1508) ~[hadoop-common-3.2.2.jar:?]
at org.apache.hadoop.ipc.Client.call(Client.java:1405) ~[hadoop-common-3.2.2.jar:?]
:jdbc客户端连接hive
代码如下
public static void main(String[] args) throws SQLException {
DruidDataSource source = new DruidDataSource();
source.setUrl("jdbc:hive2://hadoop000:10000");
source.setDbType("hive");
source.setUsername("hadoop");
source.setPassword("123456789");
DruidPooledConnection connection = source.getConnection();
System.out.println("获取到连接:" + connection);
PreparedStatement statement = connection.prepareStatement("select * from test_db.u_data");
try {
ResultSet resultSet = statement.executeQuery();
while (resultSet.next()) {
String phone = resultSet.getString("phone");
System.out.println("手机号码为:" + phone);
}
} catch (SQLException e) {
e.printStackTrace();
} finally {
connection.close();
statement.close();
}
}
maven依赖
org.apache.hive
hive-jdbc
3.1.2
com.alibaba
druid
1.2.5