python3.6.5基于kerberos认证的hive和hdfs连接调用

1.  Kerberos是一种计算机网络授权协议,用来在非安全网络中,对个人通信以安全的手段进行身份认证。具体请查阅官网

2. 需要安装的包(基于centos)

yum install libsasl2-dev
yum install gcc-c++ python-devel.x86_64 cyrus-sasl-devel.x86_64
yum install python-devel   
yum install krb5-devel
yum install python-krbV

pip install krbcontext==0.9
pip install thrift==0.9.3
pip install thrift-sasl==0.2.1
pip install impyla==0.14.1
pip install hdfs[kerberos]
pip install pykerberos==1.2.1

3. /etc/krb5.conf 配置, 在这个文件里配置你服务器所在的域

4./etc/hosts 配置, 配置集群机器和域所在机器

5. 通过kinit 生成 ccache_file或者keytab_file

6. 连接hive代码如下

import os
from impala.dbapi import connect
from krbcontext import krbcontext
keytab_path = os.path.split(os.path.realpath(__file__))[0] + '/xxx.keytab'
principal = 'xxx'
with krbcontext(using_keytab=True,principal=principal,keytab_file=keytab_path):
    conn = connect(host=ip, port=10000, auth_mechanism='GSSAPI', kerberos_service_name='hive')
    cursor = conn.cursor()
    cursor.execute('SELECT * FROM default.books')
    for row in cursor:
        print(row)

 

7. 连接hdfs代码如下

 

from hdfs.ext.kerberos import KerberosClient
from krbcontext import krbcontext

hdfs_url = 'http://' + host + ':' + port
data = self._get_keytab(sso_ticket)
self._save_keytab(data)
with krbcontext(using_keytab=True, keytab_file=self.keytab_file, principal=self.user):
    self.client = KerberosClient(hdfs_url)
    self.client._list_status(path).json()['FileStatuses']['FileStatus']  #获取path下文件及文件夹

8. 注:krbcontext这个包官方说支持python2,但是python3也能用

    这个hdfs_url 一定要带"http://"不然会报错

 

9. 我新增了一些配置文件配置,具体可看我的一个新文章

https://blog.csdn.net/u012133034/article/details/100704082

你可能感兴趣的:(大数据)