本篇文档主要用来描述:
1. 搭建unixODBC驱动,用来通过odbc方式访问数据库
2. 搭建Pyodbc驱动,用来使用Python脚本通过系统的odbc方式来对数据库进行操作
3. 这种架构的好处:
l 通过odbc访问数据库的性能要好于jdbc的方式
l 通过Python脚本开发,后续可以通过调度平台来执行Python执行数据分析,比如设置定时任务,周期性地执行Python脚本
安装前准备
(1)操作系统(系统上面要安装一些必备的开发工具(比如gcc等))
[root@CDHA ~]# cat /etc/redhat-release
CentOS release 6.7 (Final)
[root@CDHA ~]$ python
Python 2.6.2 (r262:71600, May 12 2009, 15:34:31)
[GCC 4.1.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>
(2)安装所需的软件包
greenplum-connectivity-4.3.8.2-build-1-RHEL5-x86_64.zip
--GP官网下载,GP的JDBC和ODBC驱动
pyodbc-3.0.10.tar.gz
--Python连接GP需要pyodbc驱动包
unixODBC-2.2.12.tar.gz
--unixODBC的驱动管理器
(3)将上面的包上传到搭建环境的CDHA服务器上面,比如/software/
安装GP驱动包
1. 解压greenplum-connectivity-4.3.8.2-build-1-RHEL5-x86_64.zip
unzip greenplum-connectivity-4.3.8.2-build-1-RHEL5-x86_64.zip
2. 执行解压后得到greenplum-connectivity-4.3.8.2-build-1-RHEL5-x86_64.bin可执行文件
bash greenplum-connectivity-4.3.8.2-build-1-RHEL5-x86_64.bin
略部分内容
********************************************************************
Do you accept the EMC Connectivity license agreement? [yes | no]
********************************************************************
yes ---------同意许可
********************************************************************
Providethe installation path for Greenplum Connectivity or press ENTER to
Accept the default installation path:greenplum-connectivity-4.3.8.2-build-1
********************************************************************
********************************************************************
InstallGreenplum Connectivity into? [yes | no]
********************************************************************
yes ----------------保持默认的安装路径,你也可以自由指定安装路径
********************************************************************
/usr/local/greenplum-connectivity-4.3.8.2-build-1 does not exist.
Create/usr/local/greenplum-connectivity-4.3.8.2-build-1 ? [ yes | no ]
(Selectingno will exit the installer)
********************************************************************
yes ----------------创建安装目录
Extractingproduct to /usr/local/greenplum-connectivity-4.3.8.2-build-1
********************************************************************
Installationcomplete.
GreenplumConnectivity is installed in /usr/local/greenplum-connectivity-4.3.8.2-build-1
************************************************************************
3. 配置Greenplum DB数据库驱动
查看安装目录时,如下:
[root@CDHA software]# ll /usr/local/greenplum-connectivity-4.3.8.2-build-1/drivers/odbc
total 24
drwxr-xr-x 3 gpadmin gpadmin 4096 May 10 13:34 psqlodbc-08.02.0400
drwxr-xr-x 6 gpadmin gpadmin 4096 May 10 13:37 psqlodbc-08.02.0500
drwxr-xr-x 3 gpadmin gpadmin 4096 May 10 13:37 psqlodbc-08.03.0400
drwxr-xr-x 3 gpadmin gpadmin 4096 May 10 13:38 psqlodbc-08.04.0200
drwxr-xr-x 3 gpadmin gpadmin 4096 May 10 13:38 psqlodbc-09.00.0200
drwxr-xr-x 3 gpadmin gpadmin 4096 May 10 13:38 psqlodbc-09.02.0100
[root@CDHA software]#
我们会看到有好几个版本的驱动,我们可以选择psqlodbc-08.02.0500版本的,再查看如下目录:
[root@CDHA software]# ll /usr/local/greenplum-connectivity-4.3.8.2-build-1/drivers/odbc/psqlodbc-08.02.0500/
total 48
drwxr-xr-x 2 gpadmin gpadmin 4096 May 10 13:37 datadirect-51sp2_64
drwxr-xr-x 2 gpadmin gpadmin 4096 May 10 13:37 datadirect-52_64
drwxr-xr-x 2 gpadmin gpadmin 4096 May 10 13:37 datadirect-53sp2_64
-rwxr-xr-x 1 gpadmin gpadmin 25746 May 10 13:36 license.txt
-rwxr-xr-x 1 gpadmin gpadmin 1383 May 10 13:36 readme.txt
drwxr-xr-x 3 gpadmin gpadmin 4096 Jul 6 11:38 unixodbc-2.2.12
同样我们可以看到驱动管理器。
鉴于GP基于Postgresql8.2版本,我们这里面选择驱动为psqlodbc-08.02.0500,驱动管理器选择为datadirect-52_64。
所以,我们修改greenplum_connectivity_path.sh文件中的内容:
GP_ODBC_DRIVER=psqlodbc-08.02.0500 --值与实际目录名称相同
GP_ODBC_DRIVER_MANAGER=datadirect-52_64 --值与实际目录名称相同
注:该文件默认权限位444,是不允许编辑的,你可以手动修改文件的权限,也可以修改整个安装目录的权限位755,如下:
chmod -R 755 /usr/local/greenplum-connectivity-4.3.8.2-build-1
保存greenplum_connectivity_path.sh后,要记得source,使环境变量生效,如下:
source greenplum_connectivity_path.sh
安装unixODBC驱动
1. 编译和安装unixODBC驱动包
tar -zxvf unixODBC-2.2.12.tar.gz
./configure --prefix=/etc/unixODBC --enable-fdb --disable-gui
make
make install
2. 查看unixODBC安装目录
[root@CDHA software]# ll /etc/unixODBC/
total 16
drwxr-xr-x 2 root root 4096 Jul 5 23:19 bin
drwxr-xr-x 3 root root 4096 Jul 6 10:09 etc
drwxr-xr-x 2 root root 4096 Jul 5 23:19 include
drwxr-xr-x 2 root root 4096 Jul 6 11:47 lib
3. 编辑unixODBC的etc目录下面的两个配置文件,如下:
[root@CDHA unixODBC]# cat /etc/unixODBC/etc/odbc.ini
[GreenplumDSN]
Driver =Greenplum ----值要和/etc/unixODBC/etc/odbcinst.ini中名字一致
Trace = 1
Debug=1
Database = zhangyun_db_safe ----GP数据库名
Servername = 192.168.1.24 ----GP的IP地址
UserName = zhangyun ----GP用户名
Password = xxxxxx ----GP用户密码
Port = 5432 ----GP访问端口号
ReadOnly = No
RowVersioning = No
DisallowPremature = No
ShowSystemTables = Yes
ShowOidColumn = No
FakeOidIndex = No
useDeclareFetch = 1
Fetch = 4096
UpdatableCursors = Yes
Protocol = 7.4-1
[root@CDHA unixODBC]# cat /etc/unixODBC/etc/odbcinst.ini
[Greenplum]
Description=PostgreSQL driver forGreenplum
Driver=/usr/local/greenplum-connectivity-4.3.8.2-build-1/drivers/odbc/psqlodbc-08.02.0500/unixodbc-2.2.12/psqlodbcw.so ----GP的ODBC驱动
UsageCount=1
FileUsage=1
4. 使用isql测试
[root@CDHA unixODBC]# isql GreenplumDSN zhangyun xxxxxx
+---------------------------------------+
| Connected! |
| |
| sql-statement |
| help [tablename] |
| quit |
| |
+---------------------------------------+
SQL> select count(1) from test_bigtable;
+---------------------+
| count |
+---------------------+
| 1791144834 |
+---------------------+
SQLRowCount returns -1
1 rows fetched
SQL> select user;
+-----------------------------------------------------------------+
| current_user |
+-----------------------------------------------------------------+
| zhangyun |
+-----------------------------------------------------------------+
SQLRowCount returns -1
1 rows fetched
SQL>
注:如果你在执行isql时,出现如下情况:
[root@CDHA unixODBC]# isql GreenplumDSN
[ISQL]ERROR: Could not SQLConnect
这个问题很大情况下是你没有source文件greenplum_connectivity_path.sh导致的,执行source greenplum_connectivity_path.sh文件后,再执行就OK了,最好的办法是将source该文件加入到系统环境变量中。
如果还是不可以的话,请加入-v选项查看详细信息,也可参考最后的错误的解决方法。
安装pyodbc驱动
1. 编译和安装pyodbc驱动
在编译之前说明一下,如下的一些库需要安装好,如下:
[root@CDHA pyodbc-3.0.10]# yum install gcc-c*
[root@CDHA pyodbc-3.0.10]# yum install compat-gcc-34-c++
[root@CDHA pyodbc-3.0.10]# yum install unixODBC-devel
[root@CDHA pyodbc-3.0.10]# yum install python-devel
如果还有其他库没有安装,请执行yum来进行安装。
下面开始编译pyodbc
tar -zxvf pyodbc-3.0.10.tar.gz
cd pyodbc-3.0.10
python setup.py build
python setup.py install
2. 查看pyodbc安装目录
[root@CDHA pyodbc-3.0.10]# ll /usr/lib64/python2.6/site-packages/pyodbc*
-rwxr-xr-x 1 root root 913 Jul 6 14:34 /usr/lib64/python2.6/site-packages/pyodbc-3.0.10-py2.6.egg-info
-rwxr-xr-x 1 root root 391676 Jul 6 14:34 /usr/lib64/python2.6/site-packages/pyodbc.so
测试python脚本
1. 准备python测试脚本,如下:
[zhangyun@CDHA ~]$ cat helloworld.py
#!/usr/bin/python
#-*- encoding: utf-8 -*-
####################################################################
# name: helloworld.py
# describe: 测试python访问Greenplum数据库
########################################################################
import pyodbc
import sys
reload(sys)
sys.setdefaultencoding('utf8')
class GreenplumTest:
debug = 1
def __init__(self,dbinfo):
self.UID = dbinfo[1]
self.PWD = dbinfo[2]
odbcinfo ='DSN=%s;UID=%s;PWD=%s'%(dbinfo[0],dbinfo[1],dbinfo[2])
self.cnxn =pyodbc.connect(odbcinfo,autocommit=True,ansi=True)
self.cursor =self.cnxn.cursor()
def __del__(self):
if self.cursor:
self.cursor.close()
if self.cnxn:
self.cnxn.close()
def _printinfo(self,msg):
print"%s"%(msg)
print "\n"
def testsql(self):
# 类似的业务逻辑,可以放到sql中执行
# 示例:创建表,插入数据
sql_1 = '''
drop table if exists helloworld;
create table helloworld(id int,name text) distributed by (id);
insert into helloworld values(1,'Spark'),(2,'Hadoop'),(3,'Apache');
'''
self.cursor.execute(sql_1.strip())
#查询结果,并返回
sql_2 = '''
select * from helloworld;
'''
self.cursor.execute(sql_2.strip())
row = self.cursor.fetchall()
return row
#Main
def main():
# 检查传入参数个数
if len(sys.argv) < 4 :
print 'usage: python GreenplumDSN username password\n'
sys.exit(1)
# 定义连接GP的信息
dbinfo = []
dbinfo.append(sys.argv[1])
dbinfo.append(sys.argv[2])
dbinfo.append(sys.argv[3])
GPT= GreenplumTest(dbinfo)
ret = GPT.testsql()
return ret
if __name__ == '__main__':
sys.exit(main())
2. 测试过程:
python helloworld.py GreenplumDSN zhangyun xxxxxx
上面执行后返回结果如下:
[(3, 'Apache'), (1, 'Spark'), (2, 'Hadoop')]
3. 我们登录GP数据库查看数据
[gpadmin@CDHA ~]$ psql -d zhangyun_db_safe -U zhangyun -h CDHA
psql (8.2.15)
Type "help" for help.
zhangyun_db_safe=# select * from helloworld ;
id | name
----+--------
2 | Hadoop
1 | Spark
3 | Apache
(3 rows)
zhangyun_db_safe=#
问题汇总:
1. 如果unixODBC安装好后,在使用isql时,显示如下错误
/usr/local/greenplum-connectivity-4.3.8.2-build-1/drivers/odbc/psqlodbc-08.02.0500/unixodbc-2.2.12/psqlodbcw.so文件找不到,如:
isql GreenplumDSN -v
[01000][unixODBC][Driver Manager]Can't open lib '/usr/local/greenplum-connectivity-4.3.8.2-build-1/drivers/odbc/psqlodbc-08.02.0500/unixodbc-2.2.12/psqlodbcw.so' : file not found
[ISQL]ERROR: Could not SQLConnect
解决办法:
首先使用ldd查看缺失什么文件
ldd /usr/local/greenplum-connectivity-4.3.8.2-build-1/drivers/odbc/psqlodbc-08.02.0500/unixodbc-2.2.12/psqlodbcw.so
linux-vdso.so.1 => (0x00007ffdd9106000)
libpq.so.5 => /usr/local/greenplum-connectivity-4.3.8.2-build-1/drivers/odbc/psqlodbc-08.02.0500/datadirect-52_64/libpq.so.5 (0x00002addb8e3f000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00002addb908c000)
libodbcinst.so.1 => not found
libodbc.so.1 => not found
libc.so.6 => /lib64/libc.so.6 (0x00002addb92aa000)
libssl.so.0.9.8 => /usr/local/greenplum-connectivity-4.3.8.2-build-1/drivers/odbc/psqlodbc-08.02.0500/datadirect-52_64/libssl.so.0.9.8 (0x00002addb963e000)
libcrypto.so.0.9.8 => /usr/local/greenplum-connectivity-4.3.8.2-build-1/drivers/odbc/psqlodbc-08.02.0500/datadirect-52_64/libcrypto.so.0.9.8 (0x00002addb9893000)
libgssapi_krb5.so.2 => /usr/local/greenplum-connectivity-4.3.8.2-build-1/drivers/odbc/psqlodbc-08.02.0500/datadirect-52_64/libgssapi_krb5.so.2 (0x00002addb9c26000)
libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00002addb9e50000)
libldap_r-2.3.so.0 => /usr/local/greenplum-connectivity-4.3.8.2-build-1/drivers/odbc/psqlodbc-08.02.0500/datadirect-52_64/libldap_r-2.3.so.0 (0x00002addba088000)
/lib64/ld-linux-x86-64.so.2 (0x0000003f8bc00000)
libdl.so.2 => /lib64/libdl.so.2 (0x00002addba2dc000)
libkrb5.so.3 => /usr/local/greenplum-connectivity-4.3.8.2-build-1/drivers/odbc/psqlodbc-08.02.0500/datadirect-52_64/libkrb5.so.3 (0x00002addba4e1000)
libk5crypto.so.3 => /usr/local/greenplum-connectivity-4.3.8.2-build-1/drivers/odbc/psqlodbc-08.02.0500/datadirect-52_64/libk5crypto.so.3 (0x00002addba76b000)
libcom_err.so.3 => /usr/local/greenplum-connectivity-4.3.8.2-build-1/drivers/odbc/psqlodbc-08.02.0500/datadirect-52_64/libcom_err.so.3 (0x00002addba990000)
libkrb5support.so.0 => /usr/local/greenplum-connectivity-4.3.8.2-build-1/drivers/odbc/psqlodbc-08.02.0500/datadirect-52_64/libkrb5support.so.0 (0x00002addbab95000)
libresolv.so.2 => /lib64/libresolv.so.2 (0x00002addbad9c000)
libfreebl3.so => /lib64/libfreebl3.so (0x00002addbafb7000)
liblber-2.3.so.0 => /usr/local/greenplum-connectivity-4.3.8.2-build-1/drivers/odbc/psqlodbc-08.02.0500/datadirect-52_64/liblber-2.3.so.0 (0x00002addbb1ba000)
那么我们就创建缺失的文件,如下:
ln -s /usr/local/greenplum-connectivity-4.3.8.2-build-1/drivers/odbc/psqlodbc-08.02.0500/unixodbc-2.2.12/libodbcinst.so.1 /lib64/libodbcinst.so.1
ln -s /usr/local/greenplum-connectivity-4.3.8.2-build-1/drivers/odbc/psqlodbc-08.02.0500/unixodbc-2.2.12/libodbc.so.1 /lib64/libodbc.so.1
再次查看时就没有问题了,如下:
ldd /usr/local/greenplum-connectivity-4.3.8.2-build-1/drivers/odbc/psqlodbc-08.02.0500/unixodbc-2.2.12/psqlodbcw.so
linux-vdso.so.1 => (0x00007ffcc97d4000)
libpq.so.5 => /usr/local/greenplum-connectivity-4.3.8.2-build-1/drivers/odbc/psqlodbc-08.02.0500/datadirect-52_64/libpq.so.5 (0x00002b10ed9ee000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00002b10edc3b000)
libodbcinst.so.1 => /lib64/libodbcinst.so.1 (0x00002b10ede58000)
libodbc.so.1 => /lib64/libodbc.so.1 (0x00002b10ee071000)
libc.so.6 => /lib64/libc.so.6 (0x00002b10ee300000)
libssl.so.0.9.8 => /usr/local/greenplum-connectivity-4.3.8.2-build-1/drivers/odbc/psqlodbc-08.02.0500/datadirect-52_64/libssl.so.0.9.8 (0x00002b10ee694000)
libcrypto.so.0.9.8 => /usr/local/greenplum-connectivity-4.3.8.2-build-1/drivers/odbc/psqlodbc-08.02.0500/datadirect-52_64/libcrypto.so.0.9.8 (0x00002b10ee8e9000)
libgssapi_krb5.so.2 => /usr/local/greenplum-connectivity-4.3.8.2-build-1/drivers/odbc/psqlodbc-08.02.0500/datadirect-52_64/libgssapi_krb5.so.2 (0x00002b10eec7c000)
libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00002b10eeea6000)
libldap_r-2.3.so.0 => /usr/local/greenplum-connectivity-4.3.8.2-build-1/drivers/odbc/psqlodbc-08.02.0500/datadirect-52_64/libldap_r-2.3.so.0 (0x00002b10ef0de000)
/lib64/ld-linux-x86-64.so.2 (0x0000003f8bc00000)
libdl.so.2 => /lib64/libdl.so.2 (0x00002b10ef332000)
libkrb5.so.3 => /usr/local/greenplum-connectivity-4.3.8.2-build-1/drivers/odbc/psqlodbc-08.02.0500/datadirect-52_64/libkrb5.so.3 (0x00002b10ef537000)
libk5crypto.so.3 => /usr/local/greenplum-connectivity-4.3.8.2-build-1/drivers/odbc/psqlodbc-08.02.0500/datadirect-52_64/libk5crypto.so.3 (0x00002b10ef7c1000)
libcom_err.so.3 => /usr/local/greenplum-connectivity-4.3.8.2-build-1/drivers/odbc/psqlodbc-08.02.0500/datadirect-52_64/libcom_err.so.3 (0x00002b10ef9e6000)
libkrb5support.so.0 => /usr/local/greenplum-connectivity-4.3.8.2-build-1/drivers/odbc/psqlodbc-08.02.0500/datadirect-52_64/libkrb5support.so.0 (0x00002b10efbeb000)
libresolv.so.2 => /lib64/libresolv.so.2 (0x00002b10efdf2000)
libfreebl3.so => /lib64/libfreebl3.so (0x00002b10f000d000)
liblber-2.3.so.0 => /usr/local/greenplum-connectivity-4.3.8.2-build-1/drivers/odbc/psqlodbc-08.02.0500/datadirect-52_64/liblber-2.3.so.0 (0x00002b10f0210000)