PostgreSQL是一个开源、可扩展的关系型数据库,rdkit官网文档里也是以postgresql作为案例进行演示。本文介绍带rdkit插件的postgresql在windows系统上的环境配置,以及在python环境下的基本操作。
C:\Users\Administrator> conda install -c rdkit rdkit-postgresql
C:\Users\Administrator> initdb –D d:\postgresql\data
Execution of PostgreSQL by a user with administrative permissions is not permitted.
C:\Users\Administrator> net user postgres /add
C:\Users\Administrator> net user postgres /active:yes
C:\Users\Administrator> net user postgres
C:\Users\Administrator> runas /user:postgres cmd
C:\Users\Administrator> postgres -D d:\postgresql\data
C:\Users\Administrator> createdb mols
C:\Users\Administrator> psql –c “create extension rdkit” mols
C:\Users\Administrator> psql mols
mols=# select count(*) from info;
C:\Users\Administrator> dropdb mols
C:\Users\Administrator> conda install -c conda-forge psycopg2
>>> import psycopg2
>>> connection = psycopg2.connect(database='mols',
>>> user='Administrator',
>>> password='postgresql',
>>> port='5432',
>>> host='127.0.0.1')
>>> type(connection)
psycopg2.extensions.connection
>>> cur = connection.cursor()
>>> type(cur)
psycopg2.extensions.cursor
>>> cur.execute("select * from current_user;")
>>> reply = cur.fetchall()
>>> reply
[('Administrator',)]
>>> query = '''create table info(id serial primary key,
>>> database text,
>>> project text,
>>> SMILES text,
>>> HA integer,
>>> HD integer,
>>> RB integer,
>>> MW float,
>>> LOGP float);'''
>>> cur.execute(query)
>>> import pandas as pd
>>> df = pd.read_excel('ippin.xlsx')
获取mol对象列表
>>> from rdkit import Chem
>>> mol_list = [x for x in [Chem.MolFromSmiles(i) for i in df.SMILES] if x]
>>> len(mol_list)
1351
创建一个描述符计算对象,将类药五规则设置到计算器中,这部分不明白的可以参考这篇文章
from rdkit.ML.Descriptors import MoleculeDescriptors
des_list = ['MolWt', 'NumHAcceptors', 'NumHDonors', 'MolLogP', 'NumRotatableBonds']
calculator = MoleculeDescriptors.MolecularDescriptorCalculator(des_list)
>>> feat_list = ['%s'%str(calculator.CalcDescriptors(mol)) for mol in mol_list]
>>> query = "insert into info (MW, HA, HD, LOGP, RB) values %s" % (','.join(feat_list))
>>> cur.execute(query)
>>> query = 'select count(*) from info'
>>> cur.execute(query)
>>> reply = cur.fetchall()
>>> reply
[(1351,)]
>>> connection.commit()
本文参考自rdkit、postgresql安装文档。
python代码及源文件在这里。