RDKit包含了一些修改分子的函数,这些函数可以方便地对分子进行子结构删除/替换等操作。更复杂的操作可以看Chemical Reactions中相关的功能。
>>> from rdkit import Chem
>>> from rdkit.Chem import AllChem, Draw
>>> m = Chem.MolFromSmiles('OC(=O)C(N)Cc1ccccc1')
>>> m = Chem.MolFromSmiles('c1ccccc1CC(N)C(=O)O')
>>> patt = Chem.MolFromSmarts('Cc1ccccc1')
>>> matches = m.GetSubstructMatches(patt)
>>> matches
((6, 5, 4, 3, 2, 1, 0),)
>>> Draw.MolToImage(m, (250,250), highlightAtoms=matches[0])
>>> rm = Chem.DeleteSubstructs(m, patt)
>>> Draw.MolToImage(rm, (250,250))
现在把苯甲基变成羟甲基,也就是丝氨酸
>>> repl = Chem.MolFromSmiles('CO')
>>> rms = AllChem.ReplaceSubstructs(m, patt, repl)
>>> rms[0]
>>> repl = Chem.MolFromSmiles('CCCC')
>>> rms = AllChem.ReplaceSubstructs(m, patt, repl, replacementConnectionPoint=1)
>>> rms[0]
>>> m = Chem.MolFromSmiles('CC(=O)O')
>>> patt = Chem.MolFromSmarts('[$(OC=O)]')
>>> repl = Chem.MolFromSmiles('OCC')
>>> rms = Chem.ReplaceSubstructs(m, patt, repl)
>>> rms[0]
>>> Chem.SanitizeMol(rms[0])
保留氨基酸的母核:NH2-C-COOH
>>> m = Chem.MolFromSmiles('c1ccccc1CC(N)C(=O)O')
>>> core = Chem.MolFromSmiles('NCC(=O)O')
>>> m1 = Chem.ReplaceSidechains(m, core)
>>> m1
保留R基,去除母核
>>> r = AllChem.ReplaceCore(m, core)
>>> side_mols = Chem.GetMolFrags(r, asMols=True)
此外,rdkit中还提供了非常灵活的函数,用来对指定的键进行拆解来获得分子片段,下面以断裂掉所有环中原子和环外原子间的键为例,来做一个简要的介绍。
>>> m = Chem.MolFromSmiles('CC1CC(O)C1CCC1CC1')
>>> patt = Chem.MolFromSmarts('[!R][R]')
>>> bis = m.GetSubstructMatches(patt)
>>> print(bis)
((0, 1), (4, 3), (6, 5), (7, 8))
>>> m
>>> bs = [m.GetBondBetweenAtoms(x, y).GetIdx() for x, y in bis]
>>> bs
[0, 3, 5, 7]
>>> nm = Chem.FragmentOnBonds(m, bs, addDummies=False)
>>> nm
>>> mol_list = Chem.GetMolFrags(nm, asMols=True)
>>> Draw.MolsToGridImage(mol_list, molsPerRow=5, subImgSize=(200,100))
本文参考自rdkit官方文档。
代码及源文件在这里。