rdkit MurckoScaffold 化合物骨架提取;基于骨架的相似化合物聚类检索

参考:http://www.rdkit.org/docs/source/rdkit.Chem.Scaffolds.MurckoScaffold.html
https://cloud.tencent.com/developer/article/1782620

1、rdkit MurckoScaffold 化合物骨架提取


##原始分子
mol22=Chem.MolFromSmiles('Cc1cc(Oc2nccc(CCC)c2)ccc1')
mol22

rdkit MurckoScaffold 化合物骨架提取;基于骨架的相似化合物聚类检索_第1张图片

from rdkit.Chem.Scaffolds import MurckoScaffold
##提取骨架
MurckoScaffold.GetScaffoldForMol(mol22)
或
Chem.MolFromSmiles(MurckoScaffold.MurckoScaffoldSmiles('Cc1cc(Oc2nccc(CCC)c2)ccc1'))

rdkit MurckoScaffold 化合物骨架提取;基于骨架的相似化合物聚类检索_第2张图片

rdkit MurckoScaffold 化合物骨架提取;基于骨架的相似化合物聚类检索_第3张图片

2、基于骨架的相似化合物聚类检索

import numpy as np
from rdkit import Chem
from rdkit.Chem import Draw
from rdkit.Chem.Scaffolds import MurckoScaffold



##数据加载 1万个smi分子,文件下载可参考:https://github.com/zilliztech/MolSearch/edit/master/script/test_1w.smi

mols2 = []
with open(r"1w.smi","r") as f:
    ss = f.readlines()
    # print(ss)
    for i in ss:
        mols2.append(i.strip().split()[0])

## 数据查看
Draw.MolsToGridImage([ Chem.MolFromSmiles(i) for i in mols2[:9]], molsPerRow=3, subImgSize=(300,300))

rdkit MurckoScaffold 化合物骨架提取;基于骨架的相似化合物聚类检索_第4张图片

## 提取骨架后的骨架查看
smi_scaffolds = [  MurckoScaffold.MurckoScaffoldSmiles(mol, includeChirality=False) for mol in mols2]
mol_scaffolds = [Chem.MolFromSmiles(smi_scaffold) for smi_scaffold in smi_scaffolds]

Draw.MolsToGridImage(mol_scaffolds[:9], molsPerRow=3, subImgSize=(300,300))

rdkit MurckoScaffold 化合物骨架提取;基于骨架的相似化合物聚类检索_第5张图片

根据相似骨架放一块分类
## 基于Murcko骨架聚类

scaffolds = {}
clusters_list =[]
 
 
idx = 1
for mol in mols2:
    scaffold_smi =  MurckoScaffold.MurckoScaffoldSmiles(mol, includeChirality=False)
    if scaffold_smi not in scaffolds.keys():
        scaffolds[scaffold_smi] = idx
        idx+=1
        
    cluster_id = scaffolds[scaffold_smi]
    clusters_list.append(cluster_id)
print("Num of Murcko scaffolds in dataset:",len(scaffolds.keys()))  ## 共聚类了多少类
## 聚类11个簇,查看其化合物
new_dict = {v : k for k, v in scaffolds.items()}

Chem.MolFromSmiles(new_dict[11])  ## 11的骨架


clusters_list = np.array(clusters_list)
idx_c15 = np.where(clusters_list==11)[0]
mol_list_c15 = [ mols2[i] for i in idx_c15]
# print(mol_list_c15)
Draw.MolsToGridImage([ Chem.MolFromSmiles(i) for i in mol_list_c15], molsPerRow=3, subImgSize=(300,300))

rdkit MurckoScaffold 化合物骨架提取;基于骨架的相似化合物聚类检索_第6张图片

rdkit MurckoScaffold 化合物骨架提取;基于骨架的相似化合物聚类检索_第7张图片

## 聚类2个簇,查看其化合物
new_dict = {v : k for k, v in scaffolds.items()}

Chem.MolFromSmiles(new_dict[2])  ## 2的骨架


clusters_list = np.array(clusters_list)
idx_c15 = np.where(clusters_list==2)[0]
mol_list_c15 = [ mols2[i] for i in idx_c15]
# print(mol_list_c15)
Draw.MolsToGridImage([ Chem.MolFromSmiles(i) for i in mol_list_c15], molsPerRow=3, subImgSize=(300,300))

rdkit MurckoScaffold 化合物骨架提取;基于骨架的相似化合物聚类检索_第8张图片

rdkit MurckoScaffold 化合物骨架提取;基于骨架的相似化合物聚类检索_第9张图片

## 聚类3个簇,查看其化合物

new_dict = {v : k for k, v in scaffolds.items()}

Chem.MolFromSmiles(new_dict[3])  ## 3的骨架

clusters_list = np.array(clusters_list)
idx_c15 = np.where(clusters_list==3)[0]
mol_list_c15 = [ mols2[i] for i in idx_c15]
# print(mol_list_c15)
Draw.MolsToGridImage([ Chem.MolFromSmiles(i) for i in mol_list_c15], molsPerRow=3, subImgSize=(300,300))

rdkit MurckoScaffold 化合物骨架提取;基于骨架的相似化合物聚类检索_第10张图片

rdkit MurckoScaffold 化合物骨架提取;基于骨架的相似化合物聚类检索_第11张图片

你可能感兴趣的:(CADD/AIDD,聚类,rdkit)