Chapter23: Molecule Ideation Using Matched Molecular Pairs

reading notes of《Artificial Intelligence in Drug Design》


文章目录

  • 1.Introduction
  • 2.MMP Algorithms
  • 3.BioDig: The GSK Transform Database
  • 4.Large Scale Molecule Ideation Using MMPs
  • 5.Quantifying the Value of an MMP-Based Knowledge Base
  • 6.The Ever-Growing Tail of New Transforms
  • 7.The Subset of Useful MedChem Transforms
  • 8.Assessing MMPs as a Molecule Generation Tool
  • 9.First Test - Human Inclusion
  • 10.Scond Test - Human Imitation
  • 11.Third Test - Legacy Projects
  • 12.Conclusion

1.Introduction

  • Matched Molecular Pair (MMP) analysis is one of the many ways medicinal chemists can understand SAR data. The attraction of MMP analysis lies in its ability to intuitively relate structural changes to changes in a rele- vant property.

2.MMP Algorithms

  • There are several implementations of the MMP algorithm in the literature. One of the most used MMP generation algorithm that has been adapted by many institutions was originally published by Hussain and Rea.
  • The common core fragment is termed the context (typically >50% of the molecule by heavy atom count). Two molecules with the same context are termed an MMP. The variable part between the molecule pair is termed the transform and encodes a change from fragment X to fragment Y. The transform is typically represented as a SMIRKS reaction.
  • A similar procedure has been extended for MMPs with a chemical core change. In this case multiple cuts or fragmentation operations are applied to the molecules. Where the terminal groups are all the same, but the core is different, an MMP is defined with a core or scaffold change encoded. Figure 1 shows a pictorial demonstration of the MMP algorithm.Chapter23: Molecule Ideation Using Matched Molecular Pairs_第1张图片
  • Deriving MMP’s across a large set of molecules with associated physicochemical properties or assay readouts allows for generalization of the Transforms across the dataset. If two or more com- pound pairs share the same transform the data can be aggregated. For each transform, statistics are derived to express the change for a chosen endpoint as a mean change with associated standard deviation or related statistics.

3.BioDig: The GSK Transform Database

  • For a dataset of 300K compounds approximately 2.3 million MMPs can be extracted. This necessitates a solution for bulk storage and fast query reporting. These requirements along with the process of indexing transforms lend themselves to a relational database. This database is named BioDig at GSK.

4.Large Scale Molecule Ideation Using MMPs

  • MMPs have been historically used to interrogate the effect of a chemical transform on physicochemical properties such as LogD, clearance, and membrane permeability.
  • At GSK we have extended its applicability as a molecule library generation tool.
  • For example, the effect on solubility when a primary amide is replaced by a secondary amide is different for an aliphatic and an aromatic context (Refer Fig. 3).
    Chapter23: Molecule Ideation Using Matched Molecular Pairs_第2张图片
  • SMARTS patterns can be generalized with aliphatic and aromatic flags as opposed to full atom type information. This extends a single transform into 6 related forms as shown in Fig. 4.Chapter23: Molecule Ideation Using Matched Molecular Pairs_第3张图片

5.Quantifying the Value of an MMP-Based Knowledge Base

  • A key aspect in the application of an MMP-based knowledge base is quantifying its usefulness in a medicinal chemistry design scenario. Ideally, the database must be comprehensive enough to cover the full range of transforms that could be used. Each transform in the database must also be derived from enough data to make it statistically valid.
  • To help answer these questions, a comparison was made of transforms in the Eli Lilly ADME/Tox knowledge database as compared to those in a larger 2.1 million compound diversity set. A second comparison was made of transforms in the Eli Lilly ADME/Tox knowledge database against a subset of transforms seen in historical small molecule discovery projects.

6.The Ever-Growing Tail of New Transforms

  • A linear relationship was seen between the number of molecules in the dataset and the final number of derived matched pairs and transforms. This is seen in Table 1 and Fig. 5.
    Chapter23: Molecule Ideation Using Matched Molecular Pairs_第4张图片
    Chapter23: Molecule Ideation Using Matched Molecular Pairs_第5张图片

7.The Subset of Useful MedChem Transforms

  • The knowledge database was analyzed to assess how many of the Top 100, 500, 1000, 2500, 5K, 10K, 25K, 50K, and 100K MedChem project transforms were contained in the database. The results are given in Table 2.
    Chapter23: Molecule Ideation Using Matched Molecular Pairs_第6张图片

8.Assessing MMPs as a Molecule Generation Tool

  • Three tests were used to assess the performance of molecule generators used at GSK including an MMP-based molecule generator.
    • BioDig—a matched molecular pair-based algorithm described earlier in this chapter.

    • BRICS—a fragment replacement-based algorithm.

    • RG2Smi—a language processing machine learning algorithm that translates a reduced graph input to a SMILES output.

    • The first explored the ability of the algorithms to reproduce ideas generated by a team of medicinal chemists.

    • The second test explored whether the additional ~ 103 molecules generated by the algorithms were considered good ideas by the medicinal chemists.

    • Finally, the algorithms were assessed for their ability to generate molecules in legacy drug discovery programs from a single starting molecule in the series.

  • The tests were comparing three inhouse molecule generators (Fig. 6).
    Chapter23: Molecule Ideation Using Matched Molecular Pairs_第7张图片

9.First Test - Human Inclusion

Chapter23: Molecule Ideation Using Matched Molecular Pairs_第8张图片

10.Scond Test - Human Imitation

Chapter23: Molecule Ideation Using Matched Molecular Pairs_第9张图片

11.Third Test - Legacy Projects

Chapter23: Molecule Ideation Using Matched Molecular Pairs_第10张图片

12.Conclusion

  • MMP analysis has emerged as a key method in the medicinal chemistry toolbox and there are many examples of publicly available algorithms and applications. Many companies have worked to sum- marize MMPs into databases of transforms.

你可能感兴趣的:(读书笔记,人工智能,机器学习,算法)