reading notes of《Artificial Intelligence in Drug Design》
- Hits with promising potency and ADMET properties are chosen as “lead compounds” —— these then need to be optimized for potency and selectivity while maintaining an appropriate ADMET profile. This process is often ineffective at finding molecules with the right pharmacodynamic and pharmacokinetic properties in patients.- Datasets problem:
What characterizes ligand-based VS is that it does not utilize any information about the receptor. Therefore, it can be applied to general problems.
QSAR and ML model is similar: both are supervised methods that identify pattern in molecular data to learn a target signal.
In addition to fitting complicated target functions, ML model can allows a greater breadth of molecular representations, some of them is very abstract but important such as molecule-induced transcriptomic signatures or cell painting imaging profiles.
In addition to using new biological data types, another advantage of some ML model is that they can learn their own representation from a dataset. VAE’s latent space is usually smaller in size than the input, so the model is forced to find a compressed representation of the input.
Some ML models accepted flexible training regimes which presents another opportunity for improvement of ligand-based. There are several methods based on the idea of shared knowledge, such as transfer learning, multitask learning, self-supervised learning.
Flexibility brings practical difficulties, too. The high degree of choice makes it difficult to decide what model is most promising for a specific problem. The part of challenge is that it is difficult to diagnose why a ML method is successful.
Early rule-based CASP programs used hand-coded transformation rules describing reaction centers and functional groups that may affect reactivity. Succeeding rule-based CASP programs instead automatically extract transformation rules from reaction databases.
Rule-based CASP programs have undergone relatively thorough prospective validation evidencing successful application.
Forward reaction prediction——ML models need a appropriate output representations that translate to product molecular structure. Coley et al. tackled these problems by firstly modeling this task as predicting the edits made to reaction center atom and secondly augmented chemically plausible negative data. Some other problems were tackled by Coley et al. by using a Weisfeiler-Lehman Network and this model performed on par with human chemists. ML algorithms was inspired by natural language processing that model translation problems circumvent the issue of requiring negative data or reaction templates. The approach used by Schwaller et al. outperforms human chemists on a benchmark set of 80 reactions.
Yield prediction——It is much more challenging to predicting how successful the reaction will be in the context of reagents and conditions. Schwaller et al. extend NLP type ML models to predict reaction yields and performance was achievable for clean, curated datasets of specific reaction types, but in worse datasets performance drop significantly.
Condition prediction——Some work (e.g., Marcou et al. and Coley et al.) emphasize the difficulty of predicting reaction conditions, regarding not only incomplete data but also the modeling of such flexible parameters, and even difficulty in evaluation of such models.
Retrosynthetic strategy——This problem is searching vast potential retrosynthetic possibilities to find suitable synthetic routes. Schwaller et al. proposed new metrics called round-trip accuracy and Segler et al. tackled this problem by using a Monte Carlo tree search (a RL approach) in combination with neural network functions.
The most challenging aspects of of de novo molecule generation, is that in many cases the endpoint is not explicitly known. Conceptually, this question is usually addressed by an iterative design process known as the design-make-test-analyze (DMTA) cycle.
As described by Schneider et al., earlier models had to address the three main objective: how to build a chemical structure, how to evaluate molecule quality and how to optimize chemical space efficiently. One example is the growing of structures by fragments (generation) conditional upon a receptor binding pocket, its steric constraints and hydrogen bonding sites (evaluate) and using a depth or breadth-first algorithm to explore possibilities (searching), as in Skeletons.
A common theme in scoring function pitfalls, that is also relevant outside their use in generative models, is the large disconnect between property endpoint and scoring function proxy.
Behavior of generative models when optimizing towards scoring function can further exacerbate scoring function limitations. At some point during training, the model is likely to evaluate molecules outside its domain of applicability resulting in aberrant predictions, hence evaluating model confidence is so important.
The ability to optimize all desired properties simultaneously in a multiparameter/multi-objective optimization (MPO/MOO) is one of difficulty and often neglected in publications.
Figure7 demonstrate the difficulties of generation model performance measurement.
Furthermore, molecules could likely be found using traditional drug design approaches, raising further concern as to the real world of generative models over traditional methods.
Model evaluation should focus more effort in the context of prospective application, so that evaluation is more interpretable when considering integration of de novo design with generative models into real-world projects.
It’s difficult enough to robustly measure individual properties such as molecular diversity. However, improvements are being made in establishing more interpretable metrics, such as Zhang et al. 's GBD-13.
It can also be unclear to what extent model evaluation is due to expert intervention, such as Zhavoronkov et al. 's model named GENTRL. This aspect is often overlooked.
Understanding how generative models compare to other non-AI methods is yet to be determined.
The premise of generative models is accelerated and higher quality drug design in silico. The former has some evidence, while the latter is still an open research goal with greater potential impact.
It is also worth noting that potentially improved efficiencies because of the use of generative models may result in less experimental screening, which could negatively impact data collection—data availability (for optimization) being one of the current limiting factors to current generative models.
The predictions of ML models are subject to multiple sources of heterogeneous errors.
Predictive uncertainty can be split theoretically into two components: aleatoric uncertainty and epistemic uncertainty.
In order to identify prediction with high epistemic uncertainty, model application domains can be defined.
In order to account for variations in uncertainty across the training data, the boundaries of the domain can be refined by considering the local performance of the model.
Estimated the total prediction uncertainty of a model is important. Given a model, there exist frequentist methods for uncertainty prediction which transform the model’s predictions to give predictive uncertainty, using a held-out validation set, such as conformal regression and Venn-ABERS approach.
As an alternative to these frequentist methods for uncertainty estimation, Bayesian models can be used.
Manual checking of prediction for concordance with external data and literature is needed. As a result, it is necessary for a user to understand how and why a model has made a particular prediction.
The predictions of an interpretable model need to be reducible to a small number of key parameters.
As an alternative to construction of interpretable models, external explanation methods which associate selected inputs and outputs of complex models to construct a simpler “metamodel” have been proposed.
One way of interpreting an ML model in drug design is to assess which features of an input example have most strongly affected a model’s decision; this is known as feature attribution. The ease with which this can be done depends on the model used.
Another way in which AI systems can justify their decisions is to link input examples to relevant training examples. Evaluating similarity over the representations learned by neural networks can be used to identify molecules that are processed similarly by the neural network; this is a promising way to identify potential analogues. To justify inference of the properties of an unknown chemical, structurally similar chemicals with known properties must be presented, and the relationship between structure and properties for these molecules must be described
The potential of AI in drug design is slowly approaching reality, but much work remains to be done
There are several applications of AI to drug design which are yet to achieve their full potential