_森罗万象

Chapter2 : Machine Learning Applied to the Modeling of Pharmacological and ADMET Endpoints

reading notes of《Artificial Intelligence in Drug Design》

文章目录

1.Introduction
2.Machine Learning Applied to ADMET Problems
- 2.1.The importance of a Favorable ADMET Profile
- 2.2.Data, Descriptors, Algorithms, Metrics
- 2.3.Data Are Key
- - 2.3.1.Experimental Assay Data
  - 2.3.2.Standardization of Chemical Structure
  - 2.3.3.Preprocessing of Assay Data for Machine Learning
  - 2.3.4.Examples for the Effort and Importance of Data Curation
- 2.4.Machine Learning Algorithms
- - 2.4.1.History of Supervised ML Algorithms in Drug Discovery
  - 2.4.2.Pros and Cons of Supervised ML Algorithms in Drug Discovery Industry
- 2.5.Descriptor
- - 2.5.1.Molecular Descriptors
  - 2.5.2.Atom Descriptors
- 2.6.Performance Metrics
- 2.7.Identification of Stable and Performant Models
- 2.8.Applicability Domain
- 2.9.Models for complex and Multiple Endpoints
- - 2.9.1.Modeling Physicochemical ADMET Endpoints with Multitask Graph Convolutional Networks
  - 2.9.2.Modeling of in Vivo Endpoints
  - 2.9.3.Modeling of Drug Metabolism
- 2.10.Application Examples
- - 2.10.1.Bayer's integrated ADMET Platform
  - 2.10.2.Guiding the Design of Combination Libraries
  - 2.10.3.Combing Cheminformatics and Physics-Based Methods in Lead Optimization
3.Summary and Outlook

1.Introduction

The pharmacological activity on the primary target, together with absorption, distribution, metabolism, excretion and toxicity (ADMET) are the main parameters for the discovery and optimization of new drugs.
Both the structure-based and the machine learning-based approaches can apply descriptions of the molecules and their interactions derived empirically or physics-based.
By far the most commonly applied QM methodology in industry is density functional theory (DFT), mainly due to its beneficial cost-accuracy ratio.
Substantial advances in automation, methodology, and computing power have enabled the increasing adoption of molecular dynamics and accurate free energy calculations in industry.

2.Machine Learning Applied to ADMET Problems

2.1.The importance of a Favorable ADMET Profile

Kola and Landis in 2004 showed that in the period between 1991 and 2000 the attrition rates due to PK and bioavailability went down significantly (from 42 to 10%), whereas in the same period attrition due to toxicology and clinical safety significantly increased, attributing to an increase in size and lipophilicity of the compounds.

2.2.Data, Descriptors, Algorithms, Metrics

We discuss three key ingredients to stable and predictive models, namely data, descriptors and algorithms, as well as the metrics applied to identify those.

2.3.Data Are Key

There are two types of data, namely the chemical structures and assay data that are learned.

2.3.1.Experimental Assay Data

Historically, assay definitions and upload procedures were often setup in a way to allow direct consumption of the data by the requesting research project, but not with further usage in mind.
In March 2016 a publication appeared by a consortium of scientists that outlined four foundational principles—Findability, Accessibility, Interoperability, and Reusability—abbreviated as FAIR principles, that describe procedures to a FAIRification process.
Close communication with experimentalist is of utmost importance for the data scientist in data preprocessing state.
An assay is composed of four components: a biological or physicochemical test system, a detection method, the technical infrastructure, and finally data analysis and processing.
- biological systems have a certain variability, like differences in cell activity
- detection method has a certain variability and in case of UV is highly dependent on the molecules
- stickiness of highly lipophilic compounds to glass or plastic parts like pipettes reduces the available concentration of the compound compared to the apparent concentration, resulting in erroneously too high values

2.3.2.Standardization of Chemical Structure

A study by Young et al. on six public and private databases gave error rates ranging from 0.1 to 3.4% and proved that including erroneous results in significant drops model accuracies.
The utmost goal here is to come up with a standardization protocol, which is applied not only during model training but also for model application.

It might be necessary to cleave off the leaving groups of prodrugs in case the experimental property is determined for the pharmacologically active substance.
Inconsistent hydrogen treatment may result in differing descriptor values.
Most of the descriptor packages cannot cope with stereochemistry anyhow, and therefore stereocenters are flattened.
One may, in case of modeling of target affinity profiles, additionally apply structure filters on frequent hitters like PAINS or “Hit Dexter” to avoid noise due to unspecific binding data.
the European Union-funded consortium IMI MELLODDY as part of the innovative medicine initiative (IMI) has developed and published an end-to-end open source tool for the process described under the name MELLODDY_tuner . The tool is used to standardize the data needed for the project to succeed in the endeavor of federated and privacy-preserving machine learning to leverage the world’s largest collection of small molecules with known biochemical or cellular activity to enable more accurate predictive models and increase efficiencies in drug discovery.

2.3.3.Preprocessing of Assay Data for Machine Learning

Combination of data from different sources poses further challenges, an alternative is to establish a multitask ML model that predicts the values of one assay and uses the other variant as a helper task.
There are three categories of data that require curation:
- data with attached comments, some comments such as “not fully dissolved or calibration issue” allowed to filter out experiments that not trustworthy.
- censored data is data which is outside the detection window or the serial dilution window. For classifier models, such data can be used, but for numerical models they have to removed. Intermediate censored data should always be removed.
- structures with multiple test values including outliers, there are different approach to solve this problem such as removal or replace it with the median value (rather than mean value).

2.3.4.Examples for the Effort and Importance of Data Curation

There are two examples on the high effort needed for data curation: Bayer’s pK_a model and author’s SoM model.

2.4.Machine Learning Algorithms

2.4.1.History of Supervised ML Algorithms in Drug Discovery

The basic principles and limitation of ML algorithms used in today’s drug discovery have been described by Mitchell and by Lo et al. .Furthermore, a very systematic study of ML algorithms applied in chemical health and safety has recently been published.

2.4.2.Pros and Cons of Supervised ML Algorithms in Drug Discovery Industry

Random Forests have long been the method of choice in Bayer’s ADMET platform for several reasons:
- in combination with circular fingerprints, the RF model performance was generally highest and very robust
- out-of-the-box hyper-parameters that determine the configuration of the RF algorithm are usually optimal and do not need to be search and optimized as is i.e. the case for SVMs
- ensemble models such as random forests bring along can be used as a measure of confidence of the individual prediction
The advantages of deep neural networks in Drug Discovery appear to be that
- there is no need for feature engineering e.g. graph-convolutional networks
- neural network input can be utilized rather flexible such as different format data or confidential data

2.5.Descriptor

Any representation of a molecule that is used for computational chemistry will always be some abstraction with some loss of information.

2.5.1.Molecular Descriptors

Here are classification scheme from Wikipedia, which defines five main classes:
- 0D-descriptors (i.e. constitutional descriptors, count descriptors). They are highly correlated which further reduces the information content as shown in Table 1 that provides the Pearson correlation coefficients for the Lipinski rule-of- five and the Veber properties for randomly picked 1% of the Bayer compound deck.

1D-descriptors (i.e. list of structural fragments, fingerprints). Our work-horse descriptors for more than a decade, confirmed by many publications, are circular extended connectivity fingerprints (ECFP), which encode properties of atoms and their neighbors into a bit vector of certain topological (numbers of bonds to starting atom) radius and feature type (element, function as donor, acceptor, etc., atom type).
2D-descriptors (i.e. graph invariants). Graph invariant 2D-descriptors like topology or connectivity indices at least in our hands often yield overfitted models that work well in cross-validation but are not predictive on external test sets.
3D-descriptors (such as, for 3D-MoRSE descriptors, WHIM descriptors, GETAWAY descriptors, quantum-chemical descriptors, size, steric, surface and volume descriptors). Main issue is their dependence on the conformation which introduces ambiguities and noise.
4D-descriptors (such as those derived from GRID or CoMFA methods, Volsurf). These approaches have the additional limitation of being dependent on the alignment of the ligands which is sometimes not obvious and only possible in the case of congeneric series.
There are now public databases and model repositories on compound collections such as QsarDB, Danish (Q)SAR Models database, QSAR toolbox etc.
Actually, the perception of learning the optimal representation directly from the molecule is not exactly correct, because the SMILES or InCHi typically used as structure input is already an abstract representation (i.e. a reduction) of the molecule.
Winter et al. applied the autoencoder-autodecoder concept to learn a fixed set of continuous data-driven descriptors, the CDDD, by transforming random SMILES to canonical SMILES during training. The resulting descriptor is based on approximately 72 million compounds from the ZINC and PubChem databases. The validity of the approach was tested by model performance on eight QSAR datasets and by application to virtual screening. It showed similar performance to various human-engineered descriptors and graph-convolutional models.

2.5.2.Atom Descriptors

Machine learning problems that are concerned with the reactivity of atoms like reaction rates and regioselectivity, pKa values, the prediction of the metabolic fate, or hydrogen bonding interactions require encoding of the properties of the atoms and their surrounding into specialized atom descriptors. In many applications the descriptor values are directly retrieved from quantum chemical calculations.
There are also examples of well-performing classical neighborhood encoding atom descriptors for SoM prediction and regioselectivity in Diels-Alder reactions.

2.6.Performance Metrics

Common metrices for regression models include R², root mean square error (RMSE), and Spearman’s rho
- R² is the coefficient of determination and gives information on how close the data fits the regression line. It’s might be necessary to calculate the R² for just the relevant value range of the predicted property and not for the full range.
- The RMSE is the standard deviation of residuals and indicates how close the predicted values are to the real data points and is a reliable, general error metric.
- Spearman’s rho is a nonparametric rank correlation coefficient. For a perfect ordering of the predictions from low to high in accordance with experimental values, rho will be 1. It will reduce for each misordered pair of objects and for perfect inverse ranking will be -1. A high value for rho indicates that the model applied in a project will be able to answer the question.
Common metrices for assessing the quality of classification models are derived from the Confusion Matrix, also named Contingency Matrix.
In case of highly imbalanced datasets, the accuracy can be misleading e.g. if the model always predicts the higher populated class, it will get a high accuracy without being predictive.
Specificity or true negative rate, is the proportion of observed negatives that are predicted as such while sensitivity, also called true positive rate or recall, is the proportion of observed positives that are predicted correctly. Another metric focusing more on the predictions than the observed values is the positive predictive value, also called precision, which shows the proportion of correctly predicted positives out of all predicted positives. F-Score which is the harmonic mean of precision and sensitivity.
The Matthews correlation coefficient (MCC) is the geometric mean of the regression coefficient and is also suitable for classification problems with imbalanced class distributions.
Cohen’s kappa is also a good measure that can handle imbalanced class distributions and shows how much better the classifier is compared to a classifier that would guess randomly according to the frequency of each class.
Another popular metric is the receiver operation characteristic (ROC) graph to visualize the performance of the classification algorithm. The area under the ROC curve (ROC AUC) is the numerical metric used to describe the ROC curve.

2.7.Identification of Stable and Performant Models

The process accepted as best practices for machine learning which developed over the last 20 years and that is now generally applied is described in current reviews and outlined in detail in the respective OECD guideline.
Validation strategies broadly applied are cross-validation, bootstrapping and Y-scrambling.

2.8.Applicability Domain

The region of Chemical space where the model’s predictions are reliable is limited in comparison to the immense drug-like space and the prediction accuracy for completely novel molecules may in practice be disappointing.

Many different so-called applicability domain (AD) measures have been introduced in recent years, which can be grouped into two classes:
- methods that apply distance measures on how well the future object is embedded in the training set, are termed “novelty detection”, it can be applied for any algorithms.
- methods that quantify the distance to the decision boundary of the classifier are called “confidence estimation”,it can’t be applied for any algorithms, but it is superior in general.

2.9.Models for complex and Multiple Endpoints

2.9.1.Modeling Physicochemical ADMET Endpoints with Multitask Graph Convolutional Networks

By sharing of parameters in (some of) their hidden layers between all tasks multitask neural networks force the learning of a joint representation of the input that will be useful to all tasks.
The main advantages of multitask learning are
- regularization, in that the model has to use the same amount of parameters to learn more tasks
- transfer learning, whereby learning-related tasks help extracting features that are useful in a more general way
- dataset augmentation, by combining smaller tasks with larger tasks to avoid overfitting on the small task
Multitask effect are highly dataset-dependent which suggests the use of dataset-specific models to maximize overall performance.
The work of author is that the multitask graph convolutional network performed on par or better than the single-task graph convolutional network and outperformed single-task random forests or neural networks with circular fingerprint descriptors, especially in the case of solubility, where the improvement was break-through.

2.9.2.Modeling of in Vivo Endpoints

Oral bioavailability F is defined as the extent of the oral dose that is available to produce pharmacological actions. It is defined as the ratio of the dose-normalized exposures after oral (po) versus intravenous (iv) administration. Exposure is determined as the area under the curve from multiple blood plasma samples taken over a period of typically 24h.
For single compounds, physiological based pharmacokinetic modeling (PBPK), a methodology that describes the body of the species as interacting compartment allows to describe the time-dependent exposure in different organs depending on additional parameters like body weight, gender, disease state at a quality that can for instance be used to in silico determine pediatric doses or risks.

2.9.3.Modeling of Drug Metabolism

Any oral drug first passes the liver before entering the rest of the body. Metabolic transformations occur in two phases. In phase I, mostly cytochrome P450 enzymes increase polarity by oxidative and reductive reactions. In phase II, a plethora of enzymes like UDP-glucuro-nosyltransferase, sulfo-transferases, or glutathione S-transferases conjugate specific fragments to the phase I metabolites for renal excretion.
The high effort and limitation in experimental assay capacity for the identification of SoMs has led to many computational approaches over the last 20 years, applying docking, molecular dynamics, quantum chemistry calculations, and machine learning, with and without incorporation of protein target information. The reader is referred to Kirchmair et al. for a comprehensive overview over experimental and computational approaches.
The lability of atoms with regards to metabolic reactions is determined by their chemical reactivity, i.e. the local electron density and the steric accessibility of the respective atoms, necessitating atomic descriptors instead of molecular ones, as well as machine learning for atoms instead of molecules.

2.10.Application Examples

2.10.1.Bayer’s integrated ADMET Platform

The two last but most important steps to take in end are
- to make the models accessible to the users in an easy-to-use platform
- to constantly communicate to and train the users
The portfolio of models and their quality increased constantly over the years, as indicated in Fig. 6. With this, the effort for manual model retraining became more and more prohibitive, and in parallel we and others found that regular m
odel retraining has a positive effect on the performance in active projects, even when only adding 20–50 compounds per interval.

2.10.2.Guiding the Design of Combination Libraries

In parallel to computational approaches to generate new chemical matter as starting points for drug discovery projects like virtual screening or de novo design, high-throughput screening is still a valuable tool to identify hits. Experimental testing nevertheless erodes the chemical library twofold, by substance consumption and by novelty erosion, since any hit will then indirectly expose a certain subset of the compound space.

2.10.3.Combing Cheminformatics and Physics-Based Methods in Lead Optimization

The complex multi-parameter optimization necessary to find the best compromise of many optimization parameters is a key challenge in drug discovery projects.
In this section, we sketch a project situation from 2016, where we exemplarily show how to tackle the prioritization of compounds from a large virtual chemical space by combining cheminformatics and physics-based approaches.

3.Summary and Outlook

The willingness to share more of those data in conjunction with the use of block-chain technologies enables the privacy preserving exchange of data among many pharmaceutical companies, increasing the data basis for models by several orders of magnitude.
Some of those machine learning models have reached the quality to significantly reduce or halt experimental measurements. However, this is not valid for all endpoints and despite the availability of large homogenous datasets some ADMET endpoints can still not be modeled with sufficient quality. Phys-chem/ADMET properties as well as chemical synthesizability are mainly modeled with data-based approaches such as machine learning.
For pharmacological endpoints, the data are sparse and only a smaller fraction (typically <30%) of a larger diverse drug target portfolio will be covered by ML models with sufficient predictivity. Pharmacological activity is often addressed with protein-structure based approaches.

guava loadingCache代码示例 IM 胡鹏飞 Java 工具类介绍
publicclassTest2{publicstaticvoidmain(String[]args)throwsException{LoadingCachecache=CacheBuilder.newBuilder()//设置并发级别为8，并发级别是指可以同时写缓存的线程数.concurrencyLevel(8)//设置缓存容器的初始容量为10.initialCapacity(10)//设置缓存
C++ 11 Lambda表达式和min_element()与max_element()的使用_c++ lamda函数 min_element(
网上学习资料一大堆，但如果学到的知识不成体系，遇到问题时只是浅尝辄止，不再深入研究，那么很难做到真正的技术提升。需要这份系统化的资料的朋友，可以添加戳这里获取一个人可以走的很快，但一群人才能走的更远！不论你是正从事IT行业的老鸟或是对IT行业感兴趣的新人，都欢迎加入我们的的圈子（技术交流、学习资源、职场吐槽、大厂内推、面试辅导），让我们一起学习成长！intmain(){vectormyvec{3,
【LeetCode 热题 100】24. 两两交换链表中的节点——（解法一）迭代+哨兵 xumistore LeetCode leetcode 链表算法 java
Problem:24.两两交换链表中的节点题目：给你一个链表，两两交换其中相邻的节点，并返回交换后链表的头节点。你必须在不修改节点内部的值的情况下完成本题（即，只能进行节点交换）。文章目录整体思路完整代码时空复杂度时间复杂度：O(N)空间复杂度：O(1)整体思路这段代码旨在解决一个经典的链表操作问题：两两交换链表中的节点(SwapNodesinPairs)。问题要求将链表中每两个相邻的节点进行交换
基于定制开发开源AI智能名片S2B2C商城小程序的社群游戏定制策略研究说私域人工智能小程序游戏
摘要：本文聚焦社群游戏定制领域，深入探讨以社群文化和用户偏好为导向的定制策略。通过分析互动游戏活动、社群文化塑造等关键要素，结合定制开发开源AI智能名片S2B2C商城小程序的技术特性，提出针对性游戏定制方案。研究旨在提升社群用户参与度与游戏体验，为社群游戏发展提供理论支持与实践指导。关键词：社群游戏定制；定制开发开源AI智能名片S2B2C商城小程序；社群文化；用户偏好一、引言在数字化社交蓬勃发展的
Android ViewBinding 使用与封装教程积跬步DEV Android 开发实战大全 android
AndroidViewBinding使用与封装教程：一、ViewBinding是什么？核心功能：为每个XML布局文件自动生成一个绑定类（如ActivityMainBinding），直接暴露所有带ID的视图引用。优点：避免繁琐的findViewById()，类型安全且编译时检查。对比DataBinding：ViewBinding仅处理视图引用，无数据绑定功能。DataBinding支持双向数据绑定，
Java大厂面试实录：谢飞机的电商场景技术问答（Spring Cloud、MyBatis、Redis、Kafka、AI等）
Java大厂面试实录：谢飞机的电商场景技术问答（SpringCloud、MyBatis、Redis、Kafka、AI等）本文模拟知名互联网大厂Java后端岗位面试流程，以电商业务为主线，由严肃面试官与“水货”程序员谢飞机展开有趣的对话，涵盖SpringCloud、MyBatis、Redis、Kafka、SpringSecurity、AI等热门技术栈，并附详细解析，助力求职者备战大厂面试。故事设定谢
【超硬核】JVM源码解读：Java方法main在虚拟机上解释执行 HeapDump性能社区 java 开发语言后端 jvm
本文由HeapDump性能社区首席讲师鸠摩（马智）授权整理发布第1篇-关于Java虚拟机HotSpot，开篇说的简单点开讲Java运行时，这一篇讲一些简单的内容。我们写的主类中的main()方法是如何被Java虚拟机调用到的？在Java类中的一些方法会被由C/C++编写的HotSpot虚拟机的C/C++函数调用，不过由于Java方法与C/C++函数的调用约定不同，所以并不能直接调用，需要JavaC
Python之七彩花朵代码实现 PlutoZuo Python python 开发语言
Python之七彩花朵代码实现文章目录Python之七彩花朵代码实现下面是一个简单的使用Python的七彩花朵。这个示例只是一个简单的版本，没有很多高级功能，但它可以作为一个起点，你可以在此基础上添加更多功能。importturtleastuimportrandomasraimportmathtu.setup(1.0,1.0)t=tu.Pen()t.ht()colors=['red','skybl
Python 脚本最佳实践2025版
前文可以直接把这篇文章喂给AI,可以放到AI角色设定里,也可以直接作为提示词.这样,你只管提需求,写脚本就让AI来.概述追求简洁和清晰：脚本应简单明了。使用函数(functions)、常量(constants)和适当的导入(import)实践来有逻辑地组织你的Python脚本。使用枚举(enumerations)和数据类(dataclasses)等数据结构高效管理脚本状态。通过命令行参数增强交互性
（Python基础篇）字典的操作 EternityArt 基础篇 python 开发语言
一、引言在Python编程中，字典（Dictionary）是一种极具灵活性的数据结构，它通过“键-值对”（key-valuepair）的形式存储数据，如同现实生活中的字典——通过“词语（键）”快速查找“释义（值）”。相较于列表和元组的有序索引访问，字典的优势在于基于键的快速查找，这使得它在处理需要频繁通过唯一标识获取数据的场景中极为高效。掌握字典的操作，能让我们更高效地组织和管理复杂数据，是Pyt
基于开源AI智能名片链动2+1模式与S2B2C商城小程序的渠道选择策略研究说私域人工智能小程序
摘要：在数字化商业环境下，品牌与产品的渠道选择对其市场推广和运营成功至关重要。本文聚焦于如何依据自身品牌和产品特性，结合开源AI智能名片链动2+1模式与S2B2C商城小程序，运用科学的渠道选择方法，慎重挑选1-2个适宜平台，集中资源发力并取得成绩后再拓展其他渠道。通过理论分析与案例研究，探讨该策略的有效性和可行性，为企业渠道布局提供参考。关键词：渠道选择；开源AI智能名片；链动2+1模式；S2B2
深入解析 TCP 连接状态与进程挂起、恢复与关闭誰能久伴不乏 tcp/ip 网络服务器
文章目录深入解析TCP连接状态与进程挂起、恢复与关闭一、TCP连接的各种状态1.**`LISTEN`**（监听）2.**`SYN_SENT`**（SYN已发送）3.**`SYN_RECEIVED`**（SYN已接收）4.**`ESTABLISHED`**（已建立）5.**`FIN_WAIT_1`**（关闭等待1）6.**`FIN_WAIT_2`**（关闭等待2）7.**`CLOSE_WAIT`**
Java大厂面试故事：谢飞机的互联网音视频场景技术面试全纪录（Spring Boot、MyBatis、Kafka、Redis、AI等）来旺 Java场景面试宝典 Java Spring Boot MyBatis Kafka Redis 微服务 AI
Java大厂面试故事：谢飞机的互联网音视频场景技术面试全纪录（SpringBoot、MyBatis、Kafka、Redis、AI等）互联网大厂技术面试不仅考察技术深度，更注重业务场景与系统设计能力。本篇以严肃面试官与“水货”程序员谢飞机的对话，带你体验音视频业务场景下的Java面试全过程，涵盖主流技术栈，并附详细答案解析，助你面试无忧。故事场景设定谢飞机是一名有趣但技术基础略显薄弱的程序员，这次应
配音助手：自媒体神器，内置海量音色的语音，支持多主播配音阿幸软件杂货间媒体
软件介绍内置文字转语音，提供多个主播音色，男声、女声、小孩、方言。支持的场景也是比较多，比如：广告促销、有声读物、广播配音、影视配音、Ai配音等。这个软件是免费的，只不过需要通过手机号码登录就可以使用全部功能了。软件下载夸克下载
Anaconda 详细下载与安装教程
Anaconda详细下载与安装教程1.简介Anaconda是一个用于科学计算的开源发行版，包含了Python和R的众多常用库。它还包括了conda包管理器，可以方便地安装、更新和管理各种软件包。2.下载Anaconda2.1访问官方网站首先，打开浏览器，访问Anaconda官方网站。2.2选择适合的版本在页面中，你会看到两个主要的下载选项：AnacondaIndividualEdition：适用于
MySQL Explain 详解：从入门到精通，让你的 SQL 飞起来
引言：为什么Explain是SQL优化的“照妖镜”？在Java开发中，我们常常会遇到数据库性能瓶颈的问题。一条看似简单的SQL语句，在数据量增长到一定规模后，可能会从毫秒级响应变成秒级甚至分钟级响应，直接拖慢整个应用的性能。此时，你是否曾困惑于：为什么这条SQL突然变慢了？索引明明建了，为什么没生效？到底是哪里出了问题？答案就藏在MySQL的EXPLAIN命令里。EXPLAIN就像一面“照妖镜”，
kube-scheduler 抢占机制分享放大价值 kubernetes源码分析 kubernetes kube-scheduler 抢占
当pod调度失败后，会在PostFilter扩展点执行抢占流程，下面分析相关的代码实现抢占接口//PodNominatorabstractsoperationstomaintainnominatedPods.typePodNominatorinterface{//将pod加入抢占成功的node中AddNominatedPod(pod*PodInfo,nodeNamestring)//将pod从no
Java特性之设计模式【责任链模式】 Naijia_OvO Java特性 java 设计模式责任链模式
一、责任链模式概述顾名思义，责任链模式（ChainofResponsibilityPattern）为请求创建了一个接收者对象的链。这种模式给予请求的类型，对请求的发送者和接收者进行解耦。这种类型的设计模式属于行为型模式在这种模式中，通常每个接收者都包含对另一个接收者的引用。如果一个对象不能处理该请求，那么它会把相同的请求传给下一个接收者，依此类推主要解决：职责链上的处理者负责处理请求，客户只需要将
ThinkSound V2版 - 一键给无声视频配音，为AI视频生成匹配音效支持50系显卡一键整合包下载昨日之日2006 ai语音音视频人工智能
ThinkSound是阿里通义实验室开源的首个音频生成模型，它能够让AI像专业“音效师”一样，根据视频内容生成高度逼真、与视觉内容完美契合的音频。ThinkSound可直接应用于影视后期制作，为AI生成的视频自动匹配精准的环境噪音与爆炸声效；服务于游戏开发领域，实时生成雨势变化等动态场景的自适应音效；同时可以无障碍视频生产，为视障用户同步生成画面描述与环境音效。今天分享的ThinkSoundV2版
基于Python的健身数据分析工具的搭建流程day1 weixin_45677320 python 开发语言数据挖掘爬虫
基于Python的健身数据分析工具的搭建流程分数据挖掘、数据存储和数据分析三个步骤。本文主要介绍利用Python实现健身数据分析工具的数据挖掘部分。第一步：加载库加载本文需要的库，如下代码所示。若库未安装，请按照python如何安装各种库（保姆级教程）_python安装库-CSDN博客https://blog.csdn.net/aobulaien001/article/details/133298
“Datawhale AI夏令营”基于带货视频评论的用户洞察挑战赛 fzyz123 Datawhale AI夏令营人工智能 Datawhale 大模型技术 NLP 深度学习 AI夏令营
前言：本次是DatawhaleAI夏令营2025年第一期的内容，赛事是：基于带货视频评论的用户洞察挑战赛（科大讯飞AI大赛）一、赛事背景在直播电商爆发式增长浪潮中，短视频平台积累的海量带货视频及用户评论数据蕴含巨大商业价值。这些数据不仅是消费者体验的直接反馈，更是驱动品牌决策的关键资产。用户洞察的核心在于视频内容与评论数据的联合挖掘：通过智能识别推广商品分析评论中的情感表达与观点聚合精准捕捉消费者
无面试无offer? 你需要AI 求职co-pilot的帮助!
大家好啊，我写的开源免费求职AIco-pilot工具发布了v3.0.0，欢迎大家参与、使用!https://github.com/weicanie/prisma-ai一、项目介绍开源免费的求职co-pilot，自动化简历准备至offer到手的整个流程。优化您的项目、定制您的简历、为您匹配工作，并帮助您做好面试准备。二、核心价值prisma-ai旨在解决求职者在准备简历和寻找工作时最头疼的3个问题:
[特殊字符] 实时数据洪流突围战：Flink+Paimon实现毫秒级分析的架构革命（附压测报告）——日均百亿级数据处理成本降低60%的工业级方案 Lucas55555555 flink 大数据
引言：流批一体的时代拐点据阿里云2025白皮书显示，实时数据处理需求年增速达240%，但传统Lambda架构资源消耗占比超运维成本的70%。某电商平台借助Flink+Paimon重构实时数仓后，端到端延迟从分钟级压缩至800ms，计算资源节省5.6万核/月。技术红利窗口期：2025年ApachePaimon1.0正式发布，支持秒级快照与湖仓一体，成为替代Iceberg的新范式一、痛点深挖：实时数仓
AIGC工具与软件开发流程的深度集成方案 Irene-HQ 软件开发测试 AIGC 测试工具 github AIGC 程序人生面试
一、代码开发环节集成路径‌环境配置标准化‌安装AIGC工具包并配置环境变量（如设置AIGC_TOOL_PATH），确保团队开发环境一致‌。在IDE插件市场安装Copilot等工具，实现编码时实时建议调用‌。‌人机协作新模式‌‌需求解析‌：上传PRD文档，AI自动提取业务规则生成类结构（如支付模块的PaymentService雏形）‌。‌代码补全‌：输入注释//JWT验证中间件，生成OAuth2.0
AI音乐模拟器：AIGC时代的智能音乐创作革命 lauo 人工智能 AIGC 开源前端机器人
AI音乐模拟器：AIGC时代的智能音乐创作革命引言：AIGC浪潮下的音乐创作新范式在数字化转型的浪潮中，人工智能生成内容（AIGC）正在重塑各个创意领域。音乐产业作为创意经济的重要组成部分，正经历着前所未有的变革。据最新市场研究数据显示，全球AI音乐市场规模预计将从2023年的5.8亿美元增长到2030年的26.8亿美元，年复合增长率高达24.3%。这一快速增长的市场背后，是AI音乐技术正在打破传
【AI大模型学习路线】第三阶段之RAG与LangChain——第十六章（Advanced RAG[1]）基于历史对话重新生成Query？ 985小水博一枚呀 AI大模型学习路线人工智能学习 langchain RAG
【AI大模型学习路线】第三阶段之RAG与LangChain——第十六章（AdvancedRAG[1]）基于历史对话重新生成Query？【AI大模型学习路线】第三阶段之RAG与LangChain——第十六章（AdvancedRAG[1]）基于历史对话重新生成Query？文章目录【AI大模型学习路线】第三阶段之RAG与LangChain——第十六章（AdvancedRAG[1]）基于历史对话重新生成Q
【AI大模型学习路线】第三阶段之RAG与LangChain——第十六章（Advanced RAG[1]）其他Query优化相关策略？ 985小水博一枚呀 AI大模型学习路线人工智能学习 langchain
【AI大模型学习路线】第三阶段之RAG与LangChain——第十六章（AdvancedRAG[1]）其他Query优化相关策略？【AI大模型学习路线】第三阶段之RAG与LangChain——第十六章（AdvancedRAG[1]）其他Query优化相关策略？文章目录【AI大模型学习路线】第三阶段之RAG与LangChain——第十六章（AdvancedRAG[1]）其他Query优化相关策略？一
Topview Avatar 2深度实测：AI数字人带货的新高度，还是又一个营销噱头？神码小Z AI工具人工智能
在AI数字人赛道越来越卷的今天，各家产品都在宣传自己的"独门秘技"。最近，TopviewAI推出的Avatar2引起了我的注意——号称突破了产品尺寸限制，实现了"万物皆可带"。作为一个经常需要制作营销视频的内容创作者，我决定亲自上手测试一番，看看这款工具是否真的像宣传的那样强大。TopviewAvatar2是什么？革命性升级还是渐进式改良？TopviewAvatar2是TopviewAI推出的第二
LangChain中的向量数据库接口－Weaviate 洪城叮当 langchain 数据库经验分享笔记交互人工智能知识图谱
文章目录前言一、原型定义二、代码解析1、add_texts方法1.1、应用样例2、from_texts方法2.1、应用样例3、similarity_search方法3.1、应用样例三、项目应用1、安装依赖2、引入依赖3、创建对象4、添加数据5、查询数据总结前言 Weaviate是一个开源的向量数据库，支持存储来自各类机器学习模型的数据对象和向量嵌入，并能无缝扩展至数十亿数据对象。它提供存储文档嵌
AI 图像编辑提示词参考之：背景替换
在AI图像编辑中（以FluxKontext为例），“替换背景”（BackgroundReplacement）是提升图像表现力的关键手段之一。但背景更换不仅仅是简单的视觉置换，更重要的是：确保人物主体外观不变，并与新背景在色温、色调、光影等方面自然融合。只有这样，最终图像才会呈现出“原本拍摄于该背景环境”的真实感。建议使用以下结构组织提示词：Replacethebackgroundwith[新背景]
tomcat基础与部署发布暗黑小菠萝 Tomcat java web
从51cto搬家了，以后会更新在这里方便自己查看。做项目一直用tomcat，都是配置到eclipse中使用，这几天有时间整理一下使用心得，有一些自己配置遇到的细节问题。 Tomcat：一个Servlets和JSP页面的容器，以提供网站服务。一、Tomcat安装安装方式：①运行.exe安装包 &n
网站架构发展的过程 ayaoxinchao 数据库应用服务器网站架构
1.初始阶段网站架构：应用程序、数据库、文件等资源在同一个服务器上 2.应用服务和数据服务分离：应用服务器、数据库服务器、文件服务器 3.使用缓存改善网站性能：为应用服务器提供本地缓存，但受限于应用服务器的内存容量，可以使用专门的缓存服务器，提供分布式缓存服务器架构 4.使用应用服务器集群改善网站的并发处理能力：使用负载均衡调度服务器，将来自客户端浏览器的访问请求分发到应用服务器集群中的任何
[信息与安全]数据库的备份问题 comsci 数据库
如果你们建设的信息系统是采用中心-分支的模式,那么这里有一个问题如果你的数据来自中心数据库,那么中心数据库如果出现故障,你的分支机构的数据如何保证安全呢? 是否应该在这种信息系统结构的基础上进行改造,容许分支机构的信息系统也备份一个中心数据库的文件呢? &n
使用maven tomcat plugin插件debug关联源代码商人shang maven debug 查看源码 tomcat-plugin
*首先需要配置好'''maven-tomcat7-plugin'''，参见[[Maven开发Web项目]]的'''Tomcat'''部分。 *配置好后，在[[Eclipse]]中打开'''Debug Configurations'''界面，在'''Maven Build'''项下新建当前工程的调试。在'''Main'''选项卡中点击'''Browse Workspace...'''选择需要开发的
大访问量高并发 oloz 大访问量高并发
大访问量高并发的网站主要压力还是在于数据库的操作上，尽量避免频繁的请求数据库。下面简要列出几点解决方案： 01、优化你的代码和查询语句，合理使用索引 02、使用缓存技术例如memcache、ecache将不经常变化的数据放入缓存之中 03、采用服务器集群、负载均衡分担大访问量高并发压力 04、数据读写分离 05、合理选用框架，合理架构(推荐分布式架构)。
cache 服务器小猪猪08 cache
Cache 即高速缓存.那么cache是怎么样提高系统性能与运行速度呢？是不是在任何情况下用cache都能提高性能？是不是cache用的越多就越好呢？我在近期开发的项目中有所体会，写下来当作总结也希望能跟大家一起探讨探讨，有错误的地方希望大家批评指正。　　1.Cache 是怎么样工作的? 　　Cache 是分配在服务器上
mysql存储过程香水浓 mysql
Description:插入大量测试数据 use xmpl; drop procedure if exists mockup_test_data_sp; create procedure mockup_test_data_sp( in number_of_records int ) begin declare cnt int; declare name varch
CSS的class、id、css文件名的常用命名规则 agevs JavaScript UI 框架 Ajax css
CSS的class、id、css文件名的常用命名规则 (一)常用的CSS命名规则　　头：header 　　内容：content/container 　　尾：footer 　　导航：nav 　　侧栏：sidebar 　　栏目：column 　　页面外围控制整体布局宽度：wrapper 　　左右中：left right
全局数据源 AILIKES java tomcat mysql jdbc JNDI
实验目的：为了研究两个项目同时访问一个全局数据源的时候是创建了一个数据源对象，还是创建了两个数据源对象。 1：将diuid和mysql驱动包（druid-1.0.2.jar和mysql-connector-java-5.1.15.jar）copy至%TOMCAT_HOME%/lib下；2：配置数据源，将JNDI在%TOMCAT_HOME%/conf/context.xml中配置好,格式如下：&l
MYSQL的随机查询的实现方法 baalwolf mysql
MYSQL的随机抽取实现方法。举个例子，要从tablename表中随机提取一条记录，大家一般的写法就是：SELECT * FROM tablename ORDER BY RAND() LIMIT 1。但是，后来我查了一下MYSQL的官方手册，里面针对RAND()的提示大概意思就是，在ORDER BY从句里面不能使用RAND()函数，因为这样会导致数据列被多次扫描。但是在MYSQL 3.23版本中，
JAVA的getBytes()方法 bijian1013 java eclipse unix OS
在Java中，String的getBytes()方法是得到一个操作系统默认的编码格式的字节数组。这个表示在不同OS下，返回的东西不一样！ String.getBytes(String decode)方法会根据指定的decode编码返回某字符串在该编码下的byte数组表示，如： byte[] b_gbk = "
AngularJS中操作Cookies bijian1013 JavaScript AngularJS Cookies
如果你的应用足够大、足够复杂，那么你很快就会遇到这样一咱种情况：你需要在客户端存储一些状态信息，这些状态信息是跨session(会话)的。你可能还记得利用document.cookie接口直接操作纯文本cookie的痛苦经历。幸运的是，这种方式已经一去不复返了，在所有现代浏览器中几乎
[Maven学习笔记五]Maven聚合和继承特性 bit1129 maven
Maven聚合在实际的项目中，一个项目通常会划分为多个模块，为了说明问题，以用户登陆这个小web应用为例。通常一个web应用分为三个模块： 1. 模型和数据持久化层user-core, 2. 业务逻辑层user-service以 3. web展现层user-web， user-service依赖于user-core user-web依赖于user-core和use
【JVM七】JVM知识点总结 bit1129 jvm
1. JVM运行模式 1.1 JVM运行时分为-server和-client两种模式，在32位机器上只有client模式的JVM。通常，64位的JVM默认都是使用server模式，因为server模式的JVM虽然启动慢点，但是，在运行过程，JVM会尽可能的进行优化 1.2 JVM分为三种字节码解释执行方式：mixed mode, interpret mode以及compiler
linux下查看nginx、apache、mysql、php的编译参数 ronin47
在linux平台下的应用，最流行的莫过于nginx、apache、mysql、php几个。而这几个常用的应用，在手工编译完以后，在其他一些情况下（如：新增模块），往往想要查看当初都使用了那些参数进行的编译。这时候就可以利用以下方法查看。 1、nginx [root@361way ~]# /App/nginx/sbin/nginx -V nginx: nginx version: nginx/
unity中运用Resources.Load的方法？ brotherlamp unity视频 unity资料 unity自学 unity unity教程
问：unity中运用Resources.Load的方法？答：Resources.Load是unity本地动态加载资本所用的方法,也即是你想动态加载的时分才用到它,比方枪弹,特效,某些实时替换的图像什么的,主张此文件夹不要放太多东西,在打包的时分,它会独自把里边的一切东西都会集打包到一同,不论里边有没有你用的东西,所以大多数资本应该是自个建文件放置 1、unity实时替换的物体即是依据环境条件
线段树-入门 bylijinnan java 算法线段树
/** * 线段树入门 * 问题：已知线段[2,5] [4,6] [0,7]；求点2,4,7分别出现了多少次 * 以下代码建立的线段树用链表来保存，且树的叶子结点类似[i,i] * * 参考链接：http://hi.baidu.com/semluhiigubbqvq/item/be736a33a8864789f4e4ad18 * @author lijinna
全选与反选 chicony 全选
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html> <head> <title>全选与反选</title>
vim一些简单记录 chenchao051 vim
mac在/usr/share/vim/vimrc linux在/etc/vimrc 1、问：后退键不能删除数据，不能往后退怎么办？答：在vimrc中加入set backspace=2 2、问：如何控制tab键的缩进？答：在vimrc中加入set tabstop=4 (任何
Sublime Text 快捷键 daizj 快捷键 sublime
[size=large][/size]Sublime Text快捷键：Ctrl+Shift+P：打开命令面板Ctrl+P：搜索项目中的文件Ctrl+G：跳转到第几行Ctrl+W：关闭当前打开文件Ctrl+Shift+W：关闭所有打开文件Ctrl+Shift+V：粘贴并格式化Ctrl+D：选择单词，重复可增加选择下一个相同的单词Ctrl+L：选择行，重复可依次增加选择下一行Ctrl+Shift+L：
php 引用(&)详解 dcj3sjt126com PHP
在PHP 中引用的意思是：不同的名字访问同一个变量内容. 与Ｃ语言中的指针是有差别的．Ｃ语言中的指针里面存储的是变量的内容在内存中存放的地址变量的引用 PHP 的引用允许你用两个变量来指向同一个内容复制代码代码如下: <? $a="ABC"; $b =&$a; echo
SVN中trunk,branches,tags用法详解 dcj3sjt126com SVN
Subversion有一个很标准的目录结构，是这样的。比如项目是proj，svn地址为svn://proj/，那么标准的svn布局是svn://proj/|+-trunk+-branches+-tags这是一个标准的布局，trunk为主开发目录，branches为分支开发目录，tags为tag存档目录（不允许修改）。但是具体这几个目录应该如何使用，svn并没有明确的规范，更多的还是用户自己的习惯。
对软件设计的思考 e200702084 设计模式数据结构算法 ssh 活动
软件设计的宏观与微观软件开发是一种高智商的开发活动。一个优秀的软件设计人员不仅要从宏观上把握软件之间的开发，也要从微观上把握软件之间的开发。宏观上，可以应用面向对象设计，采用流行的SSH架构，采用web层，业务逻辑层，持久层分层架构。采用设计模式提供系统的健壮性和可维护性。微观上，对于一个类，甚至方法的调用，从计算机的角度模拟程序的运行情况。了解内存分配，参数传
同步、异步、阻塞、非阻塞 geeksun 非阻塞
同步、异步、阻塞、非阻塞这几个概念有时有点混淆，在此文试图解释一下。同步：发出方法调用后，当没有返回结果，当前线程会一直在等待（阻塞）状态。场景：打电话，营业厅窗口办业务、B/S架构的http请求-响应模式。异步：方法调用后不立即返回结果，调用结果通过状态、通知或回调通知方法调用者或接收者。异步方法调用后，当前线程不会阻塞，会继续执行其他任务。实现：
Reverse SSH Tunnel 反向打洞實錄 hongtoushizi ssh
實際的操作步驟： # 首先，在客戶那理的機器下指令連回我們自己的 Server，並設定自己 Server 上的 12345 port 會對應到幾器上的 SSH port ssh -NfR 12345:localhost:22 [email protected] # 然後在 myhost 的機器上連自己的 12345 port，就可以連回在客戶那的機器 ssh localhost -p 1
Hibernate中的缓存 Josh_Persistence 一级缓存 Hiberante缓存查询缓存二级缓存
Hibernate中的缓存一、Hiberante中常见的三大缓存：一级缓存，二级缓存和查询缓存。 Hibernate中提供了两级Cache，第一级别的缓存是Session级别的缓存，它是属于事务范围的缓存。这一级别的缓存是由hibernate管理的，一般情况下无需进行干预；第二级别的缓存是SessionFactory级别的缓存，它是属于进程范围或群集范围的缓存。这一级别的缓存
对象关系行为模式之延迟加载 home198979 PHP 架构延迟加载
形象化设计模式实战 HELLO!架构一、概念 Lazy Load：一个对象，它虽然不包含所需要的所有数据，但是知道怎么获取这些数据。延迟加载貌似很简单，就是在数据需要时再从数据库获取，减少数据库的消耗。但这其中还是有不少技巧的。二、实现延迟加载实现Lazy Load主要有四种方法：延迟初始化、虚
xml 验证 pengfeicao521 xml xml解析
有些字符，xml不能识别，用jdom或者dom4j解析的时候就报错 public static void testPattern() { // 含有非法字符的串 String str = "Jamey친Ñ&#1282
div设置半透明效果 spjich css 半透明
为div设置如下样式： div{filter:alpha(Opacity=80);-moz-opacity:0.5;opacity: 0.5;} 说明： 1、filter：对win IE设置半透明滤镜效果，filter:alpha(Opacity=80)代表该对象80%半透明，火狐浏览器不认2、-moz-opaci
你真的了解单例模式么？ w574240966 java 单例设计模式 jvm
单例模式，很多初学者认为单例模式很简单，并且认为自己已经掌握了这种设计模式。但事实上，你真的了解单例模式了么。一，单例模式的5中写法。（回字的四种写法，哈哈。） 1，懒汉式（1）线程不安全的懒汉式 public cla

Chapter2 : Machine Learning Applied to the Modeling of Pharmacological and ADMET Endpoints

文章目录

1.Introduction

2.Machine Learning Applied to ADMET Problems

2.1.The importance of a Favorable ADMET Profile

2.2.Data, Descriptors, Algorithms, Metrics

2.3.Data Are Key

2.3.1.Experimental Assay Data

2.3.2.Standardization of Chemical Structure

2.3.3.Preprocessing of Assay Data for Machine Learning

2.3.4.Examples for the Effort and Importance of Data Curation

2.4.Machine Learning Algorithms

2.4.1.History of Supervised ML Algorithms in Drug Discovery

2.4.2.Pros and Cons of Supervised ML Algorithms in Drug Discovery Industry

2.5.Descriptor

2.5.1.Molecular Descriptors

2.5.2.Atom Descriptors

2.6.Performance Metrics

2.7.Identification of Stable and Performant Models

2.8.Applicability Domain

2.9.Models for complex and Multiple Endpoints

2.9.1.Modeling Physicochemical ADMET Endpoints with Multitask Graph Convolutional Networks

2.9.2.Modeling of in Vivo Endpoints

2.9.3.Modeling of Drug Metabolism

2.10.Application Examples

2.10.1.Bayer’s integrated ADMET Platform

2.10.2.Guiding the Design of Combination Libraries

2.10.3.Combing Cheminformatics and Physics-Based Methods in Lead Optimization

3.Summary and Outlook

你可能感兴趣的:(读书笔记,AI,Drug)