paper read 1
Created by: 银晗 张
Created time: May 27, 2023 3:47 PM
Tags: Product
- 补充了解蛋白质的生物学知识
- 学习一下Diffusion的原理
Method & Innovations
- Framework
- first deep learning models to perform antibody sequence-structure design by considering the 3D structures of the antigen
- design protein sequences and coordinates & side-chain orientations , firstly achieve atomic-resolution antibody design and is equivariant to rotation and translationw
- applied to antibody design tasks sequence-structure co-design, fix-backbone CDR design, and antibody optimization
Prospose Method: 基于扩散的生成模型联合采样抗体CDR序列和结构
- CDR序列及其结构的联合分布直接依赖于抗原结构, 所以我们的任务是给定一个由抗原和抗体框架组成的蛋白质复合物作为输入,得到CDRs的结构
- Differences to previous works
Traditional Computational Antibody Design Problems:
- the search space of CDRs is vast , L squences may have 20^L
- time-consuming and local optima
Generative model challenges :
- how to model the intrinsic relation between CDR sequences and 3D structures
- how to model the distribution of CDRs conditional on the rest of the antibody sequence
- the model should be explicitly conditional on the 3D structures of the antigen and generate CDRs that fit the antigen structure in the 3D space
- model should be able to consider both the position and orientation of amino acids
- instead of de novo design, model should be applicable to another realistic scenario: optimizing a particular antibody to increase the binding affinity to the antigen
Related Diffusion-Based Generative Models
- the sequence-based methods can only generate new antibodies based on previously observed
antibodies but can hardly generate antibodies for specific antigen structures
- protein structure pretidion algorithms : MSAs、AlphaFold2
- diffusion model : denosing with prior distrubtion、molecular 3D structure
M o d e l S t e p s : Model Steps: ModelSteps:
- 用任意序列、位置和方向初始化CDR。扩散模型首先聚集了来自抗原和抗体框架的信息
- 迭代地更新cdr上每个氨基酸的氨基酸类型、位置和方向(侧链的方向)
- 我们基于预测的方向,使用侧链填充算法在原子级重建CDR结构
- What insights would the proposed approach bring?
SO(3) Denosing for Amino Acid Orientations:
S : coordinates , X: amino acid types, O: orientations
- 各向同性的高斯分布,改变旋转角度
- 神经网络用于方向去噪和输出去噪的方向矩阵
- 目标函数是真实和预测的方向矩阵之间的差异内积
Diffusion For C a C_a Ca Coordinates :
- 坐标是一个正态分布
- 变化的学习率
- 神经网络用于预测高斯分布的噪声
- 目标函数是生成的分布和初始先验分布的MSE
Migrate Markov chains
MLP embeding: encodes the information of amino acid types, torsional angles, and 3D coordinates of all the heavy atoms . The pairwise embedding MLP encodes the Euclidean distances and dihedral angles between amino acid i and j to feature vectors zij, use IPA(to transform ∗ ∗ ∗ e i ∗ ∗ ∗ ***e_i*** ∗∗∗ei∗∗∗ and ∗ ∗ ∗ z i j ∗ ∗ ***z_{ij}** ∗∗∗zij∗∗ into hidden representations ∗ ∗ ∗ h i ∗ ∗ ***h_i** ∗∗∗hi∗∗) to represent itself and environment
Denoise: the representations are fed to three different MLPs to denoise the amino acid types, 3D positions, and orientations of the CDR,respectively

将向量转换为旋转矩阵 M j ∈ S O ( 3 ) M_j∈SO (3) Mj∈SO(3)右向乘以方向,为下一步生成步骤产生一个新的平均方向: O j t − 1 ← O j t M j O^{t−1}_j←O^t_jM_j Ojt−1←OjtMj。

sample algorithm:
- 20 types amino acids distrubution : s j T ∼ U n i f o r m ( 20 ) s^T_j∼ Uniform(20) sjT∼Uniform(20)
- C α C_α Cα positions from the standard normal distribution: ∗ ∗ x j T ∼ N ( 0 , I 3 ) **x^T_j ∼ N (0, I3) ∗∗xjT∼N(0,I3), side-chain C β C_β Cβ
- orientations from the uniform distribution over SO(3): ∗ ∗ O j T ∼ U n i f o r m ( S O ( 3 ) ) ∗ ∗ **O^T_j∼ Uniform(SO(3))** ∗∗OjT∼Uniform(SO(3))∗∗
DiffAb Experiment
Antigen-Specific Antibody Design and Optimization with Diffusion-Based Generative Models for Protein Structures
Domain words:
- antigen, antibody : 抗体、抗原
- complementarity-determining regions (CDR):互补性结构区域
- amino acids: 氨基酸
- molecular , atom : 分子、原子
- the structure of antibody-antigen

Target: to design effective therapeutic antibody structure
Traditional Problems:
- the search space of CDRs is vast , L squences may have 20^L
- time-consuming and local optima
Generative model challenges :
- how to model the intrinsic relation between CDR sequences and 3D structures
- how to model the distribution of CDRs conditional on the rest of the antibody sequence
- the model should be explicitly conditional on the 3D structures of the antigen and generate CDRs that fit the antigen structure in the 3D space
- model should be able to consider both the position and orientation of amino acids
- instead of de novo design, model should be applicable to another realistic scenario: optimizing a particular antibody to increase the binding affinity to the antigen
- the sequence-based methods can only generate new antibodies based on previously observed
antibodies but can hardly generate antibodies for specific antigen structures
- protein structure pretidion algorithms : MSAs、AlphaFold2
- diffusion model : 先验分布去噪、molecular 3D structure
Prospose Method: 基于扩散的生成模型联合采样抗体CDR序列和结构
- CDR序列及其结构的联合分布直接依赖于抗原结构, 所以我们的任务是给定一个由抗原和抗体框架组成的蛋白质复合物作为输入,得到CDRs的结构
- 用任意序列、位置和方向初始化CDR。扩散模型首先聚集了来自抗原和抗体框架的信息
- 迭代地更新cdr上每个氨基酸的氨基酸类型、位置和方向(侧链的方向)
- 我们基于预测的方向,使用侧链填充算法在原子级重建CDR结构

S : coordinates , X: amino acid types, O: orientations
- A diffusion probabilistic model defines two Markov chains of diffusion processes
- The forward diffusion process gradually adds noise to the data until the data distribution approximately reaches the prior distribution
- The generative diffusion process starts from the prior distribution and iteratively
transforms it to the desired distribution.
多项式分布 → 高斯分布
- 任意时刻 t t t,用 t 0 t_0 t0和 β \beta β表达

C C C:空间结构信息; R t R^t Rt :t时刻CDR的状态
all-one vector是一个元素全部为1的向量。例如,一个长度为n的all-one vector可以表示为[1, 1, 1, ..., 1]。在数学和计算机科学中,all-one vector经常用于矩阵和向量的运算和表示,例如在矩阵乘法中,一个矩阵乘以一个all-one vector可以得到该矩阵每一行的和。all-one vector也可以用于表示一组等权重的值,例如在计算平均值时,可以将每个值乘以一个all-one vector,再除以向量的长度,即可得到这组值的平均值。

Diffusion For C a C_a Ca Coordinates :
- 坐标是一个正态分布
- 变化的学习率
- 神经网络用于预测高斯分布的噪声
- 目标函数是生成的分布和初始先验分布的MSE

SO(3) Denosing for Amino Acid Orientations:
- 各向同性的高斯分布,改变旋转角度
- 神经网络用于方向去噪和输出去噪的方向矩阵
- 目标函数是真实和预测的方向矩阵之间的差异内积