首先介绍一下Prody可以做的事情:
prody原始文献:ProDy: Protein Dynamics Inferred from Theory and Experiments
https://academic.oup.com/bioinformatics/article/27/11/1575/217006
2.1 Input for ProDy
The input for ProDy is the set of atomic coordinates in PDB format for the protein of interest, or simply the PDB id or sequence of the protein. Given a query protein, fast and flexible ProDy parsers are used to Blast search the PDB, retrieve the corresponding files (e.g. mutants, complexes or sequence homologs with user-defined minimal sequence identity) from the PDB FTP server and extract their coordinates and other relevant data. Additionally, the program can be used to analyze a series of conformers from molecular dynamics (MD) trajectories inputted in PDB file format or programmatically through Python NumPy arrays. More information on the input format is given at the ProDy web site tutorial and examples.
2.2 Protein ‘dynamics’ from experiments
The experimental data refer to ensembles of structures, X-ray crystallographic or NMR. These are usually heterogeneous datasets, in the sense that they have disparate coordinate data arising from sequence dissimilarities, insertions/deletions or missing data due to unresolved disordered regions. In ProDy, we implemented algorithms for optimal alignment of such heterogeneous datasets and building corresponding covariance matrices. Covariance matrices describe the mean-square deviations in atomic coordinates from their mean position (diagonal elements) or the correlations between their pairwise fluctuations (off-diagonal elements). The principal modes of structural variation are determined upon principal component analysis (PCA) of the covariance matrix, as described previously (Bakan and Bahar, 2009).
实验数据涉及结构、x射线晶体学或核磁共振的集合。这些通常是异构数据集,因为它们具有不同的坐标数据,这些坐标数据来自于序列的不相似性、插入/删除或由于未解决的无序区域而导致的数据缺失。在ProDy中,我们实现了异构数据集的最优对齐算法,并建立了相应的协方差矩阵。协方差矩阵描述原子坐标从其平均位置(对角元素)或它们的两两波动(非对角元素)之间的相关性的均方偏差。结构变化的主要模式由协方差矩阵的主成分分析(PCA)决定,如前所述(Bakan and Bahar, 2009)。
2.3 Protein dynamics from theory and simulations
We have implemented classes for Gaussian network model (GNM) analysis and for normal mode analysis (NMA) of a given structure using the ANM (Eyal et al., 2006). Both models have been widely used in recent years for analyzing and visualizing biomolecular systems dynamics (Bahar et al., 2010). The implementation is generic and flexible. The user can (i) build the models for any set of atoms, e.g. the substrate or inhibitor can be explicitly included to study the perturbing effect of binding on dynamics, and (ii) utilize user-defined or built-in distance-dependent or residue-specific force constants (Hinsen et al., 2000; Kovacs et al., 2004). ProDy also offers the option to perform essential dynamics analysis (EDA; Amadei et al., 1993) of MD snapshots, which is equivalent to the singular value decomposition of trajectories to extract principal variations等价于轨迹的奇异值分解来提取主变分 (Velazquez-Muriel et al., 2009).
可以研究inhibitor/substrate binding 后,study the perturbing effect of binding on dynamics
2.4 Dynamics analysis example
Figure 1 illustrates the outputs generated by ProDy in a comparative analysis of experimental and computational data for p38 kinase (Bakan and Bahar, 2011). Figure 1A displays the dataset of 150 X-ray crystallographically resolved p38 structures retrieved from the PDB and optimally overlaid by ProDy. The ensemble contains the apo and inhibitor-bound forms of p38, thus providing information on the conformational space sampled by p38 upon inhibitor binding. Parsing structures, building and diagonalizing the covariance matrix to determine the principal modes of structural variations takes only 38 s on Intel CPU at 3.20 GHz. Figure 1C illustrate the first principal mode of structural variation (PC1; violet arrows) based exclusively on experimental structural dataset for p38.
As to generating computational data, two approaches are taken in ProDy: NMA of a representative structure using its ANM representation (Figure 1B; color-coded such that red/blue regions refer to largest/smallest conformational mobilities); and EDA(和PCA同义) of MD trajectories provided that an ensemble of snapshots is provided by the user. The green arrows in Figure 1C describe the first (lowest frequency, most collective) mode predicted by the ANM, shortly designated as ANM1. The heatmap in Figure 1D shows the overlap (Marques and Sanejouand, 1995) between top-ranking PCA and ANM modes. The cumulative overlap between the top three pairs of modes is 0.73.
An important aspect of ProDy is the sampling of a representative set of conformers consistent with experiments—a feature expected to find wide utility in flexible docking and structure refinement. Figure 1E displays the conformational space sampled by experimental structures (blue dots), projected onto the subspace spanned by the top three PCA directions, which accounts for 59% of the experimentally observed structural variance. The conformations generated using the softest modes ANM1-ANM3 predicted to be intrinsically accessible to p38 in the apo form, are shown by the red dots. The sizes of the motions along these modes obey a Gaussian distribution with variance scaling with the inverse square root of the corresponding eigenvalues. ANM conformers cover a subspace (green ellipsoidal envelope) that encloses all experimental structures. Detailed information on how to generate such plots and figures using ProDy is given in the online documentation, along with several examples of downloadable scripts.
ProDy的一个重要方面是对符合实验的典型构象集进行抽样,这一特性有望在柔性对接和结构优化方面得到广泛应用。图1E显示了实验结构采样的构象空间(蓝点),投影到主成分分析前三个方向所张成的子空间上,占实验观察到的结构的59%。使用softest modes (应该是 the first 3 (lowest frequency, most collective) mode )ANM1-ANM3产生的构象 the first (lowest frequency, most collective) mode ,预测p38在apo形式下本质上是可接近的。沿这些模态的运动大小服从高斯分布,其方差随对应特征值的平方根的倒数而缩放。ANM构象覆盖了一个子空间(绿色椭球包络线),该空间覆盖了所有的实验结构。在线文档中提供了关于如何使用ProDy生成这些图形和图形的详细信息,以及几个可下载脚本示例。
如何使用VMD进行Normal mode analysis?
Extensions——Normal Mode Wizard
界面一:
Help
NMWiz Main
==========
Load Normal Mode Data
---------------------
Main interface allows user to load data into NMWiz in two ways:
**Load NMD File**
If you have an NMD file, click 'Load NMD file' button to select the file. The contents will be loaded and a Wizard window associated with the data will appear.
**From Molecule** 加载已经在VMD显示的normal mode data
Alternatively, when normal mode data is present in a file format that is recognized by VMD, load the files into VMD as a molecule and click 'From Molecule' button. A window will appear to facilitate selection of normal mode data from a molecule with multiple frames.
Note that, data obtained from a molecule can be saved in NMD format from the Main window and NMD files can be parsed with ProDy for further analysis.
Perform NMA Calculations
------------------------
You can use NMWiz to perform NMA calculations via ProDy for molecules loaded in VMD. Click 'ProDy Interface' and follow the instructions therein for ANM, GNM, and PCA (EDA) calculations.
Settings and Options
--------------------
**Preserve View** 当NMWiz加载数据时,VMD会将焦点转移到新的分子上。检查此选项以在加载新数据集时保留当前视图。
When NMWiz loads data, VMD will shift focus to the new molecule. Check this to preserve the current view when loading a new dataset.
界面二:
ProDy Interface
===============
ProDy interface allows users to perform the following calculations for molecules loaded in VMD:
* Anisotropic Network Model (ANM) 各向异性网络模型
* Gaussian Network Model (GNM) 高斯网络模型
* Principal Component Analysis (PCA) a.k.a. 又名 Essential Dynamics Analysis (EDA)
Atom Selection
--------------
First thing you need to do is selecting the molecule and specifying the atoms that you want to include in the calculations. If you do not see all molecules in the menu, click 'Update'.
ProDy Job Settings
------------------
Specify the calculation type and output options in this panel. Coordinate data for selected atoms and the NMD data after calculations will be written into the 'Output directory'. All output files will named after 'Output filename'.
**ProDy Scripts** 我在windows中没有成功指定路径,在linux中直接安装prody(python3.7),VMD自动识别 pip install prody
NMWiz will try to find the path to Python executable ('python' or 'python.exe') and ProDy script ('prody'). For this to work, both of these files must be in you PATH environment variable. If they are not found, you will be prompted to specify the path.
ANM/GNM Settings
----------------
Specify the following:
* number of modes to be calculated
* index of the frame (coordinate set) to be used in calculations
* cutoff distance
* force constant
PCA/EDA Settings
----------------
Note that for PCA/EDA calculations molecule must have multiple frames. Specify the range of frames to be used in calculations. For large systems, prefer to write coordinates in DCD format to gain IO speed and save save disk space.
界面三:
ProDy Interface
===============
Active Mode
--------------
Select the active mode for which you want to draw arrows or make an animation. Direction of arrows depicting the normal mode can be changed using +/- button. Arrows can be drawn along both directions by changing the options Mode Graphics Options panel. The selected color effects both arrow graphics and square fluctuation plots.
选择要为其绘制箭头或制作动画的active mode (1-10)。使用+/-按钮可以改变描述 normal mode 的箭头方向(变成相反方向了)。通过改变选项模式图形选项面板,箭头可以沿着两个方向绘制。所选的颜色效果包括箭头图形和方形起伏图。
**RMSD** 缩放箭头所用
The RMSD corresponding to the displacement described by the arrows is displayed. User can change the RMSD value to rescale the arrows. The scaling factor that produces the specified RMSD is printed to the VMD console (along with the magnitude of the mode provided in NMD file).
显示箭头所描述的位移对应的RMSD。用户可以更改RMSD值来缩放箭头。生成指定RMSD的缩放因子被打印到VMD控制台(以及NMD文件中提供的模式的大小)。
**Selection**
Selection entry allows the user to display arrows for a subset of atoms.
*TIP*: If the arrow graphics are too crowded or the display is slow, draw arrows for an evenly spaced subset of residues, e.g try 'name CA and residue % 4 == 0', which will draw an arrow for every fourth residue.如果箭头图形太拥挤或者显示太慢,为间隔均匀的残基子集画箭头,例如try 'name CA and残基% 4 == 0',它将为每四个残基画一个箭头。
Mode Graphics
--------------
Id of the molecule that contains the arrow graphics of the active mode is shown in parentheses. 圆括号中显示了包含活动模式箭头图形的分子的Id。
Buttons:
* Draw: draw/redraw arrow graphics for the active mode
* Clean: remove most recently drawn arrow graphics
* Hide/Show: hide/show most recently drawn arrow graphics
* Options: show/hide arrow graphics option panel
Options:用户可以在此面板中更改箭头图形的属性以及NMWiz对这些更改的行为。默认情况下:*当用户改变图形属性时,箭头图形被设置为自动改变 *当前图形被隐藏,激活模式被改变
(可选):*箭头可以画在两个方向,看起来像一个双头箭头 *小于长度(a)的箭头可以隐藏
此外,用户可以更改: *箭头柱的宽度 *箭头代码的宽度/高度*图形材料和分辨率
User can change arrow graphics properties and how NMWiz behave upon such changes in this panel.
By default:
* arrow graphics are set to automatically change when graphics properties are changed by the user
* current graphics are hidden the active mode is changed
Optionally:
* arrows can be drawn in both directions to look like a double headed arrow
* arrows shorter than a length (A) can be hidden
Additionally, user can change:
* width of the arrow cylinder
* width/height of the arrow head code
* graphics material and resolution
Animations
----------
Id of the molecule that contains the most recently generated animation is shown in parentheses.
Buttons:
* Draw: animate fluctuations along the active mode
* Play : play/pause the animation
* Hide : hide/show the animation
* Options: show/hide animation option panel
Options:
User can elect automatic generation and continuous play of animations when the active mode changes. User can also select the number of frames in the animation.
Plotting 改为Figures,此选项中影响的是 Plot Mobility的图像,可以调节坐标轴等
--------
Id of the molecule for displaying selected residues is shown in parentheses.
Buttons:
* Plot: plot squared-fluctuations along the active mode 沿着active mode绘制方形波动
* Clear: clear all selections and selected atom labels
* Hide/Show: hide/show the selected residues
* Options: change plotting options
Molecule Representations 改为 Molecules,此选项调节的是vmd窗口中的显示,可以调节成网络模式
Id of the molecule that contains the structure is shown in parentheses.
Buttons:
* Update: update molecule representation
* Focus: reset view to focus on the structure
* Hide/Show: hide/show strudture
* Options: change molecular system representation
Options:
User can select the representation and coloring scheme. User can change the molecule representation settings manually, by setting 'Show structure as' to 'Custom'.用户可以选择表示法和上色方案。用户可以手动更改分子表示设置,将“显示结构”设置为“自定义”。
Structure can be colored based on the `Mobility` of the residues in the active mode, based on 'Bfactors' that came in NMD file, or based on residue/atom 'Index'. 结构可以根据活性模式中的残基的“流动性”着色,或者基于NMD文件中的“Bfactors”,或基于残留物/原子的“索引”。
In addition to the standard representations (e.g. Tube/Trace/Licorice), structure can be represented as an elastic network. Color scale method and midpoint can be used to adjust mobility and Bfactors based coloring.User can set the cutoff distance, width of dynamic bonds, and node spheres. Note that changing the cutoff distance distance only affects representation, not the precalculated normal mode data. 除了标准的表示(例如:Tube/Trace/Licorice),结构可以表示为一个弹性网络。可采用色标法和中点法来调整移动性和基于Bfactors的着色。用户可以设置截止距离,宽度的动态键,和节点球体。注意,改变截止距离距离只影响表示,而不是预先计算的正常模式数据。
*TIP*: When visualizing a large system, display molecule at lower resolutions and/or try displaying fewer atoms if all atoms are displayed.
计算过程:Submit Job(参数的输入会影响最后结果,此处我们全部使用默认的)
/software_install/vmd-1.9/lib/vmd_LINUXAMD64: /usr/lib/libGL.so.1: no version information available (required by /software_install/vmd-1.9/lib/vmd_LINUXAMD64)Info) VMD for LINUXAMD64, version 1.9.4a35 (July 10, 2019)
Info) http://www.ks.uiuc.edu/Research/vmd/
Info) Email questions and bug reports to [email protected]
Info) Please include this reference in published work using VMD:
Info) Humphrey, W., Dalke, A. and Schulten, K., `VMD - Visual
Info) Molecular Dynamics', J. Molec. Graphics 1996, 14.1, 33-38.
Info) -------------------------------------------------------------
Info) Multithreading available, 24 CPUs detected.
Info) CPU features: SSE2 AVX AVX2 FMA
Info) Free system memory: 122GB (97%)
Info) Creating CUDA device pool and initializing hardware...
Info) Detected 2 available CUDA accelerators:
Info) [0-1] GeForce RTX 2080 Ti 68 SM_7.5 1.5 GHz, 11GB RAM SP32 AE3 ZC
Info) OpenGL renderer: GDI Generic
Info) Features: STENCIL
Info) GLSL rendering mode is NOT available.
Info) Textures: 2-D (1024x1024)
Info) Detected 2 available TachyonL/OptiX ray tracing accelerators
Info) Compiling 1 OptiX shaders on 2 target GPUs...
Info) Dynamically loaded 3 plugins in directory:
Info) /software_install/vmd-1.9/lib/plugins/LINUXAMD64/molfile
Info) File loading in progress, please wait.
Info) Using plugin pdb for structure file md_CE2_pro_firstframe.pdb
Info) Using plugin pdb for coordinates from file md_CE2_pro_firstframe.pdb
Info) Determining bond structure from distance search ...
Info) Finished with coordinate file md_CE2_pro_firstframe.pdb.
Info) Analyzing structure ...
Info) Atoms: 7919
Info) Bonds: 8032
Info) Angles: 0 Dihedrals: 0 Impropers: 0 Cross-terms: 0
Info) Bondtypes: 0 Angletypes: 0 Dihedraltypes: 0 Impropertypes: 0
Info) Residues: 515
Info) Waters: 0
Info) Segments: 1
Info) Fragments: 1 Protein: 1 Nucleic: 0
vmd > Info) Using plugin xtc for coordinates from file /home/amax/project/ce/purepro/CE2/md_CE2_pro.xtcInfo) Coordinate I/O rate 1152.5 frames/sec, 103 MB/sec, 8.7 sec
Info) Finished with coordinate file /home/amax/project/ce/purepro/CE2/md_CE2_pro.xtc.
Info) Opened coordinate file /home/amax/project/ce/purepro/CE2/md_CE2_pro_firstframe_anm.pdb for writing.
Info) Finished with coordinate file /home/amax/project/ce/purepro/CE2/md_CE2_pro_firstframe_anm.pdb.
Info) Executing: /software_install/anaconda3-5.3.0/bin/python /software_install/anaconda3-5.3.0/bin/prody anm --quiet -s all -o /home/amax/project/ce/purepro/CE2 -p /home/amax/project/ce/purepro/CE2/md_CE2_pro_firstframe_anm -n 10 -c 15 -g 1 -u "/home/amax/project/ce/purepro/CE2/md_CE2_pro_firstframe_anm.pdb"
Info) NMWiz: Parsing file /home/amax/project/ce/purepro/CE2/md_CE2_pro_firstframe_anm.nmd
Warning) NMWiz: segnames does not contain any data.
Info) NMWiz: File contains a 3D model.
Info) NMWiz: All segment names are set as "A".
Info) NMWiz: Residue numbers will be used for plotting.
Info) Using plugin pdb for structure file /tmp/.nmdset1.pdb
Info) Using plugin pdb for coordinates from file /tmp/.nmdset1.pdb
Info) Determining bond structure from distance search ...
Info) Analyzing structure ...
Info) Atoms: 515
Info) Bonds: 0
Info) Angles: 0 Dihedrals: 0 Impropers: 0 Cross-terms: 0
Info) Bondtypes: 0 Angletypes: 0 Dihedraltypes: 0 Impropertypes: 0
Info) Residues: 515
Info) Waters: 0
Info) Segments: 1
Info) Fragments: 515 Protein: 0 Nucleic: 0
Info) Coordinate I/O rate 0.07 frames/sec, 0 MB/sec, 13.7 sec
Info) Finished with coordinate file /tmp/.nmdset1.pdb.
结果:
可以比较结合和未结合ligand的residue cross correlation的区别,study the perturbing effect of binding on dynamics