全文总结
gmx_mpi cluster -f md_*_mol.xtc -s md_*.tpr -method gromos -o rmsd-clust_*.xpm -g cluster_*.log -dist rmsd-dist_*.xvg -cutoff 0.3 -clid clust-id_*.xvg -cl clusters_*.pdb -tu ns
输出:
1. rmsd-clust_*.xpm (重要)可以通过对比不同的xpm文件,看到蛋白质dynamics的差异。例如native的蛋白类别相比mutation的类别少,更稳定。但是这个图形不够直观,查找更直观的二维散点图。
2. clust-id_*.xvg (重要)1->2->3->4->5->6能量依次上升,1,2,3是主要构象,可以估算出自由能差,1和2,1和3之间的转移次数多,能垒小。每个时间对应的cluster,可以大略的看Cluster之间的转移,但是没法统计每类之间的转移次数,如果要统计需要自己写程序
3. clusters_*.pdb(重要) 代表结构
4. cluster_*.log 用来检查每一个Cluster的信息
5. rmsd-dist_*.xvg 用来检查
聚类对象:
不同结构之间的距离,其可以由fitting 后的RMS deviation 或 原子对距离的RMS deviation来衡量。距离可以直接通过轨迹或者-dm .xpm获得。
聚类方法:
方法一:single linkage: 一个结构的距离比其他所有的距离小于cutoff的为一类。(add a structure to a cluster when its distance to any element of the cluster is less than cutoff.)
方法二:Jarvis Patrick: 一个结构和一类中的所有结构直接的距离小于cutoff。(add a structure to a cluster when this structure and a structure in the cluster have each other as neighbors and they have a least P neighbors in common. The neighbors of a structure are the M closest structures or all structures within cutoff.)
Monte Carlo: 采用MC的方法来取进行分类的顺序,即帧的RMSD以最小幅度增加。这样可以获得光滑的变动。分类最后的结果可以看到在xpm矩阵中,从下到上是均匀变化的。(reorder the RMSD matrix using Monte Carlo such that the order of the frames is using the smallest possible increments. With this it is possible to make a smooth animation going from one structure to another with the largest possible (e.g.) RMSD between them, however the intermediate steps should be as small as possible. Applications could be to visualize a potential of mean force ensemble of simulations or a pulling simulation. Obviously the user has to prepare the trajectory well (e.g. by not superimposing frames). The final result can be inspect visually by looking at the matrix .xpm file, which should vary smoothly from bottom to top.)
diagonalization: diagonalize the RMSD matrix.
方法三:gromos: 一种算法,使一类中的neighbor最多。(use algorithm as described in Daura et al. (Angew. Chem. Int. Ed. 1999, 38, pp 236-240). Count number of neighbors using cut-off, take structure with largest number of neighbors with all its neighbors as cluster and eliminate it from the pool of clusters. Repeat for remaining structures in pool.)
输入:
-f [<.xtc/.trr/...>] (traj.xtc) (Opt.)
Trajectory: xtc trr cpt gro g96 pdb tng
-s [<.tpr/.gro/...>] (topol.tpr)
Structure+mass(db): tpr gro g96 pdb brk ent
-n [<.ndx>] (index.ndx) (Opt.)
Index file
-dm [<.xpm>] (rmsd.xpm) (Opt.) 从别的分析中获得的距离矩阵
X PixMap compatible matrix file
输出文件:
-o [<.xpm>] (rmsd-clust.xpm) //默认输出, 将RMSD值写在矩阵的左上角,将集群的图形化描述写在右下角,当-minstruct = 1时,即两个结构位于同一集群中时,图形为黑色。当-minstruct > 1不同的颜色将用于每个集群。
X PixMap compatible matrix file
* -o writes the RMSD values in the upper left half of the matrix and a graphical depiction of the clusters in the lower right half When -minstruct = 1 the graphical depiction is black when two structures are in the same cluster. When -minstruct > 1 different colors will be used for each cluster.
-g [<.log>] (cluster.log) //默认输出,包含所有输出信息,其中124次transition是指所有transition次数加起来除以2;cl.(cluster 1) | #st(structure number) rmsd(cluster中的rmsd,应该小于设置的cutoff,但是最后的group似乎不符合要求) | middle(代表帧的时间) rmsd(其对应rmsd) | cluster members(每个structure对应的时间)
Log file
* -g writes information on the options used and a detailed list of all clusters and their members.
-dist [<.xvg>] (rmsd-dist.xvg) (Opt.) //默认输出,横坐标是RMSD,纵坐标是个数
xvgr/xmgr file
* -dist writes the RMSD distribution.
-om [<.xpm>] (rmsd-raw.xpm) //输入的xpm重新输出一次,不需要
X PixMap compatible matrix file
-ev [<.xvg>] (rmsd-eig.xvg) (Opt.) //计算RMSD对角化矩阵特征向量,测试并未输出,有bug
xvgr/xmgr file
* -ev writes the eigenvectors of the RMSD matrix diagonalization.
-sz [<.xvg>] (clust-size.xvg) (Opt.) //每个cluster中的结构数,总数为帧数。已经包含在log中。
xvgr/xmgr file
* -sz writes the cluster sizes.
-clid [<.xvg>] (clust-id.xvg) (Opt.) //每个时间对应的cluster编号,可以看Cluster之间的转移
xvgr/xmgr file
* -clid writes the cluster number as a function of time.
-clndx [<.ndx>] (clusters.ndx) (Opt.) //将cluster对应的 帧数(不是原子序号)! 写入trjconv的指定索引文件。该信息已经包含在log文件中。
Index file
* -clndx writes the frame numbers corresponding to the clusters to the specified index file to be read into trjconv.
Cluster之间的转移:
-tr [<.xpm>] (clust-trans.xpm) (Opt.) //计算cluster之间的transition次数,不好看,不如看clust-id.xvg直观
X PixMap compatible matrix file
* -tr writes a matrix of the number transitions between cluster pairs.
-ntr [<.xvg>] (clust-trans.xvg) (Opt.) //计算cluster之间转移的总次数,例如cluster6有1个结构,则有2次trans,cluster5有2个结构,则有4次trans,注意,并不是说一个结构对应一次trans,如果是时间连续的结构,例如t=1,2,3,5对应Cluster1,t=4对应cluster2,则t=1-5之间,有两次trans
xvgr/xmgr file
* -ntr writes the total number of transitions to or from each cluster.
Cluster的代表结构:
-cl [<.xtc/.trr/...>] (clusters.pdb) (Opt.) // 写每个cluster的平均(使用选项-av)或中心结构,或者为选定的一组cluster用集群成员写编号的文件(使用选项-wcl,取决于-nst和-rmsmin)。集群的中心是与集群所有其他结构相比平均RMSD最小的结构
Trajectory: xtc trr cpt gro g96 pdb tng
* -cl writes average (with option -av) or central structure of each cluster or writes numbered files with cluster members for a selected set of clusters (with option -wcl, depends on -nst and -rmsmin). The center of a cluster is the structure with the smallest average RMSD from all other structures of the cluster.
重要参数(需要尝试):
-method
Method for cluster determination: linkage, jarvis-patrick, monte-carlo, diagonalization, gromos
-cutoff
RMSD cut-off (nm) for two structures to be neighbor
-rmsmin
minimum rms difference with rest of cluster for writing structures
与xpm图像显示相关选项:
-nlevels (40) //重要!!!举例:得到rmsd范围是0.0169-0.0657nm,默认最小rms值设置为 -rmsmin=0,则在xpm中的颜色用(0.0657-0.0)/40=0.0017nm作为间隔,共有40种颜色。
Discretize RMSD matrix in this number of levels
-max (-1) // level不设上限,不用改
Maximum level in RMSD matrix
-minstruct
Minimum number of structures in cluster for coloring in the .xpm file
-[no]binary (no) //只有两种颜色
Treat the RMSD matrix as consisting of 0 and 1, where the cut-off is given by -cutoff
输出代表结构选项:
-wcl
Write the structures for this number of clusters to numbered files
-nst
Only write all structures if more than this number of structures per cluster
-[no]av (no) //写平均结构(无物理意义),而不是中心结构(representative structure),不用改
Write average instead of middle structure for each cluster
其他选项
-tu
Unit for time values: fs, ps, ns, us, ms, s
-[no]dista (no) //RMSD是ti和tj之间的RMS,RMS deviation是相对于t0的RMS,不用改
Use RMSD of distances instead of RMS deviation
-[no]fit (yes) //默认fitting,不用改
Use least squares fitting before RMSD calculation
-[no]pbc (yes) //默认不用改
PBC check
-M
Number of nearest neighbors considered for Jarvis-Patrick algorithm, 0 is use cutoff
-P
Number of identical nearest neighbors required to form a cluster
-seed
Random number seed for Monte Carlo clustering algorithm (0 means generate)
-niter
Number of iterations for MC
-nrandom
The first iterations for MC may be done complete random, to shuffle the frames
-kT
Boltzmann weighting factor for Monte Carlo optimization (zero turns off uphill steps)
-b
Time of first frame to read from trajectory (default unit ps)
-e
Time of last frame to read from trajectory (default unit ps)
-dt
Only use frame when t MOD dt = first time (default unit ps)
-skip
Only analyze every nr-th frame
-[no]w (no)
View output .xvg, .xpm, .eps and .pdb files
-xvg
xvg plot formatting: xmgrace, xmgr, none
具体操作流程:
test0:用默认的做预实验
默认输出文件:
rmsd-dist.xvg:横坐标是RMS(nm),纵坐标是标准单位,此处是个数,把第二列加和=(955*955-955)/2,即RMS矩阵去掉对角线的一半
cluster.log
rmsd-cluster.xpm :注释,横纵坐标都是Time,从0-95400(当前模拟的时间ps),颜色矩阵
test1:调整cutoff,调整rmsd的最小值
矩阵右下角显示有问题,颜色n也显示成了白色,需要改!!