gmx cluster命令学习

全文总结

gmx_mpi cluster -f md_*_mol.xtc -s md_*.tpr -method gromos -o rmsd-clust_*.xpm -g cluster_*.log -dist rmsd-dist_*.xvg -cutoff 0.3 -clid clust-id_*.xvg   -cl   clusters_*.pdb -tu ns

输出:

1. rmsd-clust_*.xpm (重要)可以通过对比不同的xpm文件,看到蛋白质dynamics的差异。例如native的蛋白类别相比mutation的类别少,更稳定。但是这个图形不够直观,查找更直观的二维散点图。

2. clust-id_*.xvg (重要)1->2->3->4->5->6能量依次上升,1,2,3是主要构象,可以估算出自由能差,1和2,1和3之间的转移次数多,能垒小。每个时间对应的cluster,可以大略的看Cluster之间的转移,但是没法统计每类之间的转移次数,如果要统计需要自己写程序

3. clusters_*.pdb(重要) 代表结构

4. cluster_*.log 用来检查每一个Cluster的信息

5. rmsd-dist_*.xvg 用来检查


聚类对象:

不同结构之间的距离,其可以由fitting 后的RMS deviation 或 原子对距离的RMS deviation来衡量。距离可以直接通过轨迹或者-dm .xpm获得。


聚类方法:

方法一:single linkage: 一个结构的距离比其他所有的距离小于cutoff的为一类。(add a structure to a cluster when its distance to any element of the cluster is less than cutoff.)

方法二:Jarvis Patrick: 一个结构和一类中的所有结构直接的距离小于cutoff。(add a structure to a cluster when this structure and a structure in the cluster have each other as neighbors and they have a least P neighbors in common. The neighbors of a structure are the M closest structures or all structures within cutoff.)

Monte Carlo: 采用MC的方法来取进行分类的顺序,即帧的RMSD以最小幅度增加。这样可以获得光滑的变动。分类最后的结果可以看到在xpm矩阵中,从下到上是均匀变化的。(reorder the RMSD matrix using Monte Carlo such that the order of the frames is using the smallest possible increments. With this it is possible to make a smooth animation going from one structure to another with the largest possible (e.g.) RMSD between them, however the intermediate steps should be as small as possible. Applications could be to visualize a potential of mean force ensemble of simulations or a pulling simulation. Obviously the user has to prepare the trajectory well (e.g. by not superimposing frames). The final result can be inspect visually by looking at the matrix .xpm file, which should vary smoothly from bottom to top.)

diagonalization: diagonalize the RMSD matrix.

方法三:gromos: 一种算法,使一类中的neighbor最多。(use algorithm as described in Daura et al. (Angew. Chem. Int. Ed. 1999, 38, pp 236-240). Count number of neighbors using cut-off, take structure with largest number of neighbors with all its neighbors as cluster and eliminate it from the pool of clusters. Repeat for remaining structures in pool.)


输入:

-f      [<.xtc/.trr/...>]  (traj.xtc)      (Opt.)

          Trajectory: xtc trr cpt gro g96 pdb tng

-s      [<.tpr/.gro/...>]  (topol.tpr)

          Structure+mass(db): tpr gro g96 pdb brk ent

-n      [<.ndx>]          (index.ndx)      (Opt.)

          Index file

-dm    [<.xpm>]          (rmsd.xpm)      (Opt.)  从别的分析中获得的距离矩阵

          X PixMap compatible matrix file 


输出文件:


-o      [<.xpm>]          (rmsd-clust.xpm) //默认输出, 将RMSD值写在矩阵的左上角,将集群的图形化描述写在右下角,当-minstruct = 1时,即两个结构位于同一集群中时,图形为黑色。当-minstruct > 1不同的颜色将用于每个集群。

          X PixMap compatible matrix file

* -o writes the RMSD values in the upper left half of the matrix and a graphical depiction of the clusters in the lower right half When -minstruct  = 1 the graphical depiction is black when two structures are in the same cluster. When -minstruct > 1 different colors will be used for each cluster.

-g      [<.log>]          (cluster.log) //默认输出,包含所有输出信息,其中124次transition是指所有transition次数加起来除以2;cl.(cluster 1) | #st(structure number)  rmsd(cluster中的rmsd,应该小于设置的cutoff,但是最后的group似乎不符合要求) | middle(代表帧的时间) rmsd(其对应rmsd) | cluster members(每个structure对应的时间)

          Log file

* -g writes information on the options used and a detailed list of all clusters and their members.

Cluster1
cluster 4 5 6

-dist  [<.xvg>]          (rmsd-dist.xvg)  (Opt.) //默认输出,横坐标是RMSD,纵坐标是个数

          xvgr/xmgr file

* -dist writes the RMSD distribution.

-om    [<.xpm>]          (rmsd-raw.xpm) //输入的xpm重新输出一次,不需要

          X PixMap compatible matrix file

-ev    [<.xvg>]          (rmsd-eig.xvg)  (Opt.)   //计算RMSD对角化矩阵特征向量,测试并未输出,有bug

          xvgr/xmgr file

* -ev writes the eigenvectors of the RMSD matrix diagonalization.

-sz    [<.xvg>]          (clust-size.xvg) (Opt.)  //每个cluster中的结构数,总数为帧数。已经包含在log中。

          xvgr/xmgr file

* -sz writes the cluster sizes.


-clid  [<.xvg>]          (clust-id.xvg)  (Opt.) //每个时间对应的cluster编号,可以看Cluster之间的转移

          xvgr/xmgr file

* -clid writes the cluster number as a function of time.

-clndx  [<.ndx>]          (clusters.ndx)  (Opt.) //将cluster对应的 帧数(不是原子序号)! 写入trjconv的指定索引文件。该信息已经包含在log文件中。

          Index file

* -clndx writes the frame numbers corresponding to the clusters to the specified index file to be read into trjconv.


clusters.ndx 帧数写入Cluster文件,方便看

Cluster之间的转移:

-tr    [<.xpm>]          (clust-trans.xpm) (Opt.) //计算cluster之间的transition次数,不好看,不如看clust-id.xvg直观

          X PixMap compatible matrix file

* -tr writes a matrix of the number transitions between cluster pairs.

6*6的矩阵,不好看,代表类别之间的trans,无trans为白色

-ntr    [<.xvg>]          (clust-trans.xvg) (Opt.) //计算cluster之间转移的总次数,例如cluster6有1个结构,则有2次trans,cluster5有2个结构,则有4次trans,注意,并不是说一个结构对应一次trans,如果是时间连续的结构,例如t=1,2,3,5对应Cluster1,t=4对应cluster2,则t=1-5之间,有两次trans

          xvgr/xmgr file

* -ntr writes the total number of transitions to or from each cluster.


Cluster的代表结构:

-cl    [<.xtc/.trr/...>]  (clusters.pdb)  (Opt.) // 写每个cluster的平均(使用选项-av)或中心结构,或者为选定的一组cluster用集群成员写编号的文件(使用选项-wcl,取决于-nst和-rmsmin)。集群的中心是与集群所有其他结构相比平均RMSD最小的结构

          Trajectory: xtc trr cpt gro g96 pdb tng

* -cl writes average (with option -av) or central structure of each cluster or writes numbered files with cluster members for a selected set of clusters (with option -wcl, depends on -nst and -rmsmin). The center of a cluster is the structure with the smallest average RMSD from all other structures of the cluster.

重要参数(需要尝试):

-method             (linkage) //重要!!!选择方法

 Method for cluster determination: linkage, jarvis-patrick, monte-carlo, diagonalization, gromos

-cutoff             (0.1) //重要!!!!在第一次运行后更改,以获得合理的cluster数目

          RMSD cut-off (nm) for two structures to be neighbor

-rmsmin             (0) //输出结构的最小的rms 差距,直接运行一次可以计算得到

          minimum rms difference with rest of cluster for writing structures

与xpm图像显示相关选项:

-nlevels             (40) //重要!!!举例:得到rmsd范围是0.0169-0.0657nm,默认最小rms值设置为   -rmsmin=0,则在xpm中的颜色用(0.0657-0.0)/40=0.0017nm作为间隔,共有40种颜色。

          Discretize RMSD matrix in this number of levels

-max                (-1) // level不设上限,不用改

          Maximum level in RMSD matrix

-minstruct           (1) //集群中用于着色.xpm文件的最小结构数(>1才有颜色),如果一个集群中的结构数=1,则为白色

          Minimum number of structures in cluster for coloring in the .xpm file

-[no]binary                (no) //只有两种颜色

          Treat the RMSD matrix as consisting of 0 and 1, where the cut-off  is given by -cutoff

输出代表结构选项:

-wcl                  (0)  //将指定的集群的结构写入到编号的文件中(猜的),默认所有的集群,不用改

          Write the structures for this number of clusters to numbered files

-nst                  (1) //默认只输出一个结构,如果设置超过这个数,则输出所有结构(猜的),不用改

          Only write all structures if more than this number of structures per cluster

-[no]av                    (no)  //写平均结构(无物理意义),而不是中心结构(representative structure),不用改

          Write average instead of middle structure for each cluster

其他选项

-tu                (ps) //设置为ns

          Unit for time values: fs, ps, ns, us, ms, s

-[no]dista                (no) //RMSD是ti和tj之间的RMS,RMS deviation是相对于t0的RMS,不用改

          Use RMSD of distances instead of RMS deviation

-[no]fit                  (yes) //默认fitting,不用改

          Use least squares fitting before RMSD calculation

-[no]pbc                  (yes) //默认不用改

          PBC check

-M                    (10) // Jarvis-Patrick algorithm指定的最小邻居数,不指定时用cutoff距离

          Number of nearest neighbors considered for Jarvis-Patrick algorithm, 0 is use cutoff

-P                    (3) //组成一个集群所需的相同的最近邻居的数量

          Number of identical nearest neighbors required to form a cluster

-seed                (0) //MC的随机数,0是随机产生一个,不用改

          Random number seed for Monte Carlo clustering algorithm (0 means generate)

-niter                (10000) //MC循环数,不用改

          Number of iterations for MC

-nrandom             (0)  //MC的第一个迭代可以完全随机完成,以打乱帧,不用改

          The first iterations for MC may be done complete random, to shuffle the frames

-kT                (0.001) //MC优化的玻尔兹曼加权因子(0的话无优化,即不连续?),不用改

          Boltzmann weighting factor for Monte Carlo optimization (zero turns off uphill steps)

-b     

          Time of first frame to read from trajectory (default unit ps)

-e     

          Time of last frame to read from trajectory (default unit ps)

-dt   

          Only use frame when t MOD dt = first time (default unit ps)

-skip                (1) //用帧数来分割

          Only analyze every nr-th frame

-[no]w                    (no)

          View output .xvg, .xpm, .eps and .pdb files

-xvg                (xmgrace) //默认用xmgrace打开,不用改

          xvg plot formatting: xmgrace, xmgr, none


具体操作流程:

test0:用默认的做预实验

结果表明:总共955个输入帧数,计算955*955RMSD矩阵。计算得到RMSD从0.10019到0.657485nm,平均RMSD为0.319467,矩阵的能量为18.5475(怎么计算的)。报错:默认的cutoff 0.1nm在 0.10019到0.657485nm之外,即最小的RMSD为0.10019nm,所以每帧自成一类。RMSD的默认最小值0小于0.10019.

默认输出文件:

rmsd-dist.xvg:横坐标是RMS(nm),纵坐标是标准单位,此处是个数,把第二列加和=(955*955-955)/2,即RMS矩阵去掉对角线的一半

cluster.log

rmsd-cluster.xpm :注释,横纵坐标都是Time,从0-95400(当前模拟的时间ps),颜色矩阵

要把注释部分删除,才能用Irfan View打开,否则报错!!!


横纵坐标


颜色


RMSD值写在矩阵的左上角,图形化描述写在右下角。当-minstruct = 1时,即两个结构位于同一集群中时,图形为黑色。当-minstruct > 1不同的颜色将用于每个集群。此结果都是白色,表示没有任何两帧为同一集群。

test1:调整cutoff,调整rmsd的最小值



换了一种算法,gromos算法,RMSD的范围和能量看起来没什么差别,下次设置的RMSD min设为0.10020nm,分成6类

矩阵右下角显示有问题,颜色n也显示成了白色,需要改!!

显示的有问题,需要修正!!

你可能感兴趣的:(gmx cluster命令学习)