glide作为广为人知的分子对接软件,提供了非常方便的各类型对接工作,本篇文章旨在介绍如何方便的在无图形化界面的liinux集群中运行整个分子对接流程
总体流程大致可以输入,预处理和分子对接三个部分
• 输入为蛋白文件和小分子2D结构数据库。
• 输入文件经过ligprep和grid_gen_from_pv.py进行预处理, 其中ligprep是将小分子2D结构转3D。grid_gen_from_pv.py是从蛋白复合物中的已知配体生成蛋白结结合位点网格。
• 分子对接依赖于GSVrun程序进行,GSVrun包括多种常用的对接模式,以及对接过程中的可调参数。详见第4节。
• 输出文件包括每个对接阶段的小分子3D构象输出以及得分。
Bash
usage: ligprep [options] (-ismi|-icsv|-imae|-isd) infile (-osd|-omae) outfile
General:
-h, -help Print brief help message.
-long_help Print exhaustive help message.
-inp <filename> Read arguments from the supplied input file.
-sif_docs List supported simplified input file format (SIF)
keywords.
Ionization:
-epik Use Epik for ionization and tautomerization
(Recommended, overrides -i).
-emb, -epik_metal_binding
Run Epik with the metal_binding option so that states
appropriate for interactions with metal ions in
protein binding pockets are also generated.
-i {0,1,2} Ionization treatment: 0 - do not neutralize or ionize,
1 - neutralize only, 2 - neutralize and ionize
(Default: 1). Note that -epik option overrides -i and
always implies ionization and tautomerization.
-ph <number> Effective/target pH.
-pht <number> pH tolerance for generated structures.
Stereoisomers:
-ac Do not respect existing chirality properties and do
not respect chiralities from the input geometry.
Generate stereoisomers for all chiral centers up to
the number permitted (specified using the -s option).
This is equivalent to "Generate all combinations" in
the Ligand Preparation user interface. Default
behavior is to respect only explicitly indicated
chiralities.
-g Respect chiralities from input geometry when
generating stereoisomers.
-s # Generate up to this many (#) stereoisomers per input
structure. (Default: 32).
Force-field based geometry optimization:
-bff {14,16} Force-field to be used for the final geometry
optimization. Default: 14 (OPLS_2005). For S-OPLS
specify 16.
Standard and job control options:
-NJOBS NJOBS Divide the overall job into NJOBS subjobs.
-NSTRUCTS NSTRUCTS Divide the overall job into subjobs with no more than
NSTRUCTS structures.
-HOST <hostname> Run job remotely on the indicated host entry.
-WAIT Do not return a prompt until the job completes.
-LOCAL Do not use temporary directory for job files. Keep
files in the current directory.
Typical usage:
Sample protonation states and stereoisomers, optimize geometry
using OPLS force-field, and store results in Maestro format:
ligprep <input> -omae outfile.maegz -epik
where <input> is one of:
-ismi infile.smi
-icsv infile.csv
-isd infile.sd
-imae infile.mae
一般情况下默认参数即可,在slurm集群上采用如下代码提交任务(ligprep的cpu利用率不高, 一般建议Njobs设置为cpu核数的2~3倍
):
srun -c 4 -J ${name}_ligprep /home/xiaxichen/hyd/Schrodinger-soft/ligprep -ismi ${name}.smi -omae ${name}.maegz -epik -NJOBS 12 -WAIT
SCHRODINGER本身其实就提供了很多python脚本以实现各种功能
脚本路径:/home/xiaxichen/hyd/Schrodinger-soft/mmshare-v5.4/python/script
脚本执行方式 $SCHRODINGER/run XXX.py [Options]
脚本查询:https://www.schrodinger.com/scriptcenter
Bash
Options:
-v, -version Show the program version and exit.
-h, -help Show this help message and exit.
-o OUT_BASE, -outbase OUT_BASE
Base name for gridgen input to be produced.
-a, -allprint Report all levels of status message.
可以看到参数非常少,连基本的网格大小都不可以设置。所以建议还是在本地创建好grid文件在传到集群上为宜,首先是所需时间其实也非常短,其次就是能够加深对蛋白结构的理解和更多的网格设置选项。
Bash
Warning: The GVSrun need a $compound_library environmental variable, or you can use -d to specify a library.
Perform Virtual Screening Workflow.
Usage: GVSrun [OPTION] <parameter>
Input parameter:
-i Gird file input.
Use a file name (Multiple files are wrapped in "", and split by ' ') or regular expression to represent your input Grid file, default is *.zip.
-D Which databases do you want to screen? The Database basic path is <>.
These database are Supported:
Custom_DB-ab-site-Fast GVSrun GVXrun_help.txt plmd README.md Scripts XDock
-d Provide your own database path, the compounds files are recommended as maegz or SDF file format.
-R Optional, reference Ligands correlated with the grid or use for RMSD calculation.
As *.mae, *.maegz (for query structure) or *.phypo (for pharmacophore hypothesis).
-m Running Mode: a serial combination of different computing tasks.
Show all mode and task in this option using "-O".
@ Available Running Mode: <Fast>
Fast: Fast Virtual Screening.
Normal: Filter The Drug-like compounds and dock screening.
Reference: Virtual Screening with reference ligand restrain.
Prep_Normal: Virtual Screening for un-prepared compounds database.
*Normal_MMGBSA: Virtual Screening and MMGBSA re-scoring.
*Cov_Screening: Virtual Screening to discover covalent durg.
*Induce_Fit_Screening: Induce fit screening.
*QM_Screening: Virtual Screening and QMMM re-docking.
Local: Local docking and screening were carried out using the input ligand structure.
Shape_Screening: perform screening based on reference ligand shape.
Advance: dock screening and clustering.
Advance_Shape_Screening: perform screening based on reference ligand shape.
@ or Custom Running Mode, e.g. -m "EDL+R+HTVS+CD"
Control parameters:
-F Force Fields for docking stage (SP/XP/MMGBSA...), OPLS_2005, OPLS3e or OPLS4. <OPLS4>
-f Force Fields for other stage (HTVS/LigPrep...), OPLS_2005, OPLS3e or OPLS4. <OPLS_2005>
-W Define a Max/Min Molecular weight for MW module, such as "100:400" <100:400>
-T Set a Job Name. Default is "Grid_name-Database_name-Run_Mode".
-C Aattach residue number on receptor, required in Covalent Docking.
e.g. "cys:A:1425", the A is chain name and 145 is the atom number(Heavy atom).
The cys is residue name for A:1425, Supported residues: cys, ser, lys.
-q Define a "DFT:Basis_Set" to QM/QMMM, default is "B3LYP-D3(BJ):6-311G**".
Other QM setting: "B3LYP-D3M(BJ):6-311G+**","M06‑2X:def2-tzvpp(-g)","wB97M‑V:cc-pVTZ-pp"
-p set a PH:PHT for ligPrep, default is 7.0:2.0, means 7.0±2.0. <7.0:2.0>
-s Set a SMARTs Expression for compounds filter at first step. such as [B]([O])[O].
-E Shape Screening Options. <10000:mmod:rapid:1200>
The paramter means: "keep_num:atomtypes:shape_sample_method:max_confs";
NOTE: element and qsar are alternative atomtypes; thorough are alternative shape_sample_method.
OUTPUT parameters:
-a The number of Output compounds per Screening Task. <5%>
e.g. 10% means reatin top 10% compounds.
10000 means reatin top 10000 compounds.
-b The number of Output compounds after Standard docking Task. <4000>
-c The number of conformations generated by each ligand in the docking task. <1>
-e The number of candidates to IFT/CD/MMGBSA/QMMM. <500>
Job control:
-H Host Name of your Queue, defult is CPU.
-N The max number of subjobs. <100>
-G The number of Glide subjobs. <90>
-P The number of Prime subjobs. <10>
-L The number of LigPrep subjobs. <30>
-A The number of phase subjobs. <10>
-Q The number of Qsite subjobs. <10>
-K The number of QIKPROP subjobs. <10>
-M The number of MACROMODEL subjobs. <10>
-S Your Schrodinger path. </home/xiaxichen/hyd/Schrodinger-soft>
GVSrun通过环境变量compound_library来获取系统中化合物数据库的位置;如果不存在该变量需要利用-d参数给定化合物数据库路径(纯SDF或MAEGZ格式的多个文件,不可套叠文件夹
)除此以外,脚本需要知道安装好的薛定谔路径:
以Advance为例
),每个Running_model其实就是个各项任务的组合,对于新的Running_model制定所需要添加的内容有三个部分:方法注释,运行逻辑, 任务运行代码:#===========================
#添加方法注释
#===========================
@Available Running Mode:
Advance: dock screening.
"HTVS_Normal+SP_ExtensionA+SP_Enhanced" #任务组合,代码93行
@Screening: Reduce the number of compounds rough docking. Related to -a option.
HTVS_Normal: The standard HTVS screening.
@Standard_Docking: Standard docking was performed to rank the compounds. Related to -b and -c option.
SP_ExtensionA: Reward intramolecular ligand hydrogen bonds, and accept halogens as H-bond acceptors.
SP_Enhanced: Increase the depth of sampling for larger grid or More accurate poses.
#===========================
#添加运行逻辑
#===========================
Parse_Running_Mode(){
# Parse_Running_Mode Running_Mode
Running_Mode=$1
if [ $Running_Mode == "Normal" ]; then
Pipeline="RDL+HTVS_Normal+QIKPROP+R5R+SP_ExtensionA"
elif [ $Running_Mode == "Prep_Normal" ]; then
Pipeline="No_Dup+RDL+IONIZE+HTVS_Normal+QIKPROP+R5R+SP_ExtensionA"
elif [ $Running_Mode == "Normal_MMGBSA" ]; then
Pipeline="RDL+HTVS_Normal+QIKPROP+R5R+SP_ExtensionA+MMGBSA_EN"
elif [ $Running_Mode == "Reference" ]; then
Pipeline="HTVS_REF+SP_REF+QIKPROP+R5R"
elif [ $Running_Mode == "Induce_Fit_Screening" ]; then
Pipeline="IFT_pre+IFT"
elif [ $Running_Mode == "Cov_Screening" ]; then
Pipeline="R+RDL+HTVS_Normal+SP_ExtensionA+CD"
elif [ $Running_Mode == "Fast" ]; then
Pipeline="HTVS_Normal+SP_Normal"
elif [ $Running_Mode == "QM_Screening" ]; then
Pipeline="RDL+HTVS_Normal+QIKPROP+R5R+SP_ExtensionA+QM_redock+RMSD"
elif [ $Running_Mode == "Shape_Screening" ];then
Pipeline="PhaseShape+HTVS_local+SP_local"
elif [ $Running_Mode == "Local" ];then
Pipeline="HTVS_local+SP_local+MMGBSA_OPT"
elif [ $Running_Mode == "Advance" ];then
Pipeline="HTVS_Normal+SP_ExtensionA+SP_Enhanced"
elif [ $Running_Mode == "Advance_Shape_Screening" ];then
Pipeline="HTVS_Shape+SP_Shape"
else
Pipeline=$Running_Mode
Running_Mode="User_Defined"
fi
}
#===========================
#添加任务运行代码
#===========================
HTVS_Normal(){
KEEP_NUM=`parse_num_or_percentage ${HTVS_out_num}`
cat<<HTVS >> ${Inp_Name}
[STAGE:PRE_DOCK_HTVS]
STAGECLASS gencodes.RecombineStage
INPUTS ${Ligand_name},
OUTPUTS HTVS_RECOMBINE_OUT,
NUMOUT njobs
OUTFORMAT maegz
MIN_SUBJOB_STS 4000
MAX_SUBJOB_STS 40000
GENCODES YES
OUTCOMPOUNDFIELD s_vsw_compound_code
OUTVARIANTFIELD s_vsw_variant
UNIQUEFIELD NONE
[STAGE:DOCK_HTVS]
STAGECLASS glide.DockingStage
INPUTS HTVS_RECOMBINE_OUT, GRID
OUTPUTS HTVS_OUT,
RECOMBINE NO
PRECISION HTVS
UNIQUEFIELD s_vsw_compound_code
${KEEP_NUM}
grid_gen_from_pv:
链接:https://pan.baidu.com/s/1ymh7e2eesWmU5yMFgxK6mA?pwd=6666
GVSrun:
https://pan.baidu.com/s/1wTLhDBruY5F0iA-ndx944A?pwd=6666
流程名 | 速度 | 1万个分子所需时长 |
---|---|---|
ligprep | 1.5 秒/分子 | 4.1 小时/1万分子 |
glide_HTVS | 0.5 秒/分子 | 1.5 小时/1万分子 |
glide_SP | 9 秒/分子 | 25 小时/1万分子 |
glide_XP | 100 秒/分子 | 280 小时/1万分子 |
感谢上海科技大学的王林写的GVSrun脚本并将其开源,用起来非常方便。附上他的github主页:
https://github.com/Wang-Lin-boop