徐优俊, 裴剑锋
北京大学前沿交叉学科研究院定量生物学中心,北京 100871
中图分类号:TP301 文献标识码:A
Deep learning for chemoinformatics
XU Youjun, PEI Jianfeng
Center for Quantitative Biology, Academy for Advanced Interdisciplinary Studies, Peking University, Beijing 100871, China
Abstract: Deep learning have been successfully used in computer vision,speech recognition and natural language processing,leading to the rapid development of artificial intelligence.The key technology of deep learning was also applied to chemoinformatics,speeding up the implementation of artificial intelligence in chemistry.As developing quantitative structure-activity relationship model is one of major tasks for chemoinformatics,the application of deep learning technology in QSAR research was focused.How three kinds of deep learning frameworks,namely,deep neural network,convolution neural network,and recurrent or recursive neural network were applied in QSAR was discussed.A perspective on the future impact of deep learning on chemoinformatics was given.
Key words: deep learning, artificial intelligence, quantitative structure-activity relationship, chemoinformatics
论文引用格式:徐优俊, 裴剑锋. 深度学习在化学信息学中的应用[J], 大数据, 2017, 3(2): 45-66.
XU Y J, PEI J F. Deep learning for chemoinformatics[J]. Big Data Research, 2017, 3(2): 45-66.
4 深度学习框架的对比与分析
● 随着数据集的增多以及多样化,研究人员逐渐倾向于使用多任务模型的训练策略,多任务学习中迁移学习的概念被应用到了数据较少的数据集中,提高对该任务的预测能力。多任务学习模型的评估方法大多是基于AUC的,说明多任务模型目前只适用于分类问题,在多任务的回归模型的问题上,还有待开发出更好的训练手段和策略。
● ReLU目前是在QSAR中最常用的一种训练技术,在DNN和CNN框架中基本都使用了该技术。发展更好、更快的训练。
5 总结与展望
[1] HINTON G, DENG L, YU D, et al. Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups[J]. IEEE Signal Processing Magazine, 2012, 29(6):82-97.
[2] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks[J]. Advances in Neural Information Processing Systems, 2012, 25(2):1097-1105.
[3] COLLOBERT R, WESTON J. A unified architecture for natural language processing: deep neural networks with multitask learning[C]// The 25th International Conference on Machine Learning, July5-9, 2008, Helsinki, Finland. New York: ACM Press, 2008: 160-167.
[4] GAWEHN E, HISS J A, SCHNEIDER G. Deep learning in drug discovery[J]. Molecular Informatics, 2016, 35(1):3-14.
[5] RAGHU M, POOLE B, KLEINBERG J, et al. On the expressive power of deep neural networks[J]. 2016: arXiv:1606.05336.
[6] HINTON G E, OSINDERO S, TEH YW. A fast learning algorithm for deep belief nets[J]. Neural Computation, 2006, 18(7):1527-1554.
[7] SRIVASTAVA N, HINTON G E, KRIZHEVSKY A, et al. Dropout: a simple way to prevent neural networks from overfitting[J]. Journal of Machine Learning Research, 2014, 15(1):1929-1958.
[8] IOFFE S, SZEGEDY C. Batch normalization: accelerating deep network training by reducing internal covariate shift[J]. 2015: arXiv:1502.03167.
[9] GLOROT X, BORDES A, BENGIO Y. Deep sparse rectifier neural networks[C]//The 14th International Conference on Artificial Intelligence and Statistics,April 11-13, 2011, Fort Lauderdale, USA.[S.l.:s.n.],2011: 315-323.
[10] DUCHI J, HAZAN E, SINGER Y. Adaptive subgradient methods for online learning and stochastic optimization[J]. Journal of Machine Learning Research, 2011, 12(7):2121-2159.
[11] ZEILER M D. ADADELTA: an adaptive learning rate method[J]. 2012: arXiv:1212.5701.
[12] KINGMA D, BA J. Adam: a method for stochastic optimization[J]. 2014: arXiv:1412.6980.
[13] MIKOLOV T, KARAFIÁT M, BURGET L, et al. Recurrent neural network based language model[C]//The11th Annual Conference of the International Speech Communication Association, September 26-30, 2010, Makuhari, Chiba.[S.l.:s.n.], 2010: 1045-1048.
[14] WU Y, SCHUSTER M, CHEN Z, et al. Google's neural machine translation system: bridging the gap between human and machine translation[J]. 2016: arXiv:1609.08144.
[15] VINCENT P, LAROCHELLE H, LAJOIE I, et al. Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion[J]. Journal of Machine Learning Research, 2010, 11(12):3371-3408.
[16] SOCHER R. Recursive deep learning for natural language processing and computer vision.Citeseer, 2014(8): 1.
[17] HOCHREITER S, SCHMIDHUBER J. Long short-term memory[J]. Neural Computation, 1997, 9(8):1735-1780.
[18] 孙潭霖, 裴剑锋. 大数据时代的药物设计与药物信息[J]. 科学通报, 2015(8):689-693.
[19] SVETNIK V, LIAW A, TONG C, et al. Random forest: a classification and regression tool for compound classification and QSAR modeling[J]. Journal of Chemical Information and Computer Sciences, 2003, 43(6):1947-1958.
[20] RUPP M, TKATCHENKO A, MÜLLER KR, et al. Fast and accurate modeling of molecular atomization energies with machine learning[J]. Physical Review Letters, 2012, 108(5):3125-3130.
[21] RACCUGLIA P, ELBERT K C, ADLER P D F, et al. Machine-learning-assisted materials discovery using failed experiments[J]. Nature, 2016, 533(7601):73-76.
[22] DU H, WANG J, HU Z, et al. Prediction of fungicidal activities of rice blast disease based on least-squares support vector machines and project pursuit regression[J]. Journal of Agricultural and Food Chemistry, 2008, 56(22):10785-10792.
[23] LECUN Y, BENGIO Y, HINTON G. Deep learning[J]. Nature, 2015, 521(7553):436-444.
[24] JAITLY N, NGUYEN P, SENIOR A W, et al. Application of pretrained deep neural networks to large vocabulary speech recognition[C]//The13th Annual Conference of the International Speech Communication Association,September 9-13, 2012, Portland, OR, USA. [S.l.:s.n.],2012: 1-4.
[25] DAHL G E, YU D, DENG L, et al. Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition[J]. IEEE Transactions on Audio, Speech, and Language Processing, 2012, 20(1):30-42.
[26] GRAVES A, MOHAMED AR, HINTON G. Speech recognition with deep recurrent neural networks[C]//2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),May 26-31, 2013, Vancouver, BC, Canada. New Jersey: IEEE Press, 2013: 6645-6649.
[27] DENG L, YU D, DAHL G E. Deep belief network for large vocabulary continuous speech recognition: 8972253[P]. 2015-03-03.
[28] GAO J, HE X, DENG L. Deep learning for web search and natural language processing[R]. Redmond:Microsoft Research, 2015.
[29] MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality[J]. Advances in Neural Information Processing Systems, 2013: arXiv:1310.4546.
[30] SOCHER R, LIN C C, MANNING C, et al. Parsing natural scenes and natural language with recursive neural networks[C]//The 28th International Conference on MACHINE LEARNing (ICML-11), June 28-July 2, 2011, Bellevue, Washington, USA. [S.l.:s.n.], 2011:129-136.
[31] HE K, ZHANG X, REN S, et al. Delving deep into rectifiers: surpassing human-level performance on imagenet classification[C]//The IEEE International Conference on Computer Vision,December 13-16, 2015, Santiago, Chile. New Jersey: IEEE Press, 2015: 1026-1034.
[32] SZEGEDY C, LIU W, JIA Y, et al. Going deeper with convolutions[C]//The IEEE Conference on Computer Vision and Pattern Recognition, June 7-12, 2015, Boston, MA, USA. New Jersey: IEEE Press, 2015: 1-9.
[33] RUSSAKOVSKY O, DENG J, SU H, et al. Imagenet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115(3):211-252.
[34] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//The IEEE Conference on Computer Vision and Pattern Recognition, June 27-30, 2016, Las Vegas, NV, USA. New Jersey: IEEE Press, 2016: 770-778.
[35] MARKOFF J. Scientists see promise in deep-learning programs[N]. New York Times, 2012-10-25.
[36] CARHART R E, SMITH D H, VENKATARAGHAVAN R. Atom pairs as molecular features in structure-activity studies: definition and applications[J]. Journal of Chemical Information and Computer Sciences, 1985, 25(2):64-73.
[37] KEARSLEY S K, SALLAMACK S, FLUDER E M, et al. Chemical similarity using physiochemical property descriptors[J]. Journal of Chemical Information and Computer Sciences, 1996, 36(1):118-127.
[38] RUMELHART D E, HINTON G E, WILLIAMS R J. Learning representations by back-propagating errors[J]. Cognitive Modeling, 1988, 5(3):1.
[39] MA J, SHERIDAN R P, LIAW A, et al. Deep neural nets as a method for quantitative structure-activity relationships[J]. Journal of chemical information and modeling, 2015, 55(2):263-274.
[40] DAHL G E, JAITLY N, SALAKHUTDINOV R. Multi-task neural networks for QSAR predictions[J]. 2014: arXiv:1406.1231.
[41] EVGENIOU T, PONTIL M. Regularized multi--task learning[C]//The 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,August 22 - 25, 2004, Seattle, WA, USA. New York: ACM Press,2004: 109-117.
[42] MAURI A, CONSONNI V, PAVAN M, et al. Dragon software: an easy approach to molecular descriptor calculations[J]. Match, 2006, 56(2):237-248.
[43] SNOEK J, LAROCHELLE H, ADAMS R P. Practical bayesian optimization of machine learning algorithms[J]. Advances in Neural Information Processing Systems, 2012: arXiv:1206.2944.
[44] SNOEK J, SWERSKY K, ZEMEL R S, et al. Input warping for bayesian optimization of non-stationary functions[C]//International Conference on Machine Learning,June 21-26, 2014, Beijing, China. [S.l.:s.n.], 2014: 1674-1682.
[45] FRIEDMAN J H. Greedy function approximation: a gradient boosting machine[J]. Annals of Statistics, 2001, 29(5):1189-1232.
[46] UNTERTHINER T, MAYR A, KLAMBAUER G, et al. Multi-task deep networks for drug target prediction[J]. Neural Information Processing System, 2014: 1-4.
[47] GAULTON A, BELLIS L J, BENTO A P, et al. ChEMBL: a large-scale bioactivity database for drug discovery[J]. Nucleic Acids Research, 2012, 40(D1):D1100-D1107.
[48] ROGERS D, HAHN M. Extended-connectivity fingerprints[J]. Journal of Chemical Information and Modeling, 2010, 50(5):742-754.
[49] HARPER G, BRADSHAW J, GITTINS J C, et al. Prediction of biological activity for high-throughput screening using binary kernel discrimination[J]. Journal of Chemical Information and Computer Sciences, 2001, 41(5):1295-1300.
[50] LOWE R, MUSSA H Y, NIGSCH F, et al. Predicting the mechanism of phospholipidosis[J]. Journal of Cheminformatics, 2012, 4(1):2.
[51] XIA X, MALISKI E G, GALLANT P, et al. Classification of kinase inhibitors using a Bayesian model[J]. Journal of Medicinal Chemistry, 2004, 47(18):4463-4470.
[52] KEISER M J, ROTH B L, ARMBRUSTER B N, et al. Relating protein pharmacology by ligand chemistry[J]. Nature Biotechnology, 2007, 25(2):197-206.
[53] WANG Y, SUZEK T, ZHANG J, et al. PubChem bioassay: 2014 update[J]. Nucleic Acids Research, 2014, 42(Database Issue):1075-1082.
[54] ROHRER S G, BAUMANN K. Maximum unbiased validation (MUV) data sets for virtual screening based on PubChem bioactivity data[J]. Journal of Chemical Information and Modeling, 2009, 49(2):169-184.
[55] MYSINGER M M, CARCHIA M, IRWIN J J, et al. Directory of useful decoys, enhanced (DUD-E): better ligands and decoys for better benchmarking[J]. Journal of medicinal chemistry, 2012, 55(14):6582-6594.
[56] RAMSUNDAR B, KEARNES S, RILEY P, et al. Massively multitask networks for drug discovery[J]. 2015: arXiv:1502.02072.
[57] MAYR A, KLAMBAUER G, UNTERTHINER T, et al. DeepTox: toxicity prediction using deep learning[J]. Frontiers in Environmental Science, 2016, 3(8):80.
[58] KAZIUS J, MCGUIRE R, BURSI R. Derivation and validation of toxicophores for mutagenicity prediction[J]. Journal of medicinal chemistry, 2005, 48(1):312-320.
[59] FRIEDMAN J, HASTIE T, TIBSHIRANI R. Regularization paths for generalized linear models via coordinate descent[J]. Journal of Statistical Software, 2010, 33(1):1.
[60] SIMON N, FRIEDMAN J, HASTIE T, et al. Regularization paths for Cox’s proportional hazards model via coordinate descent[J]. Journal of Statistical Software, 2011, 39(5):1.
[61] DUVENAUD D K, MACLAURIN D, IPARRAGUIRRE J, et al. Convolutional networks on graphs for learning molecular fingerprints[J]. Advances in Neural Information Processing Systems, 2015: arXiv:1509.09292.
[62] GRAVES A, WAYNE G, DANIHELKA I. Neural turing machines[J]. 2014: arXiv:1410.5401.
[63] MORGAN H L. The generation of a unique machine description for chemical structures-a technique developed at chemical abstracts service[J]. Journal of Chemical Documentation, 1965, 5(2):107-113.
[64] DELANEY J S. ESOL: estimating aqueous solubility directly from molecular structure[J]. Journal of Chemical Information and Computer Sciences, 2004, 44(3):1000-1005.
[65] GAMO F-J, SANZ L M, VIDAL J, et al. Thousands of chemical starting points for antimalarial lead identification[J]. Nature, 2010, 465(7296):305-310.
[66] HACHMANN J, OLIVARES-AMAYA R, ATAHAN-EVRENK S, et al. The Harvard clean energy project: large-scale computational screening and design of organic photovoltaics on the world community grid[J]. The Journal of Physical Chemistry Letters, 2011, 2(17):2241-2251.
[67] KEARNES S, MCCLOSKEY K, BERNDL M, et al. Molecular graph convolutions: moving beyond fingerprints[J]. Journal of Computer-Aided Molecular Design, 2016, 30(8):595-608.
[68] HUGHES T B, MILLER G P, SWAMIDASS S J. Modeling epoxidation of drug-like molecules with a deep machine learning network[J]. ACS Central Science, 2015, 1(4):168-180.
[69] HUGHES T B, MILLER G P, SWAMIDASS S J. Site of reactivity models predict molecular reactivity of diverse chemicals with glutathione[J]. Chemical research in toxicology, 2015, 28(4):797-809.
[70] WALLACH I, DZAMBA M, HEIFETS A. AtomNet: a deep convolutional neural network for bioactivity prediction in structure-based drug discovery[J]. 2015: arXiv:1510.02855.
[71] KOES D R, BAUMGARTNER M P, CAMACHO C J. Lessons learned in empirical scoring with smina from the CSAR 2011 benchmarking exercise[J]. Journal of Chemical Information and Modeling, 2013, 53(8):1893-1904.
[72] GABEL J, DESAPHY J R M, ROGNAN D. Beware of machine learning-based scoring functionson the danger of developing black boxes[J]. Journal of Chemical Information and Modeling, 2014, 54(10):2807-2815.
[73] SPITZER R, JAIN A N. Surflex-Dock: docking benchmarks and real-world application[J]. Journal of Computer-Aided Molecular Design, 2012, 26(6):687-699.
[74] COLEMAN R G, STERLING T, WEISS D R. SAMPL4 & DOCK3. 7: lessons for automated docking procedures[J]. Journal of Computer-Aided Molecular Design, 2014, 28(3):201-209.
[75] ALLEN W J, BALIUS T E, MUKHERJEE S, et al. DOCK 6: impact of new features and current docking performance[J]. Journal of Computational Chemistry, 2015, 36(15):1132-1156.
[76] PEREIRA J C, CAFFARENA E R, DOS SANTOS C N. Boosting docking-based virtual screening with deep learning[J]. Journal of Chemical Information and Modeling, 2016: arXiv:1608.04844.
[77] LUSCI A, POLLASTRI G, BALDI P. Deep architectures and deep learning in chemoinformatics: the prediction of aqueous solubility for drug-like molecules[J]. Journal of Chemical Information and Modeling, 2013, 53(7):1563-1575.
[78] JAIN N, YALKOWSKY S H. Estimation of the aqueous solubility I: application to organic nonelectrolytes[J]. Journal of Pharmaceutical Sciences, 2001, 90(2):234-252.
[79] LOUIS B, AGRAWAL V K, KHADIKAR P V. Prediction of intrinsic solubility of generic drugs using MLR, ANN and SVM analyses[J]. European Journal of Medicinal Chemistry, 2010, 45(9):4018-4025.
[80] AZENCOTT C-A, KSIKES A, SWAMIDASS S J, et al. One-to four-dimensional kernels for virtual screening and the prediction of physical, chemical, and biological properties[J]. Journal of Chemical Information and Modeling, 2007, 47(3):965-974.
[81] FRÖHLICH H, WEGNER J K, ZELL A. Towards optimal descriptor subset selection with support vector machines in classification and regression[J]. QSAR & Combinatorial Science, 2004, 23(5):311-318.
[82] XU Y, DAI Z, CHEN F, et al. Deep learning for drug-induced liver injury[J]. Journal of Chemical Information and Modeling, 2015, 55(10):2085-2093.
[83] LAKE B M, SALAKHUTDINOV R, TENENBAUM J B. Human-level concept learning through probabilistic program induction[J]. Science, 2015, 350(6266):1332-1338.
[84] ALTAE-TRAN H, RAMSUNDAR B, PAPPU A S, et al. Low data drug discovery with one-shot learning[J]. 2016: arXiv:1611.03199.
[85] KUHN M, LETUNIC I, JENSEN L J, et al. The SIDER database of drugs and side effects[J]. Nucleic Acids Research,2015, 44(D1):D1075.
[86] GÓMEZ-BOMBARELLI R, DUVENAUD D, HERNÁNDEZ-LOBATO J M, et al. Automatic chemical design using a data-driven continuous representation of molecules[J]. 2016: arXiv:1610.02415.
[87] SEGLER M H S, KOGEJ T, TYRCHAN C, et al. Generating focussed molecule libraries for drug discovery with recurrent neural networks[J]. 2017: arXiv:1701.01329.