Paper reading (十):Next-generation Machine Learning for Biological Networks

论文题目:Next-generation Machine Learning for Biological Networks

scholar 引用:90

页数:12

发表时间:2018.05

发表刊物:Cell

作者:Diogo M. Camacho, Katherine M. Collins, Rani K. Powers, James C. Costello and James J. Collins

摘要:Keywords: Machine learning, deep learning, systems biology, synthetic biology, network biology, neural network

Machine learning, a collection of data-analysis techniques aimed at building predictive models from multi-dimensional datasets, is becoming integral to modern biological research. By enabling one to generate models that learn from large datasets and make predictions on likely outcomes, machine learning can be used to study complex cellular systems such as biological networks. Here, we provide a prime on machine learning for life scientists, including an introduction to deep learning. We discuss opportunities and challenges at the intersection of machine learning and network biology, which could impact disease biology, drug discovery, microbiome research, and synthentic biology.

结论:

  • the need for massively large datasets
  • Although data captured from biological systems can be incredibly complex, the majority of these datasets are orders of magnitude too small for deep learning algorithms to be applied appropriately.
  • options for above challenge:
  1. invest in the collection of suitably large, well-annotated datasets for state-of-the-art studies in network biology.
  2. generate in silico data with properties of real data (GAN)
  • black box nature of most next-generation machine learning models

Introduction:

  • Applications of machine learning in biology:
  1. genome annotation
  2. predictions of protein binding
  3. the identification of key transcriptional drivers of cancer
  4. predictions of metabolic functions in complex microbial communities
  5. the charaterization of transcriptional regulatory network
  6. and so on...
  • A key advantage is that machine-learning methods can sift through volumes of data to find patterns that would be missed otherwise.
  • Network biology involves the study of the complex interactions of biomolecules that contribute to the structures and functions of living cells.

正文组织架构:

1. Introduction

2. A primer on Machine Learning

  • Basic of Machine Learning
  • Categories of Machine-Learning Methods
  • Applying Machine Learning in Biological Contexts
  • Deep Learning: Next-Generation Machine Learning

3. Intersection of Machine Leraning and Network Biology

  • Disease Biology
  • Drug Discovery
  • Microbiome Research
  • Synthetic Biology
  • Challenges and Future Outlook

正文部分内容摘录:

  •  GANs are deep neural network architectures comprised of two neural networks that are pitted against each other—one is a generative model that produces new data that mimic the distributions of the training dataset, while the other is a discriminative model (the adversary) that evaluates the new data and determines whether or not it belongs to the actual training dataset.
  • In biological applications, features can include one or more types of data, such as gene expression profiles, a genomic sequence, protein-protein interactions, metabolite concentrations, or copy number alterations.
  • Overfitting and underfitting are major causative factors underlying poor performance of machine-learning approaches. 
  • The old computer-science adage of “garbage in, garbage out” was never truer than it is with machine-learning applications. 这个谚语,扎心了。。。
  • feature selection:refer the reader to several excellent articles (Chandrashekar and Sahin, 2014, Domingos, 2012, Guyon and Elisseeff, 2003, Little and Rubin, 1987, Saeys et al., 2007). 值得一看哦
  • Unsupervised techniques can be used in a case where the sample labels are missing or incorrect. 
  • Reverse Engineering Assessment and Methodology (DREAM) 
  • Each DREAM challenge presents the network biology research community with a specific question and the necessary data to address it. 
  • rules of thumb
  1. Simple is often better
  2. Prior knowledge improves performance
  3. Ensemble models produce robust results
  • A key drawback of the deep learning paradigm is that training a deep neural network requires massive datasets of a size often not be attainable in many biological studies.
  • capsule networks allow for the learning of data structures in a manner that preserves hierarchical aspects of the data itself. 
  • Capsule networks are ripe for application in network biology and disease biology given that biological networks are highly modular in nature, with specified layers for the many biomolecules, while allowing each of these layers to interact with other layers. 
  • It is exciting to consider how multi-task learners could be used to bridge the gap between the biological and chemical aspects of drug discovery by incorporating structural data on chemical entities.
  • The human microbiome consists of the microorganisms—bacteria, archaea, viruses, fungi, protozoa—that live on or inside the human body.
  • transfer learning ----the immutable nature of biochemical compounds
  •  a learning model that embeds biological sequences to ones that embed regulatory motifs and circuit structures
  • The generated deep learning model could be used to identify fundamental design principles for synthetic biology. 

你可能感兴趣的:(Paper,Reading,machine,learning,biological,networks)