一般线性模型和混合线性模型_从零开始的线性混合模型

一般线性模型和混合线性模型

生命科学的数学统计和机器学习 (Mathematical Statistics and Machine Learning for Life Sciences)

This is the eighteenth article from the column Mathematical Statistics and Machine Learning for Life Sciences where I try to explain some mysterious analytical techniques used in Bioinformatics and Computational Biology in a simple way. Linear Mixed Model (also called Linear Mixed Effects Model) is widely used in Life Sciences, there are many tutorials showing how to run the model in R, however it is sometimes unclear how exactly the Random Effects parameters are optimized in the likelihood maximization procedure. In my previous post How Linear Mixed Model Works I gave an introduction to the concepts of the model, and in this tutorial we will derive and code the Linear Mixed Model (LMM) from scratch applying the Maximum Likelihood (ML) approach, i.e. we will use plain R to code LMM and compare the output with the one from lmer and lme R functions. The goal of this tutorial is to explain LMM “like for my grandmother” implying that people with no mathematical background should be able to understand what LMM does under the hood.

这是生命科学的数学统计和机器学习专栏中的第18条文章,我试图以一种简单的方式来解释一些在生物信息学和计算生物学中使用的神秘分析技术。 线性混合模型 (也称为线性混合效应模型)在生命科学中被广泛使用,有许多教程展示了如何在R中运行模型,但是有时不清楚在似然最大化过程中如何精确优化随机效应参数。 在我以前的文章《线性混合模型的工作原理》中,我介绍了模型的概念,在本教程中,我们将使用最大似然(ML)方法从头获得并编码线性混合模型(LMM),即我们将使用普通R编码LMM并将输出与lmerlme R函数的输出进行比较。 本教程的目的是“像祖母一样”解释LMM,这意味着没有数学背景的人应该能够理解LMM 幕后的工作

玩具数据集 (Toy Data Set)

Let us consider a toy data set which is very simple but still keeps all necessary elements of the typical setup for Linear Mixed Modelling (LMM). Suppose we have only 4 data points / samples: 2 originating from Individual #1 and the other 2 coming from Individual #2. Further, the 4 points are spread between two conditions: untreated and treated. Let us assume we measure a response (Resp) of each individual to the treatment, and would like to address whether the treatment resulted in a significant response of the individuals in the study. In other words, we are aiming to implement something similar to the paired t-test and assess the significance of treatment. Later we will relate the outputs from LMM and paired t-test and show that they are indeed identical. In the toy data set, 0 in the Treat column implies “untreated”, and 1 means “treated”. First, we will use a naive Ordinary Least Squares (OLS) linear regression that does not take relatedness between the data points into account.

让我们考虑一个非常简单的玩具数据集 ,但它仍然保留了线性混合建模(LMM)典型设置的所有必要元素。 假设我们只有4个数据点 /样本 :2 个数据源于#1个人 ,另外2 个数据源于#2个人 。 此外,这四个点分布在两个条件之间: 未处理和已处理 。 让我们假设我们测量了每个个体对治疗的React( Resp ),并想说明治疗是否导致研究中个体的显着React。 换句话说,我们的目标是实施类似于 配对t检验的方法,并评估治疗的重要性。 稍后,我们将把LMM和配对t检验的输出相关联,并证明它们确实是 相同的 。 在玩具的数据集,0在款待列意味着“未处理”,1分表示“经处理的”。 首先,我们将使用朴素的普通最小二乘(OLS) 线性回归 ,该回归不考虑数据点之间的相关性。

一般线性模型和混合线性模型_从零开始的线性混合模型_第1张图片
一般线性模型和混合线性模型_从零开始的线性混合模型_第2张图片

Technically it works, however, this is not a good fit, we have a problem here. Ordinary Least Squares (OLS) linear regression assumes that all observations (data points on the plot) are independent, that should result in uncorrelated and hence normally distributed residuals. However, we know that the data points on the plot belong to 2 individuals, i.e. 2 points for each individual. In principal, we can fit a linear model for each individual separately. However, this is not a good fit either. We have two points for each individual, so too few to make a reasonable fit for each individual. In addition, as we saw previously individual fits do not say much about the overall / population profile as some of them may have opposite behavior compared to the rest of individual fits.

从技术上讲,它可以正常工作,但是,这不是一个很好的选择, 我们 在这里 遇到了问题 。 普通最小二乘(OLS)线性回归假设所有观测值(图中的数据点)都是独立的 ,这将导致不相关且因此呈正态分布的残差 。 但是,我们知道图中的数据点属于2个个体,即每个个体2个点。 原则上,我们可以为每个人 分别拟合线性模型。 但是,这也不是一个很好的选择。 每个人都有两个要点,因此太少而不能合理地适合每个人。 此外,正如我们之前看到的那样,个体拟合并没有对总体/人口状况说太多,因为与其他个体拟合相比,其中一些可能具有相反的行为。

你可能感兴趣的:(机器学习,python,人工智能,tensorflow,神经网络)