Variance in OLS/GLM

对于GLM来说，如何估计其prediction 的Confidence Interval？以及如何估计其Coefficients 的Variance？【这个常常在线性模型用以评估其变量的Causal Inference时需要用】都是非常重要的问题。
由于GLM的支持的分布，可以是real continuous number，以及integer（包含binary number）等等，所以对它们Variance的估计就有不同的计算方法。

1、Linear model(OLS)中的variance估计[Homoscedasticity时]

详细可见Variance in Linear Model

理解ols estimator[0]
- a、The MSE of a point estimator is its variance (V) plus the square of its bias.
  优化MSE即是同时优化Variance与Bias。。
- b、OLS的假设：与无关，且无自相关性。
- c、设定：
  ，，，，
  得到：，或矩阵表达：
- d、形式化推导：
  - 1、用矩阵表示residual：
    ：
  - 2、用矩阵形式表达loss。
    带入得到（注意结果为一个标量，与不同）：
    
    由于，因为其结果为scalar，scalar的转置仍然是其本身。
    所以最终要minimize的loss function用矩阵表达如下：
  - 3、要minimize上述表达式，我们需要计算对的偏导。(Matrix Derivatives[13])
    Jacobian:
    Hessian:
    由于对的二阶偏导数矩阵（Hessian Matrix）半正定，所以该函数为凸函数（证明见[12]）。因此对凸函数求minimize ，只需要求解一阶导数Jacobian=0即可
  - 4、使，我们得到normal equation：
  - 5、由此得到close form solution：
  - 6、我们可以通过解析解，推导出的期望：
    假设真实参数为，所以：
    则：【带入】
    
    因此：
    由于
    所以，即OLS为无偏估计量
  - 7、同时，我们也可以推导出其方差：
    
    用矩阵表达为：
    【带入】
    【带入上述计算时，的表达式】
    【由于为对称矩阵，所以其转置等于其自身】
    
    【其中】
    由于在矩阵视角下为对角矩阵（非对角元素为0），对角线上，矩阵的），由于，而，且有同方差性。所以
    因此化简得到：
    
    由于未知，我们通常用样本方差代替，其中为参数数量.
关于缺失变量可能带来的偏差：Omitted Variable Bias[14]
这里与Confounding Bias比较类似（但不完全一致，这里似乎缺失了mediator也会造成bias？）。当我们缺失的变量满足或者时，OLS estimator能保持无偏。
这点非常好理解，假设,，true model中「这里假设的形式更像是mediator」，当我们omit掉，对进行OLS估计时，天然就会计算出，直觉上也好理解，是有偏的。
当然，原问题中，是无偏的，则是无偏的。而当我们在omitted Variable的情况下做回归，是否有偏，还需要证明：
关于缺失变量对是否也带来偏差：
为omitted variable
由
所以

【前一项结合律】

只有在时，才满足无偏，所以：，所以仍然是有偏的。因此，如果我们有omitted variable，则我们是有偏的，所以不能用做causal inference，也是有偏的，所以也不能用于prediction
从另一个角度理解，为何omitted variable产生时 prediction也会有偏？
因为omitted variable可能会导致非同方差性[15]。
其实，我们进行OLS估计时，得到的解的形式并不需要保证同方差性。但是在非同方差性的状态下，我们不满足Gauss Markov Assumptions，所以我们的不是无偏估计[16]。此时我们也可以使用OLS做估计（我们的求解过程并不需要Gauss-Markov假设来化简），但是只有在满足Gauss Markov Assumptions的时候，我们的OLS estimator才是BLUE的。【Best Linear Unbiased Estimator】Best此处指其Variance是最小的。
关于omitted variable情况下，bias的方向问题。positive bias or negative bias，可以见Omitted Variable Bias: The Simple Case
Variance 计算
- a、误差项：
  
  通常被记为.
  由于同方差性，所以，每一个点估计，其误差的方差都是，通常真实的variance：难以计算，所以用其估计值：代替，以下的计算公式中用到的也是。
- b、参数项：
  
  注意，为的矩阵为标量，所以这里得到的是的矩阵，即维系数的协方差矩阵，对角线上第行的元素即为的方差。
- c、estimate项：
  
  【方差性质】
  【带入】
  注意，为某一个样本的取值。
- d、Prediction Interval：[6]
  TODO

2、OLS：Heteroscedasticity时的variance估计

与Homoscedasticity的差异：
非同方差性。
由于我们假设。这里是与OLS假设不同的。

常用建模方式：[10]
lNormal , Exponential, Inverse Gaussian
Estimator
- a、Weighted Least Square，要求我们对有个比较明确的建模，（然后输入模型，表示为weight）。通常需要我们找到一个正比于variance的变量。当且仅当这个变量能比较正确地建模方差variance，才能够解决方差不同性的问题。[16]
- b、White Estimator
  将这个问题视为nuisance，通过修正其估计量的方差来解决，而非建模这个方差。[16]
  见Heteroscedasticity-consistent standard errors[17]
  在非同方差的状态下，仍然为unbiased estimator，但是并不满足BLUE，即此时的variance并不是最小的。并且，由于【第一个等号在=0的情况下成立，而第二个等号仅在同方差时成立】，所以上述的Variance估计是不成立的。
  此处，我们假设来源于不同分布，但其之间互相独立，即没有auto-correlation，所以定义：
  
  因而：
  
  【这一步与之前的推导一致】
  然而，通常如果我们无法准确地获得，所以我们用purely empirical 的方式来估计：即，【即为真实OLS估计后的residual】
  因此：
  
  带入即可获得其Variance：
- c、当然，相较于上述纯empirical的估计方法，也可以加入一些假设，譬如某一部分observations有相同的variance。即group cluster variance。
Variance 计算：
推导见上

3、LR（GLM）中的variance估计[1]

Deviance概念：
当我们拟合GLM模型的时候，不使用MSE，而是使用Deviance?[3]
Deviance是GLM中对RSS（residual sum of squares）在OLS中的一种泛化。
Deviance满足：

通过likelihood 来构建Deviance：

为saturated model（即每个参数表示一个样本）的参数，为模型估计的参数。
- a、对于normal distribution来说，常用
  ，其实就是MSE
- b、对于Bernoulli distribution来说常用
  
  其中【】
与OLS差异性来源：
假设不同：,其中为link function。[1]
1、非同方差性：
比如：Logistics model属于GLM，由于，所以它天然地构建了Variance与Mean的关系，即：，这个关系在OLS中是不存在的，这里天然造成了Heteroscedasticity。
2、同时，由于link function的存在，通常GLM没有Analytical Solution[11]
3、同时，也是由于没有Analytical Solution，所以Variance的推导也比较tricky

常用概率建模方式：[10]
Logit，Porbit，cloglog，Possion[1]
Variance 计算：
- a、参数的方差：[18]
  假设数据服从概率分布，为其概率密度函数PDF。
  为iid采样获得的样本，其似然函数Likelihood function如下:
  - aa、Score Function：log likelihood的一阶导数
    
    性质其期望为0，：
    【期望，概率积分】
    【假设Sample size=1，带入上述表达式】
    【与无关，交换顺序,Leibniz integral rule】
    【pdf积分为常数1】
  - ab、Fisher Information Matrix：
    
    【期望为0，则其二阶矩等于方差】
    【假设sample size=1，带入】
    
    TODO:
    很容易证明对于对数似然损失，Fisher Information 与Hessian相同[20]：
    Expected Fisher Information:
    
    Observerd Fisher Information:（Empirical Fisher Information）
    
    在Matrix Form中，可以通过对数似然loss的Hessian推导而来。[20]
    即：
    【注，由于我们一般都是优化负对数似然，所以负号已经包含在Hessian中了】
  - ab2、Hessian in LR:
    TODO，矩阵推导得到：
    
    其中为带入后得到的对角矩阵。
  - ac、Cramer-Rao bound：[19]
    根据Cramér–Rao bound给出的lower bound of estimator：
    *注：这里是lower bound，所以An unbiased estimator which achieves this lower bound is said to be (fully) efficient
    即：
    注：相同地，在OLS中，其参数的Variance也能用相同的方法推导出来，也是
  - ad、最终Variance的形式：[21]
    因此，对Logistic Regression：
    
    其中
    由于的表达式中有取逆操作，所以一般也没有analytical form，都是通过numerical的方法来解得。
- b、预估值的方差：
  对于Categorical Dependent Variable（outcome Y是一个类别变量）的情况下，有四种办法可以计算其置信区间。
  - ba、前言：Maximum Likelihood（在Probability估计中不可用）[5]
    Linear Model中可用：
    
    其中为样本点，是covariance matrix of regression coefficients：即，其中为样本，是预估值的covariance矩阵，实际计算可见[4]
  - bb、Endpoint Transformation [8]
    根据Maximum Likelihood估计其中线性项的Variance：，然后获得其线性项的Confidence Interval：，再将其转换到概率维度的空间中[4]，只要转换函数为单调的即可，得到：，例如logistic function：
    注意：这种方式计算出来不会越界，但是需要 outcome of interest is monotonic of the linear combination
  - bc、Delta method
    TODO。
  - bd、Bootstrap method
    从sample中多次采样样本，多次拟合模型，并且多次估计样本，然后通过样本的多次估计，来模拟从population中采样造成的variability。缺点就是非常耗时。

运用

1、计算propensity score的时候，如何评估我们模型variance带来的影响？
要求无偏吗？
为什么要用semi-parametric的方法？

2、模型计算

Refer

[0]MSE
https://study.com/academy/lesson/properties-of-point-estimators.html

[1]GLM
引子，https://stats.stackexchange.com/questions/402584/why-does-logistic-regression-not-have-variance-but-have-deviance

GLM差异性来源，Modeling probabilities：https://web.stanford.edu/class/stats191/notebooks/Logistic.html
常用建模方式：见最后，Logit，Porbit，cloglog

常用link function：
https://en.wikipedia.org/wiki/Generalized_linear_model#Link_function

[2]Confidence Interval of Coefficient
其实参数的CI很重要，譬如我们在进行Causal Effect的估计时，我们用来导出结论的是Treatment变量的系数，那么知道这个系数的CI便很重要。
https://stats.stackexchange.com/questions/354098/calculating-confidence-intervals-for-a-logistic-regression

[3]Deviance:
https://en.wikipedia.org/wiki/Deviance_(statistics)

[4]Confidence Interval for Binary Classifier（such as Logistic Regression），in Practice
Endpoint Transformation & Delta Method：
https://stats.stackexchange.com/questions/163824/different-ways-to-produce-a-confidence-interval-for-odds-ratio-from-logistic-reg
以及：
Confidence intervals for predicted outcomes in regression models for categorical outcomes
以及：
Confidence Intervals for the Odds Ratio in Logistic Regression with One Binary X

[5]
线性模型的一些假设，变量命名，以及推导见：Applied Linear Models

[6]Prediction Interval
http://web.vu.lt/mif/a.buteikis/wp-content/uploads/PE_Book/3-7-UnivarPredict.html

[7]Confidence Interval
http://web.vu.lt/mif/a.buteikis/wp-content/uploads/PE_Book/3-5-UnivarConfInt.html

[8]
7.1章：Endpoint Transformation
Confidence intervals for predicted outcomes in regression models for categorical outcomes

[10]
GLM,Link Function
https://en.wikipedia.org/wiki/Generalized_linear_model

[11]LR has no Analytical(close form) Solution
https://stats.stackexchange.com/questions/455698/why-does-logistic-regressions-likelihood-function-have-no-closed-form

[12]证明
Hessian matrix 半正定

1、为full rank 矩阵。见：https://stats.stackexchange.com/questions/174775/full-rank-assumption-in-the-linear-regression-model-explanation
2、为正定矩阵。见：Econometrics (Greene)
Chapter 3 Least square, Page 21

[13] Matrix Derivative
OLS in Matrix Form：page2 bottom

[14]
OLS in Matrix Form：Omitted Variable Bi

[15]
Omitted Variable bias：
https://statisticsbyjim.com/regression/confounding-variables-bias/#:~:text=Omitted%20variable%20bias%20occurs%20when,which%20biases%20the%20coefficient%20estimates.

[16]
在OLS in Matrix Form
Gauss-Markov 假设见如下章节：
4、The Gauss-Markov Assumptions
5、The Gauss-Markov Theorem
检验同方差性(不同方差状态下的解决办法)，见如下章节：
6、Robust (Huber of White) Standard Errors

[17]
Weight Estimator：
https://en.wikipedia.org/wiki/Heteroscedasticity-consistent_standard_errors

[18]
LR中参数的 covariance matrix：
David W. Hosmer Applied Logistic Regression
P35

[19]
Fisher Information的意义
https://www.zhihu.com/question/26561604

[20]
Fisher Information
score function and I() proof:
https://en.wikipedia.org/wiki/Fisher_information

[21]
此时Variance就是Hessian Matrix求逆。
Lecture 26 — Logistic regression

Variance in OLS/GLM

1、Linear model(OLS)中的variance估计[Homoscedasticity时]

2、OLS：Heteroscedasticity时的variance估计

3、LR（GLM）中的variance估计[1]

运用

你可能感兴趣的:(Variance in OLS/GLM)