In ordinary least-squares, the residual associated with the i i i-th observation. The term residual comes from the residual sum of squares (RSS), which is defined as
r i = y i − f ( x i ) R S S = ∑ i = 1 n r i 2 r_i = y_i - f(x_i)\\ RSS=\sum_{i=1}^n r_i^2 ri=yi−f(xi)RSS=i=1∑nri2
where the residual r i r_i ri is defined as the difference between predicted values f ( x i ) f(x_i) f(xi) from the observed value y i y_i yi.
For GLMs, there are several ways for specifying residuals. We will cover four types of residuals: response residuals, working residuals, Pearson residuals, and, deviance residuals.
For type = "response"
, the conventional residual on the response level is computed, that is,
r i = y i − f ^ ( x i ) r_i=y_i−\hat f(x_i) ri=yi−f^(xi)
# type = "response"
res.response1 <- residuals(model.pois, type = "response")
res.response2 <- expected - estimates
all.equal(res.response1, res.response2)
## [1] TRUE
For type = "working"
, the residuals are normalized by the estimates f ^ ( x i ) \hat f(x_i) f^(xi):
r i = y i − f ^ ( x i ) f ^ ( x i ) r_i=\frac{y_i−\hat f(x_i)}{\hat f(x_i)} ri=f^(xi)yi−f^(xi)
# type = "working"
res.working1 <- residuals(model.pois, type="working")
res.working2 <- (expected - estimates) / estimates
all.equal(res.working1, res.working2)
## [1] TRUE
For type = "pearson"
, the Pearson residuals are computed. They are obtained by normalizing the residuals by the square root of the estimate:
r i = y i − f ^ ( x i ) f ( x i ) r_i=\frac{y_i−\hat f(x_i)}{\sqrt{f(x_i)}} ri=f(xi)yi−f^(xi)
# type = "pearson"
res.pearson1 <- residuals(model.pois, type="pearson")
res.pearson2 <- (expected - estimates) / sqrt(estimates)
all.equal(res.pearson1, res.pearson2)
## [1] TRUE
Deviance residuals are defined by the deviance.
The deviance of a model is given by
D ( y , μ ^ ) = 2 ( log ( p ( y ∣ θ ^ s ) ) − log ( p ( y ∣ θ ^ 0 ) ) ) D(y,\hat μ)=2(\log (p(y∣\hat θ_s))−\log (p(y∣\hat θ_0))) D(y,μ^)=2(log(p(y∣θ^s))−log(p(y∣θ^0)))
where
The deviance indicates the extent to which the likelihood of the saturated model exceeds the likelihood of the proposed model. If the proposed model has a good fit, the deviance will be small. If the proposed model has a bad fit, the deviance will be high. For example, for the Poisson model, the deviance is
D = 2 ⋅ ∑ i = 1 n [ y i ⋅ log ( y i μ ^ i ) − ( y i − μ ^ i ) ] D=2 \cdot \sum_{i=1}^n \left[ y_i \cdot \log (\frac{y_i}{\hat μ_i})−(y_i−\hat μ_i) \right] D=2⋅i=1∑n[yi⋅log(μ^iyi)−(yi−μ^i)]
具体推导过程参考: Poission regression deviance 的推导
首先假设 x x x只有一个位置参数, Y Y Y 服从Poission distribution,并且 Y Y Y 的均值是一个关于 x x x 的函数,即
E ( Y ∣ x ) = λ ( x ) \mathbb E(Y|x) = \lambda (x) E(Y∣x)=λ(x)
对于Poission regression用 log \log log 作为link function,即 log ( λ ( x ) ) = β 0 + β 1 x \log(\lambda(x))=\beta_0 + \beta_1 x log(λ(x))=β0+β1x
那么Likelihood function就是 L ( β 0 , β 1 ; y i ) = ∏ i = 1 n e − λ ( x i ) [ λ ( x i ) y i ] y i ! = ∏ i = 1 n e e − β 0 + β 1 x i [ e β 0 + β 1 x i ] y i ] y i ! L(\beta_0, \beta_1; y_i) = \prod \limits_{i=1}^n \frac{e^{-\lambda(x_i)}[\lambda(x_i)^{y_i}]}{y_i !} = \prod \limits_{i=1}^n \frac{e^{e^{-\beta_0 + \beta_1 x_i}}[e^{\beta_0 + \beta_1 x_i}]^{y_i}]}{y_i !} L(β0,β1;yi)=i=1∏nyi!e−λ(xi)[λ(xi)yi]=i=1∏nyi!ee−β0+β1xi[eβ0+β1xi]yi]
Log likelihood function就是:
l ( β 0 , β 1 ; y i ) = ∑ i = 1 n y i ( β 0 + β 1 x i ) − ∑ i = 1 n e β 0 + β 1 x i − ∑ i = 1 n log ( y i ! ) ( 1 ) l(\beta_0, \beta_1; y_i) = \sum \limits_{i=1}^n y_i(\beta_0 + \beta_1 x_i) - \sum \limits_{i=1}^n e^{\beta_0 + \beta_1 x_i} - \sum \limits_{i=1}^n \log (y_i !) \quad \quad (1) l(β0,β1;yi)=i=1∑nyi(β0+β1xi)−i=1∑neβ0+β1xi−i=1∑nlog(yi!)(1)
写成用 μ ^ i \hat \mu_i μ^i表示均值预测值的形式,即
l ( μ ^ ) = ∑ i = 1 n y i log ( μ ^ i ) − ∑ i = 1 n μ ^ i − ∑ i = 1 n log ( y i ! ) ( 1 ′ ) l(\hat \mu) = \sum \limits_{i=1}^n y_i \log(\hat \mu_i) - \sum \limits_{i=1}^n {\hat \mu_i} - \sum \limits_{i=1}^n \log (y_i !) \quad \quad (1') l(μ^)=i=1∑nyilog(μ^i)−i=1∑nμ^i−i=1∑nlog(yi!)(1′)
那么,saturated model 的 log likelihood function 即为:
l ( μ ) = ∑ i = 1 n y i log ( μ i ) − ∑ i = 1 n μ i − ∑ i = 1 n log ( y i ! ) ( 2 ) l(\mu) = \sum \limits_{i=1}^n y_i \log(\mu_i) - \sum \limits_{i=1}^n {\mu_i} - \sum \limits_{i=1}^n \log (y_i !) \quad \quad (2) l(μ)=i=1∑nyilog(μi)−i=1∑nμi−i=1∑nlog(yi!)(2)
欲求(2)式中 μ i \mu_i μi的最优估计,即对(2)求偏导数,并令其等于0,即:
∂ ∂ μ i l ( μ ) = y i μ i − 1 = 0 \frac{\partial}{\partial \mu_i}l(\mu) = \frac{y_i}{\mu_i} - 1=0 ∂μi∂l(μ)=μiyi−1=0
解得,’‘saturated model’’ 的参数估计值为:
μ ^ i = y i \hat \mu_i = y_i μ^i=yi
将上面的结果带入到(2)式中,即得“saturated model”的likelihood function:
l ( μ ^ ) = ∑ i = 1 n y i log ( y i ) − ∑ i = 1 n y i − ∑ i = 1 n log ( y i ! ) ( 3 ) l(\hat \mu) = \sum \limits_{i=1}^n y_i \log(y_i) - \sum \limits_{i=1}^n {y_i} - \sum \limits_{i=1}^n \log (y_i !) \quad \quad (3) l(μ^)=i=1∑nyilog(yi)−i=1∑nyi−i=1∑nlog(yi!)(3)
注意:这仅仅对于分类模型,有时也用 (D)表示Deviance, (D∗)表示Scaled Deviance,即D除以上一个尺度参数。
即
D ∗ = D ⋅ ϕ D^{*}=D \cdot \phi D∗=D⋅ϕ
例如,Poission分布的尺度参数就为1,Nomal distribution的尺度参数为 σ 2 \sigma^2 σ2.
所以,Nomal distribution的Deviance推出来是
D ( y , μ ) = ( y − μ ) 2 D(y,\mu)=(y−\mu)^2 D(y,μ)=(y−μ)2
Scaled Deviance是
D ∗ ( y , μ ) = ( y − μ ) 2 σ 2 D^{*}(y,\mu)=\frac{(y−\mu)^2}{\sigma^2} D∗(y,μ)=σ2(y−μ)2
详情https://stats.stackexchange.com/questions/315788/deviance-in-simple-linear-regression
In R, the deviance residuals represent the contributions of individual samples to the deviance D. More specifically, they are defined as the signed square roots of the unit deviances.
How does such a deviance look like in practice? For example, for the Poisson distribution, the deviance residuals are defined as:(这里我个人有个小疑问暂时没解决,根号下的数不一定是正的呀)
r i = s i g n ( y − μ ^ i ) ⋅ 2 ⋅ [ y i ⋅ log ( y i μ ^ i ) − ( y i − μ ^ i ) ] r_i=sign (y−\hat μ_i) \cdot \sqrt{2 \cdot \left[ y_i⋅\log \left( \frac{y_i} {\hat μ_i} \right)−(y_i−\hat μ_i) \right]} ri=sign(y−μ^i)⋅2⋅[yi⋅log(μ^iyi)−(yi−μ^i)]
# type = "deviance"
res.dev1 <- residuals(model.pois, type = "deviance")
res.dev2 <- residuals(model.pois)
poisson.dev <- function (y, mu)
# unit deviance
2 * (y * log(ifelse(y == 0, 1, y/mu)) - (y - mu))
res.dev3 <- sqrt(poisson.dev(expected, estimates)) *
ifelse(expected > estimates, 1, -1)
all.equal(res.dev1, res.dev2, res.dev3)
## [1] TRUE
We can obtain the deviance residuals of our model using the residuals
function:
summary(residuals(model.pois))
如果有以下输出
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -4.0451 -2.4138 -0.8169 -0.1840 1.4265 8.7946
Since the median deviance residual is close to zero, this means that our model is not biased in one direction (i.e. the out come is neither over- nor underestimated).
Tips:
Note that, for ordinary least-squares models, the deviance residual is identical to the conventional residual.
推导见:
https://bookdown.org/egarpor/PM-UC3M/glm-deviance.html#eq:dev
或者
https://stats.stackexchange.com/questions/315788/deviance-in-simple-linear-regression
名称 | Response residuals(conventional residuals) | Working residuals | Pearson residuals | Deviance residuals |
---|---|---|---|---|
公式 | r i = y i − f ^ ( x i ) r_i=y_i−\hat f(x_i) ri=yi−f^(xi) | r i = y i − f ^ ( x i ) f ^ ( x i ) r_i=\frac{y_i−\hat f(x_i)}{\hat f(x_i)} ri=f^(xi)yi−f^(xi) | r i = y i − f ^ ( x i ) f ( x i ) r_i=\frac{y_i−\hat f(x_i)}{\sqrt{f(x_i)}} ri=f(xi)yi−f^(xi) | r i = s i g n ( y − μ ^ i ) ⋅ 2 ⋅ [ y i ⋅ log ( y i μ ^ i ) − ( y i − μ ^ i ) ] r_i=sign (y−\hat μ_i) \cdot \sqrt{2 \cdot \left[ y_i⋅\log \left( \frac{y_i} {\hat μ_i} \right)−(y_i−\hat μ_i) \right]} ri=sign(y−μ^i)⋅2⋅[yi⋅log(μ^iyi)−(yi−μ^i)] |
意义 | 常规 | Normalize by the estimates | Normalize the residuals by the square root of the estimate | represent the contributions of individual samples to the deviance D D D |
这部分参考了视频:https://www.youtube.com/watch?v=9T0wlKdew6I&t=0s
我们假设只有均值这一种未知参数(即方差已知)。
如图,左上角的就是Null model,即这些数据只用一个参数去刻画;右上角的就是Saturated Model,即每个数据都有一个参数刻画它,模型很饱和;下方的是Proposed Model,即我们使用的模型,需要去评估的模型,它可以是任何介于Null Model和Saturated Model之间的模型。
Since we have already introduced the deviance, understanding the null and residual deviance is not a challenge anymore. Let us repeat the definition of the deviance once again:
D ( y , μ ) = 2 ( log ( p ( y ∣ θ s ) ) − log ( p ( y ∣ θ 0 ) ) ) D(y,\mu)=2(\log(p(y∣\theta^s))−\log(p(y∣\theta^0))) D(y,μ)=2(log(p(y∣θs))−log(p(y∣θ0)))
The null and residual deviance differ in θ 0 θ_0 θ0:
How can we interpret these two quantities?
名称 | Deviance Residuals | Residual Deviance |
---|---|---|
公式 | r i = s i g n ( y − μ ^ i ) ⋅ 2 ⋅ [ y i ⋅ log ( y i μ ^ i ) − ( y i − μ ^ i ) ] r_i=sign (y−\hat μ_i) \cdot \sqrt{2 \cdot \left[ y_i⋅\log \left( \frac{y_i} {\hat μ_i} \right)−(y_i−\hat μ_i) \right]} ri=sign(y−μ^i)⋅2⋅[yi⋅log(μ^iyi)−(yi−μ^i)] | R e s i d u a l D e v i a n c e = ∑ [ ( D e v i a n c e R e s i d u a l ) i ] 2 Residual Deviance =\sum[(Deviance \, \, Residual)_i]^2 ResidualDeviance=∑[(DevianceResidual)i]2 D ( y , μ ^ ) = 2 ( log ( p ( y ∣ θ ^ s ) ) − log ( p ( y ∣ θ ^ 0 ) ) ) D(y,\hat μ)=2(\log (p(y∣\hat θ_s))−\log (p(y∣\hat θ_0))) D(y,μ^)=2(log(p(y∣θ^s))−log(p(y∣θ^0))) |
区别 | Deviance Residuals是Residual的一种,重点关注每个样本点,区别于上文提到的另外三种Residual | Residual Deviance是Deviance的一种,重点关注模型整体,是‘saturated model’与‘proposed model’的log likelihood差的二倍,区别于Null Deviance( ‘saturated model’ 与 ‘null model’ 的log likelihood差的二倍) |
Here, I deal with the other outputs of the GLM summary fuction: the dispersion parameter, the AIC, and the statement about Fisher scoring iterations.
Dispersion (variability/scatter/spread) simply indicates whether a distribution is wide or narrow. The GLM function can use a dispersion parameter to model the variability.
However, for likelihood-based model, the dispersion parameter is always fixed to 1. It is adjusted only for methods that are based on quasi-likelihood estimation such as when family = "quasipoisson"
or family = "quasibinomial"
. These methods are particularly suited for dealing with overdispersion.
The Akaike information criterion (AIC) is an information-theoretic measure that describes the quality of a model. It is defined as
A I C = 2 p − 2 ln ( L ^ ) AIC=2p−2 \ln (\hat L) AIC=2p−2ln(L^)
where p p p is the number of model parameters and L ^ \hat L L^ is the maximum of the likelihood function. A model with a low AIC is characterized by low complexity (minimizes p p p) and a good fit (maximizes L ^ \hat L L^).
主要参考:Interpreting Generalized Linear Models
次要参考:Notes for Predictive Modeling 5.5
youtube视频:Deviance Residuals
BOOK:Beyond Multiple Linear Regression——Applied Generalized Linear Models and Multilevel Models in R 第四章
Poission regression deviance 的推导
wikipedia:Deviance
https://stats.stackexchange.com/questions/5135/interpretation-of-rs-lm-output
https://stats.stackexchange.com/questions/315788/deviance-in-simple-linear-regression
https://stats.stackexchange.com/questions/108995/interpreting-residual-and-null-deviance-in-glm-r
Data science in insurance: an R intro