可加模型的一个简单示例

Additive Models

to avoid the curse of dimensionality and for better interpretability we assume
m ( x ) = E ( Y ∣ X = x ) = c + ∑ j = 1 d g j ( x j ) m(\boldsymbol{x})=E(Y|\boldsymbol{X}=\boldsymbol{x})=c+\sum_{j=1}^dg_j(x_j) m(x)=E(YX=x)=c+j=1dgj(xj)
⟹ \Longrightarrow the additive functions g j g_j gj can be estimated with the optimal one-dimensional rate

two possible methods for estimating an additive model:

  • backfitting estimator
  • marginal integration estimator
    indentification conditions for both methods

E X j { g ( X j ) } = 0 , ∀ j = 1 , … , d ⟹ E ( Y ) = e \begin{aligned} E_{X_j}\{ g(X&_j) \}=0, \forall j=1,\dots,d\\ & \Longrightarrow E(Y)=e \end{aligned} EXj{g(Xj)}=0,j=1,,dE(Y)=e

formulation Hibert space framework:

  • let H Y X \mathcal{H}_{Y\boldsymbol{X}} HYX be the Hilbert space of random variables which are functions of Y , X Y, \boldsymbol{X} Y,X
  • let ⟨ U , V ⟩ = E ( U V ) \langle U,V\rangle=E(UV) U,V=E(UV) the scalar product
  • define H X \mathcal{H}_{\boldsymbol{X}} HX and H X , j = 1 , … , d \mathcal{H}_{X},j=1,\dots,d HX,j=1,,d the corresponding subspaces

⟹ \Longrightarrow we aim to find the element of H X 1 ⊕ ⋯ ⊕ H X d \mathcal{H}_{X_1}\oplus \cdots \oplus\mathcal{H}_{X_d} HX1HXd closest to Y ∈ H Y X Y\in\mathcal{H}_{Y\boldsymbol{X}} YHYX or m ∈ H X m\in \mathcal{H}_{\boldsymbol{X}} mHX
by the projection theorem, there exists a unique solution with
E [ { Y − m ( X ) } ∣ X α ] = 0    ⟺    g α ( X α ) = E [ { Y − ∑ j ≠ α g j ( X j ) } ∣ X α ] , α = 1 , … , d E[\{ Y-m(\boldsymbol{X}) \}|X_{\alpha}]=0\\ \iff g_{\alpha}(X_{\alpha})=E[\{ Y-\sum_{j\neq\alpha}g_j(X_j) \}|X_{\alpha}], \quad\alpha=1,\dots,d E[{Ym(X)}Xα]=0gα(Xα)=E[{Yj=αgj(Xj)}Xα],α=1,,d
denote projection P α ( ∙ ) = E ( ∙ ∣ X α ) P_{\alpha}(\bullet)=E(\bullet|X_{\alpha}) Pα()=E(Xα)
⟹ ( I P 1 ⋯ P 1 P 2 I ⋯ P 2 ⋮ ⋱ ⋮ P d ⋯ P d I ) ( g 1 ( X 1 ) g 2 ( X 2 ) ⋮ g d ( X d ) ) = ( P 1 Y P 2 Y ⋮ P d Y ) \Longrightarrow\left(\begin{array}{cccc} I & P_{1} & \cdots & P_{1} \\ P_{2} & I & \cdots & P_{2} \\ \vdots & & \ddots & \vdots \\ P_{d} & \cdots & P_{d} & I \end{array}\right)\left(\begin{array}{c} g_{1}\left(X_{1}\right) \\ g_{2}\left(X_{2}\right) \\ \vdots \\ g_{d}\left(X_{d}\right) \end{array}\right)=\left(\begin{array}{c} P_{1} Y \\ P_{2} Y \\ \vdots \\ P_{d} Y \end{array}\right) IP2PdP1IPdP1P2I g1(X1)g2(X2)gd(Xd) = P1YP2YPdY
denote by
S α the   ( n × n ) smoother matrix \bold{S}_{\alpha}\quad \text{the} \,(n\times n) \quad \text{smoother matrix} Sαthe(n×n)smoother matrix
such that S α Y \bold{S}_{\alpha}\boldsymbol{Y} SαY is an estimate of the vector { E ( Y 1 ∣ X α 1 ) , … , E ( Y n ∣ X α n ) } ⊤ \{ E(Y_1|X_{\alpha1}),\dots,E(Y_n|X_{\alpha n}) \}^{\top} {E(Y1Xα1),,E(YnXαn)}
⟹ ( I S 1 ⋯ S 1 S 2 I ⋯ S 2 ⋮ ⋱ ⋮ S d ⋯ S d I ) ⏟ n d × n d ( g 1 g 2 ⋮ g d ) = ( S 1 Y S 2 Y ⋮ S d Y ) \Longrightarrow \underbrace{\left(\begin{array}{cccc} \mathbf{I} & \mathbf{S}_{1} & \cdots & \mathbf{S}_{1} \\ \mathbf{S}_{2} & \mathbf{I} & \cdots & \mathbf{S}_{2} \\ \vdots & & \ddots & \vdots \\ \mathbf{S}_{d} & \cdots & \mathbf{S}_{d} & \mathbf{I} \end{array}\right)}_{n d \times n d}\left(\begin{array}{c} \boldsymbol{g}_{1} \\ \boldsymbol{g}_{2} \\ \vdots \\ \boldsymbol{g}_{d} \end{array}\right)=\left(\begin{array}{c} \mathbf{S}_{1} \boldsymbol{Y} \\ \mathbf{S}_{2} \boldsymbol{Y} \\ \vdots \\ \mathbf{S}_{d} \boldsymbol{Y} \end{array}\right) nd×nd IS2SdS1ISdS1S2I g1g2gd = S1YS2YSdY
note: infinite samples the matrix on the left side can be singular

Bacfitting algorithm

in practice, the following backfitting algorithm (a simplification of the Gauss-Seidel procedure) is used:


  • initialize g ^ ( 0 ) ≡ 0   ∀ α , c ^ = Y ˉ \hat{\boldsymbol{g}}^{(0)}\equiv 0 \,\forall\alpha,\hat{c}=\bar{Y} g^(0)0α,c^=Yˉ
  • repeat for α = 1 , … , d \alpha=1,\dots,d α=1,,d
    r α = Y − c ^ − ∑ j = 1 α − 1 g ^ j ( ℓ + 1 ) − ∑ j = α + 1 d g ^ j ( ℓ ) g ^ α ( ℓ + 1 ) ( ∙ ) = S α ( r α ) \begin{aligned} \boldsymbol{r}_\alpha & =\boldsymbol{Y}-\widehat{c}-\sum_{j=1}^{\alpha-1} \widehat{\boldsymbol{g}}_j^{(\ell+1)}-\sum_{j=\alpha+1}^d \widehat{\boldsymbol{g}}_j^{(\ell)} \\ \widehat{\boldsymbol{g}}_\alpha^{(\ell+1)}(\bullet) & =\mathbf{S}_\alpha\left(\boldsymbol{r}_\alpha\right) \end{aligned} rαg α(+1)()=Yc j=1α1g j(+1)j=α+1dg j()=Sα(rα)
  • proceed until convergence is reached

Example: smoother performance in additive models

simulated sample of n = 75 n = 75 n=75 regression observations with regressors X j X_j Xj i.i.d.
uniform on [ − 2.5 , 2.5 ] [-2.5, 2.5] [2.5,2.5], generated from
Y = ∑ j = 1 4 g j ( X j ) + ε , ε ∼ N ( 0 , 1 ) Y=\sum_{j=1}^4g_j(X_j)+\varepsilon, \quad \varepsilon\sim N(0,1) Y=j=14gj(Xj)+ε,εN(0,1)
where
g 1 ( X 1 ) = − sin ⁡ ( 2 X 1 ) g 2 ( X 2 ) = X 2 2 − E ( X 2 2 ) g 3 ( X 3 ) = X 3 g 4 ( X 4 ) = exp ⁡ ( − X 4 ) − E { exp ⁡ ( − X 4 ) } \begin{array}{ll} g_1\left(X_1\right)=-\sin \left(2 X_1\right) & g_2\left(X_2\right)=X_2^2-E\left(X_2^2\right) \\ g_3\left(X_3\right)=X_3 & g_4\left(X_4\right)=\exp \left(-X_4\right)-E\left\{\exp \left(-X_4\right)\right\} \end{array} g1(X1)=sin(2X1)g3(X3)=X3g2(X2)=X22E(X22)g4(X4)=exp(X4)E{exp(X4)}
Plotting results in this example:
可加模型的一个简单示例_第1张图片

Code:

n = 75
X = matrix(NA, n, 4)
for (i in 1:4) {
  X[, i] = runif(n, min = -2.5, max = 2.5)
}

g1 = function(x) {
  return(-sin(2 * x))
}

g2 = function(x) {
  return(x ^ 2 - mean(x ^ 2))
}

g3 = function(x) {
  return(x)
}

g4 = function(x) {
  return(exp(-x) - mean(exp(-x)))
}
eps = rnorm(n)

###indicator function
I = function(x, index) {
  if (index == 1) {
    return(x)
  }
  if (index == 0) {
    return(0)
  }
}

x <- seq(-2.5, 2.5, l = 100)
Y = I(g1(X[, 1]), 1) + I(g2(X[, 2]), 0) + I(g3(X[, 3]), 0) + I(g4(X[, 4]), 0) + eps
fit_g1 <- loess(
  Y ~ x,
  family = 'symmetric',
  degree = 2,
  span = 0.7,
  data = data.frame(x = X[, 1], Y = Y),
  surface = "direct"
)
out_g1 <- predict(fit_g1,
                  newdata = data.frame(newx = x),
                  se = TRUE)
low_g1 <- out_g1$fit - qnorm(0.975) * out_g1$se.fit
high_g1 <- out_g1$fit + qnorm(0.975) * out_g1$se.fit
df.low_g1 <- data.frame(x = x, y = low_g1)
df.high_g1 <- data.frame(x = x, y = high_g1)
P1 = ggplot(data = data.frame(X1 = X[, 1], g1 = Y),
            aes(X1, g1)) +
  geom_point() +
  geom_smooth(method = "loess", show.legend = TRUE) +
  geom_line(data = df.low_g1, aes(x, y), color = "red") +
  geom_line(data = df.high_g1, aes(x, y), color = "red")

Y = I(g1(X[, 1]), 0) + I(g2(X[, 2]), 1) + I(g3(X[, 3]), 0) + I(g4(X[, 4]), 0) + eps
fit_g2 <- loess(
  Y ~ x,
  family = 'symmetric',
  degree = 2,
  span = 0.9,
  data = data.frame(
    x = X[, 2],
    Y = (Y - fit_g1$fitted),
    surface = "direct"
  )
)
out_g2 <- predict(fit_g2,
                  newdata = data.frame(newx = x),
                  se = TRUE)
low_g2 <- out_g2$fit - qnorm(0.975) * out_g2$se.fit
high_g2 <- out_g2$fit + qnorm(0.975) * out_g2$se.fit
df.low_g2 <- data.frame(x = x, y = low_g2)
df.high_g2 <- data.frame(x = x, y = high_g2)
P2 = ggplot(data = data.frame(X2 = X[, 2], g2 = (Y - fit_g1$fitted)),
            aes(X2, g2)) +
  geom_point() +
  geom_smooth(method = "loess", show.legend = TRUE) +
  geom_line(data = df.low_g2, aes(x, y), color = "red") +
  geom_line(data = df.high_g2, aes(x, y), color = "red")

Y = I(g1(X[, 1]), 0) + I(g2(X[, 2]), 0) + I(g3(X[, 3]), 1) + I(g4(X[, 4]), 0) + eps
fit_g3 <- loess(
  Y ~ x,
  family = 'symmetric',
  degree = 2,
  span = 0.9,
  data = data.frame(
    x = X[, 3],
    Y = (Y - fit_g1$fitted - fit_g2$fitted),
    surface = "direct"
  )
)
out_g3 <- predict(fit_g3,
                  newdata = data.frame(newx = x),
                  se = TRUE)
low_g3 <- out_g3$fit - qnorm(0.975) * out_g3$se.fit
high_g3 <- out_g3$fit + qnorm(0.975) * out_g3$se.fit
df.low_g3 <- data.frame(x = x, y = low_g3)
df.high_g3 <- data.frame(x = x, y = high_g3)
P3 = ggplot(data = data.frame(X3 = X[, 3], g3 = (Y - fit_g1$fitted - fit_g2$fitted)),
            aes(X3, g3)) +
  geom_point() +
  geom_smooth(method = "loess", show.legend = TRUE) +
  geom_line(data = df.low_g3, aes(x, y), color = "red") +
  geom_line(data = df.high_g3, aes(x, y), color = "red")

Y = I(g1(X[, 1]), 0) + I(g2(X[, 2]), 0) + I(g3(X[, 3]), 0) + I(g4(X[, 4]), 1) + eps
fit_g4 <- loess(
  Y ~ x,
  family = 'symmetric',
  degree = 2,
  span = 0.9,
  data = data.frame(
    x = X[, 4],
    Y = (Y - fit_g1$fitted - fit_g2$fitted - fit_g3$fitted),
    surface = "direct"
  )
)
out_g4 <- predict(fit_g4,
                  newdata = data.frame(newx = x),
                  se = TRUE)
low_g4 <- out_g4$fit - qnorm(0.975) * out_g4$se.fit
high_g4 <- out_g4$fit + qnorm(0.975) * out_g4$se.fit
df.low_g4 <- data.frame(x = x, y = low_g4)
df.high_g4 <- data.frame(x = x, y = high_g4)
P4 = ggplot(data = data.frame(
  X4 = X[, 4],
  g4 = (Y - fit_g1$fitted - fit_g2$fitted - fit_g3$fitted)
),
aes(X4, g4)) +
  geom_point() +
  geom_smooth(method = "loess", show.legend = TRUE) +
  geom_line(data = df.low_g4, aes(x, y), color = "red") +
  geom_line(data = df.high_g4, aes(x, y), color = "red")

cowplot::plot_grid(P1, P2, P3, P4, align = "vh")

result:

可加模型的一个简单示例_第2张图片

参考文献

https://academic.uprm.edu/wrolke/esma6836/smooth.html

Hastie, T. J. and Tibshirani, R. J. (1990). Generalized Additive Models, Vol. 43 of Monographs on Statistics and Applied Probability, Chapman and Hall, London.

Opsomer, J. and Ruppert, D. (1997). Fitting a bivariate additive model by local polynomial regression, Annals of Statistics 25: 186-211.

Mammen, E., Linton, O. and Nielsen, J. P. (1999). The existence and asymptotic properties of a backfitting projection algorithm under weak conditions, Annals of Statistics 27: 1443-1490.

你可能感兴趣的:(统计学,R,非参数统计,算法,r语言,回归)