to avoid the curse of dimensionality and for better interpretability we assume
m ( x ) = E ( Y ∣ X = x ) = c + ∑ j = 1 d g j ( x j ) m(\boldsymbol{x})=E(Y|\boldsymbol{X}=\boldsymbol{x})=c+\sum_{j=1}^dg_j(x_j) m(x)=E(Y∣X=x)=c+j=1∑dgj(xj)
⟹ \Longrightarrow ⟹ the additive functions g j g_j gj can be estimated with the optimal one-dimensional rate
two possible methods for estimating an additive model:
E X j { g ( X j ) } = 0 , ∀ j = 1 , … , d ⟹ E ( Y ) = e \begin{aligned} E_{X_j}\{ g(X&_j) \}=0, \forall j=1,\dots,d\\ & \Longrightarrow E(Y)=e \end{aligned} EXj{g(Xj)}=0,∀j=1,…,d⟹E(Y)=e
formulation Hibert space framework:
⟹ \Longrightarrow ⟹ we aim to find the element of H X 1 ⊕ ⋯ ⊕ H X d \mathcal{H}_{X_1}\oplus \cdots \oplus\mathcal{H}_{X_d} HX1⊕⋯⊕HXd closest to Y ∈ H Y X Y\in\mathcal{H}_{Y\boldsymbol{X}} Y∈HYX or m ∈ H X m\in \mathcal{H}_{\boldsymbol{X}} m∈HX
by the projection theorem, there exists a unique solution with
E [ { Y − m ( X ) } ∣ X α ] = 0 ⟺ g α ( X α ) = E [ { Y − ∑ j ≠ α g j ( X j ) } ∣ X α ] , α = 1 , … , d E[\{ Y-m(\boldsymbol{X}) \}|X_{\alpha}]=0\\ \iff g_{\alpha}(X_{\alpha})=E[\{ Y-\sum_{j\neq\alpha}g_j(X_j) \}|X_{\alpha}], \quad\alpha=1,\dots,d E[{Y−m(X)}∣Xα]=0⟺gα(Xα)=E[{Y−j=α∑gj(Xj)}∣Xα],α=1,…,d
denote projection P α ( ∙ ) = E ( ∙ ∣ X α ) P_{\alpha}(\bullet)=E(\bullet|X_{\alpha}) Pα(∙)=E(∙∣Xα)
⟹ ( I P 1 ⋯ P 1 P 2 I ⋯ P 2 ⋮ ⋱ ⋮ P d ⋯ P d I ) ( g 1 ( X 1 ) g 2 ( X 2 ) ⋮ g d ( X d ) ) = ( P 1 Y P 2 Y ⋮ P d Y ) \Longrightarrow\left(\begin{array}{cccc} I & P_{1} & \cdots & P_{1} \\ P_{2} & I & \cdots & P_{2} \\ \vdots & & \ddots & \vdots \\ P_{d} & \cdots & P_{d} & I \end{array}\right)\left(\begin{array}{c} g_{1}\left(X_{1}\right) \\ g_{2}\left(X_{2}\right) \\ \vdots \\ g_{d}\left(X_{d}\right) \end{array}\right)=\left(\begin{array}{c} P_{1} Y \\ P_{2} Y \\ \vdots \\ P_{d} Y \end{array}\right) ⟹ IP2⋮PdP1I⋯⋯⋯⋱PdP1P2⋮I g1(X1)g2(X2)⋮gd(Xd) = P1YP2Y⋮PdY
denote by
S α the ( n × n ) smoother matrix \bold{S}_{\alpha}\quad \text{the} \,(n\times n) \quad \text{smoother matrix} Sαthe(n×n)smoother matrix
such that S α Y \bold{S}_{\alpha}\boldsymbol{Y} SαY is an estimate of the vector { E ( Y 1 ∣ X α 1 ) , … , E ( Y n ∣ X α n ) } ⊤ \{ E(Y_1|X_{\alpha1}),\dots,E(Y_n|X_{\alpha n}) \}^{\top} {E(Y1∣Xα1),…,E(Yn∣Xαn)}⊤
⟹ ( I S 1 ⋯ S 1 S 2 I ⋯ S 2 ⋮ ⋱ ⋮ S d ⋯ S d I ) ⏟ n d × n d ( g 1 g 2 ⋮ g d ) = ( S 1 Y S 2 Y ⋮ S d Y ) \Longrightarrow \underbrace{\left(\begin{array}{cccc} \mathbf{I} & \mathbf{S}_{1} & \cdots & \mathbf{S}_{1} \\ \mathbf{S}_{2} & \mathbf{I} & \cdots & \mathbf{S}_{2} \\ \vdots & & \ddots & \vdots \\ \mathbf{S}_{d} & \cdots & \mathbf{S}_{d} & \mathbf{I} \end{array}\right)}_{n d \times n d}\left(\begin{array}{c} \boldsymbol{g}_{1} \\ \boldsymbol{g}_{2} \\ \vdots \\ \boldsymbol{g}_{d} \end{array}\right)=\left(\begin{array}{c} \mathbf{S}_{1} \boldsymbol{Y} \\ \mathbf{S}_{2} \boldsymbol{Y} \\ \vdots \\ \mathbf{S}_{d} \boldsymbol{Y} \end{array}\right) ⟹nd×nd IS2⋮SdS1I⋯⋯⋯⋱SdS1S2⋮I g1g2⋮gd = S1YS2Y⋮SdY
note: infinite samples the matrix on the left side can be singular
in practice, the following backfitting algorithm (a simplification of the Gauss-Seidel procedure) is used:
simulated sample of n = 75 n = 75 n=75 regression observations with regressors X j X_j Xj i.i.d.
uniform on [ − 2.5 , 2.5 ] [-2.5, 2.5] [−2.5,2.5], generated from
Y = ∑ j = 1 4 g j ( X j ) + ε , ε ∼ N ( 0 , 1 ) Y=\sum_{j=1}^4g_j(X_j)+\varepsilon, \quad \varepsilon\sim N(0,1) Y=j=1∑4gj(Xj)+ε,ε∼N(0,1)
where
g 1 ( X 1 ) = − sin ( 2 X 1 ) g 2 ( X 2 ) = X 2 2 − E ( X 2 2 ) g 3 ( X 3 ) = X 3 g 4 ( X 4 ) = exp ( − X 4 ) − E { exp ( − X 4 ) } \begin{array}{ll} g_1\left(X_1\right)=-\sin \left(2 X_1\right) & g_2\left(X_2\right)=X_2^2-E\left(X_2^2\right) \\ g_3\left(X_3\right)=X_3 & g_4\left(X_4\right)=\exp \left(-X_4\right)-E\left\{\exp \left(-X_4\right)\right\} \end{array} g1(X1)=−sin(2X1)g3(X3)=X3g2(X2)=X22−E(X22)g4(X4)=exp(−X4)−E{exp(−X4)}
Plotting results in this example:
n = 75
X = matrix(NA, n, 4)
for (i in 1:4) {
X[, i] = runif(n, min = -2.5, max = 2.5)
}
g1 = function(x) {
return(-sin(2 * x))
}
g2 = function(x) {
return(x ^ 2 - mean(x ^ 2))
}
g3 = function(x) {
return(x)
}
g4 = function(x) {
return(exp(-x) - mean(exp(-x)))
}
eps = rnorm(n)
###indicator function
I = function(x, index) {
if (index == 1) {
return(x)
}
if (index == 0) {
return(0)
}
}
x <- seq(-2.5, 2.5, l = 100)
Y = I(g1(X[, 1]), 1) + I(g2(X[, 2]), 0) + I(g3(X[, 3]), 0) + I(g4(X[, 4]), 0) + eps
fit_g1 <- loess(
Y ~ x,
family = 'symmetric',
degree = 2,
span = 0.7,
data = data.frame(x = X[, 1], Y = Y),
surface = "direct"
)
out_g1 <- predict(fit_g1,
newdata = data.frame(newx = x),
se = TRUE)
low_g1 <- out_g1$fit - qnorm(0.975) * out_g1$se.fit
high_g1 <- out_g1$fit + qnorm(0.975) * out_g1$se.fit
df.low_g1 <- data.frame(x = x, y = low_g1)
df.high_g1 <- data.frame(x = x, y = high_g1)
P1 = ggplot(data = data.frame(X1 = X[, 1], g1 = Y),
aes(X1, g1)) +
geom_point() +
geom_smooth(method = "loess", show.legend = TRUE) +
geom_line(data = df.low_g1, aes(x, y), color = "red") +
geom_line(data = df.high_g1, aes(x, y), color = "red")
Y = I(g1(X[, 1]), 0) + I(g2(X[, 2]), 1) + I(g3(X[, 3]), 0) + I(g4(X[, 4]), 0) + eps
fit_g2 <- loess(
Y ~ x,
family = 'symmetric',
degree = 2,
span = 0.9,
data = data.frame(
x = X[, 2],
Y = (Y - fit_g1$fitted),
surface = "direct"
)
)
out_g2 <- predict(fit_g2,
newdata = data.frame(newx = x),
se = TRUE)
low_g2 <- out_g2$fit - qnorm(0.975) * out_g2$se.fit
high_g2 <- out_g2$fit + qnorm(0.975) * out_g2$se.fit
df.low_g2 <- data.frame(x = x, y = low_g2)
df.high_g2 <- data.frame(x = x, y = high_g2)
P2 = ggplot(data = data.frame(X2 = X[, 2], g2 = (Y - fit_g1$fitted)),
aes(X2, g2)) +
geom_point() +
geom_smooth(method = "loess", show.legend = TRUE) +
geom_line(data = df.low_g2, aes(x, y), color = "red") +
geom_line(data = df.high_g2, aes(x, y), color = "red")
Y = I(g1(X[, 1]), 0) + I(g2(X[, 2]), 0) + I(g3(X[, 3]), 1) + I(g4(X[, 4]), 0) + eps
fit_g3 <- loess(
Y ~ x,
family = 'symmetric',
degree = 2,
span = 0.9,
data = data.frame(
x = X[, 3],
Y = (Y - fit_g1$fitted - fit_g2$fitted),
surface = "direct"
)
)
out_g3 <- predict(fit_g3,
newdata = data.frame(newx = x),
se = TRUE)
low_g3 <- out_g3$fit - qnorm(0.975) * out_g3$se.fit
high_g3 <- out_g3$fit + qnorm(0.975) * out_g3$se.fit
df.low_g3 <- data.frame(x = x, y = low_g3)
df.high_g3 <- data.frame(x = x, y = high_g3)
P3 = ggplot(data = data.frame(X3 = X[, 3], g3 = (Y - fit_g1$fitted - fit_g2$fitted)),
aes(X3, g3)) +
geom_point() +
geom_smooth(method = "loess", show.legend = TRUE) +
geom_line(data = df.low_g3, aes(x, y), color = "red") +
geom_line(data = df.high_g3, aes(x, y), color = "red")
Y = I(g1(X[, 1]), 0) + I(g2(X[, 2]), 0) + I(g3(X[, 3]), 0) + I(g4(X[, 4]), 1) + eps
fit_g4 <- loess(
Y ~ x,
family = 'symmetric',
degree = 2,
span = 0.9,
data = data.frame(
x = X[, 4],
Y = (Y - fit_g1$fitted - fit_g2$fitted - fit_g3$fitted),
surface = "direct"
)
)
out_g4 <- predict(fit_g4,
newdata = data.frame(newx = x),
se = TRUE)
low_g4 <- out_g4$fit - qnorm(0.975) * out_g4$se.fit
high_g4 <- out_g4$fit + qnorm(0.975) * out_g4$se.fit
df.low_g4 <- data.frame(x = x, y = low_g4)
df.high_g4 <- data.frame(x = x, y = high_g4)
P4 = ggplot(data = data.frame(
X4 = X[, 4],
g4 = (Y - fit_g1$fitted - fit_g2$fitted - fit_g3$fitted)
),
aes(X4, g4)) +
geom_point() +
geom_smooth(method = "loess", show.legend = TRUE) +
geom_line(data = df.low_g4, aes(x, y), color = "red") +
geom_line(data = df.high_g4, aes(x, y), color = "red")
cowplot::plot_grid(P1, P2, P3, P4, align = "vh")
result:
https://academic.uprm.edu/wrolke/esma6836/smooth.html
Hastie, T. J. and Tibshirani, R. J. (1990). Generalized Additive Models, Vol. 43 of Monographs on Statistics and Applied Probability, Chapman and Hall, London.
Opsomer, J. and Ruppert, D. (1997). Fitting a bivariate additive model by local polynomial regression, Annals of Statistics 25: 186-211.
Mammen, E., Linton, O. and Nielsen, J. P. (1999). The existence and asymptotic properties of a backfitting projection algorithm under weak conditions, Annals of Statistics 27: 1443-1490.