question:
- 还是不太懂为什么(表示z对y的贡献)可以与(x对z的贡献)相乘,乘了之后就是x对y的贡献。
- x'z=z'x,转置一下lambda就真的没有correlation了?看上去很像操纵数据,但统计学意义上也uncorrelated了吗?
- 怎么证明R方等于β²各元素的和?
R语言程序示例
ref: R in Action, 2nd edition, page:209
ps:亲测该版本的R in Action有印刷错误,而且部分宏包更新后代码失效,但是它是freedownload,我懒得找其他版本了,慎重参阅。
relative weights :find contribution each predictor makes to R-square
relweights <- function(fit,...){
R <- cor(fit$model) #相关系数矩阵Rxx
nvar <- ncol(R)
rxx <- R[2:nvar, 2:nvar] #自变量的相关系数矩阵
rxy <- R[2:nvar, 1]
svd <- eigen(rxx) #计算矩阵特征值、特征向量
evec <- svd$vectors #特征向量
ev <- svd$values #特征值
delta <- diag(sqrt(ev)) #以特征值的平方根为对角线创建矩阵delta
lambda <- evec %*% delta %*% t(evec) # correlations between original predictors and new orthogonal variables。转化为对角化矩阵
lambdasq <- lambda ^ 2
beta <- solve(lambda) %*% rxy # regression coefficients of Y on orthogonal variables正交矩阵.$AA=I$则A是正交矩阵。求lambda的逆矩阵,再乘rxy
rsquare <- colSums(beta ^ 2) #R^2是模型对总体的解释力度
rawwgt <- lambdasq %*% beta ^ 2 #自变量单独对总体的解释力度。
import <- (rawwgt / rsquare) * 100
import <- as.data.frame(import)
row.names(import) <- names(fit$model[2:nvar])
names(import) <- "Weights" #设定列名
import <- import[order(import),1, drop=FALSE]
dotchart(import$Weights, labels=row.names(import),
xlab="% of R-Square", pch=19,
main="Relative Importance of Predictor Variables",
sub=paste("Total R-Square=", round(rsquare, digits=3)), #点线
...)
return(import)
}
states <- as.data.frame(state.x77[,c("Murder", "Population",
"Illiteracy", "Income", "Frost")])
fit <- lm(Murder ~ Population + Illiteracy + Income + Frost, data=states)
relweights(fit, col="blue")
基础知识:矩阵形式ols的求解
已知,求 :
calculation of relative weight
ref:Jeff Johnson,2000. A Heuristic Method for Estimating the Relative Weight of Predictor Variables in Multiple Regression. Multivariate Behavioral Research, 35:1-19
每个变量的贡献包括单独贡献以及包含与其他变量的correlation的贡献。
主要动作是将原自变量矩阵转化为不互相关的正交矩阵,再obtaining the bestfitting (in the least squares sense) set of orthogonal variables(正交矩阵)
- 先求矩阵的eigenvectors和eigenvalues
- 再求的singular value decomposition (奇异值分解),求类似主成分分析(PCA)那样的退化矩阵
这里P和Q都是eigenvectors
ps:If no two predictor variables in X are perfectly correlated with each other, X is of full rank and no diagonal elements of will be equal to zero. - 找到与最接近的正交矩阵,因为The columns of Z are the best-fitting approximations to the columns of X in that they minimize the residual sum of squares between the original variables and the orthogonal variables (Johnson, 1966)
- 让X在Z上回归,,因此
X is a linear transformation of Z
Because the Z variables are uncorrelated, the relative contribution of each z to each x is represented by the squared standardized regression coefficient (which is the same as the squared zero-order correlation) of each z for each x, represented by the squared column elements of
由于,因此any particular represents the proportion of variance in accounted for by , just as it represents the proportion of variance in accounted for by .
- 找到Y被Z解释的部分。The vector of beta weights when regressing y on Z is obtained by
求relative weights
转化为方差的平方单位,再将其scaled by
其中,Z是X的近似替代,所以模型解释力度R方不应变化,因而因此在计算relative weights的时候我们有X的correlation matrix等于
Q正好是的eigenvalues
由(1)知
由于,即,因此
按照(3)(4)算出 和 代入(2)即可.
数学符号参考:
http://www.mohu.org/info/symbols/symbols.htm
在,所有行内的符号都有过度的上浮,打了\tag的公式字体大小也不太好,是怎么回事?