典范分析结合了排序和回归的思想,其包含一个响应变量矩阵Y和解释变量矩阵X。典范分析通常形成正交轴上分布的散点图(排序图)。典范分析,尤其是RDA和CCA,是与多元回归密切相关的:
多元回归模型预测与Y的排序是不同的,二者相关系数的平方根就是多元回归模型的决定系数。因此,多元回归模型建立了排序与Y排序之间的对应,因为回归拟合值排序受限于X,并且与X是最优地且线性相关的。这个特性在典范分析中也很重要。这种限制是在最小均方(least-square)意义上最优的,意味着多元线性回归模型得到了最大的(决定系数)。因此,典范分析结合了排序和回归的方法,在X的约束下,形成与X密切相关的Y排序。而X与Y的关系建立方式依据典范分析方法差异而不同。
冗余分析的特征方程(如下)来自于多元回归,而后进行主成分分解。
响应变量矩阵Y大小为n × p,n是对象数量,p是变量数目。如果变量之间的尺度差异较大,对数据进行中心化(均值)或以列进行标准化。解释变量矩阵大小为n × m,m <= n。变量进行标准化以方便计算。矩阵X和Y进行中心化能够消除回归截距,这样简化解释又不损失相关信息。X也可以进行标准化。这些并不是做RDA所必须的,但是提出解释变量量纲之间的尺度差异,能够将回归系数转换为标准回归系数以方便比较。而矩阵X标准化或中心化过程不改变解释变异量或以及回归的拟合值。
Vegan包进行RDA分析
library(vegan)
library(magrittr)
## Loading required package: permute
## Loading required package: lattice
## This is vegan 2.5-6
spe <- read.csv("2020otuabund.csv",header=T,row.names = 1) %$% .[,-10]
spe <- t(spe)[,1:16]
env <- read.csv("waterc.csv",header=T,row.names = NULL) %$% .[,-(1:2)]
sp <- scale(spe, center = TRUE)
water <- scale(env, center = TRUE)
water <- water[,c(1,2,7,8)]
mod11<-rda(sp ~ TotalC+TotalN+SAL+pH, env)
summary(mod11)
##
## Call:
## rda(formula = sp ~ TotalC + TotalN + SAL + pH, data = env)
##
## Partitioning of variance:
## Inertia Proportion
## Total 16.000 1.0000
## Constrained 9.832 0.6145
## Unconstrained 6.168 0.3855
##
## Eigenvalues, and their contribution to the variance
##
## Importance of components:
## RDA1 RDA2 RDA3 RDA4 PC1 PC2 PC3
## Eigenvalue 6.5226 2.1684 0.78380 0.35745 2.7334 1.7448 1.07726
## Proportion Explained 0.4077 0.1355 0.04899 0.02234 0.1708 0.1091 0.06733
## Cumulative Proportion 0.4077 0.5432 0.59218 0.61452 0.7854 0.8944 0.96173
## PC4
## Eigenvalue 0.61228
## Proportion Explained 0.03827
## Cumulative Proportion 1.00000
##
## Accumulated constrained eigenvalues
## Importance of components:
## RDA1 RDA2 RDA3 RDA4
## Eigenvalue 6.5226 2.1684 0.78380 0.35745
## Proportion Explained 0.6634 0.2205 0.07972 0.03635
## Cumulative Proportion 0.6634 0.8839 0.96365 1.00000
##
## Scaling 2 for species and site scores
## * Species are scaled proportional to eigenvalues
## * Sites are unscaled: weighted dispersion equal on all dimensions
## * General scaling constant of scores: 3.363586
##
##
## Species scores
##
## RDA1 RDA2 RDA3 RDA4 PC1 PC2
## OTU72 -0.15864 -0.26192 0.432768 -0.04309 -0.5901195 -0.02104
## OTU401 -0.15190 0.41764 -0.070863 0.11729 0.4254556 0.32065
## OTU822 0.41641 -0.27951 0.074570 -0.18489 -0.0078372 0.55600
## OTU334 -0.67922 0.07835 0.088613 0.04723 0.3248174 0.20553
## OTU815 0.60617 -0.41794 0.192550 0.09392 -0.0280372 0.05853
## OTU828 0.64631 -0.10303 0.240456 -0.02145 0.2288193 -0.29497
## OTU461 -0.68355 -0.32370 0.189123 0.06208 -0.0008486 0.15978
## OTU361 -0.70671 -0.23541 -0.055085 -0.03725 0.0061413 -0.04282
## OTU912 0.24139 -0.18079 -0.250814 -0.27749 -0.5223089 0.40881
## OTU83 0.46548 0.35236 0.115120 0.20873 -0.3058090 -0.44886
## OTU638 0.52096 -0.43189 -0.328720 0.17608 0.3003277 -0.10085
## OTU75 -0.78599 -0.08045 0.023342 -0.03683 -0.0086104 0.02925
## OTU643 0.70966 0.14607 0.147374 -0.07273 0.1339911 -0.36211
## OTU868 0.59002 0.13248 0.001474 0.01602 -0.4618344 -0.33728
## OTU673 0.09015 -0.67743 -0.090516 0.14908 0.3240885 -0.06191
## OTU651 0.39161 0.11345 0.011508 -0.07418 0.6722584 -0.18163
##
##
## Site scores (weighted sums of species scores)
##
## RDA1 RDA2 RDA3 RDA4 PC1 PC2
## H1 -1.5505 -0.004236 -1.50882 -1.6763 -0.39559 -0.61280
## H2 -1.6591 1.878655 -0.59495 0.7410 0.62023 1.45094
## H3 -1.4895 -2.330115 1.21685 0.9664 0.35000 0.01861
## M1 0.2531 -0.405126 -0.35016 -1.7639 -0.20189 0.60985
## M2 0.7313 1.079749 2.39744 -0.2944 -2.09920 -0.98534
## M3 0.3798 0.908631 0.95080 1.3976 -0.06154 -1.47745
## N1 1.2253 -0.936031 -1.90587 1.5087 -0.37623 0.01875
## N2 1.2686 0.597322 -0.28386 2.5384 2.44508 -1.06512
## N3 0.8410 -0.788848 0.07857 -3.4175 -0.28087 2.04257
##
##
## Site constraints (linear combinations of constraining variables)
##
## RDA1 RDA2 RDA3 RDA4 PC1 PC2
## H1 -1.5880 0.3692 -1.461026 -1.2108 -0.39559 -0.61280
## H2 -1.3262 1.7883 -0.071104 1.1527 0.62023 1.45094
## H3 -1.4864 -2.2507 1.397221 0.8040 0.35000 0.01861
## M1 0.4777 -0.9880 -0.510043 -1.8351 -0.20189 0.60985
## M2 0.6579 0.8215 1.415913 0.4964 -2.09920 -0.98534
## M3 -0.2408 0.5166 0.006138 -0.3785 -0.06154 -1.47745
## N1 1.2080 -0.8749 -2.035604 1.8738 -0.37623 0.01875
## N2 1.2022 0.4497 0.526211 -0.1378 2.44508 -1.06512
## N3 1.0956 0.1684 0.732294 -0.7648 -0.28087 2.04257
##
##
## Biplot scores for constraining variables
##
## RDA1 RDA2 RDA3 RDA4 PC1 PC2
## TotalC -0.8196 -0.40984 0.33625 0.21740 0 0
## TotalN -0.3354 0.35041 0.05323 0.87287 0 0
## SAL -0.7938 -0.38601 -0.44072 0.16332 0 0
## pH 0.9945 -0.07912 -0.02377 -0.06469 0 0
原理验证
# 响应变量矩阵,拟合值矩阵,残差矩阵在典范轴的排序(坐标)
XtX = t(water) %*% water
invers = solve(XtX)
fitY <- water %*% invers %*% t(water) %*% sp
resY <- sp - fitY
U <- eigen(cov(fitY))$vectors
Ures <- eigen(cov(resY))$vectors
FY <- sp %*% U
ZY <- fitY %*% U
NYres <- resY %*% Ures
B = invers %*% t(water) %*% sp
C = B %*% U
RDA模型各个典范轴的方差贡献
var <- eigen(cov(fitY))$values
var
## [1] 6.522593e+00 2.168415e+00 7.837967e-01 3.574501e-01 2.949718e-16
## [6] 1.517841e-16 1.197628e-16 1.173570e-16 6.710281e-17 3.868146e-17
## [11] 9.735283e-18 -1.314287e-17 -2.619791e-17 -6.737024e-17 -1.219536e-16
## [16] -2.560984e-16
varres <- eigen(cov(resY))$values
varres
## [1] 2.733406e+00 1.744801e+00 1.077257e+00 6.122818e-01 4.778527e-16
## [6] 1.540008e-17 -9.321812e-18 -1.569721e-17 -1.704919e-17 -3.053562e-17
## [11] -4.060141e-17 -4.277304e-17 -4.325843e-17 -6.131568e-17 -7.034357e-17
## [16] -1.931979e-16