先说什么叫对称分析(symmetrical analysis),对称分析意味着被分析的两个矩阵之间没有响应和解释变量之分,扮演同样的角色。对称排序与非对称排序的区别类似于相关分析与回归(I类)分析的区别:前者是描述性和探索性分析工具,适用于无因果关系的模型;后者具有推论过程,即通过解释变量的线性组合定向解释响应变量的变化。大部分生态学研究都围绕“处理-响应”的假设进行数据分析,所以非对称分析更合适,这里仅简单介绍几种对称分析方法,以飨读者。
典范相关分析(canonical correlation analysis,CCorA)
CCorA的目的是在最大化数据集两个表格之间相关性的典范轴上排列观测点,可以检验两套多元数据线性独立的假设。
# 载入所需程序包
library(ade4)
library(vegan)
#library(packfor)
rm(list = ls())
setwd("D:\\Users\\Administrator\\Desktop\\RStudio\\数量生态学\\DATA")
# 此程序包可以从 https://r-forge.r-project.org/R/?group_id=195 下载
# 如果是MacOS X系统,packfor程序包内forward.sel函数的运行需要加载
# gfortran程序包。用户必须从"cran.r-project.org"网站内选择"MacOS X",
# 然后选择"tools"安装gfortran程序包。
library(MASS)
library(ellipse)
library(FactoMineR)
# 附加函数
source("evplot.R")
source("hcoplot.R")
# 导入CSV数据文件
spe <- read.csv("DoubsSpe.csv", row.names=1)
env <- read.csv("DoubsEnv.csv", row.names=1)
spa <- read.csv("DoubsSpa.csv", row.names=1)
# 删除没有数据的样方8
spe <- spe[-8, ]
env <- env[-8, ]
spa <- spa[-8, ]
# 提取环境变量das(离源头距离)以备用
das <- env[, 1]
# 从环境变量矩阵剔除das变量
env <- env[, -1]
# 将slope变量(pen)转化为因子(定性)变量
pen2 <- rep("very_steep", nrow(env))
pen2[env$pen <= quantile(env$pen)[4]] = "steep"
pen2[env$pen <= quantile(env$pen)[3]] = "moderate"
pen2[env$pen <= quantile(env$pen)[2]] = "low"
pen2 <- factor(pen2, levels=c("low", "moderate", "steep", "very_steep"))
table(pen2)
# 生成一个含定性坡度变量的环境变量数据框env2
env2 <- env
env2$pen <- pen2
# 将所有解释变量分为两个解释变量子集
# 地形变量(上下游梯度)子集
envtopo <- env[, c(1:3)]
names(envtopo)
#水体化学属性变量子集
envchem <- env[, c(4:10)]
names(envchem)
# 物种数据Hellinger转化
spe.hel <- decostand(spe, "hellinger")
> # 典范相关分析(CCorA)
> # **********************
> # 数据的准备(对数据进行转化使变量分布近似对称)
> envchem2 <- envchem
> envchem2$pho <- log(envchem$pho)
> envchem2$nit <- sqrt(envchem$nit)
> envchem2$amm <- log1p(envchem$amm)
> envchem2$dbo <- log(envchem$dbo)
> envtopo2 <- envtopo
> envtopo2$alt <- log(envtopo$alt)
> envtopo2$pen <- log(envtopo$pen)
> envtopo2$deb <- sqrt(envtopo$deb)
> # CCorA (基于标准化的变量)
> chem.topo.CCorA <- CCorA(envchem2, envtopo2, stand.Y=TRUE, stand.X=TRUE, permutations=999)
> chem.topo.CCorA
Canonical Correlation Analysis
Call:
CCorA(Y = envchem2, X = envtopo2, stand.Y = TRUE, stand.X = TRUE, permutations = 999)
Y X
Matrix Ranks 7 3
Pillai's trace: 1.821107
Significance of Pillai's trace:
from F-distribution: 1.1352e-06
based on permutations: 0.001
Permutation: free
Number of permutations: 999
CanAxis1 CanAxis2 CanAxis3
Canonical Correlations 0.92100 0.76372 0.6242
Y | X X | Y
RDA R squares 0.49174 0.7821
adj. RDA R squares 0.43075 0.7095
> biplot(chem.topo.CCorA, plot.type="biplot")
oxy 与 alt正相关,依次类推。
协惯量分析(co-intertia anaysis ,CoIA)
处理两个数据矩阵的CoIA计算步骤如下:
- 计算两个数据表内变量交叉的协方差矩阵。协方差矩阵的平方和称为总协惯量。计算协方差矩阵长度特征根和特征向量。特征根代表总协惯量的分解。
- 将两个原始矩阵的对象和变量投影到协惯量的排序图上。根据排序图上两组数据的投影判断他们的关系。
> # 协惯量分析(CoIA)
> # *******************
> # 两个环境变量矩阵的PCA排序
> dudi.chem <- dudi.pca(envchem2, scale=TRUE, scan=FALSE, nf=3)
> dudi.topo <- dudi.pca(envtopo2, scale=TRUE, scan=FALSE, nf=2)
> dudi.chem$eig/sum(dudi.chem$eig) # 每轴特征根比例dudi.topo$eig/sum(dudi.topo$eig) # 每轴特征根比例
[1] 0.62085880 0.17514557 0.10173342 0.04784425 0.02301475 0.01791889 0.01348432
> all.equal(dudi.chem$lw,dudi.topo$lw) #两个分析每行权重是否相等?
[1] TRUE
> # 协惯量分析
> coia.chem.topo <- coinertia(dudi.chem,dudi.topo, scan=FALSE, nf=2)
> coia.chem.topo
Coinertia analysis
call: coinertia(dudiX = dudi.chem, dudiY = dudi.topo, scannf = FALSE,
nf = 2)
class: coinertia dudi
$rank (rank) : 3
$nf (axis saved) : 2
$RV (RV coeff) : 0.5537773
eigenvalues: 6.78 0.05645 0.01618
vector length mode content
1 $eig 3 numeric Eigenvalues
2 $lw 3 numeric Row weigths (for dudi.topo cols)
3 $cw 7 numeric Col weigths (for dudi.chem cols)
data.frame nrow ncol content
1 $tab 3 7 Crossed Table (CT): cols(dudi.topo) x cols(dudi.chem)
2 $li 3 2 CT row scores (cols of dudi.topo)
3 $l1 3 2 Principal components (loadings for dudi.topo cols)
4 $co 7 2 CT col scores (cols of dudi.chem)
5 $c1 7 2 Principal axes (loadings for dudi.chem cols)
6 $lX 29 2 Row scores (rows of dudi.chem)
7 $mX 29 2 Normed row scores (rows of dudi.chem)
8 $lY 29 2 Row scores (rows of dudi.topo)
9 $mY 29 2 Normed row scores (rows of dudi.topo)
10 $aX 3 2 Corr dudi.chem axes / coinertia axes
11 $aY 2 2 Corr dudi.topo axes / coinertia axes
CT rows = cols of dudi.topo (3) / CT cols = cols of dudi.chem (7)
> coia.chem.topo$eig[1]/sum(coia.chem.topo$eig) # 第1个特征根解释量
[1] 0.9894016
> summary(coia.chem.topo)
Coinertia analysis
Class: coinertia dudi
Call: coinertia(dudiX = dudi.chem, dudiY = dudi.topo, scannf = FALSE,
nf = 2)
Total inertia: 6.853
Eigenvalues:
Ax1 Ax2 Ax3
6.78050 0.05645 0.01618
Projected inertia (%):
Ax1 Ax2 Ax3
98.9402 0.8237 0.2361
Cumulative projected inertia (%):
Ax1 Ax1:2 Ax1:3
98.94 99.76 100.00
Eigenvalues decomposition:
eig covar sdX sdY corr
1 6.78049894 2.6039391 1.9994990 1.6364535 0.7958037
2 0.05644986 0.2375918 0.8714496 0.5354616 0.5091676
Inertia & coinertia X (dudi.chem):
inertia max ratio
1 3.997996 4.346012 0.9199230
12 4.757421 5.572031 0.8538037
Inertia & coinertia Y (dudi.topo):
inertia max ratio
1 2.677980 2.681151 0.9988172
12 2.964699 2.967525 0.9990479
RV:
0.5537773
> randtest(coia.chem.topo, nrepet=999) # 置换检验
Monte-Carlo test
Call: randtest.coinertia(xtest = coia.chem.topo, nrepet = 999)
Observation: 0.5537773
Based on 999 replicates
Simulated p-value: 0.001
Alternative hypothesis: greater
Std.Obs Expectation Variance
11.211685860 0.059422647 0.001944175
> plot(coia.chem.topo)
多元因子分析(multiple factor ,analysis,MFA )
> # 多元因子分析(MFA)
> # *******************
> # 三组变量的MFA
> # 组合三个表格(Hellinger转化的物种数据、地形变量和水体化学属性)
> tab3 <- data.frame(spe.hel, envtopo, envchem)
> dim(tab3)
[1] 29 37
> (grn <- c(ncol(spe), ncol(envtopo), ncol(envchem)))
[1] 27 3 7
> # 计算MFA(附带多图)
> # graphics.off() # 关闭前面的绘图窗口
> t3.mfa <- MFA(tab3, group=grn, type=c("c","s","s"), ncp=2,
+ name.group=c("鱼类群落","地形","水质"))
> t3.mfa
**Results of the Multiple Factor Analysis (MFA)**
The analysis was performed on 29 individuals, described by 37 variables
*Results are available in the following objects :
name description
1 "$eig" "eigenvalues"
2 "$separate.analyses" "separate analyses for each group of variables"
3 "$group" "results for all the groups"
4 "$partial.axes" "results for the partial axes"
5 "$inertia.ratio" "inertia ratio"
6 "$ind" "results for the individuals"
7 "$quanti.var" "results for the quantitative variables"
8 "$summary.quanti" "summary for the quantitative variables"
9 "$global.pca" "results for the global PCA"
plot(t3.mfa, choix="ind", habillage="none")
plot(t3.mfa, choix="ind", habillage="none", partial="all")
plot(t3.mfa, choix="var", habillage="group")
plot(t3.mfa, choix="axes")
> # RV系数及检验
> (rvp <- t3.mfa$group$RV)
鱼类群落 地形 水质 MFA
鱼类群落 1.0000000 0.5802803 0.5053235 0.8604235
地形 0.5802803 1.0000000 0.3618737 0.8002752
水质 0.5053235 0.3618737 1.0000000 0.7672412
MFA 0.8604235 0.8002752 0.7672412 1.0000000
> rvp[1,2] <- coeffRV(spe.hel, scale(envtopo))$p.value
> rvp[1,3] <- coeffRV(spe.hel, scale(envchem))$p.value
> rvp[2,3] <- coeffRV(scale(envtopo), scale(envchem))$p.value
> round(rvp[-4,-4], 6)
鱼类群落 地形 水质
鱼类群落 1.000000 0.000002 0.000002
地形 0.580280 1.000000 0.002811
水质 0.505324 0.361874 1.000000
> # 特征根和方差百分百
> t3.mfa$eig
eigenvalue percentage of variance cumulative percentage of variance
comp 1 2.3406761135 47.703588747 47.70359
comp 2 0.7725455009 15.744678493 63.44827
comp 3 0.5134558271 10.464363469 73.91263
comp 4 0.3773033186 7.689539890 81.60217
comp 5 0.2006821321 4.089954113 85.69212
comp 6 0.1537343955 3.133147015 88.82527
comp 7 0.1336609567 2.724045105 91.54932
comp 8 0.1075494099 2.191884981 93.74120
comp 9 0.0568909440 1.159452254 94.90065
comp 10 0.0471241618 0.960402690 95.86106
comp 11 0.0411904958 0.839473032 96.70053
comp 12 0.0301940493 0.615362589 97.31589
comp 13 0.0278536386 0.567664408 97.88356
comp 14 0.0200600195 0.408828422 98.29239
comp 15 0.0189179899 0.385553561 98.67794
comp 16 0.0152012570 0.309805577 98.98774
comp 17 0.0119765023 0.244084236 99.23183
comp 18 0.0093145099 0.189832137 99.42166
comp 19 0.0068523708 0.139653101 99.56131
comp 20 0.0060259428 0.122810283 99.68412
comp 21 0.0054933843 0.111956602 99.79608
comp 22 0.0032748975 0.066743264 99.86282
comp 23 0.0027672695 0.056397672 99.91922
comp 24 0.0016028422 0.032666342 99.95189
comp 25 0.0011209475 0.022845203 99.97473
comp 26 0.0007359563 0.014998981 99.98973
comp 27 0.0003982184 0.008115794 99.99785
comp 28 0.0001055944 0.002152041 100.00000
> ev <- t3.mfa$eig[,1]
> names(ev) <- 1:nrow(t3.mfa$eig)
evplot(ev)
aa <- dimdesc(t3.mfa, axes=1:2, proba=0.0001)
# 保留最显著(相关性)的变量排序图
varsig <- t3.mfa$quanti.var$cor[unique(c(rownames(aa$Dim.1$quanti),
rownames(aa$Dim.2$quanti))),]
plot(varsig[,1:2], asp=1, type="n", xlim=c(-1,1), ylim=c(-1,1))
abline(h=0, lty=3)
abline(v=0, lty=3)
symbols(0, 0, circles=1, inches=FALSE, add=TRUE)
arrows(0, 0, varsig[,1], varsig[,2], length=0.08, angle=20)
for (v in 1:nrow(varsig)) {
if (abs(varsig[v,1]) > abs(varsig[v,2])) {
if (varsig[v,1] >= 0) pos <- 4
else pos <- 2
}
else {
if (varsig[v,2] >= 0) pos <- 3
else pos <- 1
}
text(varsig[v,1], varsig[v,2], labels=rownames(varsig)[v], pos=pos)
}
参考
典型相关与对应分析
Lesson 13: Canonical Correlation Analysis