技能树直播课程学习-WGCNA-2-网络构建和模块检测

1. 挑选软阈值

软阈值的要求：无尺度拓扑网络系数 R^2 高于 0.85，或者或平均连接度降到 100 以下

# Choose a set of soft-thresholding powers
powers = c(c(1:10), seq(from = 12, to=20, by=1))
# Call the network topology analysis function
sft = pickSoftThreshold(datExpr, powerVector = powers, verbose = 5)
# Plot the results:
sizeGrWindow(9, 5)
png("step2-beta-value.png",width = 800,height = 600)
par(mfrow = c(1,2));
cex1 = 0.9 # 图里面字体大小
# Scale-free topology fit index as a function of the soft-thresholding power
plot(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],
     xlab="Soft Threshold (power)",ylab="Scale Free Topology Model Fit,signed R^2",type="n",
     main = paste("Scale independence"));
text(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2],
     labels=powers,cex=cex1,col="red")+
  abline(h=0.70,col="red")# this line corresponds to using an R^2 cut-off of
# Mean connectivity as a function of the soft-thresholding power
plot(sft$fitIndices[,1], sft$fitIndices[,5],
     xlab="Soft Threshold (power)",ylab="Mean Connectivity", type="n",
     main = paste("Mean connectivity"))
text(sft$fitIndices[,1], sft$fitIndices[,5], labels=powers, cex=cex1,col="red")+abline(h=100,col="red")
dev.off()
sft$powerEstimate # 自动评估的软阈值

如果没有合适的阈值，可以使用 R 包作者提供的经验阈值，如下

2. 构建网络（auto）

net = blockwiseModules(datExpr, power = 7,
                       TOMType = "unsigned",
                       minModuleSize = 30,
                       reassignThreshold = 0,
                       mergeCutHeight = 0.25,
                       numericLabels = TRUE,
                       pamRespectsDendro = FALSE,
                       saveTOMs = FALSE,
                       saveTOMFileBase = "WGCNA TOM",
                       verbose = 3)
# 参数mergeCutHeight是模块合并的阈值。net$Colors包含模块赋值，net$MES包含模块的模块特征(module eigengenes of the modules)。

table(net$colors)
# 0    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15
# 10 1215 1113  511  334  325  272  192  190  174  171  160  127   74   72   60
# 共15个模块，按照大小从大到小的顺序标记为1到15。0表示没有聚到另外15个模块里
#用于模块标识的层次聚类树形图(树)在net$Dendrogram[[1]]；$中返回。

sizeGrWindow(12, 9)
# Convert labels to colors for plotting
mergedColors = labels2colors(net$colors)
# Plot the dendrogram and the module colors underneath
png("step2-genes-modules.png",width = 800,height = 500)
plotDendroAndColors(net$dendrograms[[1]], mergedColors[net$blockGenes[[1]]],
                    "Module colors",
                    dendroLabels = FALSE, hang = 0.03,
                    addGuide = TRUE, guideHang = 0.05)
dev.off()

最后保存数据

moduleLabels = net$colors
moduleColors = labels2colors(net$colors)
MEs = net$MEs
geneTree = net$dendrograms[[1]] #用于模块标识的层次聚类树形图(树)保存在geneTree里面
save(MEs, moduleLabels, moduleColors, geneTree,
     file = "WGCNA-02-networkConstruction-auto.RData")

3. 构建网络（step-by-step）

3.1. 生成邻接矩阵和 TOM 矩阵

## Co-expression similarity and adjacency
softPower = 7
# 若有最佳的power的估计值，则用“sft$powerEstimate”
adjacency = adjacency(datExpr, power = softPower)

## To minimize effects of noise and spurious associations (最小化噪声和伪关联的影响), we transform the adjacency into Topological Overlap Matrix, and calculate the corresponding dissimilarity (并计算相应的相异性):
# Turn adjacency into topological overlap
TOM = TOMsimilarity(adjacency);
dissTOM = 1-TOM

3.2. 聚类

# Call the hierarchical clustering function
geneTree = hclust(as.dist(dissTOM), method = "average")
# Plot the resulting clustering tree (dendrogram)
sizeGrWindow(12,9)
plot(geneTree, xlab="", sub="", main = "Gene clustering on TOM-based dissimilarity",
     labels = FALSE, hang = 0.04);

3.3. 识别模块

# 在聚类树（树状图）中，每一片叶子，即一条短的垂直线，对应一个基因。树状图的分支群紧密相连，高度共表达基因。模块识别相当于单个分支的识别（“从树状图上切下分支”）。有几种分支切割方法；我们的标准方法是从包Dynamic Tree Cut中切割动态树。下一段代码说明了它的用途。
# We like large modules, so we set the minimum module size relatively high:
minModuleSize = 30;
# Module identification using dynamic tree cut:
dynamicMods = cutreeDynamic(dendro = geneTree, distM = dissTOM,
                            deepSplit = 2, pamRespectsDendro = FALSE,
                            minClusterSize = minModuleSize);
table(dynamicMods)
# 0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21  22
# 1 471 419 240 227 216 193 190 185 174 167 151 151 151 131 116 110 108 103  98  96  93  89
# 23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40
# 88  87  80  74  73  73  72  66  65  63  60  60  54  48  47  39  37  34

# 函数返回40个模块，标记为1–40从大到小。标签0是为未分配的基因保留的。上面的命令列出了模块的大小。我们现在在基因树状图下绘制模块分配

# Convert numeric lables into colors
dynamicColors = labels2colors(dynamicMods)
table(dynamicColors)
# Plot the dendrogram and colors underneath
sizeGrWindow(8,6)
plotDendroAndColors(geneTree, dynamicColors, "Dynamic Tree Cut",
                    dendroLabels = FALSE, hang = 0.03,
                    addGuide = TRUE, guideHang = 0.05,
                    main = "Gene dendrogram and module colors")

3.4. 合并共表达模块

# 动态树切割可以识别其表达谱非常相似的模块。我们应当合并高度共表达的模块。为了量化整个模块的共表达相似性，我们计算它们的特征基因并根据它们的相关性对它们进行聚类：
# Calculate eigengenes
MEList = moduleEigengenes(datExpr, colors = dynamicColors)
MEs = MEList$eigengenes
# Calculate dissimilarity of module eigengenes
MEDiss = 1-cor(MEs);
# Cluster module eigengenes
METree = hclust(as.dist(MEDiss), method = "average");
# Plot the result
sizeGrWindow(7, 6)
plot(METree, main = "Clustering of module eigengenes",
     xlab = "", sub = "")

# 我们选择0.25的高度切割，相当于对相互之间相关性为0.75进行合并
MEDissThres = 0.25
# Plot the cut line into the dendrogram
abline(h=MEDissThres, col = "red")
# Call an automatic merging function
merge = mergeCloseModules(datExpr, dynamicColors, cutHeight = MEDissThres, verbose = 3)
# The merged module colors
mergedColors = merge$colors;

length(table(mergedColors))
# Eigengenes of the new merged modules:
mergedMEs = merge$newMEs;

sizeGrWindow(12, 9)
#pdf(file = "Plots/geneDendro-3.pdf", wi = 9, he = 6)
plotDendroAndColors(geneTree, cbind(dynamicColors, mergedColors),
                    c("Dynamic Tree Cut", "Merged dynamic"),
                    dendroLabels = FALSE, hang = 0.03,
                    addGuide = TRUE, guideHang = 0.05)
#dev.off()

3.5. 保存数据

# Rename to moduleColors
moduleColors = mergedColors
# Construct numerical labels corresponding to the colors
colorOrder = c("grey", standardColors(50));
moduleLabels = match(moduleColors, colorOrder)-1;
MEs = mergedMEs;
# Save module colors and labels for use in subsequent parts
save(MEs, moduleLabels, moduleColors, geneTree, file = "WGCNA-02-networkConstruction-stepByStep.RData")

友情宣传

全国巡讲全球听（买一得五），第二期，你的生物信息学入门课
生信技能树的2019年终总结，你的生物信息学成长宝藏
2020学习主旋律，B站74小时免费教学视频为你领路

技能树直播课程学习-WGCNA-2-网络构建和模块检测

1. 挑选软阈值

2. 构建网络（auto）

3. 构建网络（step-by-step）

3.1. 生成邻接矩阵和 TOM 矩阵

3.2. 聚类

3.3. 识别模块

3.4. 合并共表达模块

3.5. 保存数据

友情宣传

你可能感兴趣的:(技能树直播课程学习-WGCNA-2-网络构建和模块检测)