GSVA自定义基因集分析

微信公众号：研平方
关注可了解更多的科研教程及技巧。如有问题或建议，请在公众号留言
欢迎关注我：一起学习，一起进步！

已经很久没有再用R语言跑过数据了，最近有朋友需要跑GSVA，顺便重温了下R，现将内容分享如下。

1.GSVA简介

GSVA全名Gene set variation analysis（基因集变异分析），是一种非参数，无监督的算法。与GSEA不同，GSVA不需要预先对样本进行分组，可以计算每个样本中特定基因集的富集分数。换而言之，GSVA转化了基因表达数据，从单个基因作为特征的表达矩阵，转化为特定基因集作为特征的表达矩阵。GSVA对基因富集结果进行了量化，可以更方便地进行后续统计分析。如果用limma包做差异表达分析可以寻找样本间差异表达的基因，同样地，使用limma包对GSVA的结果（依然是一个矩阵）做同样的分析，则可以寻找样本间有显著差异的基因集。这些“差异表达”的基因集，相对于基因而言，更加具有生物学意义，更具有可解释性，可以进一步用于肿瘤subtype的分型等等与生物学意义结合密切的探究。

GSVA

关于GSVA的原理与理解，就无需展开说了，在线资源很多！

2.准备数据

2.1 加载相应的包

setwd(" ")
rm(list = ls())
options(stringsAsFactors = F)
library(GSVA)
library(GSEABase)
library(msigdbr)
library(clusterProfiler)
library(org.Hs.eg.db)
library(enrichplot)
library(limma)

2.2 Expression Data

exprSet <- read.table("exprSet.txt",header = T,sep = ",")
rownames(exprSet) <- exprSet$X
exprSet <- exprSet[,-1]
str(exprSet)

2.3 自定义基因集

2.3.1 版本一:没眼睛看

pathway <- read_delim("pathway.txt", "\t", 
                      escape_double = FALSE, trim_ws = TRUE)
pathway <- as.data.frame(pathway)

if(T){
T_cell_activation <- unique(na.omit(pathway$`T cell activation`))
toll_like_receptor_signaling_pathway <- unique(na.omit(pathway$`toll-like receptor signaling pathway`))
leukocyte_differentiation <- unique(na.omit(pathway$`leukocyte differentiation`))
positive_regulation_of_cell_death <- unique(na.omit(pathway$`positive regulation of cell death`))
neutrophil_activation <- unique(na.omit(pathway$`neutrophil activation`))
positive_regulation_of_immune_response <- unique(na.omit(pathway$`positive regulation of immune response`))
}

pathway_list <- list(T_cell_activation,toll_like_receptor_signaling_pathway,leukocyte_differentiation,
                     positive_regulation_of_cell_death,neutrophil_activation,positive_regulation_of_immune_response)

names(pathway_list) <- c("T cell activation","toll-like receptor signaling pathway","leukocyte differentiation",
                         "positive regulation of cell death","neutrophil activation","positive regulation of immune response")

2.3.2 版本二:for循环

pathway_list <- vector("list",length(pathway))

for (i in seq_along(pathway)) {
  pathway_list[[i]] <- unique(na.omit(pathway[,i]))
}

names(pathway_list) <- c("T cell activation","toll-like receptor signaling pathway","leukocyte differentiation",
                         "positive regulation of cell death","neutrophil activation","positive regulation of immune response")

2.3.3 版本二:lappy()

pathway_list <- lapply(pathway, function(x) {
  unique(na.omit(x)) 
})

不得不说，apply()家族是真的香呀！

3.实战

gsva_matrix_BD <- gsva(as.matrix(exprSet), pathway_list,method='gsva',
                    kcdf='Gaussian',abs.ranking=TRUE)
write.csv(gsva_matrix_BD,file = "gsva_matrix_BD.csv")

4.结果

GSVA富集分析结果