GSVA自定义基因集分析

微信公众号:研平方
关注可了解更多的科研教程及技巧。如有问题或建议,请在公众号留言
欢迎关注我:一起学习,一起进步!

已经很久没有再用R语言跑过数据了,最近有朋友需要跑GSVA,顺便重温了下R,现将内容分享如下。

1.GSVA简介

GSVA全名Gene set variation analysis(基因集变异分析),是一种非参数,无监督的算法。与GSEA不同,GSVA不需要预先对样本进行分组,可以计算每个样本中特定基因集的富集分数。换而言之,GSVA转化了基因表达数据,从单个基因作为特征的表达矩阵,转化为特定基因集作为特征的表达矩阵。GSVA对基因富集结果进行了量化,可以更方便地进行后续统计分析。如果用limma包做差异表达分析可以寻找样本间差异表达的基因,同样地,使用limma包对GSVA的结果(依然是一个矩阵)做同样的分析,则可以寻找样本间有显著差异的基因集。这些“差异表达”的基因集,相对于基因而言,更加具有生物学意义,更具有可解释性,可以进一步用于肿瘤subtype的分型等等与生物学意义结合密切的探究。


GSVA

关于GSVA的原理与理解,就无需展开说了,在线资源很多!

2.准备数据

2.1 加载相应的包

setwd(" ")
rm(list = ls())
options(stringsAsFactors = F)
library(GSVA)
library(GSEABase)
library(msigdbr)
library(clusterProfiler)
library(org.Hs.eg.db)
library(enrichplot)
library(limma)

2.2 Expression Data

exprSet <- read.table("exprSet.txt",header = T,sep = ",")
rownames(exprSet) <- exprSet$X
exprSet <- exprSet[,-1]
str(exprSet)

2.3 自定义基因集

2.3.1 版本一:没眼睛看

pathway <- read_delim("pathway.txt", "\t", 
                      escape_double = FALSE, trim_ws = TRUE)
pathway <- as.data.frame(pathway)

if(T){
T_cell_activation <- unique(na.omit(pathway$`T cell activation`))
toll_like_receptor_signaling_pathway <- unique(na.omit(pathway$`toll-like receptor signaling pathway`))
leukocyte_differentiation <- unique(na.omit(pathway$`leukocyte differentiation`))
positive_regulation_of_cell_death <- unique(na.omit(pathway$`positive regulation of cell death`))
neutrophil_activation <- unique(na.omit(pathway$`neutrophil activation`))
positive_regulation_of_immune_response <- unique(na.omit(pathway$`positive regulation of immune response`))
}

pathway_list <- list(T_cell_activation,toll_like_receptor_signaling_pathway,leukocyte_differentiation,
                     positive_regulation_of_cell_death,neutrophil_activation,positive_regulation_of_immune_response)

names(pathway_list) <- c("T cell activation","toll-like receptor signaling pathway","leukocyte differentiation",
                         "positive regulation of cell death","neutrophil activation","positive regulation of immune response")

2.3.2 版本二:for循环

pathway_list <- vector("list",length(pathway))

for (i in seq_along(pathway)) {
  pathway_list[[i]] <- unique(na.omit(pathway[,i]))
}

names(pathway_list) <- c("T cell activation","toll-like receptor signaling pathway","leukocyte differentiation",
                         "positive regulation of cell death","neutrophil activation","positive regulation of immune response")

2.3.3 版本二:lappy()

pathway_list <- lapply(pathway, function(x) {
  unique(na.omit(x)) 
})

不得不说,apply()家族是真的香呀!

3.实战

gsva_matrix_BD <- gsva(as.matrix(exprSet), pathway_list,method='gsva',
                    kcdf='Gaussian',abs.ranking=TRUE)
write.csv(gsva_matrix_BD,file = "gsva_matrix_BD.csv")

4.结果

GSVA富集分析结果

你可能感兴趣的:(GSVA自定义基因集分析)