GSVA基因集变异分析结合limma包分析差异基因集
基因集变异分析即GSVA(Gene set variation analysis),是一种非参数的无监督分析方法,主要用来评估芯片和转录组的基因集富集结果,适合单细胞转录组测序基因集的差异分析[1] [2]。通过将基因在不同样品间的表达量矩阵转化成基因集在样品间的表达量矩阵,从而来评估不同的代谢通路在不同样品间是否富集。简单来说就是研究这些感兴趣的基因集在不同样品间的差异,或者寻找比较重要的基因集。
计算GSVA分数
数据基于单细胞转录组Seurat对象object。首先加载msigdbr和GSVA两个包,msigdbr用于获取MSigdb(Molecular Signatures Database)数据库KEGG基因集,数据库包含了以下9种不同基因的基因,可供下载以及R软件包载入。
使用gsva函数计算GSVA富集分数,parallel.sz指定并行计算的进程数,这里要设置为1,默认0或者大于1的执行速度更慢,属实鸡肋,设置为1要快很多。另外输入矩阵为object对象中log转化后的表达矩阵指定kcdf参数为"Gaussian",如果是counts矩阵则指定kcdf="Poisson"。
method参数可以设置为"gsva"或者"zscore",单细胞数据较大,一般有上万个细胞,指定method = "zscore"会快很多。
library(msigdbr)
library(GSVA)
## 表达矩阵
expr=as.matrix(object@assays$RNA@data)
##通路基因集
msgdC2 = msigdbr(species = "Homo sapiens", category = "C2",subcategory = "KEGG")
keggSet = msgdC2 %>% split(x = .$gene_symbol, f = .$gs_description)
kegg <- gsva(expr, gset.idx.list = keggSet, kcdf="Gaussian",method = "zscore",
parallel.sz=1,parallel.type="snow")
linear model
使用limma包获取差异基因集
library(limma)
## limma gsva通路活性评估
de_gsva <- function(exprSet,meta,compare = NULL){
allDiff = list()
design <- model.matrix(~0+factor(meta))
colnames(design)=levels(factor(meta))
rownames(design)=colnames(exprSet)
fit <- lmFit(exprSet,design)
if(length(unique(meta))==2){
if(is.null(compare)){
stop("there are 2 Groups,Please set compare value")
}
contrast.matrix<-makeContrasts(contrasts = compare,levels = design)
fit2 <- contrasts.fit(fit, contrast.matrix)
fit2 <- eBayes(fit2)
tempOutput = topTable(fit2,adjust='fdr', coef=1, number=Inf)
allDiff[[compare]] = na.omit(tempOutput)
}else if(length(unique(meta))>2){
for(g in colnames(design)){
fm = ""
for(gother in colnames(design)[which(!colnames(design) %in% g)]){
fm = paste0(fm,"+",gother)
}
fm = paste0(g,"VsOthers = ",g,"-(",substring(fm,2),")/",ncol(design)-1)
contrast.matrix <- makeContrasts(contrasts = fm,levels=design)
fit2 <- contrasts.fit(fit, contrast.matrix)
fit2 <- eBayes(fit2)
allDiff[[g]]=topTable(fit2,adjust='fdr',coef=1,number=Inf)
}
}else{
stop("error only have one group")
}
return(allDiff)
}
meta <- [email protected][,c("Group")]
Diff =de_gsva(exprSet = kegg ,meta = meta,compare = "Asthma-Control")
ggplot可视化
idiff <-Diff[["Asthma-Control"]]
df <- data.frame(ID = rownames(idiff), score = idiff$t )
df$group =sapply(1:nrow(idiff),function(x){
if(idiff[x,"logFC"]>0 & idiff[x,"adj.P.Val"]0,1,0)
df$nudge_y = ifelse(df$score>0,-0.1,0.1)
sortdf <- df[order(df$score),]
sortdf$ID <- factor(sortdf$ID, levels = sortdf$ID)
limt = max(abs(df$score))
ggplot(sortdf, aes(ID, score,fill=group)) +
geom_bar(stat = 'identity',alpha = 0.7) +
scale_fill_manual(breaks=c("down","noSig","up"),
values = c("#008020","grey","#08519C"))+
geom_text(data = df, aes(label = df$ID, y = df$nudge_y),
nudge_x =0,nudge_y =0,hjust =df$hjust,
size = tex.size)+
labs(x = paste0(type," pathways"),
y=paste0("t value of GSVA score\n",compare),
title = title)+
scale_y_continuous(limits=c(-limt,limt))+
coord_flip() +
theme_bw() + #去除背景色
theme(panel.grid =element_blank())+
theme(panel.border = element_rect(size = 0.6)
#panel.border = element_blank()
)+
theme(plot.title = element_text(hjust = 0.5,size = 18),
axis.text.y = element_blank(),
axis.title = element_text(hjust = 0.5,size = 18),
axis.line = element_blank(),
axis.ticks.y = element_blank(),
legend.position = limt
)
如下图所示
gsva热图展示如下图,当多个亚型之间比较时,可以用热图展示不同亚型分组间比较的t值。暂无热图代码。
-
Lambrechts D, Wauters E, Boeckx B, et al. Phenotype molding of stromal cells in the lung tumor microenvironment[J]. Nature medicine, 2018, 24(8): 1277-1289. ↩
-
Chen Y P, Yin J H, Li W F, et al. Single-cell transcriptomics reveals regulators underlying immune cell diversity and immune subtypes associated with prognosis in nasopharyngeal carcinoma[J]. Cell Research, 2020, 30(11): 1024-1042. ↩