【R>>DepMap】单基因肿瘤细胞表达情况

CCLE是肿瘤细胞基因表达的百科全书,之前写CCLE:求任意基因的相关性稍微介绍过其用法。今天再打开CCLE,发现一直在转圈圈,难不成是跑路了?
于是挖掘下另一个网站DepMap,其主业是利用 RNAi和CRISPR-Cas9技术筛选各种基因对肿瘤的必要性。(即Achilles项目
同时DepMap还收集许多其他的数据库的数据,而且整理的还挺好:


在Download模块中,有Top Download, custom Downlaod和All Download共3种方式。

你会发现CCLE和CRISPR是下载最多的数据。
当然我们也可以在这里找到某个基因在细胞系中的整体表达情况:

1.数据下载

下面以ALKBH5为例,首先在搜索框中输入“ALKBH5”,搜索后出现选择Characterization选项,右上角下载该基因的表达矩阵ALKBH5 Expression 21Q2 Public.csv


2. 泛癌boxplot

rm(list = ls())
library(tidyverse)
library(ggpubr)
rt <- data.table::fread("ALKBH5 Expression 21Q2 Public.csv",data.table = F)
ggplot(rt,aes(x=reorder(`Primary Disease`,`Expression 21Q2 Public`,FUN=median), #按中位数自动排序
              y=`Expression 21Q2 Public`,color=`Primary Disease`))+
  geom_boxplot()+
  geom_jitter(width = 0.15)+
  geom_hline(yintercept = mean(rt$`Expression 21Q2 Public`),lty=2)+
  theme_classic(base_size = 12)+
  rotate_x_text(45)+
  labs(x="",y="ALKBH5 expression \nLog2(TPM+1)")+
  theme(legend.position = "none")+
  stat_summary(fun.data = "mean_sd",geom = "errorbar",width=0.3,position = position_dodge(0.9))+
  stat_compare_means(method = "anova",label.x = 3,label.y = 7)

3.单个肿瘤表达

rt1 <- rt[rt$`Primary Disease`=="Kidney Cancer",]
ggplot(rt1,aes(x=reorder(`Cell Line Name`,`Expression 21Q2 Public`,FUN=median), #按中位数自动排序
              y=`Expression 21Q2 Public`))+
  geom_segment(aes(y=mean(`Expression 21Q2 Public`),
                   xend=`Cell Line Name`,
                   yend=`Expression 21Q2 Public`))+
  geom_point(aes(size=`Expression 21Q2 Public`,
                 color=`Expression 21Q2 Public`))+
  geom_hline(yintercept = mean(rt1$`Expression 21Q2 Public`),lty=2)+
  theme_bw(base_size = 12)+
  labs(x="",y="ALKBH5 expression",
       color="ALKBH5 expression",
       size="ALKBH5 expression")+
  scale_color_viridis_c(alpha = 1,begin = 0.6,end=0.9,direction = -1)+
  coord_flip()

番外篇:搜索出来的4个模块中,必要性 Perturbation effects表示这个基因去除后对细胞生存的影响,CERES值越负表示基因敲除后,对细胞的(生存/增殖)影响越大。

例如下面文献的筛选标准:


Candidate genes were defined as essential genes with a CERES score of < −1 across 75% of LUAD cell lines (n = 31).

function(x) {sum(x < -1) > 0.75 * ncol(ceres.luad)}

参考链接:
单基因的肿瘤细胞系表达怎么看?CCLE告诉你

CCLE:求任意基因的相关性

2021-03-15 DepMap数据库

Cancer Essential Genes Stratified Lung Adenocarcinoma Patients with Distinct Survival Outcomes and Identified a Subgroup from the Terminal Respiratory Unit Type with Different Proliferative Signatures in Multiple Cohorts

你可能感兴趣的:(【R>>DepMap】单基因肿瘤细胞表达情况)