Analysis of single cell RNA-seq data 学习笔记(六)

这一节,我们介绍一下单细胞生物学分析的内容

PCA

#加载包
library(pcaMethods) 
library(pcaReduce) 
library(SC3) library(scater) 
library(SingleCellExperiment) 
library(pheatmap) 
library(mclust) 

#读取数据
deng <- readRDS("deng/deng-reads.rds") 

接下来,我们看一下同一类型的细胞的聚类情况

plotPCA(deng, colour_by = "cell_type2")

该例子中我们已知每个细胞的细胞类型,根据表达矩阵来进行聚类
看一下是否通过表达矩阵能够准确的细胞类型,而不是交错在一起

接下来可以画一下表达谱,看看那些基因表达相近


t-SNE

这次我们不事先标记好细胞类型,而是用表达矩阵表达情况直接聚类,再标记细胞

colData(deng)$tSNE_kmeans <- as.character(kmeans(deng@reducedDims$TSNE, centers = 8)$clust) 
plotTSNE(deng, rand_seed = 1, colour_by = "tSNE_kmeans")

我们假设用kmeans分为8类,然后自然聚类,我们看图可以清楚看到可以很好的分成8类,接下来根据不同类型的细胞marker鉴定细胞

Pseudotime analysis

这一块我们讨论下细胞轨迹的追踪,特别是在发育过程中,不同时期的细胞类型会不同,也就是说在一个时期,某细胞的细胞类型为A,在下一个时期,可能就会发育成B细胞类型

还是上面的数据,我们在某一时期已知这些细胞不同时期的细胞类型

deng_SCE <- readRDS("deng/deng-reads.rds") 
deng_SCE$cell_type2 <- factor( deng_SCE$cell_type2, levels = c("zy", "early2cell", "mid2cell", "late2cell", "4cell", "8cell", "16cell", "earlyblast", "midblast", "lateblast") ) 
cellLabels <- deng_SCE$cell_type2 deng <- counts(deng_SCE) 
colnames(deng) <- cellLabels deng_SCE <- plotPCA(deng_SCE, colour_by = "cell_type2", return_SCE = TRUE)

我们提取PC1的数值,来看不同时期的细胞类型:

deng_SCE$PC1 <- reducedDim(deng_SCE, "PCA")[,1] 
ggplot(as.data.frame(colData(deng_SCE)), aes(x = PC1, y = cell_type2,colour = cell_type2)) 
+ geom_quasirandom(groupOnX = FALSE) 
+ scale_color_tableau() + theme_classic() 
+ xlab("First principal component") 
+ ylab("Timepoint") 
+ ggtitle("Cells ordered by first principal component")

由这幅图我们可以知道,每种细胞类型的数量,以及表达谱计算的每个细胞PC1的贡献

TSCAN

接下来就是对不同时期的细胞类型进行细胞轨迹的追踪,看一下细胞的发育轨迹:

procdeng <- TSCAN::preprocess(deng) 
colnames(procdeng) <- 1:ncol(deng) 
dengclust <- TSCAN::exprmclust(procdeng, clusternum = 10) TSCAN::plotmclust(dengclust)
dengorderTSCAN <- TSCAN::TSCANorder(dengclust, orderonly = FALSE) pseudotime_order_tscan <- as.character(dengorderTSCAN$sample_name) 
deng_SCE$pseudotime_order_tscan <- NA 
deng_SCE$pseudotime_order_tscan[as.numeric(dengorderTSCAN$sample_name)] <-dengorderTSCAN$Pseudotime

#可视化
ggplot(as.data.frame(colData(deng_SCE)), aes(x = pseudotime_order_tscan, y = cell_type2, colour = cell_type2)) 
+ geom_quasirandom(groupOnX = FALSE) 
+ scale_color_tableau() 
+ theme_classic() 
+ xlab("TSCAN pseudotime") 
+ ylab("Timepoint") 
+ ggtitle("Cells ordered by TSCAN pseudotime")

这样的话我们可以看到不同时期的细胞的发育时长,是否和对应时期一致

monocle:

这个包也可以追踪细胞轨迹:

m3dGenes <- as.character( M3DropFeatureSelection(deng)$Gene )
d <- deng[which(rownames(deng) %in% m3dGenes), ] 
d <- d[!duplicated(rownames(d)), ]
d <- deng[which(rownames(deng) %in% m3dGenes), ] 
d <- d[!duplicated(rownames(d)), ]
colnames(d) <- 1:ncol(d) 
geneNames <- rownames(d) 
rownames(d) <- 1:nrow(d) 
pd <- data.frame(timepoint = cellLabels) 
pd <- new("AnnotatedDataFrame", data=pd) 
fd <- data.frame(gene_short_name = geneNames) 
fd <- new("AnnotatedDataFrame", data=fd)
dCellData <- newCellDataSet(d, phenoData = pd, featureData = fd, expressionFamily = tobit()) 
dCellData <- setOrderingFilter(dCellData, which(geneNames %in% m3dGenes)) 
dCellData <- estimateSizeFactors(dCellData) 
dCellDataSet <- reduceDimension(dCellData, pseudo_expr = 1) 
dCellDataSet <- orderCells(dCellDataSet, reverse = FALSE) plot_cell_trajectory(dCellDataSet)

在PCA的聚类图中画出细胞轨迹



接下来我们同样看一下不同时期细胞类型与发育时间的关系:

pseudotime_monocle <-data.frame( Timepoint = phenoData(dCellDataSet)$timepoint, pseudotime = phenoData(dCellDataSet)$Pseudotime, State = phenoData(dCellDataSet)$State ) 
rownames(pseudotime_monocle) <- 1:ncol(d) 
pseudotime_order_monocle <-rownames(pseudotime_monocle[order(pseudotime_monocle$pseudotime), ])

#可视化
ggplot(as.data.frame(colData(deng_SCE)), aes(x = pseudotime_monocle, y = cell_type2, colour = cell_type2)) 
+ geom_quasirandom(groupOnX = FALSE) 
+ scale_color_tableau() 
+ theme_classic() 
+ xlab("monocle pseudotime")
+ ylab("Timepoint") 
+ ggtitle("Cells ordered by monocle pseudotime")

原理与上面的包一致,这里不再赘述

假设我想比较不同时期我们想看的基因的表达情况怎么办呢?

ouija_markers_down <- c("Dazl", "Rnf17", "Sycp3", "Fgf8", "Egfr", "Bmp5", "Bmp15", "Pou5f1") 
ouija_markers_up <- c("Creb3", "Gpx4", "Krt8", "Elf5", "Cdx2", "Tdgf1", "Gdf3", "Eomes") 
ouija_markers_transient <- c("Zscan4b", "Foxa1", "Prdm14", "Sox21") 
ouija_markers <- c(ouija_markers_down, ouija_markers_up, ouija_markers_transient) 
plotExpression(deng_SCE, ouija_markers, x = "cell_type2", colour_by = "cell_type2") + theme(axis.text.x = element_text(angle = 60, hjust = 1))

这个来自于Ouija (http://kieranrcampbell.github.io/ouija/)



这样表达量情况和分布情况就可以很清楚的看到了

你可能感兴趣的:(Analysis of single cell RNA-seq data 学习笔记(六))