单细胞转录组基础分析五：细胞再聚类

本文是参考学习单细胞转录组基础分析五：细胞再聚类的学习笔记。可能根据学习情况有所改动。

单细胞数据分析中，一般需要对可以细分的细胞再聚类，比如本次分析中的T细胞群体可以细分为Naive T cells、CD8+ T cells、Treg cells、Tmemory cells等。细胞子集的提取使用subset函数，主要依据meta.data的分类项目选择，也可以自定义细胞barcode提取。

提取细胞子集

library(Seurat)

重新降维聚类

因为再聚类的细胞之间差异比较小，所以聚类函数FindClusters()控制分辨率的参数建议调高到resolution = 0.9。

##PCA降维

图片

Cluster差异分析

diff.wilcox = FindAllMarkers(scRNAsub)

SingleR细胞鉴定

Subcluster的细胞同样可以使用SingleR鉴定细胞类型。使用的时候注意调整参考数据库和分类标签，以便鉴定结果更有针对性。上节使用SingleR时使用的参考数据库是人类主要细胞图谱（HumanPrimaryCellAtlasData），采用分类标签是主分类标签（label.main）；这次建议使用人类免疫细胞数据（MonacoImmuneData），分类标签采用精细分类标签（label.fine）。希望详细了解SingleR的朋友可以到github看看：

https://github.com/dviraran/singler

##细胞类型鉴定

图片

> library(Seurat)
> library(monocle)
载入需要的程辑包：Matrix

载入程辑包：‘Matrix’

The following object is masked from ‘package:S4Vectors’:

    expand

The following objects are masked from ‘package:tidyr’:

    expand, pack, unpack

载入需要的程辑包：VGAM
载入需要的程辑包：splines

载入程辑包：‘VGAM’

The following object is masked from ‘package:tidyr’:

    fill

载入需要的程辑包：DDRTree
载入需要的程辑包：irlba
Warning messages:
1: 程辑包‘monocle’是用R版本4.0.3 来建造的 
2: 程辑包‘Matrix’是用R版本4.0.3 来建造的 
3: 程辑包‘VGAM’是用R版本4.0.3 来建造的 
4: 程辑包‘DDRTree’是用R版本4.0.3 来建造的 
5: 程辑包‘irlba’是用R版本4.0.3 来建造的 
> library(tidyverse)
> library(patchwork)
> rm(list=ls())
> dir.create("subcluster")
> scRNA <- readRDS("scRNA.rds")
> ##提取细胞子集
> Cells.sub <- subset([email protected], celltype=="T_cells")
Error in eval(e, x, parent.frame()) : object 'celltype' not found
> scRNAsub <- subset(scRNA, cells=row.names(Cells.sub))
Error in row.names(Cells.sub) : object 'Cells.sub' not found
> library(SingleR)
> refdata <- HumanPrimaryCellAtlasData()
Cannot connect to ExperimentHub server, using 'localHub=TRUE' instead
snapshotDate(): 2021-03-04
see ?celldex and browseVignettes('celldex') for documentation
loading from cache
see ?celldex and browseVignettes('celldex') for documentation
loading from cache
> testdata <- GetAssayData(scRNA, slot="data")
> clusters <- [email protected]$seurat_clusters
> cellpred <- SingleR(test = testdata, ref = refdata, labels = refdata$label.main, 
+                     method = "cluster", clusters = clusters, 
+                     assay.type.test = "logcounts", assay.type.ref = "logcounts")
Warning message:
'method="cluster"' is no longer necessary when 'cluster=' is specified 
> celltype = data.frame(ClusterID=rownames(cellpred), celltype=cellpred$labels, stringsAsFactors = F)
> dim(celltype)
[1] 10  2
> celltype
   ClusterID celltype
1          0 Monocyte
2          1  T_cells
3          2   B_cell
4          3  T_cells
5          4  T_cells
6          5   B_cell
7          6  NK_cell
8          7  T_cells
9          8  T_cells
10         9 Monocyte
> ##提取细胞子集
> Cells.sub <- subset([email protected], celltype=="T_cells")
> scRNAsub <- subset(scRNA, cells=row.names(Cells.sub))
> ##PCA降维
> scRNAsub <- FindVariableFeatures(scRNAsub, selection.method = "vst", nfeatures = 2000)
Calculating gene variances
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Calculating feature variances of standardized and clipped values
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
> scale.genes <-  rownames(scRNAsub)
> scRNAsub <- ScaleData(scRNAsub, features = scale.genes)
Centering and scaling data matrix
  |==================================================================================| 100%
> scRNAsub <- RunPCA(scRNAsub, features = VariableFeatures(scRNAsub))
PC_ 1 
Positive:  RPS12, LTB, CD3E, TRAC, TRBC2, CD3D, IL32, TCF7, CD3G, IL7R 
       LDHB, ARL4C, CD69, CD247, CD7, NOSIP, CD27, RHOH, SPOCK2, TRBC1 
       BCL11B, GZMM, SYNE2, CD6, RORA, CTSW, TRABD2A, CCR7, ZAP70, AQP3 
Negative:  LYZ, FCN1, S100A9, MNDA, FGL2, CST3, VCAN, S100A8, NCF2, SERPINA1 
       GRN, KLF4, TYMP, CTSS, MS4A6A, CSTA, PSAP, CD36, MPEG1, RNF130 
       CPVL, TGFBI, CSF3R, SLC7A7, CLEC7A, CD68, AIF1, LST1, LGALS1, S100A12 
PC_ 2 
Positive:  CD79A, MS4A1, BANK1, IGHM, HLA-DQA1, LINC00926, IGHD, CD79B, TCL1A, CD22 
       HLA-DQA2, HLA-DQB1, BCL11A, CD74, HLA-DRB1, SPIB, FCRL1, HLA-DPA1, HLA-DRA, HLA-DPB1 
       TNFRSF13C, MEF2C, FAM129C, ARHGAP24, FCRLA, FCER2, PLPP5, HLA-DOB, HVCN1, IGKC 
Negative:  CD247, S100A4, CTSW, CD7, CD3E, GZMM, IL32, ARL4C, ANXA1, GZMA 
       CD3D, NKG7, PRF1, CST7, DOK2, CCL5, ZAP70, KLRB1, CD3G, S100A10 
       MT2A, GNLY, TRAC, GAPDH, IL7R, RORA, KLRD1, APMAP, TRBC1, MATK 
PC_ 3 
Positive:  RPS12, IL7R, LDHB, TRABD2A, CD3D, RCAN3, CD3G, TCF7, TRAC, LTB 
       MAL, LEF1, VIM, CCR7, PRKCA, CD27, NOSIP, FOSB, PASK, CD5 
       INPP4B, NELL2, SUSD3, IL6ST, CD40LG, CHRM3-AS2, AQP3, LINC01550, CD3E, BCL11B 
Negative:  GNLY, GZMB, SPON2, KLRD1, NKG7, KLRF1, FGFBP2, PRF1, CST7, ADGRG1 
       TTC38, GZMA, CLIC3, FCGR3A, CCL4, HOPX, APOBEC3G, MATK, IL2RB, PTGDR 
       TRDC, TBX21, CTSW, GZMH, S1PR5, IGFBP7, APMAP, EFHD2, SH2D1B, XCL2 
PC_ 4 
Positive:  S100A12, RAB27A, NCF1, AC020656.1, CYP1B1, RBP7, QPCT, PADI4, MCEMP1, S100A8 
       IRS2, RETN, TREM1, CLEC4D, VNN3, F5, VCAN, ALOX5AP, PLBD1, ITGAM 
       BPI, AZI2, BST1, UBE2D1, FNDC3B, VNN2, PGD, FES, AQP9, GAB2 
Negative:  NOTCH4, CDKN1C, PPP1R17, HES4, TCF7L2, CKB, ABCC3, ZNF703, CDH23, SMIM25 
       KCNAB3, TNFRSF8, SECTM1, SIGLEC10, DUSP5, MGLL, MS4A4A, PTP4A3, CSF1R, LINC01503 
       FCGR3A, MAPKAPK3, MTSS1, RNASET2, RHOC, RRAS, SLC35F3, CXCL16, XYLB, ZGRF1 
PC_ 5 
Positive:  RHEX, SLC35F3, CDH1, MYB, LINC00996, KCNK17, IL3RA, NRP1, IGFBP3, LRRC36 
       CLEC4C, AC002553.1, SMPD3, PACSIN1, DRD4, AR, CIB2, CCDC102B, SERPINF1, PLXNA4 
       LHFPL2, LGMN, DNASE1L3, DERL3, JCHAIN, PPM1J, PLD4, SLC7A5, MZB1, LDLRAD4 
Negative:  PPP1R17, CDKN1C, ABCC3, CKB, SIGLEC10, HES4, TCF7L2, FCGR3A, SMIM25, SECTM1 
       TNFRSF8, HSPA6, ZNF703, MGLL, PTP4A3, MTSS1, MS4A7, RRAS, IFITM3, MAFB 
       HMOX1, KCNAB3, GPR137B, LINC01503, CSF1R, ABI3, E2F2, CD300E, TSC22D3, MARCKS 
> ElbowPlot(scRNAsub, ndims=20, reduction="pca")
> pc.num=1:10
> ##细胞聚类
> scRNAsub <- FindNeighbors(scRNAsub, dims = pc.num)
Computing nearest neighbor graph
Computing SNN
> scRNAsub <- FindClusters(scRNAsub, resolution = 0.9)
Modularity Optimizer version 1.3.0 by Ludo Waltman and Nees Jan van Eck

Number of nodes: 251
Number of edges: 6245

Running Louvain algorithm...
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
Maximum modularity in 10 random starts: 0.7669
Number of communities: 6
Elapsed time: 0 seconds
> table([email protected]$seurat_clusters)

 0  1  2  3  4  5 
76 50 41 39 28 17 
> metadata <- [email protected]
> cell_cluster <- data.frame(cell_ID=rownames(metadata), cluster_ID=metadata$seurat_clusters)
> write.csv(cell_cluster,'subcluster/cell_cluster.csv',row.names = F)
> ##非线性降维
> #tSNE
> scRNAsub = RunTSNE(scRNAsub, dims = pc.num)
> embed_tsne <- Embeddings(scRNAsub, 'tsne')
> write.csv(embed_tsne,'subcluster/embed_tsne.csv')
> plot1 = DimPlot(scRNAsub, reduction = "tsne")
> ggsave("subcluster/tSNE.pdf", plot = plot1, width = 8, height = 7)
> ggsave("subcluster/tSNE.png", plot = plot1, width = 8, height = 7)
> plot1
> ggsave("subcluster/tSNE.pdf", plot = plot1, width = 8, height = 7)
> ggsave("subcluster/tSNE.png", plot = plot1, width = 8, height = 7)
> #UMAP
> scRNAsub <- RunUMAP(scRNAsub, dims = pc.num)
Warning: The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
This message will be shown once per session
16:09:00 UMAP embedding parameters a = 0.9922 b = 1.112
16:09:00 Read 251 rows and found 10 numeric columns
16:09:00 Using Annoy for neighbor search, n_neighbors = 30
16:09:00 Building Annoy index with metric = cosine, n_trees = 50
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
16:09:01 Writing NN index file to temp file C:\Users\Nano\AppData\Local\Temp\Rtmp8sLN3y\file52412ae23ef
16:09:01 Searching Annoy index using 1 thread, search_k = 3000
16:09:01 Annoy recall = 100%
16:09:01 Commencing smooth kNN distance calibration using 1 thread
16:09:03 Initializing from normalized Laplacian + noise
16:09:03 Commencing optimization for 500 epochs, with 8594 positive edges
0%   10   20   30   40   50   60   70   80   90   100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
16:09:05 Optimization finished
> embed_umap <- Embeddings(scRNAsub, 'umap')
> write.csv(embed_umap,'subcluster/embed_umap.csv')
> plot2 = DimPlot(scRNAsub, reduction = "umap")
> plot2
> ggsave("subcluster/UMAP.pdf", plot = plot2, width = 8, height = 7)
> ggsave("subcluster/UMAP.png", plot = plot2, width = 8, height = 7)
> #合并tSNE与UMAP
> plotc <- plot1+plot2+ plot_layout(guides = 'collect')
> plotc
> plotc
> ggsave("subcluster/tSNE_UMAP.pdf", plot = plotc, width = 10, height = 5)
> ggsave("subcluster/tSNE_UMAP.png", plot = plotc, width = 10, height = 5)
> diff.wilcox = FindAllMarkers(scRNAsub)
Calculating cluster 0
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=05s  
Calculating cluster 1
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=02s  
Calculating cluster 2
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=04s  
Calculating cluster 3
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=03s  
Calculating cluster 4
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=03s  
Calculating cluster 5
  |++++++++++++++++++++++++++++++++++++++++++++++++++| 100% elapsed=06s  
> all.markers = diff.wilcox %>% select(gene, everything()) %>% subset(p_val<0.05)
> top10 = all.markers %>% group_by(cluster) %>% top_n(n = 10, wt = avg_logFC)
> write.csv(all.markers, "subcluster/diff_genes_wilcox.csv", row.names = F)
> write.csv(top10, "subcluster/top10_diff_genes_wilcox.csv", row.names = F)
> ##细胞类型鉴定
> library(SingleR)
> refdata <- MonacoImmuneData()
snapshotDate(): 2020-10-27
see ?celldex and browseVignettes('celldex') for documentation
Could not check id: EH3496 for updates.
  Using previously cached version.
loading from cache
see ?celldex and browseVignettes('celldex') for documentation
loading from cache
Warning message:
Could not check database for updates.
  Database source currently unreachable.
  This should only be a temporary interruption. 
  Using previously cached version. 
> testdata <- GetAssayData(scRNAsub, slot="data")
> clusters <- [email protected]$seurat_clusters
> cellpred <- SingleR(test = testdata, ref = refdata, labels = refdata$label.fine, 
+                     method = "cluster", clusters = clusters, 
+                     assay.type.test = "logcounts", assay.type.ref = "logcounts")
Warning message:
'method="cluster"' is no longer necessary when 'cluster=' is specified 
> dim(celltype)
[1] 10  2
> celltype = data.frame(ClusterID=rownames(cellpred), celltype=cellpred$labels, stringsAsFactors = F)
> dim(celltype)
[1] 6 2
> celltype
  ClusterID             celltype
1         0  Classical monocytes
2         1           Th17 cells
3         2    Naive CD4 T cells
4         3        Naive B cells
5         4           MAIT cells
6         5 Natural killer cells
> write.csv(celltype,"subcluster/celltype_singleR.csv",row.names = F)
> [email protected]$celltype = "NA"
> for(i in 1:nrow(celltype)){
+   [email protected][which([email protected]$seurat_clusters == celltype$ClusterID[i]),'celltype'] <- celltype$celltype[i]}
> p1 = DimPlot(scRNAsub, group.by="celltype", label=T, label.size=5, reduction='tsne')
> p2 = DimPlot(scRNAsub, group.by="celltype", label=T, label.size=5, reduction='umap')
> p3 = plotc <- p1+p2+ plot_layout(guides = 'collect')
> ggsave("subcluster/tSNE_celltype.pdf", p1, width=7 ,height=6)
> ggsave("subcluster/UMAP_celltype.pdf", p2, width=7 ,height=6)
> ggsave("subcluster/celltype.pdf", p3, width=10 ,height=5)
> ggsave("subcluster/celltype.png", p3, width=10 ,height=5)

单细胞转录组基础分析五：细胞再聚类

你可能感兴趣的:(单细胞转录组基础分析五：细胞再聚类)