单细胞免疫组库数据分析||Seurat整合单细胞转录组与VDJ数据

在做10X单细胞免疫组库分析的是往往是做一部分BCR、TCR做一部分5‘转录组,那么怎样才能把两者结合到一起呢?

今天我们尝试用我们的趁手工具做一下整合分析。

首先是下载数据,我们从10X官方的dataset中下载数据:https://support.10xgenomics.com/single-cell-vdj/datasets/3.1.0/vdj_v1_hs_pbmc3

单细胞免疫组库数据分析||Seurat整合单细胞转录组与VDJ数据_第1张图片

在下载页面有关于这个样本的基本介绍,如这个数据集根据单细胞V(D)J试剂试剂盒使用指南和细胞表面蛋白特征条形码技术(CG000186),从标记的细胞中扩增出cDNA,生成5’基因表达、细胞表面蛋白、TCR富集和Ig富集文库。这里是不是应该把本文的题目改成:Seurat整合单细胞转录组与VDJ数据与膜蛋白数据啊。

读入数据:

rm(list =ls())
library(Seurat)
data <-   Read10X(data.dir = "F:\\VDJ\\filtered_feature_bc_matrix")
10X data contains more than one type and is being returned as a list containing matrices of each type.

上来就报错:

seurat_object = CreateSeuratObject(counts = data)
Error in CreateAssayObject(counts = counts, min.cells = min.cells, min.features = min.features) : 
  No cell names (colnames) names present in the input matrix
> 

原来我们的feature中不仅有基因表达还有膜蛋白的信息,分开来读好了:

seurat_object = CreateSeuratObject(counts = data$`Gene Expression`)
seurat_object[['Protein']] = CreateAssayObject(counts = data$`Antibody Capture`)

我们熟悉的seurat对象又回来了:

seurat_object

An object of class Seurat 
33555 features across 7231 samples within 2 assays 
Active assay: RNA (33538 features)
 1 other assay present: Protein

注意这里是两个数据, RNA ,Protein:

当我们需要分析哪一部分的时候,用DefaultAssay(seurat_object)来指定,这里我们只分析RNA的数据啊,茂密的森林伸出两条路,我们也看一眼膜蛋白的数据吧:

 seurat_object@assays$Protein
Assay data with 17 features for 7231 cells
First 10 features:
 CD3-UCHT1-TotalC, CD19-HIB19-TotalC, CD45RA-HI100-TotalC, CD4-RPA-T4-TotalC, CD8a-RPA-T8-TotalC, CD14-M5E2-TotalC, CD16-3G8-TotalC, CD56-QA17A16-TotalC, CD25-BC96-TotalC,
CD45RO-UCHL1-TotalC 

接着我们读入BCR的数据:

bcr <- read.csv("F:\\VDJ\\filtered_contig_annotations.csv")
bcr[1:4,]
 barcode is_cell                   contig_id high_confidence length chain    v_gene d_gene j_gene c_gene full_length productive             cdr3
1 AAACCTGTCACTGGGC-1    True AAACCTGTCACTGGGC-1_contig_1            True    556   IGK IGKV1D-33   None  IGKJ4   IGKC        True       True      CQQYDNLPLTF
2 AAACCTGTCACTGGGC-1    True AAACCTGTCACTGGGC-1_contig_2            True    516   IGH  IGHV4-59   None  IGHJ5   IGHM        True       True     CARGGNSGLDPW
3 AAACCTGTCACTGGGC-1    True AAACCTGTCACTGGGC-1_contig_3            True    569   IGK IGKV2D-30   None  IGKJ1   IGKC        True      False CWGLLLHARYTLAWTF
4 AAACCTGTCAGGTAAA-1    True AAACCTGTCAGGTAAA-1_contig_1            True    548   IGK  IGKV1-12   None  IGKJ4   IGKC        True       True      CQQANSFPLTF
                                           cdr3_nt reads umis raw_clonotype_id       raw_consensus_id
1                TGTCAACAGTATGATAATCTCCCGCTCACTTTC  7410   61       clonotype7 clonotype7_consensus_1
2             TGTGCGAGAGGCGGGAACAGTGGCTTAGACCCCTGG  1458   18       clonotype7 clonotype7_consensus_2
3 TGTTGGGGTTTATTACTGCATGCAAGGTACACACTGGCCTGGACGTTC  2045   17       clonotype7                   None
4                TGTCAACAGGCTAACAGTTTCCCGCTCACTTTC   140    1       clonotype2 clonotype2_consensus_2

为了保持barcode一致,处理一下barcode的命名:

bcr$barcode <- gsub("-1", "", bcr$barcode)
bcr <- bcr[!duplicated(bcr$barcode), ]
names(bcr)[names(bcr) == "raw_clonotype_id"] <- "clonotype_id"
table(bcr$v_gene)

    IGHV1-18      IGHV1-2     IGHV1-24      IGHV1-3     IGHV1-45     IGHV1-46     IGHV1-58     IGHV1-69    IGHV1-69D      IGHV1-8 IGHV1/OR15-1 IGHV1/OR21-1     IGHV2-26      IGHV2-5     IGHV2-70 
          30           48            5           14            1           18            2            1           26            8            1            1            9           28            8 
    IGHV3-11     IGHV3-13     IGHV3-15     IGHV3-16     IGHV3-21     IGHV3-23     IGHV3-30     IGHV3-33     IGHV3-43     IGHV3-48     IGHV3-49     IGHV3-53     IGHV3-64    IGHV3-64D     IGHV3-66 
          20            7           29            1           43           95           40           50           10           25           10           13            5            4            2 
     IGHV3-7     IGHV3-72     IGHV3-73     IGHV3-74   IGHV4-30-2     IGHV4-31     IGHV4-34     IGHV4-39      IGHV4-4     IGHV4-59     IGHV4-61   IGHV5-10-1     IGHV5-51      IGHV6-1    IGHV7-4-1 
          15            6            7           25            9            4           30           55           22           76            9            5           21            4            8 
    IGKV1-12     IGKV1-16     IGKV1-17     IGKV1-27     IGKV1-33     IGKV1-37     IGKV1-39      IGKV1-5      IGKV1-6      IGKV1-8      IGKV1-9    IGKV1D-13    IGKV1D-33    IGKV1D-37    IGKV1D-39 
          29           10            8            9            9            2            6           42            3           16           11            1           15            6           65 
   IGKV1D-43     IGKV1D-8     IGKV2-24     IGKV2-28     IGKV2-30    IGKV2D-26    IGKV2D-28    IGKV2D-29    IGKV2D-30    IGKV2D-40     IGKV3-11     IGKV3-15     IGKV3-20      IGKV3-7    IGKV3D-11 
           5            4            6           26           14            5            1            9            1            2           39           58           89            2            1 
   IGKV3D-15    IGKV3D-20      IGKV4-1      IGKV5-2     IGKV6-21    IGKV6D-21     IGLV1-40     IGLV1-44     IGLV1-47     IGLV1-51    IGLV10-54     IGLV2-11     IGLV2-14     IGLV2-18     IGLV2-23 
           9            2           60            8            2            1           31           37           26           24            3           24           60            1           20 
     IGLV2-8      IGLV3-1     IGLV3-10     IGLV3-12     IGLV3-16     IGLV3-19     IGLV3-21     IGLV3-25     IGLV3-27      IGLV3-9      IGLV4-3     IGLV4-60     IGLV4-69     IGLV5-37     IGLV5-39 
          20           35           15            1            2           15           27           16            4            4            2            5           15            7            1 
    IGLV5-45     IGLV5-48     IGLV5-52     IGLV6-57     IGLV7-43     IGLV7-46     IGLV8-61     IGLV9-49         None 
           9            1            2           14            5           10           15            9          173 
>

读入克隆型信息:

clono <- read.csv("F:\\VDJ\\vdj_v1_hs_pbmc3_b_clonotypes.csv")
head(clono)

 clonotype_id frequency  proportion                            cdr3s_aa                                                                                cdr3s_nt
1   clonotype1         7 0.009067358                    IGK:CQQYDNWPPYTF                                                IGK:TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT
2   clonotype2         3 0.003886010    IGK:CQQANSFPLTF;IGK:CQQYDNWPPYTF          IGK:TGTCAACAGGCTAACAGTTTCCCGCTCACTTTC;IGK:TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT
3   clonotype5         2 0.002590674                     IGL:CQARDSSTVVF                                                   IGL:TGTCAGGCGCGGGACAGCAGCACTGTGGTATTC
4   clonotype6         2 0.002590674 IGH:CARPGTTGTTGLKNW;IGK:CQQYNNWPLTF IGH:TGTGCGAGACCCGGTACAACTGGAACGACGGGTTTAAAAAACTGG;IGK:TGTCAGCAGTATAATAACTGGCCTCTCACCTTC
5   clonotype4         2 0.002590674 IGH:CAKSFFTGTGQFHYW;IGK:CQQYSTYSQTF IGH:TGTGCGAAATCATTTTTTACCGGGACAGGACAGTTTCACTATTGG;IGK:TGCCAACAGTATAGTACTTATTCTCAAACGTTC
6   clonotype3         2 0.002590674                    IGL:CQSYDSSLSGVF                                                IGL:TGCCAGTCCTATGACAGCAGCCTGAGTGGGGTGTTC

整合到seurat之中:

# Slap the AA sequences onto our original table by clonotype_id.
bcr <- merge(bcr, clono)

# Reorder so barcodes are first column and set them as rownames.
rownames(bcr) <- bcr[,2]
#tcr[,1] <- NULL

# Add to the Seurat object's metadata.
clono_seurat <- AddMetaData(object=seurat_object, metadata=bcr)

head([email protected])
                    orig.ident nCount_RNA nFeature_RNA nCount_Protein nFeature_Protein clonotype_id barcode is_cell contig_id high_confidence length chain v_gene d_gene j_gene c_gene full_length
AAACCTGAGATCTGAA SeuratProject       4386         1206           5092               17                                        NA                      
AAACCTGAGGAACTGC SeuratProject       6258         1723           6480               17                                        NA                      
AAACCTGAGGAGTCTG SeuratProject       4243         1404           3830               17                                        NA                      
AAACCTGAGGCTCTTA SeuratProject       5205         1622           3589               17                                        NA                      
AAACCTGAGTACGTTC SeuratProject       7098         2123           5250               17                                        NA                      
AAACCTGGTATCACCA SeuratProject       1095          604           3573               17                                        NA                      
                 productive cdr3 cdr3_nt reads umis raw_consensus_id frequency proportion cdr3s_aa cdr3s_nt
AAACCTGAGATCTGAA                NA   NA                     NA         NA          
AAACCTGAGGAACTGC                NA   NA                     NA         NA          
AAACCTGAGGAGTCTG                NA   NA                     NA         NA          
AAACCTGAGGCTCTTA                NA   NA                     NA         NA          
AAACCTGAGTACGTTC                NA   NA                     NA         NA          
AAACCTGGTATCACCA                NA   NA                     NA         NA          

为什么这么多NA?因为转录组中的细胞很多,而免疫细胞(BCR)只有一部分。

 table(rownames([email protected])  %in% rownames(bcr))

FALSE  TRUE 
 6469   762 
> clono_seurat<- subset(clono_seurat,cells =  rownames(bcr))
> head([email protected])
                    orig.ident nCount_RNA nFeature_RNA nCount_Protein nFeature_Protein clonotype_id          barcode is_cell                   contig_id high_confidence length chain   v_gene d_gene
CTTTGCGCAAGGACTG SeuratProject       1266          642            696               17   clonotype1 CTTTGCGCAAGGACTG    True CTTTGCGCAAGGACTG-1_contig_1            True    479   IGK IGKV3-15   None
CAGCTAACACCGATAT SeuratProject        935          484            545               17   clonotype1 CAGCTAACACCGATAT    True CAGCTAACACCGATAT-1_contig_1            True    549   IGK IGKV3-15   None
TATTACCGTCGCTTCT SeuratProject       1352          594            422               17   clonotype1 TATTACCGTCGCTTCT    True TATTACCGTCGCTTCT-1_contig_1            True    565   IGK IGKV3-15   None
CTGAAGTGTGAAGGCT SeuratProject       1259          626            450               17   clonotype1 CTGAAGTGTGAAGGCT    True CTGAAGTGTGAAGGCT-1_contig_1            True    554   IGK IGKV3-15   None
CCATGTCAGACCGGAT SeuratProject       1088          555            741               17   clonotype1 CCATGTCAGACCGGAT    True CCATGTCAGACCGGAT-1_contig_1            True    556   IGK IGKV3-15   None
CGTTCTGAGCTCAACT SeuratProject        808          461           4896               17   clonotype1 CGTTCTGAGCTCAACT    True CGTTCTGAGCTCAACT-1_contig_1            True    565   IGK IGKV3-15   None
                 j_gene c_gene full_length productive         cdr3                              cdr3_nt reads umis       raw_consensus_id frequency  proportion         cdr3s_aa
CTTTGCGCAAGGACTG  IGKJ2   IGKC        True       True CQQYDNWPPYTF TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT   197    2 clonotype1_consensus_1         7 0.009067358 IGK:CQQYDNWPPYTF
CAGCTAACACCGATAT  IGKJ2   IGKC        True       True CQQYDNWPPYTF TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT   274    2 clonotype1_consensus_1         7 0.009067358 IGK:CQQYDNWPPYTF
TATTACCGTCGCTTCT  IGKJ2   IGKC        True       True CQQYDNWPPYTF TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT   299    2 clonotype1_consensus_1         7 0.009067358 IGK:CQQYDNWPPYTF
CTGAAGTGTGAAGGCT  IGKJ2   IGKC        True       True CQQYDNWPPYTF TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT   250    3 clonotype1_consensus_1         7 0.009067358 IGK:CQQYDNWPPYTF
CCATGTCAGACCGGAT  IGKJ2   IGKC        True       True CQQYDNWPPYTF TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT   324    2 clonotype1_consensus_1         7 0.009067358 IGK:CQQYDNWPPYTF
CGTTCTGAGCTCAACT  IGKJ2   IGKC        True       True CQQYDNWPPYTF TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT   187    3 clonotype1_consensus_1         7 0.009067358 IGK:CQQYDNWPPYTF
                                                 cdr3s_nt
CTTTGCGCAAGGACTG IGK:TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT
CAGCTAACACCGATAT IGK:TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT
TATTACCGTCGCTTCT IGK:TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT
CTGAAGTGTGAAGGCT IGK:TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT
CCATGTCAGACCGGAT IGK:TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT
CGTTCTGAGCTCAACT IGK:TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT

seurat标准过程:

library(tidyverse)
clono_seurat %>% NormalizeData() %>% FindVariableFeatures() %>% 
 ScaleData(assay="RNA")  %>% 
 RunPCA(npcs = 50,assay="RNA")%>% 
 FindNeighbors(assay="RNA")%>%
 FindClusters(assay="RNA")%>%
 RunUMAP(1:30) -> clono_seurat

head([email protected])

                  orig.ident nCount_RNA nFeature_RNA nCount_Protein nFeature_Protein clonotype_id          barcode is_cell
CTTTGCGCAAGGACTG SeuratProject       1266          642            696               17   clonotype1 CTTTGCGCAAGGACTG    True
CAGCTAACACCGATAT SeuratProject        935          484            545               17   clonotype1 CAGCTAACACCGATAT    True
TATTACCGTCGCTTCT SeuratProject       1352          594            422               17   clonotype1 TATTACCGTCGCTTCT    True
CTGAAGTGTGAAGGCT SeuratProject       1259          626            450               17   clonotype1 CTGAAGTGTGAAGGCT    True
CCATGTCAGACCGGAT SeuratProject       1088          555            741               17   clonotype1 CCATGTCAGACCGGAT    True
CGTTCTGAGCTCAACT SeuratProject        808          461           4896               17   clonotype1 CGTTCTGAGCTCAACT    True
                                  contig_id high_confidence length chain   v_gene d_gene j_gene c_gene full_length productive
CTTTGCGCAAGGACTG CTTTGCGCAAGGACTG-1_contig_1            True    479   IGK IGKV3-15   None  IGKJ2   IGKC        True       True
CAGCTAACACCGATAT CAGCTAACACCGATAT-1_contig_1            True    549   IGK IGKV3-15   None  IGKJ2   IGKC        True       True
TATTACCGTCGCTTCT TATTACCGTCGCTTCT-1_contig_1            True    565   IGK IGKV3-15   None  IGKJ2   IGKC        True       True
CTGAAGTGTGAAGGCT CTGAAGTGTGAAGGCT-1_contig_1            True    554   IGK IGKV3-15   None  IGKJ2   IGKC        True       True
CCATGTCAGACCGGAT CCATGTCAGACCGGAT-1_contig_1            True    556   IGK IGKV3-15   None  IGKJ2   IGKC        True       True
CGTTCTGAGCTCAACT CGTTCTGAGCTCAACT-1_contig_1            True    565   IGK IGKV3-15   None  IGKJ2   IGKC        True       True
                        cdr3                              cdr3_nt reads umis       raw_consensus_id frequency  proportion
CTTTGCGCAAGGACTG CQQYDNWPPYTF TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT   197    2 clonotype1_consensus_1         7 0.009067358
CAGCTAACACCGATAT CQQYDNWPPYTF TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT   274    2 clonotype1_consensus_1         7 0.009067358
TATTACCGTCGCTTCT CQQYDNWPPYTF TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT   299    2 clonotype1_consensus_1         7 0.009067358
CTGAAGTGTGAAGGCT CQQYDNWPPYTF TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT   250    3 clonotype1_consensus_1         7 0.009067358
CCATGTCAGACCGGAT CQQYDNWPPYTF TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT   324    2 clonotype1_consensus_1         7 0.009067358
CGTTCTGAGCTCAACT CQQYDNWPPYTF TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT   187    3 clonotype1_consensus_1         7 0.009067358
                        cdr3s_aa                                 cdr3s_nt RNA_snn_res.0.8 seurat_clusters
CTTTGCGCAAGGACTG IGK:CQQYDNWPPYTF IGK:TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT               5               5
CAGCTAACACCGATAT IGK:CQQYDNWPPYTF IGK:TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT               5               5
TATTACCGTCGCTTCT IGK:CQQYDNWPPYTF IGK:TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT               5               5
CTGAAGTGTGAAGGCT IGK:CQQYDNWPPYTF IGK:TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT               5               5
CCATGTCAGACCGGAT IGK:CQQYDNWPPYTF IGK:TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT               5               5
CGTTCTGAGCTCAACT IGK:CQQYDNWPPYTF IGK:TGTCAGCAGTATGATAACTGGCCTCCGTACACTTTT               5               5
p1 <- DimPlot(clono_seurat)
p2 <-DimPlot(clono_seurat,group.by = "chain")
p3 <-FeaturePlot(clono_seurat,features="frequency")

library(patchwork)
p1+p2+p3
单细胞免疫组库数据分析||Seurat整合单细胞转录组与VDJ数据_第2张图片
library(sankeywheel)
sankeywheel(
  from = [email protected]$chain, to = [email protected]$v_gene,
  weight = [email protected]$nCount_Protein,#type = "sankey",
  title = "sk",
  subtitle = "chain &  v_gene",
  seriesName = "", width = "100%", height = "600px"
)
单细胞免疫组库数据分析||Seurat整合单细胞转录组与VDJ数据_第3张图片

计算不同链的差异基因:

DT::datatable(FindMarkers(clono_seurat,ident.1 = 'IGK',group.by = "chain"))
mk<-FindMarkers(clono_seurat,ident.1 = 'IGK',group.by = "chain")
单细胞免疫组库数据分析||Seurat整合单细胞转录组与VDJ数据_第4张图片
DotPlot(clono_seurat,features=rownames(mk),group.by = "chain") + RotatedAxis()
单细胞免疫组库数据分析||Seurat整合单细胞转录组与VDJ数据_第5张图片
RidgePlot(clono_seurat,features=rownames(mk),group.by = "chain")
单细胞免疫组库数据分析||Seurat整合单细胞转录组与VDJ数据_第6张图片


Tutorial: Integrating VDJ sequencing data with Seurat
Tutorial: Associating VDJ clonotyping data with scRNA-seq in Seurat

你可能感兴趣的:(单细胞免疫组库数据分析||Seurat整合单细胞转录组与VDJ数据)