免疫组库数据分析||immunarch教程:载入10X数据

immunarch — Fast and Seamless Exploration of Single-cell and Bulk T-cell/Antibody Immune Repertoires in R

数据分析的第一步应该是了解你的数据。对于R语言用户来讲,在了解完数据之后,就是如何把数据导入到R环境中。我们已经提到,immunarch几乎支持所有免疫组库的数据格式,今天我们以10XGenomics VDJ数据为例讲讲,如何载入数据。

10x Genomics有多种pipeline用于单细胞和生物系统的空间视图,包括单细胞免疫图谱。10x Genomics Chromium单细胞免疫分析解决方案可以同时分析以下内容:

  • T细胞和B细胞的V(D)J转录本和克隆型。
  • 5 '基因表达。
  • 细胞表面蛋白/抗原特异性(特征条形码)在单细胞分辨率相同的一组细胞。

他们的端到端pipeline包括我们熟悉的CellRanger软件,其中包括以下管道的免疫分析:

  • cellranger mkfastq
  • cellranger vdj
  • cellranger count

在处理数据时,cellranger 会有很多输出文件。您应该使用filtered contigs csv文件,因为它们包含条形码信息。

.
├── vdj_v1_mm_c57bl6_pbmc_t_filtered_contig_annotations.csv <-- This contains the count data we want!
├── vdj_v1_mm_c57bl6_pbmc_t_consensus_annotations.csv 
├── vdj_v1_mm_c57bl6_pbmc_t_clonotypes.csv
├── vdj_v1_mm_c57bl6_pbmc_t_all_contig_annotations.csv 
├── vdj_v1_mm_c57bl6_pbmc_t_matrix.h5
├── vdj_v1_mm_c57bl6_pbmc_t_bam.bam.bai
├── vdj_v1_mm_c57bl6_pbmc_t_molecule_info.h5
├── vdj_v1_mm_c57bl6_pbmc_t_raw_feature_bc_matrix.tar.gz
├── vdj_v1_mm_c57bl6_pbmc_t_analysis.tar.gz

在您的R环境中运行下面的代码,以将数据加载为Immunarch的格式。您可以在包含Cellranger输出文件的整个文件夹上运行它。repLoad将忽略不支持的文件格式。

 library(immunarch)
immdata_10x <- repLoad(file_path)

我们关心的是file_path下面应该是什么。

  • 多个样本的filtered contigs csv ,注意改成样本名(同一个路径下不能有同样的文件)
  • metadata.txt(样本分组信息)

metadata 是这样的:

Sample  Sex     Age     Status
immunoseq_1     M   1   C
immunoseq_2     M   2   C
immunoseq_3     F   3   A

文件夹大概率是这样的:

# For instance you have a following structure in your folder:
# >_ ls
# immunoseq1.txt
# immunoseq2.txt
# immunoseq3.txt
# metadata.txt

再不清楚的话,就看示例文件,自己构造。载入过程十分轻松:

> immdata_10x <- repLoad(file_path)

== Step 1/3: loading repertoire files... ==

Processing "/filepath/C57BL_mice_igenrichment" ...
  -- Parsing "/filepath/vdj_v1_mm_c57bl6_pbmc_t_all_contig_annotations.csv" -- 10x (filt.contigs)
  [!] Removed 2917 clonotypes with no nucleotide and amino acid CDR3 sequence.
  -- Parsing "/filepath/vdj_v1_mm_c57bl6_pbmc_t_clonotypes.csv" -- unsupported format, skipping
  -- Parsing "/filepath/vdj_v1_mm_c57bl6_pbmc_t_consensus_annotations.csv" -- 10x (consensus)
  -- Parsing "/filepath/vdj_v1_mm_c57bl6_pbmc_t_filtered_contig_annotations.csv" -- 10x (filt.contigs)
  [!] Removed 1198 clonotypes with no nucleotide and amino acid CDR3 sequence.

== Step 2/3: checking metadata files and merging... ==

Processing "" ...
  -- Metadata file not found; creating a dummy metadata...

== Step 3/3: splitting data by barcodes and chains... ==

Done!

这时数据就可用了:

> immdata_10x
$data$vdj_v1_mm_c57bl6_splenocytes_t_consensus_annotations_TRA
# A tibble: 710 x 17
   Clones Proportion CDR3.nt      CDR3.aa  V.name  D.name J.name V.end D.start D.end J.start VJ.ins VD.ins DJ.ins chain ClonotypeID  ConsensusID 
                                                            
 1     55    0.00414 TGTGCTATGGC… CAMATGG… TRAV13… None   TRAJ56    NA      NA    NA      NA     NA     NA     NA TRA   clonotype306 clonotype30…
 2     55    0.00414 TGTGCAGCTAG… CAASGNT… TRAV7-4 None   TRAJ27    NA      NA    NA      NA     NA     NA     NA TRA   clonotype338 clonotype33…
 3     53    0.00399 TGTGCAGCAAG… CAARDSG… TRAV14… None   TRAJ11    NA      NA    NA      NA     NA     NA     NA TRA   clonotype617 clonotype61…
 4     45    0.00339 TGCGCAGTCAG… CAVSNNT… TRAV3-3 None   TRAJ27    NA      NA    NA      NA     NA     NA     NA TRA   clonotype435 clonotype43…
 5     43    0.00324 TGTGCAGTCAG… CAVSNMG… TRAV7D… None   TRAJ9     NA      NA    NA      NA     NA     NA     NA TRA   clonotype401 clonotype40…
 6     42    0.00316 TGTGCAGCAAG… CAASPNY… TRAV14… None   TRAJ21    NA      NA    NA      NA     NA     NA     NA TRA   clonotype5   clonotype5_…
 7     37    0.00279 TGTGCAGTGAG… CAVSSGG… TRAV7D… None   TRAJ6     NA      NA    NA      NA     NA     NA     NA TRA   clonotype453 clonotype45…
 8     35    0.00264 TGTGCAGCAAG… CAASATS… TRAV14… None   TRAJ22    NA      NA    NA      NA     NA     NA     NA TRA   clonotype809 clonotype80…
 9     32    0.00241 TGTGCAGCAAG… CAASPNY… TRAV14… None   TRAJ21    NA      NA    NA      NA     NA     NA     NA TRA   clonotype150 clonotype15…
10     32    0.00241 TGTGCTCTGGG… CALGDEA… TRAV6-… None   TRAJ30    NA      NA    NA      NA     NA     NA     NA TRA   clonotype393 clonotype39…
# … with 700 more rows

$meta
                                                       Sample Chain                                                Source
1 vdj_v1_mm_c57bl6_splenocytes_t_all_contig_annotations_Multi Multi vdj_v1_mm_c57bl6_splenocytes_t_all_contig_annotations
2   vdj_v1_mm_c57bl6_splenocytes_t_all_contig_annotations_TRA   TRA vdj_v1_mm_c57bl6_splenocytes_t_all_contig_annotations
3   vdj_v1_mm_c57bl6_splenocytes_t_all_contig_annotations_TRB   TRB vdj_v1_mm_c57bl6_splenocytes_t_all_contig_annotations
5    vdj_v1_mm_c57bl6_splenocytes_t_consensus_annotations_TRA   TRA  vdj_v1_mm_c57bl6_splenocytes_t_consensus_annotations
6    vdj_v1_mm_c57bl6_splenocytes_t_consensus_annotations_TRB   TRB  vdj_v1_mm_c57bl6_splenocytes_t_consensus_annotations

恭喜! 现在您的数据已经为探索做好了准备。请按照这里的步骤了解有关如何研究数据集的更多信息。一个重要的注意事项是,有些contigs文件缺少条形码列—cell的惟一标识。
这些文件可以用于分析单链数据(只有alpha或beta TCRs),但为了分析配对链数据并充分利用单细胞技术的全部力量,您应该将带有条形码的文件读入到Immunarch。

参考:
https://immunarch.com/articles/web_only/load_10x.html

你可能感兴趣的:(免疫组库数据分析||immunarch教程:载入10X数据)