immunarch — Fast and Seamless Exploration of Single-cell and Bulk T-cell/Antibody Immune Repertoires in R
Repertoire overlap and public clonotypes
免疫组重叠(Repertoire overlap)是最常用的度量Repertoire 相似度的方法。它是通过计算在给定的Repertoire 之间共享的克隆类型的特定统计量来实现的,也称为“公共”克隆类型。immunarch 提供了几个指标:-公共克隆型数量(.method = "public")) -一个经典的重叠相似性度量。
- overlap coefficient (.method = "overlap") : 重叠相似度的标准化度量。它被定义为交集的大小除以两个集合中较小的部分。
- Jaccard index (.method = "jaccard") : 它度量有限样本集之间的相似性,定义为交集的大小除以样本集并集的大小。
- Tversky index (.method = "tversky") : 一种对集合的非对称相似度量,用来比较一个变体和一个原型。如果使用默认参数,它类似于Dice的系数。
- cosine similarity (.method = "cosine") : 两个非零向量之间的相似性度量
- Morisita’s overlap index (.method = "morisita") : 个体在总体中分散的统计方法。它用于比较样本之间的重叠。
- incremental overlap : overlaps of the N most abundant clonotypes with incrementally growing N (.method = "inc+METHOD", e.g., "inc+public" or "inc+morisita").
包含所描述方法的函数是repOverlap。同样,当输出被传递到vis()函数时,输出很容易被可视化,它完成了所有的工作:
library(immunarch) # Load the package into R
data(immdata) # Load the test dataset
imm_ov1 <- repOverlap(immdata$data, .method = "public", .verbose = F)
imm_ov2 <- repOverlap(immdata$data, .method = "morisita", .verbose = F)
p1 <- vis(imm_ov1)
p2 <- vis(imm_ov2, .text.size = 2)
p1 + p2
vis(imm_ov1, "heatmap2")
您可以轻松更改有效数字的数量:
p1 <- vis(imm_ov2, .text.size = 2.5, .signif.digits = 1)
p2 <- vis(imm_ov2, .text.size = 2, .signif.digits = 2)
p1 + p2
repOverlapAnalysis可以对计算得到的重叠测度函数进行分析。
# Apply different analysis algorithms to the matrix of public clonotypes:
# "mds" - Multi-dimensional Scaling
repOverlapAnalysis(imm_ov1, "mds")
Standard deviations (1, .., p=4):
[1] 0 0 0 0
Rotation (n x k) = (12 x 2):
[,1] [,2]
A2-i129 -18.7767715 -18.360817
A2-i131 29.9586985 -7.870441
A2-i133 28.1148594 22.629093
A2-i132 -44.3435640 6.221812
A4-i191 13.8586515 7.452149
A4-i192 -14.0065477 27.068830
MS1 -8.8469009 -8.151574
MS2 -0.9712073 -1.297017
MS3 -10.4398629 4.894354
MS4 0.5131505 10.471309
MS5 18.5153823 -12.628029
MS6 6.4241122 -30.429669
# "tsne" - t-Stochastic Neighbor Embedding
repOverlapAnalysis(imm_ov1, "tsne")
DimI DimII
A2-i129 -11.893405 70.95531
A2-i131 112.806943 -229.78268
A2-i133 -34.283164 47.07587
A2-i132 -44.726418 11.90656
A4-i191 -13.979182 10.05010
A4-i192 -13.316741 89.16606
MS1 -30.856320 78.41378
MS2 -32.951243 16.06630
MS3 -18.041903 75.90590
MS4 -24.965529 16.01290
MS5 120.335521 -229.87194
MS6 -8.128559 44.10184
attr(,"class")
[1] "immunr_tsne" "matrix"
# Visualise the results
repOverlapAnalysis(imm_ov1, "mds") %>% vis()
同样可以基于结果聚类。
# Clusterise the MDS resulting components using K-means
repOverlapAnalysis(imm_ov1, "mds+kmeans") %>% vis()
为了从repertoires 列表中构建一个包含所有clonotypes的庞大表,使用pubRep函数。
# Pass "nt" as the second parameter to build the public repertoire table using CDR3 nucleotide sequences
pr.nt <- pubRep(immdata$data, "nt", .verbose = F)
pr.nt
CDR3.nt Samples A2-i129 A2-i131 A2-i133 A2-i132
1: TGCGCCAGCAGCTTGGAAGAGACCCAGTACTTC 8 1 NA 1 1
2: TGTGCCAGCAGCTTCCAAGAGACCCAGTACTTC 7 NA 1 1 2
3: TGTGCCAGCAGTTACCAAGAGACCCAGTACTTC 7 1 1 1 NA
4: TGCGCCAGCAGCTTCCAAGAGACCCAGTACTTC 6 2 NA 1 1
5: TGTGCCAGCAGCCAAGAGACCCAGTACTTC 6 4 2 NA 2
---
75101: TGTGCTTCACAACTCTTATTGGACGAGACCCAGTACTTC 1 NA 1 NA NA
75102: TGTGCTTCACAAGCCCTACAGGGCACTTTCCATAATTCACCCCTCCACTTT 1 NA NA NA NA
75103: TGTGCTTCAGGGCGGGCCTACGAGCAGTACTTC 1 NA NA NA NA
75104: TGTGCTTCCGCCGGACCGGACCGGGAGACCCAGTACTTC 1 NA NA 1 NA
75105: TGTGCTTGCGGGACAGATAACTATGGCTACACCTTC 1 NA NA NA NA
A4-i191 A4-i192 MS1 MS2 MS3 MS4 MS5 MS6
1: NA 1 NA NA 1 1 1 1
2: 1 NA 1 NA NA 2 NA 1
3: 1 1 1 NA 2 NA NA NA
4: NA NA NA 1 NA 1 NA 1
5: 3 1 NA NA NA NA 4 NA
---
75101: NA NA NA NA NA NA NA NA
75102: NA NA NA NA NA NA 1 NA
75103: NA NA 1 NA NA NA NA NA
75104: NA NA NA NA NA NA NA NA
75105: NA 1 NA NA NA NA NA NA
# Pass "aa+v" as the second parameter to build the public repertoire table using CDR3 aminoacid sequences and V alleles
# In order to use only CDR3 aminoacid sequences, just pass "aa"
pr.aav <- pubRep(immdata$data, "aa+v", .verbose = F)
pr.aav
CDR3.aa V.name Samples A2-i129 A2-i131 A2-i133 A2-i132 A4-i191 A4-i192 MS1
1: CASSLEETQYF TRBV5-1 8 1 NA 2 1 NA 2 NA
2: CASSDSSGGANEQFF TRBV6-4 6 1 1 2 NA 3 NA NA
3: CASSFQETQYF TRBV5-1 6 3 NA 1 1 NA NA NA
4: CASSLGETQYF TRBV12-4 6 2 NA NA 4 3 NA 1
5: CASSDSGGSYNEQFF TRBV6-4 5 NA NA NA 3 NA 1 1
---
74440: CTSSRPTQGAYEQYF TRBV7-2 1 NA NA NA NA NA NA NA
74441: CTSSSRAGAGTDTQYF TRBV7-2 1 NA NA NA NA NA NA NA
74442: CTSSYPGLAGLKRKETQYF TRBV7-2 1 NA NA NA 1 NA NA NA
74443: CTSSYRQRPYQETQYF TRBV7-2 1 NA NA NA NA NA NA NA
74444: CTSSYSTSGVGQFF TRBV7-2 1 NA NA NA NA NA NA NA
MS2 MS3 MS4 MS5 MS6
1: NA 1 1 1 1
2: NA 2 NA NA 12
3: 1 NA 1 NA 1
4: NA NA NA 2 1
5: NA 1 NA NA 1
---
74440: NA NA NA NA 1
74441: NA 1 NA NA NA
74442: NA NA NA NA NA
74443: NA 1 NA NA NA
74444: NA NA 1 NA NA
# You can also pass the ".coding" parameter to filter out all noncoding sequences first:
pr.aav.cod <- pubRep(immdata$data, "aa+v", .coding = T)
pr.aav.cod
CDR3.aa V.name Samples A2-i129 A2-i131 A2-i133 A2-i132 A4-i191 A4-i192 MS1
1: CASSLEETQYF TRBV5-1 8 1 NA 2 1 NA 2 NA
2: CASSDSSGGANEQFF TRBV6-4 6 1 1 2 NA 3 NA NA
3: CASSFQETQYF TRBV5-1 6 3 NA 1 1 NA NA NA
4: CASSLGETQYF TRBV12-4 6 2 NA NA 4 3 NA 1
5: CASSDSGGSYNEQFF TRBV6-4 5 NA NA NA 3 NA 1 1
---
74440: CTSSRPTQGAYEQYF TRBV7-2 1 NA NA NA NA NA NA NA
74441: CTSSSRAGAGTDTQYF TRBV7-2 1 NA NA NA NA NA NA NA
74442: CTSSYPGLAGLKRKETQYF TRBV7-2 1 NA NA NA 1 NA NA NA
74443: CTSSYRQRPYQETQYF TRBV7-2 1 NA NA NA NA NA NA NA
74444: CTSSYSTSGVGQFF TRBV7-2 1 NA NA NA NA NA NA NA
MS2 MS3 MS4 MS5 MS6
1: NA 1 1 1 1
2: NA 2 NA NA 12
3: 1 NA 1 NA 1
4: NA NA NA 2 1
5: NA 1 NA NA 1
---
74440: NA NA NA NA 1
74441: NA 1 NA NA NA
74442: NA NA NA NA NA
74443: NA 1 NA NA NA
74444: NA NA 1 NA NA
# Create a public repertoire with coding-only sequences using both CDR3 amino acid sequences and V genes
pr <- pubRep(immdata$data, "aa+v", .coding = T, .verbose = F)
# Apply the filter subroutine to leave clonotypes presented only in healthy individuals
pr1 <- pubRepFilter(pr, immdata$meta, c(Status = "C"))
# Apply the filter subroutine to leave clonotypes presented only in diseased individuals
pr2 <- pubRepFilter(pr, immdata$meta, c(Status = "MS"))
# Divide one by another
pr3 <- pubRepApply(pr1, pr2)
# Plot it
p <- ggplot() +
geom_jitter(aes(x = "Treatment", y = Result), data = pr3)
p
参考:
https://immunarch.com/articles/web_only/v4_overlap.html