mixcr与vdjtools是基于java平台开发的处理从原始序列到定量克隆型的大量免疫组数据的免疫分析软件,在使用前要确保java环境是ok的。
官网下载 Java Runtime Environment,jre是java的运行环境。
java -version #检查java环境是否ok
下载vdjtools并安装,latest release。
vdjtools的可视化依赖于R的一些可视化包,安装所需要的R包。
使用vjtools自带命令安装
java -jar /path to vdjtools/vdjtools-1.2.1.jar Rinstall
也可以在R中手动安装
将分析好的数据转换为vdjtools可识别的格式,上游分析参考使用mixcr构建免疫组库及下游分析
构建分组文件
分组文件应包含所有样本名以及样本所在位置。
# convert
java -jar /path to vdjtools/vdjtools-1.2.1.jar Convert -S mixcr -m metadata.txt output_prefix
#or
java -jar /path to vdjtools/vdjtools-1.2.1.jar Convert -S mixcr sample1.txt sample2.txt ... output_prefix
# /path to vdjtools/: vdjtolls的安装路径
#output_prefix: 输出路径
转换完后的表格
1.Basic analysis
1.1 CalcBasicStats
This routine computes a set of basic sample statistics, such as read counts, number of clonotypes, etc.
java -jar /path to vdjtools/vdjtools-1.2.1.jar CalcBasicStats sample1.txt sample2.txt ... output_prefix
#or
java -jar /path to vdjtools/vdjtools-1.2.1.jar CalcBasicStats -m metadata.txt output_prefix
# /path to vdjtools/: vdjtolls的安装路径
#output_prefix: 输出路径
Tabular output
The following table with .basicstats.txt
suffix is generated,
Column | Description |
---|---|
sample_id | Sample unique identifier |
… | Metadata columns. See Metadata section |
count | Number of reads in a given sample |
diversity | Number of clonotypes in a given sample |
mean_frequency | Mean clonotype frequency |
geomean_frequency | Geometric mean of clonotype frequency |
nc_diversity | Number of non-coding clonotypes |
nc_frequency | Frequency of reads that belong to non-coding clonotypes |
mean_cdr3nt_length | Mean length of CDR3 nucleotide sequence. Weighted by clonotype frequency |
mean_insert_size | Mean number of inserted random nucleotides in CDR3 sequence. Characterizes V-J insert for receptor chains without D segment, or a sum of V-D and D-J insert sizes |
mean_ndn_size | Mean number of nucleotides that lie between V and J segment sequences in CDR3 |
convergence | Mean number of unique CDR3 nucleotide sequences that code for the same CDR3 amino acid sequence |
1.2 CalcSegmentUsage
This routine computes Variable (V) and Joining (J) segment usage vectors.
java -jar /path to vdjtools/vdjtools-1.2.1.jar CalcSegmentUsage -p -f "disease_state" -m metadata.txt ./results/desease_state
java -jar /path to vdjtools/vdjtools-1.2.1.jar CalcSegmentUsage -p -f "Sex" -m metadata.txt ./results/Sex
#-p : 画图,依赖于R包
#-f : 指定分组依据,分组信息在metadata文件中
#--plot-type png 输出png图片
1.3 CalcSpectratype
Calculates spectratype, that is, histogram of read counts by CDR3 nucleotide length.
java -jar /path to vdjtools/vdjtools-1.2.1.jar CalcSpectratype -a -m metadata.txt output_prefix
#-a :Will use CDR3 amino acid sequences for calculation instead of nucleotide ones
aa
:CDR3的氨基酸序列长度的频率分布
insert
: CDR3序列中V-J/V-D/D-J插入的核苷酸序列长度的频率分布
ndn
:CDR3序列中V和J片段中间的核苷酸序列长度的频率分布
1.4 PlotFancySpectratype
Plots a spectratype that also displays CDR3 lengths for top N clonotypes in a given sample.This plot allows to detect the highly-expanded clonotypes.
java -jar /path to vdjtools/vdjtools-1.2.1.jar PlotFancySpectratype -t 5 sample1.txt output_prefix
#-t:Number of top clonotypes to visualize. Should not exceed 20, default is 10
#单一样本
1.5 PlotFancyVJUsage
Plots a circos-style V-J usage plot displaying the frequency of various V-J junctions.
java -jar /path to vdjtools/vdjtools-1.2.1.jar PlotFancyVJUsage sample.txt output_prefix
# -u: Instead of counting read frequency, will count the number of unique clonotypes
1.6 PlotSpectratypeV
Plots a detailed spectratype containing additional info displays CDR3 length distribution for clonotypes from top N Variable segment families.This plot is useful to detect type 1 and type 2 repertoire biases, that could arise under pathological conditions.
java -jar /path to vdjtools/vdjtools-1.2.1.jar PlotSpectratypeV sample.txt output_prefix
# -u: Instead of counting read frequency, will count the number of unique clonotypes
# -t: Number of top (by frequency) V segments to visualize. Should notexceed 12 default is 12
2.Diversity estimation
2.1 PlotQuantileStats
Plots a three-layer donut chart to visualize the repertoire clonality.
• First layer (“set”) includes the frequency of singleton (“1”, met once), doubleton (“2”, met twice) and highorder(“3+”, met three or more times) clonotypes.
• The second layer (“quantile”), displays the abundance of top 20% (“Q1”), next 20% (“Q2”), ... (up to “Q5”)
clonotypes for clonotypes from “3+” set.
• The last layer (“top”) displays the individual abundances of top N clonotypes.
java -jar /path to vdjtools/vdjtools-1.2.1.jar PlotQuantileStats -t 10 sample.txt output_prefix
#-t:Number of top clonotypes to visualize. Should not exceed 10, default is 5
2.2 RarefactionPlot
Plots rarefaction curves for specified list of samples, that is, the dependencies between sample diversity and sample size.
java -jar /path to vdjtools/vdjtools-1.2.1.jar RarefactionPlot -m metadata.txt output_prefix
#-f: factor
Solid and dashed lines mark interpolated and extrapolated regions of rarefaction curves respectively,
points mark exact sample size and diversity. Shaded areas mark 95% confidence intervals.
实线和虚线分别表示稀疏曲线的实际和外推区域,点表示实际的样本大小和多样性。阴影区域表示95%置信区间
2.3 CalcDiversityStats
多样性估计,输出两个表格,一个是原始数据的多样性计算,另一个是在原始数据上外推的多样性计算。
java -jar /path to vdjtools/vdjtools-1.2.1.jar CalcDiversityStats -m metadata.txt output_prefix
3.Repertoire overlap analysis
Clonotype sharing between samples
3.1 OverlapPair
Performs a comprehensive analysis of clonotype sharing for a pair of samples.
java -jar /path to vdjtools/vdjtools-1.2.1.jar OverlapPair -p --plot-area-v2 sample1.txt sample2.txt output_prefix
#-p: plot
#--plot-area-v2:Alternative plotting mode, clonotype CDR3 sequences are shown at plot sides and connected to corresponding areas with lines.
Overlap type
Shorthand | Rule | Note |
---|---|---|
strict | CDR3nt (AND) V (AND) J (AND) SHMs | Require full match for receptor nucleotide sequence |
nt | CDR3nt | |
ntV | CDR3nt (AND) V | |
ntVJ | CDR3nt (AND) V (AND) J | |
aa | CDR3aa | |
aaV | CDR3aa (AND) V | |
aaVJ | CDR3aa (AND) V (AND) J | |
aa!nt | CDR3aa (AND)((NOT) CDR3nt ) | Removes nearly all contamination bias from overlap results. Should not be used for samples from the same donor/tracking experiments |
Clonotype scatterplot. Main frame contains a scatterplot of clonotype abundances (overlapping clonotypes only) and a linear regression. Point size is scaled to the geometric mean of clonotype frequency in both samples. Scatterplot axes represent log10 clonotype frequencies in each sample. Two marginal histograms show the overlapping (red) and total clonotype (grey) abundance distributions in corresponding sample. Histograms are weighted by clonotype abundance, i.e. they display read distribution by clonotype size.
Shared clonotype abundance plot. Plot shows details for top 20 clonotypes shared between samples, as well as collapsed (“NotShown”) and non-overlapping (“NonOverlapping”) clonotypes. Clonotype CDR3 amino acid sequence is plotted against the sample where the clonotype reaches maximum abundance.
CalcPairwiseDistances
Performs an all-versus-all pairwise overlap for a list of samples and computes a set of repertoire similarity measures. At least 3 samples should be provided.
java -jar /path to vdjtools/vdjtools-1.2.1.jar CalcPairwiseDistances -p [sample1.txt sample2.txt sample3.txt or -m metadata.txt] output_prefix
#-p: plot
Pairwise overlap circos plot. Count, frequency and diversity panels correspond to the read count, frequency (both non-symmetric) and the total number of clonotypes that are shared between samples. Pairwise overlaps are stacked, i.e. segment arc length is not equal to sample size.
ClusterSamples
将CalcPairwiseDistances
的输出文本作为输入进行聚类分析。
java -jar /path to vdjtools/vdjtools-1.2.1.jar ClusterSamples -p input_prefix output_prefix
#input_prefix等于 calcpariwiseDistance 中的 output_prefix (不用加后缀)
#-p: plot
#-f: factor
#-n:Specifies if plotting factor is continuous
比如:
java -jar /path to vdjtools/vdjtools-1.2.1.jar CalcPairwiseDistances -p e:/data/ -m metadata.txt e:/results/all
java -jar /path to vdjtools/vdjtools-1.2.1.jar ClusterSamples -p -f "Sex" e:/results/all e:/results/Sex
官方给的参考图片
TestClusters
This routine allows to test whether a given factor influences repertoire clustering. It assesses compactness of samples that have the same factor level and separation between samples with distinct factor levels for the factor specified in ClusterSamples
.
(只有ClusterSamples
指定了-f
时才可以使用该函数,验证factor是如何影响聚类效果的。)
java -jar /path to vdjtools/vdjtools-1.2.1.jar TestClusters input_prefix output_prefix
官方图片
TrackClonotypes
This routine performs an all-vs-all intersection between an ordered list of samples for clonotype tracking purposes. User can specify sample which clonotypes will be traced, e.g. the pre-therapy sample.
java -jar /path to vdjtools/vdjtools-1.2.1.jar TrackClonotypes [options] [sample1.txt sample2.txt sample3.txt ... if -m is not specified] output_prefix
#-m:metadata
#-f:factor
#-p:plot