昨天老师发给我一篇生信女神Shirley Liu的文章,看了里面的内容之后感觉很兴奋~它可以不做免疫组测序,直接从Bulk RNA-seq或者scRNA-seq里面重构得到免疫组的信息。
中文翻译
文章要点
- Although less sensitive than TCR-seq and BCR-seq, TRUST is able to identify the abundantly expressed and potentially more clonally expanded TCRs/BCRs in the RNA-seq data that are more likely to be involved in antigen binding
- Recent years have also seen other computational methods introduced for immune repertoire construction from RNA-seq data, including V’DJer, MiXCR, CATT and ImRep. These methods focus on reconstruction of complementary-determining region3 (CDR3), with limited ability to assemble full-length V(D)J receptor sequences, although CDR1 and CDR2 on the V sequence still contribute considerably to anti- gen recognition and binding.
TRUST4和其他重构算法相比,它的特点:
- 可利用FASTQ或BAM文件
- 可重构更长,甚至全长的TCR或BCR序列
- 更快更敏感
虽然TRUST4也可以从单细胞数据中重构,今天我主要想试一试从Bulk中重构
1. 安装
git clone https://github.com/liulab-dfci/TRUST4.git
make
#我想添加环境变量,但不知道问什么总是失败
#所以决定再目标文件夹对run-trust4文件创建软链接
ln -s /home/user/myh/install/TRUST4/run-trust4 /home/user/myh/**/TRUST4_outs
cd /home/user/myh/**/TRUST4_outs
./run-trust4
#可以使用
2.用法
官方Usage
Usage: ./run-trust4 [OPTIONS]
Required:
-b STRING: path to bam file
-1 STRING -2 STRING: path to paired-end read files
-u STRING: path to single-end read file
-f STRING: path to the fasta file coordinate and sequence of V/D/J/C genes
Optional:
--ref STRING: path to detailed V/D/J/C gene reference file, such as from IMGT database. (default: not used). (recommended)
-o STRING: prefix of output files. (default: inferred from file prefix)
--od STRING: the directory for output files. (default: ./)
-t INT: number of threads (default: 1)
--barcode STRING: if -b, bam field for barcode; if -1 -2/-u, file containing barcodes (defaul: not used)
--barcodeRange INT INT CHAR: start, end(-1 for lenght-1), strand in a barcode is the true barcode (default: 0 -1 +)
--barcodeWhitelist STRING: path to the barcode whitelist (default: not used)
--read1Range INT INT: start, end(-1 for length-1) in -1/-u files for genomic sequence (default: 0 -1)
--read2Range INT INT: start, end(-1 for length-1) in -2 files for genomic sequence (default: 0 -1)
--UMI STRING: if -b, bam field for UMI; if -1 -2/-u, file containing UMIs (default: not used)
--umiRange INT INT CHAR: start, end(-1 for lenght-1), strand in a UMI is the true UMI (default: 0 -1 +)
--mateIdSuffixLen INT: the suffix length in read id for mate. (default: not used)
--skipMateExtension: do not extend assemblies with mate information, useful for SMART-seq (default: not used)
--abnormalUnmapFlag: the flag in BAM for the unmapped read-pair is nonconcordant (default: not set)
--noExtraction: directly use the files from provided -1 -2/-u to assemble (default: extraction first)
--repseq: the data is from TCR-seq or BCR-seq (default: not set)
--outputReadAssignment: output read assignment results to the prefix_assign.out file (default: no output)
--stage INT: start TRUST4 on specified stage (default: 0)
0: start from beginning (candidate read extraction)
1: start from assembly
2: start from annotation
3: start from generating the report table
我的数据是小鼠的数据,先用一个Fastq文件试一试
./run-trust4 -f /home/user/myh/install/TRUST4/mouse/GRCm38_bcrtcr.fa --ref /home/user/myh/install/TRUST4/mouse/mouse_IMGT+C.fa -1 /home/user/myh/raw_data/AEKIBULK/inputs/clean_data/KI_T/KIT11_1.clean.fq.gz -2 /home/user/myh/raw_data/AEKIBULK/inputs/clean_data/KI_T/KIT11_2.clean.fq.gz -o KIT11
可以通过-t
调节可用的线程数
因为我的数据里面是分选了T细胞和B细胞的,但我用T细胞的数据跑也能重构到BCR的结果,Emmm
注意一下TRUST4跑完是不会主动生成文件夹的,所有的结果都散在那里……
XX_report.tsv里面有如下信息:可直接用于immunarch
还会生成airr文件,也可用于immunarch分析
- "airr" - adaptive immune receptor repertoire (AIRR) data format. http://docs.airr-community.org/en/latest/datarep/overview.html
对于T细胞的结果,我把BCR链删掉后,用immunarch进行后续分析
补充一点关于用VDJtools分析的内容
下载好VDJtools后
参考
1.Basic analysis
1.1 CalcBasicStats
java -jar /home/user/myh/install/VDJtools/vdjtools-1.2.1/vdjtools-1.2.1.jar CalcBasicStats -m /home/user/myh/raw_data/AEKIBULK/vdjtools/inputs/metadata.txt /home/user/myh/raw_data/AEKIBULK/vdjtools/outs
# /path to vdjtools/: vdjtolls的安装路径
#output_prefix: 输出路径
VDJtools的格式
注意在CDR3aa里面,要删除out_of_frame的内容,不然vdjtools无法识别
1.2 CalcSegmentUsage
java -jar /home/user/myh/install/VDJtools/vdjtools-1.2.1/vdjtools-1.2.1.jar CalcSegmentUsage -p -f "group" -m /home/user/myh/raw_data/AEKIBULK/vdjtools/inputs/metadata.txt /home/user/myh/raw_data/AEKIBULK/vdjtools/outs
#-p : 画图,依赖于R包
#-f : 指定分组依据,分组信息在metadata文件中
#--plot-type png 输出png图片
1.3 CalcSpectratype
Calculates spectratype, that is, histogram of read counts by CDR3 nucleotide length.
java -jar /home/user/myh/install/VDJtools/vdjtools-1.2.1/vdjtools-1.2.1.jar CalcSpectratype -a -m /home/user/myh/raw_data/AEKIBULK/vdjtools/inputs/metadata.txt /home/user/myh/raw_data/AEKIBULK/vdjtools/outs
#-a :Will use CDR3 amino acid sequences for calculation instead of nucleotide ones
1.4 PlotFancySpectratype
Plots a spectratype that also displays CDR3 lengths for top N clonotypes in a given sample.This plot allows to detect the highly-expanded clonotypes.
java -jar /home/user/myh/install/VDJtools/vdjtools-1.2.1/vdjtools-1.2.1.jar PlotFancySpectratype -t 5 /home/user/myh/raw_data/AEKIBULK/vdjtools/inputs/AE_T_5.txt /home/user/myh/raw_data/AEKIBULK/vdjtools/outs
#-t:Number of top clonotypes to visualize. Should not exceed 20, default is 10
#单一样本
下面这个不知道为啥没跑出来
java -jar /home/user/myh/install/VDJtools/vdjtools-1.2.1/vdjtools-1.2.1.jar CalcPairwiseDistances -p -m /home/user/myh/raw_data/AEKIBULK/vdjtools/inputs/metadata.txt /home/user/myh/raw_data/AEKIBULK/vdjtools/outs
#-p: plot
如果要看单细胞的数据:
./run-trust4 -b /home/user/myh/raw_data/***/possorted_genome_bam.bam -f /home/user/myh/install/TRUST4/human/hg38_bcrtcr.fa --ref /home/user/myh/install/TRUST4/human/human_IMGT+C.fa --barcode CB -o XXX