HLAscan: genotyping of the HLA region using next-generation sequencing data
原文地址: https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-017-1671-3
软件地址:
https://github.com/SyntekabioTools/HLAscan/releases/
直接download了编译好的包:hla_scan_r_v2.1.4可以直接使用,
https://github.com/SyntekabioTools/HLAscan这里下载dataset,放到database中。
2017年发表
使用雨WGS数据,WES数据
第一步 align reads 到IMGT HLA数据库上
第二步用一个score function 来打分。
去除等位基因的假阳性来确定正确的分型。
HLAscan performs alignment of reads to HLA sequences from the international ImMunoGeneTics project/human leukocyte antigen (IMGT/HLA) database.
The distribution of aligned reads was used to calculate a score function to determine correctly phased alleles by progressively removing false-positive alleles.
Comparative HLA typing tests using public datasets from the 1000 Genomes Project and the International HapMap Project demonstrated that HLAscan could perform HLA typing more accurately than previously reported NGS-based methods such as HLAreporter and PHLAT.
In addition, the results of HLA-A, −B, and -DRB1 typing by HLAscan using data generated by NextGen were identical to those obtained using a Sanger sequencing–based method.
We also applied HLAscan to a family dataset with various coverage depths generated on the Illumina HiSeq X-TEN platform.
HLAscan identified allele types of HLA-A, −B, −C, −DQB1, and -DRB1 with 100% accuracy for sequences at ≥ 90× depth, and the overall accuracy was 96.9%.
usage:others
./hla_scan_r_v2.1.2 -t 6 -l R1.fastq.gz.paired.fq -r R2.fastq.gz.paired.fq -d data-set/db/HLA-ALL.IMGT -g HLA-A HLA-B HLA-C &>>eo &
./hla_scan_r_v2.0 -l normal_wes_R1.fq -r normal_wes_R2.fq -d ./db/HLA-ALL.IMGT -v 37 -g HLA-E -t 2 > HLA-E.log
./hla_scan_r_v2.0 -l normal_wes_R1.fq -r normal_wes_R2.fq -d ./db/HLA-ALL.IMGT -v 37 -g HLA-E -t 2 > HLA-E.log
/hla_scan -l NS_L001_R1_001.fastq.gz -d A_gen.fasta
hla_scan_r_v2.1.2 \
-b NA12155.chr6.bam \
-d HLA-ALL.IMGT \
-v 38 \
-b 参数指定输入的bam文件,
-d 指定数据库的名字,数据库是软件自带的,
-v 指定基因组版本hg38。
测试压缩的fastq会报错。需要提前解压。
hla_scan -t 60 \
-l CP60005015-C01-WES_R1.fastq \
-r CP60005015-C01-WES_R2.fastq \
-d /database/imgt-hla/db/HLA-ALL.IMGT \
-g HLA_A HLA_B HLA_C HLA-E HLA-F HLA-G MICA MICB HLA-DMA HLA-DMB HLA-DOA HLA-DOB HLA-DPA1 HLA-DPB1 HLA-DQA1 HLA-DQB1 HLA-DRA HLA-DRB1 HLA-DRB5 TAP1 TAP2
用bam文件时需要指定 -v参数,38,37
-b参数指定输入的bam文件,-d指定数据库的名字,数据库是软件自带的,-v指定基因组版本hg38。
软件默认对HLA-A基因进行分型,除了该基因外,还支持以下基因的分型
1. HLA-A
2. HLA-B
3. HLA-C
4. HLA-E
5. HLA-F
6. HLA-G
7. MICA
8. MICB
9. HLA-DMA
10. HLA-DMB
11. HLA-DOA
12. HLA-DOB
13. HLA-DPA1
14. HLA-DPB1
15. HLA-DQA1
16. HLA-DQB1
17. HLA-DRA
18. HLA-DRB1
19. HLA-DRB5
20. TAP1
21. TAP2
软件也支持直接提供fastq格式的输入序列,具体用法可以参考官网的说明。
运行过程中会显示如下的log信息
=====================================================
HLAscan v2.1
Report created
2018\. 7\. 23. 11:25:32
========================================================
HLA gene : HLA-A
# of considered types : 3182
----------- HLA-Types -----------
[Type 1] 01:11N EX3_3.29348_45 EX2_3.75926_100 EX4_24.0471_100 EX5_35.1966_100
[Type 2] 01:11N EX3_3.29348_45 EX2_3.75926_100 EX4_24.0471_100 EX5_35.1966_100
可以看到,对于HLA-A基因,基因分型的结果为HLA-A*01:11N。
结果统计脚本:
这个脚本非常有用,可以将hlascan的一堆结果txt文件,转化为一个文件,将hla信息也汇总到一起。
python3 merge_HLA_result.py ./*.txt
结果为:
HLA-A-1 11:01:01:02
HLA-A-2 11:01:53
HLA-B-1 46:01:01
HLA-B-2 13:01:01
HLA-C-1 01:02:30
HLA-C-2 03:04:01:01
HLA-E-1 01:03:01:01
HLA-E-2 01:03:01:01
HLA-F-1 01:01:02:02
HLA-F-2 01:01:02:02
HLA-G-1 01:01:03:03
HLA-G-2 01:01:01:03
MICA-1 010:01
MICA-2 010:01
MICB-1 NULL
MICB-2 NULL
HLA-DMA-1 01:01:01:01
HLA-DMA-2 01:02
HLA-DMB-1 01:01:01:01
HLA-DMB-2 01:01:01:01
HLA-DOA-1 01:01:01
HLA-DOA-2 01:01:01
HLA-DOB-1 01:01:01:01
HLA-DOB-2 01:01:01:01
HLA-DPA1-1 02:01:01
HLA-DPA1-2 02:02:02
HLA-DPB1-1 05:01:01
HLA-DPB1-2 14:01:01
HLA-DQA1-1 01:02:01:02
HLA-DQA1-2 03:03:01
HLA-DQB1-1 06:01:15
HLA-DQB1-2 03:03:02:01
HLA-DRA-1 01:01:01:03
HLA-DRA-2 01:01:01:03
HLA-DRB1-1 09:01:02
HLA-DRB1-2 15:01:01:04
HLA-DRB5-1 01:01:01
HLA-DRB5-2 01:01:01
TAP1-1 01:01:01:02
TAP1-2 01:01:01:02
TAP2-1 01:01:03:03
TAP2-2 02:01:02:02
点赞私信自动发送脚本
其他HLA 分型工具:点击下面链接查看
10X单细胞RNAseq数据HLA分型工具:scHLAcount
WES数据只能检测ABC三种结果的: OptiType
2022年还在更新的HLA-HD