基因组比对工具NGMLR和结构变异识别工具Sniffles

前言

基因组结构变异是很多癌症、遗传病等疾病的重要诱因。目前基于二代测序技术检测基因组结构变异存在很大的局限性,而三代测序存在错误率较高等多种问题,尤其针对复杂结构变异大多软件识别能力较差。针对这一问题,有研究人员就开发出了基因组比对工具NGMLR和结构变异识别工具Sniffles,为变异检测提供了前所未有的灵敏度和精确度,并且NGMLR和Sniffles可以自动过滤虚假事件并对低覆盖率数据进行操作,从而降低成本。

简介

NGMLR和Sniffles是适用于长读长测序的新型结构变异检测工具,基因组比对工具NGMLR在基于短read比对方法的基础上,考虑了PacBio和Oxford Nanopore平台产生的数据类型。结构变异识别工具Sniffles是一款结构变异识别工具,可以根据比对结果进行扫描,精确检测出结构变异。


基因组比对工具NGMLR和结构变异识别工具Sniffles_第1张图片
NGMLR(左)和Sniffles(右)的主要步骤

NGMLR

安装

推荐使用conda进行安装:

conda install ngmlr

使用

对于Pacbio数据:

ngmlr -t 4 -r reference.fasta -q reads.fastq -o test.sam

对于Oxford Nanopore数据:

ngmlr -t 4 -r reference.fasta -q reads.fastq -o test.sam -x ont

参数说明

用法:ngmlr [options] -r -q [-o ]

输入/输出参数:
    -r ,  --reference 
        (required)  Path to the reference genome (FASTA/Q, can be gzipped)
    -q ,  --query 
        Path to the read file (FASTA/Q) [/dev/stdin]
    -o ,  --output 
        Path to output file [stdout]
    --skip-write
        Don't write reference index to disk [false]
    --bam-fix
        Report reads with > 64k CIGAR operations as unmapped. Required to be compatible with the BAM format [false]
    --rg-id 
        Adds RG:Z: to all alignments in SAM/BAM [none]
    --rg-sm 
        RG header: Sample [none]
    --rg-lb 
        RG header: Library [none]
    --rg-pl 
        RG header: Platform [none]
    --rg-ds 
        RG header: Description [none]
    --rg-dt 
        RG header: Date (format: YYYY-MM-DD) [none]
    --rg-pu 
        RG header: Platform unit [none]
    --rg-pi 
        RG header: Median insert size [none]
    --rg-pg 
        RG header: Programs [none]
    --rg-cn 
        RG header: sequencing center [none]
    --rg-fo 
        RG header: Flow order [none]
    --rg-ks 
        RG header: Key sequence [none]

一般参数:
    -t ,  --threads 
        Number of threads [1]
    -x ,  --presets 
        Parameter presets for different sequencing technologies [pacbio]
    -i <0-1>,  --min-identity <0-1>
        Alignments with an identity lower than this threshold will be discarded [0.65]
    -R ,  --min-residues 
        Alignments containing less than  or ( * read length) residues will be discarded [0.25]
    --no-smallinv
        Don't detect small inversions [false]
    --no-lowqualitysplit
        Split alignments with poor quality [false]
    --verbose
        Debug output [false]
    --no-progress
        Don't print progress info while mapping [false]

高级参数:
    --match 
        Match score [2]
    --mismatch 
        Mismatch score [-5]
    --gap-open 
        Gap open score [-5]
    --gap-extend-max 
        Gap open extend max [-5]
    --gap-extend-min 
        Gap open extend min [-1]
    --gap-decay 
        Gap extend decay [0.15]
    -k <10-15>,  --kmer-length <10-15>
        K-mer length in bases [13]
    --kmer-skip 
        Number of k-mers to skip when building the lookup table from the reference [2]
    --bin-size 
        Sets the size of the grid used during candidate search [4]
    --max-segments 
        Max number of segments allowed for a read per kb [1]
    --subread-length 
        Length of fragments reads are split into [256]
    --subread-corridor 
        Length of corridor sub-reads are aligned with [40]

Sniffles

安装

推荐使用conda进行安装:

conda install sniffles

使用

sniffles -m mapped.sort.bam -v output.vcf

mapped.sort.bam可以来自ngmlr或bwa,如果是来自bwa,要使用-M参数标记出主要和次要比对。

参考

  • Sedlazeck F J , Rescheneder P , Smolka M , et al. Accurate detection of complex structural variations using single-molecule sequencing[J]. Nature Methods, 2018.
  • Sniffles
  • NGMLR

你可能感兴趣的:(基因组比对工具NGMLR和结构变异识别工具Sniffles)