Pisces变异检测

体细胞和胚系变异检测,推荐tumor-only变异检测。

优势:

    低频突变检测,支持linux和windows平台。

说明:

软件开发之初用于tumor-only样本,可以检测 SNVs、MNVs和small indels。输入文件为BAM,输出文件为VCF或GVCF格式。可以进行tumor/normal样本分析,但需要过滤掉胚系突变Illumina的Strelka可以做这件事情。该软件包含如下几个执行程序:

Stitcher - Stitches two paired reads together into a single read

Pisces - Calls small variants

Scylla - Detects multiple nucleotide variants (MNVs) in a given sample and phases the variants in complex regions into sub populations

VariantQualityRecalibration - Recalibrates the variant quality scores (Q scores) if the particular variants are over represented

快速入门:

(1) 软件下载:

  https://github.com/Illumina/Pisces

    配制微软 .net core 2.0 或以上环境直接使用

(2) 软件版本:

      v5.2.5

(3) 基因组索引

    dotnet CreateGenomeSizeFile.dll ***

    REQUIRED:

      -g                 FOLDER Genome folder.  Example folder structure:

                               \\Genomes\Homo_sapiens\UCSC\hg19\Sequence\WholeG-

                               enomeFASTA

      -s                 STRING Species and build, in quotes. Example

                               format: Genus Species (Source Build). - e.g.

                               "Rattus norvegicus (UCSC rn4)"

    COMMON:

      -o, --out, --outfolder

                             FOLDER output directory


参数建议:

     dotnet CreateGenomeSizeFile.dll -g Reference_Genome/hg19/ -s "Homo sapiens  (UCSC rn1)"  -o Reference_Genome/hg19/


(4) 变异检测

     dotnet Pisces.dll ***

参数建议

    a. Somatic:

         -bam {Bam} -CallMNVs false -g {genome folder} -gVCF false -i {interval file} -OutFolder {outfolder}

    b. Germline:

             -bam {Bam} -CallMNVs false -crushvcf true -g {genome folder} -gVCF false -i {interval file} -ploidy diploid -OutFolder {outfolder}

    c.  Ultra low freq:

         -bam {Bam} -g {genome folder} -OutFolder {outfolder} -MinVF 0.0005 -SSFilter false -MinBQ 65 -MaxVQ 100 -MinDepthFilter 500 -MinVQ 0 -VQFilter 20 -ReportNoCalls True -CallMNVs False -MinDepth 5 -threadbychr true


    d.  High Speed:

         -bam {Bam} -CallMNVs false -g {genome folder} -gVCF false -OutFolder {outfolder} -ThreadByChr True

  (5) 参数解释

-ver/-v: Print version.

-MinVariantQScore / -MinVQ: MinimumVariantQScore to report variant

    变异Q Score最小值

-MinBaseCallQuality / -MinBQ: MinimumBaseCallQuality to use a base of the read

    使用的read中Base Call质量值最小值

-BamPaths / -Bam: BAMPath(s), single value or comma delimited list

    BAM文件路径,多个BAM用逗号分隔

-MinDepth / -MinDP: Minimum depth to call a variant

    最小深度阈值

-MinimumFrequency / -MinVF: MinimumFrequency to call a variant

    最小突变频率阈值

-TargetLODFrequency / -TargetVF: Target Frequency to call a variant. Ie, to target a 5% allele frequency, we must call down to 2.6%, to capture that 5% allele 95% of the time. This parameter is used by theSomatic Genotyping Model


-EnableSingleStrandFilter / -SSFilter: Flag variants as filtered if coverage limited to one strand

    过滤单链变异

-VariantQualityFilter / -VQFilter: FilteredVariantQScore to report variant as filtered

    变异Q值过滤

-MinVariantFrequencyFilter / -VFFilter: FilteredVariantFrequency to report variant as filtered

    变异频率过滤

-RepeatFilter: FilteredIndelRepeats to report variant as filtered

    Repeat过滤

-MinDepthFilter / -MinDPFilter: FilteredLowDepth to report variant as filtered

    低深度变异过滤

-IntervalPaths / -I: IntervalPath(s), single value or comma delimited list corresponding to BAMPath(s). At most one value should be provided if BAM folder is specified

    Interval路径

-MinMapQuality / -MinMQ: MinimumMapQuality required to use a read

    read中比对质量阈值

-GenomePaths / -G: GenomePath(s), single value or comma delimited list corresponding to BAMPath(s). Must be single value if BAM folder is specified

    参考基因组路径

-OutputSBFiles: Output strand bias files, 'true' or 'false'

    是否输出链偏好文件

-OnlyUseProperPairs / -PP: Only use proper pairs, 'true' or 'false'

    只是用完全配对的reads对

-MaxVariantQScore / -MaxVQ : MaximumVariantQScore to cap output variant Qscores

    变异Q值的最大值

-MaxAcceptableStrandBiasFilter / -SBFilter: Strand bias cutoff

    链bias阈值

-MaxNumThreads / -t: ThreadCount

    线程数目

-ThreadByChr: Thread by chr. More memory intensive.  This will temporarily create output per chr.

    设置染色体并行

-gVCF: Output gVCF files, 'true' or 'false'

    是否输出gVCF文件

-CallMNVs: Call MNVs (a.k.a. phased SNPs) 'true' or 'false'

    是否Call MNVs

-MaxMNVLength: Max length phased SNPs that can be called

    MNVs最大长度

-MaxRefGapInMNV or -MaxGapBetweenMNV : Max allowed gap between phased SNPs that can be called


-ReportNoCalls : 'true' or 'false'. default, false


-Collapse: Whether or not to collapse variants together, 'true' or 'false'. default, false

    是否合并变异,默认否

-PriorsPath: PriorsPath for vcf file containing known variants, used with -collapse to preferentially reconcile variants

    已知变异的VCF文件

-TrimMnvPriors : Whether or not to trim preceeding base from MNVs in priors file.  Note: COSMIC convention is to include preceeding base for MNV.Default is false.


-ReportRcCounts: Report collapsed read count, When BAM files contain XW and XV tags, output read counts for duplex-stitched, duplex-nonstitched, simplex-stitched, and simplex-nonstitched.  'true' or 'false'.default, false

    报告合并后的read count

-ReportTsCounts: Report collapsed read count by different template strands, Conditional on ReportRcCounts, output read counts for duplex-stitched, duplex-nonstitched, simplex-forward-stitched, simplex-forward-nonstitched, simplex-reverse-stitched, simplex-reverse-nonstitched.  'true' or 'false'. default, false


-Ploidy: 'somatic' or 'diploid'. default, somatic.

    体细胞/胚系,默认体细胞

-DiploidGenotypeParameters: A,B,C. default 0.20,0.70,0.80

    胚系突变基因型参数

-RMxNFilter: M,N,F. Comma-separated list of integers indicating max length of the repeat section (M), the minimum number of repetitions of that repeat (N), to be applied if the variant frequency is less than (F). Default is R5x9,F=20.


-CoverageMethod: 'approximate' or 'exact'. Exact is more precise but requires more memory (minimum 8 GB).  Default approximate

    覆盖度方法,默认大概的,精确的需要更多的内存,至少8G

-CollapseFreqThreshold: When collapsing,minimum frequency required for target variants. Default '0'


-CollapseFreqRatioThreshold: When collapsing,minimum ratio required of target variant frequency to collapsible variant frequency. Default '0.5f'


-NoiseModel: Window/Flat. Default Flat

    噪音模型,Window或Flat,默认Flat

-ForcedAlleles : vcf path for alleles that are forced to report

-BamPaths : BAMPath(s), single value or comma delimited list

-BAMFolder : BAM parent folder

-MultiProcess : When threading by chr, launch separate processes to parallelize. Default true

-ChrFilter : Chromosome to process. If provided, other chromosomes are filtered out of output.  No default value.

-OutFolder      : Output folder.  No default value.

-MaxNumThreads   : Maximum number of threads. Default 20


文章:Pisces: An Accurate and Versatile Variant Caller for Somatic and Germline Next-Generation Sequencing Data

整理:浩渺予怀

你可能感兴趣的:(Pisces变异检测)