GATK的初次了解

终于讲完了Journal club,可以踏实下来继续学习生信知识啦~

这篇学习笔记主要是对GATK有一个初步了解,就是要先知道它是干嘛的。GATK这个软件的功能太多太多了。。。看了官网以后感觉无从下手,从来没有接触过GATK的我这么多东西可咋学?参考大神的文章:GATK入门的最佳姿势,别犹豫,直接上,就对了。

先来看看GATK的官网吧:https://gatk.broadinstitute.org/hc/en-us

GATK的英文全称是genome analysis toolkit(基因组分析工具)。根据官网的介绍,这个软件主要是用来发现突变体(variant)。

GATK是鉴定germline DNA和RNAseq数据中的SNPs和indels的"工业标准"。它的使用范围正在逐渐的扩大,包括体细胞短变异体calling、处理拷贝数(CNV)和结构变异(SV)。除了突变体caller之外,GATK还包括许多功能来执行相关任务,比如对高通量测序的数据处理和质量控制,并绑定了时下非常流行的Picard。

这些工具主要用于处理由Illumina测序技术生成的外显体和全基因组,但它们也可以用于处理各种其他技术和实验设计。尽管GATK最初是为人类基因组学而开发的,但它目前已经进化到可以处理任何多倍体生物的基因组数据。

现在GATK最新版是4.1.8.1,你在官网上可以查阅所有版本的参数说明:here,可以说是非常的详细了。

GATK需要在Linux和其他posix兼容的平台上运行,其中包括MacOS x。Windows系统不受支持GATK。主要的系统需求是Java 8 / JDK 1.8。其中有一些工具需要依赖R或Python。你可以在这个网站:Download了解下载和安装说明。具体下载方法就不赘述了,因为像这种数据一般也只能在服务器上运行,由于我们学校的服务器里已经安装好了GATK,所以就直接拿来调用就可以了~

简单的看一下GATK都有哪些tool,在命令行里输入:

$ gatk --list

这时会弹出一大串信息,每一个功能都用虚线隔开了,并且在每一个参数后面都有介绍是什么用途:

Using GATK jar /gpfs/share/apps/gatk/4.1.7.0/gatk-package-4.1.7.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /gpfs/share/apps/gatk/4.1.7.0/gatk-package-4.1.7.0-local.jar --help
USAGE:   [-h]

Available Programs:
--------------------------------------------------------------------------------------
Base Calling:                                    Tools that process sequencing machine data, e.g. Illumina base calls, and detect sequencing level attributes, e.g. adapters
    CheckIlluminaDirectory (Picard)              Asserts the validity for specified Illumina basecalling data.
    CollectIlluminaBasecallingMetrics (Picard)   Collects Illumina Basecalling metrics for a sequencing run.
    CollectIlluminaLaneMetrics (Picard)          Collects Illumina lane metrics for the given BaseCalling analysis directory.

    ExtractIlluminaBarcodes (Picard)             Tool determines the barcode for each read in an Illumina lane.
    IlluminaBasecallsToFastq (Picard)            Generate FASTQ file(s) from Illumina basecall read data.
    IlluminaBasecallsToSam (Picard)              Transforms raw Illumina sequencing data into an unmapped SAM or BAM file.
    MarkIlluminaAdapters (Picard)                Reads a SAM or BAM file and rewrites it with new adapter-trimming tags.

--------------------------------------------------------------------------------------
Copy Number Variant Discovery:                   Tools that analyze read coverage to detect copy number variants.
    AnnotateIntervals                            Annotates intervals with GC content, mappability, and segmental-duplication content
    CallCopyRatioSegments                        Calls copy-ratio segments as amplified, deleted, or copy-number neutral
    CombineSegmentBreakpoints                    (EXPERIMENTAL Tool) Combine the breakpoints of two segment files and annotate the resulting intervals with chosen columns from each file.
    CreateReadCountPanelOfNormals                Creates a panel of normals for read-count denoising
    DenoiseReadCounts                            Denoises read counts to produce denoised copy ratios
    DetermineGermlineContigPloidy                Determines the baseline contig ploidy for germline samples given counts data
    FilterIntervals                              Filters intervals based on annotations and/or count statistics
    GermlineCNVCaller                            Calls copy-number variants in germline samples given their counts and the output of DetermineGermlineContigPloidy
    MergeAnnotatedRegions                        (EXPERIMENTAL Tool) Merge annotated genomic regions based entirely on touching/overlapping intervals.
    MergeAnnotatedRegionsByAnnotation            (EXPERIMENTAL Tool) Merge annotated genomic regions within specified distance if annotation value(s) are exactly the same.
    ModelSegments                                Models segmented copy ratios from denoised read counts and segmented minor-allele fractions from allelic counts
    PlotDenoisedCopyRatios                       Creates plots of denoised copy ratios
    PlotModeledSegments                          Creates plots of denoised and segmented copy-ratio and minor-allele-fraction estimates
    PostprocessGermlineCNVCalls                  Postprocesses the output of GermlineCNVCaller and generates VCFs and denoised copy ratios
    TagGermlineEvents                            (EXPERIMENTAL Tool) Do a simplistic tagging of germline events in a tumor segment file.

--------------------------------------------------------------------------------------
Coverage Analysis:                               Tools that count coverage, e.g. depth per allele
    ASEReadCounter                               Generates table of filtered base counts at het sites for allele specific expression
    AnalyzeSaturationMutagenesis                 (BETA Tool) (EXPERIMENTAL) Processes reads from a MITESeq or other saturation mutagenesis experiment.
    CollectAllelicCounts                         Collects reference and alternate allele counts at specified sites
    CollectAllelicCountsSpark                    Collects reference and alternate allele counts at specified sites
    CollectF1R2Counts                            Collect F1R2 read counts for the Mutect2 orientation bias mixture model filter
    CollectReadCounts                            Collects read counts at specified intervals
    CountBases                                   Count bases in a SAM/BAM/CRAM file
    CountBasesSpark                              Counts bases in the input SAM/BAM
    CountReads                                   Count reads in a SAM/BAM/CRAM file
    CountReadsSpark                              Counts reads in the input SAM/BAM
    DepthOfCoverage                              (BETA Tool) Generate coverage summary information for reads data
    GetPileupSummaries                           Tabulates pileup metrics for inferring contamination
    Pileup                                       Prints read alignments in samtools pileup format
    PileupSpark                                  (BETA Tool) Prints read alignments in samtools pileup format

--------------------------------------------------------------------------------------
Diagnostics and Quality Control:                 Tools that collect sequencing quality related and comparative metrics
    AccumulateVariantCallingMetrics (Picard)     Combines multiple Variant Calling Metrics files into a single file
    AnalyzeCovariates                            Evaluate and compare base quality score recalibration (BQSR) tables
    BamIndexStats (Picard)                       Generate index statistics from a BAM file
    CalcMetadataSpark                            (BETA Tool) (Internal) Collects read metrics relevant to structural variant discovery
    CalculateContamination                       Calculate the fraction of reads coming from cross-sample contamination
    CalculateFingerprintMetrics (Picard)         Calculate statistics on fingerprints, checking their viability
    CalculateReadGroupChecksum (Picard)          Creates a hash code based on the read groups (RG).
    CheckFingerprint (Picard)                    Computes a fingerprint from the supplied input (SAM/BAM or VCF) file and compares it to the provided genotypes
    CheckPileup                                  Compare GATK's internal pileup to a reference Samtools mpileup
    CheckTerminatorBlock (Picard)                Asserts the provided gzip file's (e.g., BAM) last block is well-formed; RC 100 otherwise
    ClusterCrosscheckMetrics (Picard)            Clusters the results of a CrosscheckFingerprints run by LOD score
    CollectAlignmentSummaryMetrics (Picard)      Produces a summary of alignment metrics from a SAM or BAM file.
    CollectArraysVariantCallingMetrics (Picard)  Collects summary and per-sample from the provided arrays VCF file
    CollectBaseDistributionByCycle (Picard)      Chart the nucleotide distribution per cycle in a SAM or BAM file
    CollectBaseDistributionByCycleSpark          (BETA Tool) Collects base distribution per cycle in SAM/BAM/CRAM file(s).
    CollectGcBiasMetrics (Picard)                Collect metrics regarding GC bias.
    CollectHiSeqXPfFailMetrics (Picard)          Classify PF-Failing reads in a HiSeqX Illumina Basecalling directory into various categories.
    CollectHsMetrics (Picard)                    Collects hybrid-selection (HS) metrics for a SAM or BAM file.
    CollectIndependentReplicateMetrics (Picard)  (EXPERIMENTAL Tool) Estimates the rate of independent replication rate of reads within a bam.

    CollectInsertSizeMetrics (Picard)            Collect metrics about the insert size distribution of a paired-end library.
    CollectInsertSizeMetricsSpark                (BETA Tool) Collects insert size distribution information on alignment data
    CollectJumpingLibraryMetrics (Picard)        Collect jumping library metrics.
    CollectMultipleMetrics (Picard)              Collect multiple classes of metrics.
    CollectMultipleMetricsSpark                  (BETA Tool) Runs multiple metrics collection modules for a given alignment file
    CollectOxoGMetrics (Picard)                  Collect metrics to assess oxidative artifacts.
    CollectQualityYieldMetrics (Picard)          Collect metrics about reads that pass quality thresholds and Illumina-specific filters.
    CollectQualityYieldMetricsSpark              (BETA Tool) Collects quality yield metrics from SAM/BAM/CRAM file(s).
    CollectRawWgsMetrics (Picard)                Collect whole genome sequencing-related metrics.
    CollectRnaSeqMetrics (Picard)                Produces RNA alignment metrics for a SAM or BAM file.
    CollectRrbsMetrics (Picard)                  Collects metrics from reduced representation bisulfite sequencing (Rrbs) data.
    CollectSamErrorMetrics (Picard)              Program to collect error metrics on bases stratified in various ways.
    CollectSequencingArtifactMetrics (Picard)    Collect metrics to quantify single-base sequencing artifacts.
    CollectTargetedPcrMetrics (Picard)           Calculate PCR-related metrics from targeted sequencing data.
    CollectVariantCallingMetrics (Picard)        Collects per-sample and aggregate (spanning all samples) metrics from the provided VCF file
    CollectWgsMetrics (Picard)                   Collect metrics about coverage and performance of whole genome sequencing (WGS) experiments.
    CollectWgsMetricsWithNonZeroCoverage (Picard)(EXPERIMENTAL Tool) Collect metrics about coverage and performance of whole genome sequencing (WGS) experiments.
    CompareBaseQualities                         Compares the base qualities of two SAM/BAM/CRAM files
    CompareDuplicatesSpark                       (BETA Tool) Determine if two potentially identical BAMs have the same duplicate reads
    CompareMetrics (Picard)                      Compare two metrics files.
    CompareSAMs (Picard)                         Compare two input ".sam" or ".bam" files.
    ConvertSequencingArtifactToOxoG (Picard)     Extract OxoG metrics from generalized artifacts metrics.
    CrosscheckFingerprints (Picard)              Checks that all data in the input files appear to have come from the same individual
    CrosscheckReadGroupFingerprints (Picard)     DEPRECATED: USE CrosscheckFingerprints.
    EstimateLibraryComplexity (Picard)           Estimates the numbers of unique molecules in a sequencing library.
    FlagStat                                     Accumulate flag statistics given a BAM file
    FlagStatSpark                                Spark tool to accumulate flag statistics
    GatherPileupSummaries                        Combine output files from GetPileupSummary in the order defined by a sequence dictionary
    GetSampleName                                Emit a single sample name
    IdentifyContaminant (Picard)                 Computes a fingerprint from the supplied SAM/BAM file, given a contamination estimate.
    MeanQualityByCycle (Picard)                  Collect mean quality by cycle.
    MeanQualityByCycleSpark                      (BETA Tool) MeanQualityByCycle on Spark
    QualityScoreDistribution (Picard)            Chart the distribution of quality scores.
    QualityScoreDistributionSpark                (BETA Tool) QualityScoreDistribution on Spark
    ValidateSamFile (Picard)                     Validates a SAM or BAM file.
    ViewSam (Picard)                             Prints a SAM or BAM file to the screen

--------------------------------------------------------------------------------------
Genotyping Arrays Manipulation:                  Tools that manipulate data generated by Genotyping arrays
    CombineGenotypingArrayVcfs (Picard)          Program to combine multiple genotyping array VCF files into one VCF.
    CreateVerifyIDIntensityContaminationMetricsFile (Picard)    Program to generate a picard metrics file from the output of the VerifyIDIntensity tool.
    GtcToVcf (Picard)                            Program to convert a GTC file to a VCF
    MergePedIntoVcf (Picard)                     Program to merge a single-sample ped file from zCall into a single-sample VCF.
    VcfToAdpc (Picard)                           Program to convert an Arrays VCF to an ADPC file.

--------------------------------------------------------------------------------------
Intervals Manipulation:                          Tools that process genomic intervals in various formats
    BedToIntervalList (Picard)                   Converts a BED file to a Picard Interval List.
    CompareIntervalLists                         Compare two interval lists for equality
    IntervalListToBed (Picard)                   Converts an Picard IntervalList file to a BED file.
    IntervalListTools (Picard)                   A tool for performing various IntervalList manipulations
    LiftOverIntervalList (Picard)                Lifts over an interval list from one reference build to another.
    PreprocessIntervals                          Prepares bins for coverage collection
    SplitIntervals                               Split intervals into sub-interval files.

--------------------------------------------------------------------------------------
Metagenomics:                                    Tools that perform metagenomic analysis, e.g. microbial community composition and pathogen detection
    PathSeqBuildKmers                            Builds set of host reference k-mers
    PathSeqBuildReferenceTaxonomy                Builds a taxonomy datafile of the microbe reference
    PathSeqBwaSpark                              Step 2: Aligns reads to the microbe reference
    PathSeqFilterSpark                           Step 1: Filters low quality, low complexity, duplicate, and host reads
    PathSeqPipelineSpark                         Combined tool that performs all steps: read filtering, microbe reference alignment, and abundance scoring
    PathSeqScoreSpark                            Step 3: Classifies pathogen-aligned reads and generates abundance scores

--------------------------------------------------------------------------------------
Methylation-Specific Tools:                      Tools that perform methylation calling, processing bisulfite sequenced, methylation-aware aligned BAM
    MethylationTypeCaller                        (EXPERIMENTAL Tool) Identify methylated bases from bisulfite sequenced, methylation-aware BAMs

--------------------------------------------------------------------------------------
Other:                                           Miscellaneous tools, e.g. those that aid in data streaming
    CreateHadoopBamSplittingIndex                (BETA Tool) Create a Hadoop BAM splitting index
    FifoBuffer (Picard)                          Provides a large, FIFO buffer that can be used to buffer input and output streams between programs.
    GatherBQSRReports                            Gathers scattered BQSR recalibration reports into a single file
    GatherTranches                               (BETA Tool) Gathers scattered VQSLOD tranches into a single file
    IndexFeatureFile                             Creates an index for a feature file, e.g. VCF or BED file.
    ParallelCopyGCSDirectoryIntoHDFSSpark        (BETA Tool) Parallel copy a file or directory from Google Cloud Storage into the HDFS file system used by Spark
    PrintBGZFBlockInformation                    (EXPERIMENTAL Tool) Print information about the compressed blocks in a BGZF format file

--------------------------------------------------------------------------------------
Read Data Manipulation:                          Tools that manipulate read data in SAM, BAM or CRAM format
    AddCommentsToBam (Picard)                    Adds comments to the header of a BAM file.
    AddOATag (Picard)                            Record current alignment information to OA tag.
    AddOrReplaceReadGroups (Picard)              Assigns all the reads in a file to a single new read-group.
    AddOriginalAlignmentTags                     (EXPERIMENTAL Tool) Adds Original Alignment tag and original mate contig tag
    ApplyBQSR                                    Apply base quality score recalibration
    ApplyBQSRSpark                               (BETA Tool) Apply base quality score recalibration on Spark
    BQSRPipelineSpark                            (BETA Tool) Both steps of BQSR (BaseRecalibrator and ApplyBQSR) on Spark
    BamToBfq (Picard)                            Converts a BAM file into a BFQ (binary fastq formatted) file
    BaseRecalibrator                             Generates recalibration table for Base Quality Score Recalibration (BQSR)
    BaseRecalibratorSpark                        (BETA Tool) Generate recalibration table for Base Quality Score Recalibration (BQSR) on Spark
    BuildBamIndex (Picard)                       Generates a BAM index ".bai" file.
    BwaAndMarkDuplicatesPipelineSpark            (BETA Tool) Takes name-sorted file and runs BWA and MarkDuplicates.
    BwaSpark                                     (BETA Tool) Align reads to a given reference using BWA on Spark
    CleanSam (Picard)                            Cleans the provided SAM/BAM, soft-clipping beyond-end-of-reference alignments and setting MAPQ to 0 for unmapped reads
    ClipReads                                    Clip reads in a SAM/BAM/CRAM file
    CollectDuplicateMetrics (Picard)             Collect Duplicate metrics from marked file.
    ConvertHeaderlessHadoopBamShardToBam         (BETA Tool) Convert a headerless BAM shard into a readable BAM
    DownsampleByDuplicateSet                     (BETA Tool) Discard a set fraction of duplicate sets from a UMI-grouped bam
    DownsampleSam (Picard)                       Downsample a SAM or BAM file.
    ExtractOriginalAlignmentRecordsByNameSpark   (BETA Tool) Subsets reads by name
    FastqToSam (Picard)                          Converts a FASTQ file to an unaligned BAM or SAM file
    FilterSamReads (Picard)                      Subsets reads from a SAM or BAM file by applying one of several filters.
    FixMateInformation (Picard)                  Verify mate-pair information between mates and fix if needed.
    FixMisencodedBaseQualityReads                Fix Illumina base quality scores in a SAM/BAM/CRAM file
    GatherBamFiles (Picard)                      Concatenate efficiently BAM files that resulted from a scattered parallel analysis
    LeftAlignIndels                              Left-aligns indels from reads in a SAM/BAM/CRAM file
    MarkDuplicates (Picard)                      Identifies duplicate reads.
    MarkDuplicatesSpark                          MarkDuplicates on Spark
    MarkDuplicatesWithMateCigar (Picard)         Identifies duplicate reads, accounting for mate CIGAR.
    MergeBamAlignment (Picard)                   Merge alignment data from a SAM or BAM with data in an unmapped BAM file.
    MergeSamFiles (Picard)                       Merges multiple SAM and/or BAM files into a single file.
    PositionBasedDownsampleSam (Picard)          Downsample a SAM or BAM file to retain a subset of the reads based on the reads location in each tile in the flowcell.
    PrintReads                                   Print reads in the SAM/BAM/CRAM file
    PrintReadsHeader                             Print the header from a SAM/BAM/CRAM file
    PrintReadsSpark                              PrintReads on Spark
    ReorderSam (Picard)                          Reorders reads in a SAM or BAM file to match ordering in a second reference file.
    ReplaceSamHeader (Picard)                    Replaces the SAMFileHeader in a SAM or BAM file.
    RevertBaseQualityScores                      Revert Quality Scores in a SAM/BAM/CRAM file
    RevertOriginalBaseQualitiesAndAddMateCigar (Picard)Reverts the original base qualities and adds the mate cigar tag to read-group BAMs
    RevertSam (Picard)                           Reverts SAM or BAM files to a previous state.
    RevertSamSpark                               (BETA Tool) Reverts SAM or BAM files to a previous state.
    SamFormatConverter (Picard)                  Convert a BAM file to a SAM file, or a SAM to a BAM
    SamToFastq (Picard)                          Converts a SAM or BAM file to FASTQ.
    SamToFastqWithTags (Picard)                  Converts a SAM or BAM file to FASTQ alongside FASTQs created from tags.
    SetNmAndUqTags (Picard)                      DEPRECATED: Use SetNmMdAndUqTags instead.
    SetNmMdAndUqTags (Picard)                    Fixes the NM, MD, and UQ tags in a SAM file
    SimpleMarkDuplicatesWithMateCigar (Picard)   (EXPERIMENTAL Tool) Examines aligned records in the supplied SAM or BAM file to locate duplicate molecules.
    SortSam (Picard)                             Sorts a SAM or BAM file
    SortSamSpark                                 (BETA Tool) SortSam on Spark (works on SAM/BAM/CRAM)
    SplitNCigarReads                             Split Reads with N in Cigar
    SplitReads                                   Outputs reads from a SAM/BAM/CRAM by read group, sample and library name
    SplitSamByLibrary (Picard)                   Splits a SAM or BAM file into individual files by library
    SplitSamByNumberOfReads (Picard)             Splits a SAM or BAM file to multiple BAMs.
    UmiAwareMarkDuplicatesWithMateCigar (Picard) (EXPERIMENTAL Tool) Identifies duplicate reads using information from read positions and UMIs.
    UnmarkDuplicates                             Clears the 0x400 duplicate SAM flag

--------------------------------------------------------------------------------------
Reference:                                       Tools that analyze and manipulate FASTA format references
    BaitDesigner (Picard)                        Designs oligonucleotide baits for hybrid selection reactions.
    BwaMemIndexImageCreator                      Create a BWA-MEM index image file for use with GATK BWA tools
    CountBasesInReference                        Count the numbers of each base in a reference file
    CreateSequenceDictionary (Picard)            Creates a sequence dictionary for a reference sequence.
    ExtractSequences (Picard)                    Subsets intervals from a reference sequence to a new FASTA file.
    FastaAlternateReferenceMaker                 Create an alternative reference by combining a fasta with a vcf.
    FastaReferenceMaker                          Create snippets of a fasta file
    FindBadGenomicKmersSpark                     (BETA Tool) Identifies sequences that occur at high frequency in a reference
    NonNFastaSize (Picard)                       Counts the number of non-N bases in a fasta file.
    NormalizeFasta (Picard)                      Normalizes lines of sequence in a FASTA file to be of the same length.
    ScatterIntervalsByNs (Picard)                Writes an interval list created by splitting a reference at Ns.

--------------------------------------------------------------------------------------
Short Variant Discovery:                         Tools that perform variant calling and genotyping for short variants (SNPs, SNVs and Indels)
    CombineGVCFs                                 Merges one or more HaplotypeCaller GVCF files into a single GVCF with appropriate annotations
    GenomicsDBImport                             Import VCFs to GenomicsDB
    GenotypeGVCFs                                Perform joint genotyping on one or more samples pre-called with HaplotypeCaller
    GnarlyGenotyper                              (BETA Tool) Perform "quick and dirty" joint genotyping on one or more samples pre-called with HaplotypeCaller
    HaplotypeCaller                              Call germline SNPs and indels via local re-assembly of haplotypes
    HaplotypeCallerSpark                         (BETA Tool) HaplotypeCaller on Spark
    LearnReadOrientationModel                    Get the maximum likelihood estimates of artifact prior probabilities in the orientation bias mixture model filter
    MergeMutectStats                             Merge the stats output by scatters of a single Mutect2 job
    Mutect2                                      Call somatic SNVs and indels via local assembly of haplotypes
    ReadsPipelineSpark                           (BETA Tool) Runs BWA (if specified), MarkDuplicates, BQSR, and HaplotypeCaller on unaligned or aligned reads to generate a VCF.

--------------------------------------------------------------------------------------
Structural Variant Discovery:                    Tools that detect structural variants
    CpxVariantReInterpreterSpark                 (BETA Tool) (Internal) Tries to extract simple variants from a provided GATK-SV CPX.vcf
    DiscoverVariantsFromContigAlignmentsSAMSpark (BETA Tool) (Internal) Examines aligned contigs from local assemblies and calls structural variants
    ExtractSVEvidenceSpark                       (BETA Tool) (Internal) Extracts evidence of structural variations from reads
    FindBreakpointEvidenceSpark                  (BETA Tool) (Internal) Produces local assemblies of genomic regions that may harbor structural variants
    PairedEndAndSplitReadEvidenceCollection      (BETA Tool) Gathers paired-end and split read evidence files for use in the GATK-SV pipeline.
    StructuralVariationDiscoveryPipelineSpark    (BETA Tool) Runs the structural variation discovery workflow on a single sample
    SvDiscoverFromLocalAssemblyContigAlignmentsSpark    (BETA Tool) (Internal) Examines aligned contigs from local assemblies and calls structural variants or their breakpoints

--------------------------------------------------------------------------------------
Variant Evaluation and Refinement:               Tools that evaluate and refine variant calls, e.g. with annotations not offered by the engine
    AlleleFrequencyQC                            (BETA Tool) General-purpose tool for variant evaluation (% in dbSNP, genotype concordance, Ti/Tv ratios, and a lot more)
    AnnotateVcfWithBamDepth                      (Internal) Annotate a vcf with a bam's read depth at each variant locus
    AnnotateVcfWithExpectedAlleleFraction        (Internal) Annotate a vcf with expected allele fractions in pooled sequencing
    CalculateGenotypePosteriors                  Calculate genotype posterior probabilities given family and/or known population genotypes
    CalculateMixingFractions                     (Internal) Calculate proportions of different samples in a pooled bam
    Concordance                                  Evaluate concordance of an input VCF against a validated truth VCF
    CountFalsePositives                          (BETA Tool) Count PASS variants
    CountVariants                                Counts variant records in a VCF file, regardless of filter status.
    CountVariantsSpark                           CountVariants on Spark
    EvaluateInfoFieldConcordance                 (BETA Tool) Evaluate concordance of info fields in an input VCF against a validated truth VCF
    FilterFuncotations                           (EXPERIMENTAL Tool) Filter variants based on clinically-significant Funcotations.
    FindMendelianViolations (Picard)             Finds mendelian violations of all types within a VCF
    FuncotateSegments                            (BETA Tool) Functional annotation for segment files.  The output formats are not well-defined and subject to change.
    Funcotator                                   Functional Annotator
    FuncotatorDataSourceDownloader               Data source downloader for Funcotator.
    GenotypeConcordance (Picard)                 Calculates the concordance between genotype data of one sample in each of two VCFs - truth (or reference) vs. calls.
    MergeMutect2CallsWithMC3                     (EXPERIMENTAL Tool) UNSUPPORTED.  FOR EVALUATION ONLY. Merge M2 calls with MC
    ValidateBasicSomaticShortMutations           (EXPERIMENTAL Tool) Check variants against tumor-normal bams representing the same samples, though not the ones from the actual calls.
    ValidateVariants                             Validate VCF
    VariantEval                                  (BETA Tool) General-purpose tool for variant evaluation (% in dbSNP, genotype concordance, Ti/Tv ratios, and a lot more)
    VariantsToTable                              Extract fields from a VCF file to a tab-delimited table

--------------------------------------------------------------------------------------
Variant Filtering:                               Tools that filter variants by annotating the FILTER column
    ApplyVQSR                                     Apply a score cutoff to filter variants based on a recalibration table
    CNNScoreVariants                             Apply a Convolutional Neural Net to filter annotated variants
    CNNVariantTrain                              (EXPERIMENTAL Tool) Train a CNN model for filtering variants
    CNNVariantWriteTensors                       (EXPERIMENTAL Tool) Write variant tensors for training a CNN to filter variants
    CreateSomaticPanelOfNormals                  (BETA Tool) Make a panel of normals for use with Mutect2
    FilterAlignmentArtifacts                     (EXPERIMENTAL Tool) Filter alignment artifacts from a vcf callset.
    FilterMutectCalls                            Filter somatic SNVs and indels called by Mutect2
    FilterVariantTranches                        Apply tranche filtering
    FilterVcf (Picard)                           Hard filters a VCF.
    MTLowHeteroplasmyFilterTool                  If too many low het sites, filter all low het sites
    NuMTFilterTool                               Uses the median autosomal coverage and the allele depth to determine whether the allele might be a NuMT
    VariantFiltration                            Filter variant calls based on INFO and/or FORMAT annotations
    VariantRecalibrator                          Build a recalibration model to score variant quality for filtering purposes

--------------------------------------------------------------------------------------
Variant Manipulation:                            Tools that manipulate variant call format (VCF) data
    FixVcfHeader (Picard)                        Replaces or fixes a VCF header.
    GatherVcfs (Picard)                          Gathers multiple VCF files from a scatter operation into a single VCF file
    GatherVcfsCloud                              (BETA Tool) Gathers multiple VCF files from a scatter operation into a single VCF file
    LeftAlignAndTrimVariants                     Left align and trim vairants
    LiftoverVcf (Picard)                         Lifts over a VCF file from one reference build to another.
    MakeSitesOnlyVcf (Picard)                    Creates a VCF that contains all the site-level information for all records in the input VCF but no genotype information.
    MakeVcfSampleNameMap (Picard)                Creates a TSV from sample name to VCF/GVCF path, with one line per input.
    MergeVcfs (Picard)                           Combines multiple variant files into a single variant file
    PrintVariantsSpark                           Prints out variants from the input VCF.
    RemoveNearbyIndels                           (Internal) Remove indels from the VCF file that are close to each other.
    RenameSampleInVcf (Picard)                   Renames a sample within a VCF or BCF.
    SelectVariants                               Select a subset of variants from a VCF file
    SortVcf (Picard)                             Sorts one or more VCF files.
    SplitVcfs (Picard)                           Splits SNPs and INDELs into separate files.
    UpdateVCFSequenceDictionary                  Updates the sequence dictionary in a variant file.
    UpdateVcfSequenceDictionary (Picard)         Takes a VCF and a second file that contains a sequence dictionary and updates the VCF with the new sequence dictionary.
    VariantAnnotator                             Tool for adding annotations to VCF files
    VcfFormatConverter (Picard)                  Converts VCF to BCF or BCF to VCF.
    VcfToIntervalList (Picard)                   Converts a VCF or BCF file to a Picard Interval List

--------------------------------------------------------------------------------------

可以看出来这个软件太强大了。这里有一篇非常好,非常详细的文章,可以让GATK小白们对GATK的分析流程有一个大体的了解:从零开始完整学习全基因组测序(WGS)数据分析:第4节 构建WGS主流程。

在之后的笔记里会记录对GATK的后续学习。

你可能感兴趣的:(GATK的初次了解)