2020.10.12丨二代变异检测

  • 基因或基因组上的差别
    • 基因转录翻译过程2020.10.12丨二代变异检测_第1张图片
    • 变异类型2020.10.12丨二代变异检测_第2张图片
  • Calling SNPs(Index)
    • Overall procedure
      • fig2020.10.12丨二代变异检测_第3张图片
    • step.1 mapping
      • BWA(https://github.com/lh3/bwa)
      • 建立索引
        • bwa index hg19.fa
      • 比对
        • 2020.10.12丨二代变异检测_第4张图片
      • SAM Format(https://samtools.github.io/hts-specs/)
        • fig2020.10.12丨二代变异检测_第5张图片
        • 转换BAM格式
          • samtools view –b A.sam > A.bam
        • 排序
          • samtools sort –O BAM A.bam > A.sorted.bam
        • 建立索引(bam.bai)
          • samtools index A.sorted.bam
    • step.2 Remove duplicates
      • fig2020.10.12丨二代变异检测_第6张图片
      • GATK https://github.com/broadinstitute/gatk/releases
      • code 
        • gatk MarkDuplicates \-I A.sorted.bam \ # Input BAM from alignment-O A.dedup.bam \ #Output BAM-M A.marked_dup_metrics.txt # Output metrics
    • step.3 Call SNP for each sample
      • fig32020.10.12丨二代变异检测_第7张图片
      • 3.1 Build index
        • Need two indicies:• .fai (from samtools index)• .dict (from gatk CreateSequenceDictionary)
        • samtools index hg19.fa #Creates hg19.fa.faigatk CreateSequenceDictionary \ #Creates reference.fa.dict–R hg19.fa
      • 3.2 Call SNPs for each sample using HaplotypeCaller
        • code
          • gatk HaplotypeCaller \-R hg19.fa \-I A.sorted.bam \-O A.raw.gvcf \-ERC GVCF \-ploidy 2 \ #modify based on species or sample pool
    • step.4 Combine GVCF from all the samples and genotype
      • gatk CombineGVCFs\-R hg19.fa \-O combine_variants.raw.gvcf \--variant A.raw.gvcf \--variant B.raw.gvcf \
      • gatk GenotypeGVCFs \-R hg19.fa \-O combine_variants.raw.vcf \--variant combine_variants.raw.gvcf \
      • GVCF vs VCF
        • GVCF A record for all sites (including non-variant sites)2020.10.12丨二代变异检测_第8张图片
        • VCF Only variant sites
      • 4.1 Obtaining SNP and filter
        • code
          • gatk SelectVariants \-R hg19.fa \-O combine_SNP.raw.vcf \--variant combine_variants.raw.vcf--select-type-to-include SNP
          • gatk VariantFiltration \-R hg19.fa \-O combine_SNP.filtered.vcf \--variant combine_SNP.raw.vcf \–-filter-name “snp_filter” \--filter-expression “QD < 2.0 || FS > 60.0 || SOR > 3.0 || MQ < 40.0 ||MQRankSum < -12.5”
      • 4.2 Obtaining Indel and filter
        • code
          • gatk SelectVariants \-R hg19.fa \-O combine_INDEL.raw.vcf \--variant combine_variants.raw.vcf--select-type-to-include INDEL 
          • gatk VariantFiltration \-R hg19.fa \-O combine_INDEL.filtered.vcf \--variant combine_INDEL.raw.vcf \–-filter-name “indel_filter” \--filter-expression “QD < 2.0 || FS > 200.0 || SOR > 10.0 || MQ < 40.0 ||MQRankSum < -12.5”
      • 4.3 (Rare SNP and call rate filter)
        • VCFtools
        • code
          • vcftools \--vcf combine_SNP.filtered.vcf \--max-missing 0.8 \--maf 0.05 \--minDP 4 \--out final.snp.vcf \
      • Visualize SNP and Indel on IGV
        • • Integrated Genomics Viewerhttp://software.broadinstitute.org/software/igv/
  • Calling SV
    • SV types
      • fig2020.10.12丨二代变异检测_第9张图片
    • Manta https://github.com/Illumina/manta
    • code 
      • /path/to/configManta.py \--bam ../02.dedup/A.dedup.bam \--referenceFasta ../ref/hg19.fa \--runDir A_mantaA_manta/runWorkflow.py
    • fig2020.10.12丨二代变异检测_第10张图片
    • Manta VCF2020.10.12丨二代变异检测_第11张图片
  •  

你可能感兴趣的:(变异检测,二代)