GTAK4分析篇 -- 1.数据前处理

最新版的GATK4跟之前GATK2/3用法上很大的不同,下面是改版后的代码

1.创建bwa比对的index

bwa index -a bwtsw -p hg38 /home/NGS/gatk_ref/Homo_sapiens_assembly38.fasta

2.fastqc测序质量监控

fastqc /home/NGS/raw_data/A921C10.R1.fastq.gz /home/NGS/raw_data/A921C10.R2.fastq.gz  -t 2 --outdir /home/NGS/1.fastqc/

3.去除低质量reads和接头

trim_galore -q 30   --phred33 --length 100 --stringency 3 --paired -o /home/NGS/2.trim-galore/ /home/NGS/raw_data/A921C10.R1.fastq.gz /home/NGS/raw_data/A921C10.R2.fastq.gz

4.比对参考基因组 + 按坐标排序

bwa mem /home/NGS/bwa_index/human /home/NGS/raw_data/A921C10.R1_trimmed.fq.gz /home/NGS/raw_data/A921C10.R2_trimmed.fq.gz -t 4 -R "@RG\tID:A921C10\tSM:A921C10\tLB:WGS\tPL:Illumina" | samtools sort -@ 4 -O BAM -o A921C10.bam

5.创建fasta的dict(用于GATK)

/home/gatk-4.1.4.1/gatk CreateSequenceDictionary -R /home/NGS/gatk_ref/Homo_sapiens_assembly38.fasta -O Homo_sapiens_assembly38.dict

6.创建fasta的fai(用于GATK)

samtools faidx /home/NGS/gatk_ref/Homo_sapiens_assembly38.fasta

7.统计比对结果

java -jar /home/picard/picard/build/libs/picard.jar CollectAlignmentSummaryMetrics R=/home/NGS/gatk_ref/Homo_sapiens.GRCh38.dna.primary_assembly.fa I=/home/NGS/3.bwa+sort/A921C10.bam  O=./alignment_metrics.txt
java -jar /home/picard/picard/build/libs/picard.jar CollectInsertSizeMetrics INPUT=/home/NGS/3.bwa+sort/A921C10.bam OUTPUT=./insert_metrics.txt HISTOGRAM_FILE=insert_size_histogram.pdf
samtools depth -a /home/NGS/3.bwa+sort/A921C10.bam > depth_out.txt

8.标记重复序列(PCR)

java -jar /home/picard/picard/build/libs/picard.jar MarkDuplicates I=/home/NGS/3.bwa+sort/A921C10.bam M=./metrics.txt O=./A921C10_dedup.bam

9.修正且确保读取和匹配是同步的

java -jar /home/picard/picard/build/libs/picard.jar FixMateInformation I=/home/NGS/3.bwa+sort/A921C10.bam O=./A921C10_dedup_fix.bam ADD_MATE_CIGAR=true

10.校正碱基质量得分

input="/home/NGS/5.MarkDuplicates/A921C10_dedup.bam"
output="/home/NGS/7.Base_QualityScore_Recalibration"
ref="/home/NGS/gatk_ref/Homo_sapiens_assembly38.fasta"
snp="/home/NGS/gatk_ref/dbsnp_146.hg38.vcf"
indel="/home/NGS/gatk_ref/Mills_and_1000G_gold_standard.indels.hg38.vcf"
gatk="/home/gatk-4.1.4.1/gatk"
    
$gatk BaseRecalibrator -R $ref -I $input -O $ouput/bqsr.table --known-sites $snp --known-sites $indel
echo -e "\n\n\nBaseRecalibrator has finished!!!!!!!!\n\n\n"
    
$gatk ApplyBQSR -R $ref -I $input  -bqsr-recal-file bqsr.table  -O $ouput/A921C10_dedup_fix_bqsr.bam
echo -e "\n\n\nApplyBQSR has finished!!!!!!!!\n\n\n"

注意:所用的fasta和vcf最好在同一个平台下载,实践时出现了一个错误:fasta的染色体标记和vcf不一样。有的是1、2、Y、X、M,而有的是chr1、chr2、chrX、chrY、chrM,比对是就会没有overlap的reads

你可能感兴趣的:(GTAK4分析篇 -- 1.数据前处理)