捕获区域测序数据找变异

这次数据共8个样本,它们的接头一一对应于A01-H01号Index,捕获11个特定的基因。文件大小如下;

407M Aug 19 22:05 1_S1_L001_R1_001.fastq.gz
434M Aug 19 22:00 1_S1_L001_R2_001.fastq.gz
392M Aug 19 22:05 2_S2_L001_R1_001.fastq.gz
416M Aug 19 22:00 2_S2_L001_R2_001.fastq.gz
 34M Aug 19 22:05 3_S3_L001_R1_001.fastq.gz
 36M Aug 19 22:00 3_S3_L001_R2_001.fastq.gz
387M Aug 19 22:05 4_S4_L001_R1_001.fastq.gz
412M Aug 19 22:00 4_S4_L001_R2_001.fastq.gz
475M Aug 19 22:05 5_S5_L001_R1_001.fastq.gz
502M Aug 19 22:00 5_S5_L001_R2_001.fastq.gz
 37M Aug 19 22:05 6_S6_L001_R1_001.fastq.gz
 38M Aug 19 22:00 6_S6_L001_R2_001.fastq.gz
 35M Aug 19 22:05 7_S7_L001_R1_001.fastq.gz
 37M Aug 19 22:00 7_S7_L001_R2_001.fastq.gz
 38M Aug 19 22:05 8_S8_L001_R1_001.fastq.gz
 41M Aug 19 22:00 8_S8_L001_R2_001.fastq.gz

这里的捕获芯片选择的是SureSelect XT Target Enrichment System for Illumina Paired-End Multiplexed Sequencing Library

首先进行QC

ls *gz |xargs ~/biosoft/fastqc/FastQC/fastqc -t 10
mkdir fastqc_results
mv *_fastqc.zip *html fastqc_results/
cd fastqc_results
multiqc ./

得到的QC文件夹就可以拿到自己的电脑查看啦。测序没什么问题就可以直接比对啦。

进行GATK标准流程

sbatch target_variation.sh 1_S1_L001_R1_001.fastq.gz    1_S1_L001_R2_001.fastq.gz   1
sbatch target_variation.sh 2_S2_L001_R1_001.fastq.gz    2_S2_L001_R2_001.fastq.gz   2
sbatch target_variation.sh 3_S3_L001_R1_001.fastq.gz    3_S3_L001_R2_001.fastq.gz   3
sbatch target_variation.sh 4_S4_L001_R1_001.fastq.gz    4_S4_L001_R2_001.fastq.gz   4
sbatch target_variation.sh 5_S5_L001_R1_001.fastq.gz    5_S5_L001_R2_001.fastq.gz   5
sbatch target_variation.sh 6_S6_L001_R1_001.fastq.gz    6_S6_L001_R2_001.fastq.gz   6
sbatch target_variation.sh 7_S7_L001_R1_001.fastq.gz    7_S7_L001_R2_001.fastq.gz   7
sbatch target_variation.sh 8_S8_L001_R1_001.fastq.gz    8_S8_L001_R2_001.fastq.gz   8

就会得到基本的bam格式的比对文件,以及vcf文件格式的SNV和INDEL。

比对的QC

暂时不能确定是一种什么情况下的捕获测序,所以先把它当做是WES来进行简单的QC试试看:

ls *_recal.bam |while read id
do
file=$(basename $id )
sample=${file%%.*} 
echo $sample
bedtools coverage  -hist   -abam $id  \
-b ~/annotation/CCDS/human/hg19_exon.bed |grep '^all' >${sample}.exome.coverage.hist.txt
done 

variation的过滤及注释

你可能感兴趣的:(捕获区域测序数据找变异)