参考:vcftools使用手册
输出指定染色体上的位点
[sunchengquan 10:40:42 ~/scq/GWAS/vcftools_filtering]
$ vcftools --vcf genotype_id.vcf --chr A1 --recode --out A1_analysis
VCFtools - v0.1.13
(C) Adam Auton and Anthony Marcketta 2009
Parameters as interpreted:
--vcf genotype_id.vcf
--chr A1
--out A1_analysis
--recode
After filtering, kept 120 out of 120 Individuals
Outputting VCF file...
After filtering, kept 1889 out of a possible 17848 Sites
Run Time = 1.00 seconds
[sunchengquan 10:43:51 ~/scq/GWAS/vcftools_filtering]
$ grep -v '^##' A1_analysis.recode.vcf |cut -f 1-9|tail -4
A1 33781370 A1__33781370 C T 60.09 PASS . GT:AD:DP:GQ:PL
A1 33844521 A1__33844521 G T 38.44 PASS . GT:AD:DP:GQ:PL
A1 33870037 A1__33870037 AG A 2747.53 PASS . GT:AD:DP:GQ:PL
A1 34012584 A1__34012584 A G 42.21 PASS . GT:AD:DP:GQ:PL
这两个参数需要和–chr一起使用,指定要处理的一系列站点的下限和上限
[sunchengquan 10:44:17 ~/scq/GWAS/vcftools_filtering]
$ vcftools --vcf genotype_id.vcf --chr A1 --from-bp 33000000 --to-bp 33781370 --recode --out A1_analysis_pos_bp
VCFtools - v0.1.13
(C) Adam Auton and Anthony Marcketta 2009
Parameters as interpreted:
--vcf genotype_id.vcf
--chr A1
--to-bp 33781370
--out A1_analysis_pos_bp
--recode
--from-bp 33000000
After filtering, kept 120 out of 120 Individuals
Outputting VCF file...
After filtering, kept 32 out of a possible 17848 Sites
Run Time = 0.00 seconds
[sunchengquan 10:52:21 ~/scq/GWAS/vcftools_filtering]
$ grep -v '^##' A1_analysis_pos_bp.recode.vcf |cut -f 1-9|tail -4
A1 33757525 A1__33757525 C T 12582.3 PASS . GT:AD:DP:GQ:PL
A1 33767507 A1__33767507 C T 103768 PASS . GT:AD:DP:GQ:PL
A1 33773748 A1__33773748 A G 49502.1 PASS . GT:AD:DP:GQ:PL
A1 33781370 A1__33781370 C T 60.09 PASS . GT:AD:DP:GQ:PL
根据文件中的位置列表包括或排除一组位点。输入文件的每一行应包含(制表符分隔的)染色体和位置
根据bed文件,过滤vcf文件
[sunchengquan 14:29:40 ~/scq/GWAS/vcftools_filtering]
$ cut -f3 genotype_id.vcf|grep -v '^ID\|#'|head -50 > snp.list.txt
[sunchengquan 14:32:39 ~/scq/GWAS/vcftools_filtering]
$ vcftools --vcf genotype_id.vcf --snps snp.list.txt --recode --recode-INFO-all --out subset.snp
[sunchengquan 14:32:39 ~/scq/GWAS/vcftools_filtering]
$ grep -v '^#' subset.snp.recode.vcf |wc -l
50
[sunchengquan 14:39:40 ~/scq/GWAS/vcftools_filtering]
vcftools --vcf genotype_id.vcf --keep-only-indels --recode --recode-INFO-all --out genotype_id_indel
[sunchengquan 14:40:40 ~/scq/GWAS/vcftools_filtering]
vcftools --vcf genotype_id.vcf --remove-indels --recode --recode-INFO-all --out genotype_id_snp
vcftools --vcf genotype_id.vcf --maf 0.05 --min-alleles 2 --max-alleles 2 --recode --recode-INFO-all --out genotype.id.maf0.05.allele2
[sunchengquan 14:41:40 ~/scq/GWAS/vcftools_filtering]
vcftools --vcf genotype_id.vcf --minDP 3 --maxDP 100 --min-meanDP 3 --recode-INFO-all --recode --out genotype_id_filter_DP
[sunchengquan 14:42:40 ~/scq/GWAS/vcftools_filtering]
vcftools --vcf genotype_id.vcf --remove-indels --max-missing 0.8 --maf 0.05 --min-alleles 2 --max-alleles 2 --hwe 0.01 --recode --recode-INFO-all --out genotype.id.snp.hwe0.01
[sunchengquan 14:43:40 ~/scq/GWAS/vcftools_filtering]
vcftools --vcf genotype_id.vcf --max-missing 0.8 --maf 0.05 --min-alleles 2 --max-alleles 2 --recode --recode-INFO-all --out genotype.id.int0.8maf0.05.allele2
[sunchengquan 14:43:40 ~/scq/GWAS/vcftools_filtering]
grep "#CHROM" genotype_id.vcf |cut -f 10-50 |tr '\t' '\n' > sample_id.txt
[sunchengquan 14:44:40 ~/scq/GWAS/vcftools_filtering]
vcftools --vcf genotype_id.vcf --keep sample_id.txt --recode --recode-INFO-all --out subset.sample