1.软件安装
conda install -c bioconda hisat2
conda install -c bioconda bowtie
conda install -c bioconda bowtie2
conda install -c bioconda bwa
2 构建基因组索引文件
2.1 Hisat2
hisat2-build -p 2 genome.fa genome
hisat2-build不支持基因组以压缩文件的形式输入,运行完成后,生成8个后缀名为ht2的文件。
使用帮助
hisat2-build --usage
Usage: hisat2-build [options]*
reference_in comma-separated list of files with ref sequences
hisat2_index_base write ht2 data to files with this dir/basename
Options:
-c reference sequences given on cmd line (as
)
--large-index force generated index to be 'large', even if ref
has fewer than 4 billion nucleotides
-a/--noauto disable automatic -p/--bmax/--dcv memory-fitting
-p number of threads
--bmax max bucket sz for blockwise suffix-array builder
--bmaxdivn max bucket sz as divisor of ref len (default: 4)
--dcv diff-cover period for blockwise (default: 1024)
--nodc disable diff-cover (algorithm becomes quadratic)
-r/--noref don't build .3/.4.ht2 (packed reference) portion
-3/--justref just build .3/.4.ht2 (packed reference) portion
-o/--offrate SA is sampled every 2^offRate BWT chars (default: 5)
-t/--ftabchars # of chars consumed in initial lookup (default: 10)
--localoffrate SA (local) is sampled every 2^offRate BWT chars (default: 3)
--localftabchars # of chars consumed in initial lookup in a local index (default: 6)
--snp SNP file name
--haplotype haplotype file name
--ss Splice site file name
--exon Exon file name
--seed seed for random number generator
-q/--quiet verbose output (for debugging)
-h/--help print detailed description of tool and its options
--usage print this usage message
--version print version information and quit
2.2 BWA
bwa index -p genome genome.fa
索引建立好之后,会生成5个文件,后缀分别为bwt,pac,ann,amb,sa
使用帮助
bwa index
Usage: bwa index [options]
Options: -a STR BWT construction algorithm: bwtsw, is or rb2 [auto]
-p STR prefix of the index [same as fasta name]
-b INT block size for the bwtsw algorithm (effective with -a bwtsw) [10000000]
-6 index files named as .64.* instead of .*
Warning: `-a bwtsw' does not work for short genomes, while `-a is' and
`-a div' do not work not for long genomes.
2.3 Bowtie
bowtie1出现的早,所以对于测序长度在50 bp以下的序列效果不错
bowtie-build --threads 2 genome.fa genome
使用帮助
bowtie-build --usage
Usage: bowtie-build [options]*
reference_in comma-separated list of files with ref sequences
ebwt_outfile_base write Ebwt data to files with this dir/basename
Options:
-f reference files are Fasta (default)
-c reference sequences given on cmd line (as )
--large-index force generated index to be 'large', even if ref
has fewer than 4 billion nucleotides
-C/--color build a colorspace index
-a/--noauto disable automatic -p/--bmax/--dcv memory-fitting
-p/--packed use packed strings internally; slower, uses less mem
--bmax max bucket sz for blockwise suffix-array builder
--bmaxdivn max bucket sz as divisor of ref len (default: 4)
--dcv diff-cover period for blockwise (default: 1024)
--nodc disable diff-cover (algorithm becomes quadratic)
-r/--noref don't build .3/.4.ebwt (packed reference) portion
-3/--justref just build .3/.4.ebwt (packed reference) portion
-o/--offrate SA is sampled every 2^offRate BWT chars (default: 5)
-t/--ftabchars # of chars consumed in initial lookup (default: 10)
--threads # of threads
--ntoa convert Ns in reference to As
--seed seed for random number generator
-q/--quiet verbose output (for debugging)
-h/--help print detailed description of tool and its options
--usage print this usage message
--version print version information and quit
2.4 Bowtie2
bowtie2-build --threads 2 genome.fa genome
生成4个后缀名为ht2的文件。
使用帮助
bowtie2-build --usage
Usage: bowtie2-build [options]*
reference_in comma-separated list of files with ref sequences
bt2_index_base write bt2 data to files with this dir/basename
*** Bowtie 2 indexes work only with v2 (not v1). Likewise for v1 indexes. ***
Options:
-f reference files are Fasta (default)
-c reference sequences given on cmd line (as
)
--large-index force generated index to be 'large', even if ref
has fewer than 4 billion nucleotides
--debug use the debug binary; slower, assertions enabled
--sanitized use sanitized binary; slower, uses ASan and/or UBSan
--verbose log the issued command
-a/--noauto disable automatic -p/--bmax/--dcv memory-fitting
-p/--packed use packed strings internally; slower, less memory
--bmax max bucket sz for blockwise suffix-array builder
--bmaxdivn max bucket sz as divisor of ref len (default: 4)
--dcv diff-cover period for blockwise (default: 1024)
--nodc disable diff-cover (algorithm becomes quadratic)
-r/--noref don't build .3/.4 index files
-3/--justref just build .3/.4 index files
-o/--offrate SA is sampled every 2^ BWT chars (default: 5)
-t/--ftabchars # of chars consumed in initial lookup (default: 10)
--threads # of threads
--seed seed for random number generator
-q/--quiet verbose output (for debugging)
-h/--help print detailed description of tool and its options
--usage print this usage message
--version print version information and quit
bowtie1和2的差别:
1,bowtie1出现的早,所以对于测序长度在50bp以下的序列效果不错,而bowtie2主要针对的是长度在50bp以上的测序的。
2,Bowtie 2支持有空位的比对
3,Bowtie 2支持局部比对,也可以全局比对
4,Bowtie 2对最长序列没有要求,但是Bowtie 1最长不能超过1000bp
5, Bowtie 2 allows alignments to [overlap ambiguous characters] (e.g. N
s) in the reference. Bowtie 1 does not.
6,Bowtie 2不能比对colorspace reads.