Hisat2, Bowtie, Bowtie2和BWA构建基因组索引

1.软件安装

conda install -c bioconda hisat2
conda install -c bioconda bowtie
conda install -c bioconda bowtie2
conda install -c bioconda bwa

2 构建基因组索引文件

2.1 Hisat2

hisat2-build -p 2 genome.fa genome

hisat2-build不支持基因组以压缩文件的形式输入,运行完成后,生成8个后缀名为ht2的文件。

使用帮助

hisat2-build  --usage
Usage: hisat2-build [options]*  
    reference_in            comma-separated list of files with ref sequences
    hisat2_index_base       write ht2 data to files with this dir/basename
Options:
    -c                      reference sequences given on cmd line (as
                            )
    --large-index           force generated index to be 'large', even if ref
                            has fewer than 4 billion nucleotides
    -a/--noauto             disable automatic -p/--bmax/--dcv memory-fitting
    -p                      number of threads
    --bmax             max bucket sz for blockwise suffix-array builder
    --bmaxdivn         max bucket sz as divisor of ref len (default: 4)
    --dcv              diff-cover period for blockwise (default: 1024)
    --nodc                  disable diff-cover (algorithm becomes quadratic)
    -r/--noref              don't build .3/.4.ht2 (packed reference) portion
    -3/--justref            just build .3/.4.ht2 (packed reference) portion
    -o/--offrate       SA is sampled every 2^offRate BWT chars (default: 5)
    -t/--ftabchars     # of chars consumed in initial lookup (default: 10)
    --localoffrate     SA (local) is sampled every 2^offRate BWT chars (default: 3)
    --localftabchars   # of chars consumed in initial lookup in a local index (default: 6)
    --snp             SNP file name
    --haplotype       haplotype file name
    --ss              Splice site file name
    --exon            Exon file name
    --seed             seed for random number generator
    -q/--quiet              verbose output (for debugging)
    -h/--help               print detailed description of tool and its options
    --usage                 print this usage message
    --version               print version information and quit

2.2 BWA

bwa index -p genome genome.fa

索引建立好之后,会生成5个文件,后缀分别为bwt,pac,ann,amb,sa

使用帮助

bwa index
Usage:   bwa index [options] 

Options: -a STR    BWT construction algorithm: bwtsw, is or rb2 [auto]
         -p STR    prefix of the index [same as fasta name]
         -b INT    block size for the bwtsw algorithm (effective with -a bwtsw) [10000000]
         -6        index files named as .64.* instead of .* 

Warning: `-a bwtsw' does not work for short genomes, while `-a is' and
         `-a div' do not work not for long genomes.

2.3 Bowtie

bowtie1出现的早,所以对于测序长度在50 bp以下的序列效果不错

bowtie-build --threads 2 genome.fa genome

使用帮助

bowtie-build  --usage
Usage: bowtie-build [options]*  
    reference_in            comma-separated list of files with ref sequences
    ebwt_outfile_base       write Ebwt data to files with this dir/basename
Options:
    -f                      reference files are Fasta (default)
    -c                      reference sequences given on cmd line (as )
    --large-index           force generated index to be 'large', even if ref
                            has fewer than 4 billion nucleotides
    -C/--color              build a colorspace index
    -a/--noauto             disable automatic -p/--bmax/--dcv memory-fitting
    -p/--packed             use packed strings internally; slower, uses less mem
    --bmax             max bucket sz for blockwise suffix-array builder
    --bmaxdivn         max bucket sz as divisor of ref len (default: 4)
    --dcv              diff-cover period for blockwise (default: 1024)
    --nodc                  disable diff-cover (algorithm becomes quadratic)
    -r/--noref              don't build .3/.4.ebwt (packed reference) portion
    -3/--justref            just build .3/.4.ebwt (packed reference) portion
    -o/--offrate       SA is sampled every 2^offRate BWT chars (default: 5)
    -t/--ftabchars     # of chars consumed in initial lookup (default: 10)
    --threads          # of threads
    --ntoa                  convert Ns in reference to As
    --seed             seed for random number generator
    -q/--quiet              verbose output (for debugging)
    -h/--help               print detailed description of tool and its options
    --usage                 print this usage message
    --version               print version information and quit

2.4 Bowtie2

bowtie2-build  --threads 2 genome.fa genome

生成4个后缀名为ht2的文件。

使用帮助

bowtie2-build  --usage
Usage: bowtie2-build [options]*  
    reference_in            comma-separated list of files with ref sequences
    bt2_index_base          write bt2 data to files with this dir/basename
*** Bowtie 2 indexes work only with v2 (not v1).  Likewise for v1 indexes. ***
Options:
    -f                      reference files are Fasta (default)
    -c                      reference sequences given on cmd line (as
                            )
    --large-index           force generated index to be 'large', even if ref
                            has fewer than 4 billion nucleotides
    --debug                 use the debug binary; slower, assertions enabled
    --sanitized             use sanitized binary; slower, uses ASan and/or UBSan
    --verbose               log the issued command
    -a/--noauto             disable automatic -p/--bmax/--dcv memory-fitting
    -p/--packed             use packed strings internally; slower, less memory
    --bmax             max bucket sz for blockwise suffix-array builder
    --bmaxdivn         max bucket sz as divisor of ref len (default: 4)
    --dcv              diff-cover period for blockwise (default: 1024)
    --nodc                  disable diff-cover (algorithm becomes quadratic)
    -r/--noref              don't build .3/.4 index files
    -3/--justref            just build .3/.4 index files
    -o/--offrate       SA is sampled every 2^ BWT chars (default: 5)
    -t/--ftabchars     # of chars consumed in initial lookup (default: 10)
    --threads          # of threads
    --seed             seed for random number generator
    -q/--quiet              verbose output (for debugging)
    -h/--help               print detailed description of tool and its options
    --usage                 print this usage message
    --version               print version information and quit

bowtie1和2的差别:

1,bowtie1出现的早,所以对于测序长度在50bp以下的序列效果不错,而bowtie2主要针对的是长度在50bp以上的测序的。
2,Bowtie 2支持有空位的比对
3,Bowtie 2支持局部比对,也可以全局比对
4,Bowtie 2对最长序列没有要求,但是Bowtie 1最长不能超过1000bp
5, Bowtie 2 allows alignments to [overlap ambiguous characters] (e.g. Ns) in the reference. Bowtie 1 does not.
6,Bowtie 2不能比对colorspace reads.

你可能感兴趣的:(Hisat2, Bowtie, Bowtie2和BWA构建基因组索引)