宏基因组:基因组组装之MEGAHIT

宏基因组:基因组组装之MEGAHIT_第1张图片
image.png
宏基因组:基因组组装之MEGAHIT_第2张图片
image.png

宏基因组:基因组组装之MEGAHIT_第3张图片
image.png

megahit软件安装

git clone https://github.com/voutcn/megahit.git
cd megahit
make

megahit使用帮助

>/data/software/MEGAHIT-1.2.4-beta-Linux-static/bin/megahit
megahit: MEGAHIT v1.2.4-betas

contact: Dinghua Li 

Usage:
  megahit [options] {-1  -2  | --12  | -r } [-o ]

  Input options that can be specified for multiple times (supporting plain text and gz/bz2 extensions)
    -1                                 comma-separated list of fasta/q paired-end #1 files, paired with files in 
    -2                                 comma-separated list of fasta/q paired-end #2 files, paired with files in 
    --12                              comma-separated list of interleaved fasta/q paired-end files
    -r/--read                           comma-separated list of fasta/q single-end files

  Input options that can be specified for at most ONE time (not recommended):
    --input-cmd                        command that outputs fasta/q reads to stdout; taken by MEGAHIT as SE reads

Optional Arguments:
  Basic assembly options:
    --min-count                        minimum multiplicity for filtering (k_min+1)-mers [2]
    --k-list                    comma-separated list of kmer size
                                            all must be odd, in the range 15-255, increment <= 28)
                                            [21,29,39,59,79,99,119,141]

  Another way to set --k-list (overrides --k-list if one of them set):
    --k-min                            minimum kmer size (<= 255), must be odd number [21]
    --k-max                            maximum kmer size (<= 255), must be odd number [141]
    --k-step                           increment of kmer size of each iteration (<= 28), must be even number [12]

  Advanced assembly options:
    --no-mercy                              do not add mercy kmers
    --bubble-level                     intensity of bubble merging (0-2), 0 to disable [2]
    --merge-level                      merge complex bubbles of length <= l*kmer_size and similarity >= s [20,0.95]
    --prune-level                      strength of low depth pruning (0-3) [2]
    --prune-depth                      remove unitigs with avg kmer depth less than this value [2]
    --low-local-ratio                ratio threshold to define low local coverage contigs [0.2]
    --max-tip-len                      remove tips less than this value [2*k]
    --no-local                              disable local assembly
    --kmin-1pass                            use 1pass mode to build SdBG of k_min

  Presets parameters:
    --presets                          override a group of parameters; possible values:
                                            meta-sensitive: '--min-count 1 --k-list 21,29,39,49,...,129,141'
                                            meta-large: '--k-min 27 --k-max 127 --k-step 10'
                                            (large & complex metagenomes, like soil)

  Hardware options:
    -m/--memory                      max memory in byte to be used in SdBG construction
                                            (if set between 0-1, fraction of the machine's total memory) [0.9]
    --mem-flag                         SdBG builder memory mode
                                            0: minimum; 1: moderate; others: use all memory specified by '-m/--memory' [1]
    -t/--num-cpu-threads               number of CPU threads [# of logical processors]
    --no-hw-accel                           run MEGAHIT without BMI2 and POPCNT hardware instructions

  Output options:
    -o/--out-dir                    output directory [./megahit_out]
    --out-prefix                    output prefix (the contig file will be OUT_DIR/OUT_PREFIX.contigs.fa)
    --min-contig-len                   minimum length of contigs to output [200]
    --keep-tmp-files                        keep all temporary files
    --tmp-dir                       set temp directory

Other Arguments:
    --continue                              continue a MEGAHIT run from its last available check point.
                                            please set the output directory correctly when using this option.
    --test                                  run MEGAHIT on a toy test dataset
    -h/--help                               print the usage message
    -v/--version                            print version

megahit使用

megahit --min-contig-len 500 --k-list 119,141 -1 test_1.fastq.gz -2 test_2.fastq.gz -o out
# --min-contig-len 500 去除小于500bp的序列
# --k-list 119,141 可以根据实际情况调整kmer的长短和个数

查看拼接结果

$ ls out
done  final.contigs.fa  intermediate_contigs  log  opts.txt
#其中 final.contigs.fa为最终组装出来的fasta文件
宏基因组:基因组组装之MEGAHIT_第4张图片
image

你可能感兴趣的:(宏基因组:基因组组装之MEGAHIT)