






checkm taxonomy_wf -h 显示帮助

usage: checkm taxonomy_wf [-h] [--ali] [--nt] [-g] [--individual_markers]
                          [--aai_strain AAI_STRAIN] [-a ALIGNMENT_FILE]
                          [--ignore_thresholds] [-e E_VALUE] [-l LENGTH]
                          [-c COVERAGE_FILE] [-f FILE] [--tab_table]
                          [-x EXTENSION] [-t THREADS] [-q] [--tmpdir TMPDIR]
                          taxon bin_folder out_folder

Runs taxon_set, analyze, qa

positional arguments:
                        taxonomic rank
  taxon                 taxon of interest
  bin_folder            folder containing bins (fasta format)
  out_folder            folder to write output files

optional arguments:
  -h, --help            show this help message and exit
  --ali                 generate HMMER alignment file for each bin
  --nt                  generate nucleotide gene sequences for each bin
  -g, --genes           bins contain genes as amino acids instead of nucleotide contigs
  --individual_markers  treat marker as independent (i.e., ignore co-located set structure)
                        do not exclude adjacent marker genes when estimating contamination
                        skip identification and filtering of pseudogenes
  --aai_strain AAI_STRAIN
                        AAI threshold used to identify strain heterogeneity (default: 0.9)
  -a, --alignment_file ALIGNMENT_FILE
                        produce file showing alignment of multi-copy genes and their AAI identity
  --ignore_thresholds   ignore model-specific score thresholds
  -e, --e_value E_VALUE
                        e-value cut off (default: 1e-10)
  -l, --length LENGTH   percent overlap between target and query (default: 0.7)
  -c, --coverage_file COVERAGE_FILE
                        file containing coverage of each sequence; coverage information added to table type 2 (see coverage command)
  -f, --file FILE       print results to file (default: stdout)
  --tab_table           print tab-separated values table
  -x, --extension EXTENSION
                        extension of bins (other files in folder are ignored) (default: fna)
  -t, --threads THREADS
                        number of threads (default: 1)
  -q, --quiet           suppress console output
  --tmpdir TMPDIR       specify an alternative directory for temporary files

Example: checkm taxonomy_wf domain Bacteria ./bins ./output

[2019-10-17 02:51:12] INFO: CheckM v1.0.18
[2019-10-17 02:51:12] INFO: checkm taxonomy_wf domain Bacteria . out
[2019-10-17 02:51:12] INFO: [CheckM - taxon_set] Generate taxonomic-specific marker set.
[2019-10-17 02:51:15] INFO: Marker set for Bacteria contains 104 marker genes arranged in 58 sets.
[2019-10-17 02:51:15] INFO: Marker set inferred from 5449 reference genomes.
[2019-10-17 02:51:15] INFO: Marker set written to: out/
[2019-10-17 02:51:15] INFO: { Current stage: 0:00:03.175 || Total: 0:00:03.175 }
[2019-10-17 02:51:15] INFO: [CheckM - analyze] Identifying marker genes in bins.
[2019-10-17 02:51:15] INFO: Identifying marker genes in 1 bins with 1 threads:
   Finished processing 1 of 1 (100.00%) bins.
[2019-10-17 02:52:05] INFO: Saving HMM info to file.
[2019-10-17 02:52:05] INFO: { Current stage: 0:00:49.981 || Total: 0:00:53.156 }
[2019-10-17 02:52:05] INFO: Parsing HMM hits to marker genes:
   Finished parsing hits for 1 of 1 (100.00%) bins.
[2019-10-17 02:52:05] INFO: Aligning marker genes with multiple hits in a single bin:
   Finished processing 1 of 1 (100.00%) bins.
[2019-10-17 02:52:05] INFO: { Current stage: 0:00:00.343 || Total: 0:00:53.500 }
[2019-10-17 02:52:05] INFO: Calculating genome statistics for 1 bins with 1 threads:
   Finished processing 1 of 1 (100.00%) bins.
[2019-10-17 02:52:05] INFO: { Current stage: 0:00:00.246 || Total: 0:00:53.747 }
[2019-10-17 02:52:05] INFO: [CheckM - qa] Tabulating genome statistics.
[2019-10-17 02:52:05] INFO: Calculating AAI between multi-copy marker genes.
[2019-10-17 02:52:05] INFO: Reading HMM info from file.
[2019-10-17 02:52:05] INFO: Parsing HMM hits to marker genes:
   Finished parsing hits for 1 of 1 (100.00%) bins.
 Bin Id          Marker lineage   # genomes   # markers   # marker sets   0    1    2   3   4   5+   Completeness   Contamination   Strain heterogeneity
 GCA_900517465      Bacteria         5449        104            58        3   101   0   0   0   0       97.07            0.00               0.00
[2019-10-17 02:52:06] INFO: { Current stage: 0:00:00.330 || Total: 0:00:54.077 }

这个例子中Completeness 为97.07%, Contamination为0,结果还不错。


checkm coverage  -h
usage: checkm coverage [-h] [-x EXTENSION] [-r] [-a MIN_ALIGN]
                       [-e MAX_EDIT_DIST] [-m MIN_QC] [-t THREADS] [-q]
                       bin_dir output_file bam_files [bam_files ...]

Calculate coverage of sequences.

positional arguments:
  bin_dir               directory containing bins (fasta format)
  output_file           print results to file
  bam_files             BAM files to parse

optional arguments:
  -h, --help            show this help message and exit
  -x, --extension EXTENSION
                        extension of bins (other files in directory are ignored) (default: fna)
  -r, --all_reads       use all reads to estimate coverage instead of just those in proper pairs
  -a, --min_align MIN_ALIGN
                        minimum alignment length as percentage of read length (default: 0.98)
  -e, --max_edit_dist MAX_EDIT_DIST
                        maximum edit distance as percentage of read length (default: 0.02)
  -m, --min_qc MIN_QC   minimum quality score (in phred) (default: 15)
  -t, --threads THREADS
                        number of threads (default: 1)
  -q, --quiet           suppress console output

Example: checkm coverage ./bins coverage.tsv example_1.bam example_2.bam

checkm coverage -x fa bins/ ~/cov.tsv sorted.bam

Sequence Id     Bin Id  Sequence length (bp)    Bam Id  Coverage        Mapped reads
k141_14826      unbinned        522     test    0.000000        0
k141_8961       unbinned        539     test    18.918367       68
k141_66057      unbinned        597     test    8.040201        32
k141_28930      unbinned        978     test    50.460123       329
k141_25826      unbinned        1090    test    8.669725        63
k141_63496      unbinned        510     test    2.352941        8
k141_25829      unbinned        1023    test    12.167155       83
k141_63494      unbinned        558     test    6.182796        23
k141_63495      unbinned        1399    test    448.277341      4181
k141_63493      unbinned        703     test    6.827881        32
k141_63490      unbinned        841     test    9.096314        51
k141_66058      bin.39  6959    test    18.923983       878
k141_28933      unbinned        1195    test    12.175732       97
k141_63498      unbinned        557     test    5.655296        21
k141_63499      unbinned        614     test    8.550489        35
k141_14821      unbinned        1130    test    82.559292       622

checkm profile  ~/cov.tsv
#可以获得% mapped reads: (reads mapped to bin)/(total number of reads mapped to assembly)
[2019-10-17 03:25:40] INFO: CheckM v1.0.18
[2019-10-17 03:25:40] INFO: checkm profile /data/home/liufei/cov.tsv
[2019-10-17 03:25:40] INFO: [CheckM - profile] Calculating percentage of reads mapped to each bin.
[2019-10-17 03:25:40] INFO: Determining number of reads mapped to each bin.
  Bin Id     Bin size (Mbp)   test: mapped reads   test: % mapped reads   test: % binned populations   test: % community
  bin.1           0.22                            586765                                            0.59                                                 2.15                                               1.36
  bin.10          0.86                            542830                                            0.54                                                 0.52                                               0.33
  bin.11          0.29                           2445260                                            2.45                                                 6.93                                               4.39
  bin.12          0.24                            70318                                             0.07                                                 0.24                                               0.15
  bin.13          0.33                            815391                             
