

conda install kneaddata

kneaddata_database --available

KneadData Databases ( database : build = location )
human_genome : bmtagger = http://huttenhower.sph.harvard.edu/kneadData_databases/Homo_sapiens_BMTagger_v0.1.tar.gz
human_genome : bowtie2 = http://huttenhower.sph.harvard.edu/kneadData_databases/Homo_sapiens_Bowtie2_v0.1.tar.gz
mouse_C57BL : bowtie2 = http://huttenhower.sph.harvard.edu/kneadData_databases/mouse_C57BL_6NJ_Bowtie2_v0.1.tar.gz
human_transcriptome : bowtie2 = http://huttenhower.sph.harvard.edu/kneadData_databases/Homo_sapiens_hg38_transcriptome_Bowtie2_v0.1.tar.gz
ribosomal_RNA : bowtie2 = http://huttenhower.sph.harvard.edu/kneadData_databases/SILVA_128_LSUParc_SSUParc_ribosomal_RNA_v0.1.tar.gz
kneaddata_database --download human_genome bowtie2 .

Download URL: http://huttenhower.sph.harvard.edu/kneadData_databases/Homo_sapiens_Bowtie2_v0.1.tar.gz
Downloading file of size: 3.44 GB
kneaddata -h 显示帮助

usage: kneaddata [-h] [--version] [-v] -i INPUT -o OUTPUT_DIR
[-db REFERENCE_DB] [--bypass-trim]
[--output-prefix OUTPUT_PREFIX] [-t <1>] [-p <1>]
[-q {phred33,phred64}] [--run-bmtagger] [--run-trf]
[--run-fastqc-start] [--run-fastqc-end] [--store-temp-output]
[--remove-intermediate-output] [--cat-final-output]
[--trimmomatic TRIMMOMATIC_PATH] [--max-memory MAX_MEMORY]
[--trimmomatic-options TRIMMOMATIC_OPTIONS]
[--bowtie2 BOWTIE2_PATH] [--bowtie2-options BOWTIE2_OPTIONS]
[--no-discordant] [--cat-pairs] [--reorder] [--serial]
[--bmtagger BMTAGGER_PATH] [--trf TRF_PATH] [--match MATCH]
[--mismatch MISMATCH] [--delta DELTA] [--pm PM] [--pi PI]
[--minscore MINSCORE] [--maxperiod MAXPERIOD]
[--fastqc FASTQC_PATH]


optional arguments:
-h, --help show this help message and exit
-v, --verbose additional output is printed

global options:
--version show program's version number and exit
-i INPUT, --input INPUT
input FASTQ file (add a second argument instance to run with paired input files)
directory to write output files
-db REFERENCE_DB, --reference-db REFERENCE_DB
location of reference database (additional arguments add databases)
--bypass-trim bypass the trim step
--output-prefix OUTPUT_PREFIX
prefix for all output files
[ DEFAULT : SAMPLE_kneaddata ] -t <1>, --threads <1> number of threads [ Default : 1 ] -p <1>, --processes <1> number of processes [ Default : 1 ] -q {phred33,phred64}, --quality-scores {phred33,phred64} quality scores [ DEFAULT : phred33 ] --run-bmtagger run BMTagger instead of Bowtie2 to identify contaminant reads --run-trf run TRF to remove tandem repeats --run-fastqc-start run fastqc at the beginning of the workflow --run-fastqc-end run fastqc at the end of the workflow --store-temp-output store temp output files [ DEFAULT : temp output files are removed ] --remove-intermediate-output remove intermediate output files [ DEFAULT : intermediate output files are stored ] --cat-final-output concatenate all final output files [ DEFAULT : final output is not concatenated ] --log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL} level of log messages [ DEFAULT : DEBUG ] --log LOG log file [ DEFAULT :OUTPUT_DIR/$SAMPLE_kneaddata.log ]

trimmomatic arguments:
--trimmomatic TRIMMOMATIC_PATH
path to trimmomatic
--max-memory MAX_MEMORY
max amount of memory
[ DEFAULT : 500m ]
--trimmomatic-options TRIMMOMATIC_OPTIONS
options for trimmomatic
MINLEN is set to 70 percent of total input read length

bowtie2 arguments:
--bowtie2 BOWTIE2_PATH
path to bowtie2
--bowtie2-options BOWTIE2_OPTIONS
options for bowtie2
[ DEFAULT : --very-sensitive ]
--no-discordant do not include discordant alignments for pairs (ie one of the two pairs aligns)
[ DEFAULT : Discordant alignments are included ]
--cat-pairs concatenate pair files before aligning so reads are aligned as single end
[ DEFAULT : paired reads are aligned as pairs ]
--reorder order the sequences in the same order as the input
[ DEFAULT : With discordant paired alignments sequences are not ordered ]
--serial filter the input in serial for multiple databases so a subset of reads are processed in each database search

bmtagger arguments:
--bmtagger BMTAGGER_PATH
path to BMTagger

trf arguments:
--trf TRF_PATH path to TRF
--match MATCH matching weight
[ DEFAULT : 2 ]
--mismatch MISMATCH mismatching penalty
[ DEFAULT : 7 ]
--delta DELTA indel penalty
[ DEFAULT : 7 ]
--pm PM match probability
[ DEFAULT : 80 ]
--pi PI indel probability
[ DEFAULT : 10 ]
--minscore MINSCORE minimum alignment score to report
[ DEFAULT : 50 ]
--maxperiod MAXPERIOD
maximum period size to report
[ DEFAULT : 500 ]

fastqc arguments:
--fastqc FASTQC_PATH path to fastqc
[ DEFAULT : name_1.fastq.gz -i $name_2.fastq.gz -o kneaddata_out --trimmomatic Trimmomatic-0.36/ --remove-intermediate-output -db Homo_sapiens_Bowtie2

--remove-intermediate-output 清理中间文件
-db 人基因组的bowtie2索引文件
--trimmomatic 质控程序位置

kneaddata_read_count_table --input kneaddata_out --output kneaddata_read_counts.out

cat kneaddata_read_counts.out

Sample raw pair1 raw pair2 trimmed pair1 trimmed pair2 trimmed orphan1 trimmed orphan2 decontaminated Homo_sapiens pair1 decontaminated Homo_sapiens pair2 decontaminated Homo_sapiens orphan1 decontaminated Homo_sapiens orphan2
final pair1 final pair2 final orphan1 final orphan2
kneaddata 72577172.0 72577172.0 49961458.0 49961458.0 20031875.0 955031.0 48388320.0 48388320.0 21348792.0 901878.0 48388320.0 48388320.0 21348792.0 901878.0
在这个栗子中,宏基因组测序原始paired-end reads数为72577172,过滤低质量序列后的paired-end reads数为49961458.0,过滤完人基因组之后的paired-end reads数为48388320.0。

