MetaPhlAn2宏基因组物种注释

导读

上一篇介绍了MetaPhlAn:宏基因组微生物分类分析教程,这次来学习MetaPhlAn2的使用方法。

bitbucket地址:https://bitbucket.org/biobakery/biobakery/wiki/metaphlan2

依赖:
Python (version >= 2.7)
Bowtie2
Numpy
Pandas (optional, only required by utility scripts)
BioPython (optional, only required by utility scripts)
SciPy (optional, only required by utility scripts)
Matplotlib (optional, only required by utility scripts)
biom (optional, only required for biom format input/output)

一、conda安装

conda install -c bioconda metaphlan2

二、测序数据

windows下载:
SRS014476-Supragingival_plaque.fasta.gz
SRS014494-Posterior_fornix.fasta.gz
SRS014459-Stool.fasta.gz
SRS014464-Anterior_nares.fasta.gz
SRS014470-Tongue_dorsum.fasta.gz
SRS014472-Buccal_mucosa.fasta.gz

linux下载:

curl -O https://bitbucket.org/biobakery/biobakery/raw/tip/demos/biobakery_demos/data/metaphlan2/input/SRS014476-Supragingival_plaque.fasta.gz
curl -O https://bitbucket.org/biobakery/biobakery/raw/tip/demos/biobakery_demos/data/metaphlan2/input/SRS014494-Posterior_fornix.fasta.gz
curl -O https://bitbucket.org/biobakery/biobakery/raw/tip/demos/biobakery_demos/data/metaphlan2/input/SRS014459-Stool.fasta.gz

三、MetaPhlAn2分析

1. 准备

mkdir metaphlan2_analysis
mv ~/Downloads/SRS*.fasta.gz metaphlan2_analysis/
cd metaphlan2_analysis
ls

2. 单样品分析

# 分析第一个样品
metaphlan2.py SRS014476-Supragingival_plaque.fasta.gz  --input_type fasta > SRS014476-Supragingival_plaque_profile.txt
# 查看比对结果
less -S SRS014476-Supragingival_plaque.fasta.gz.bowtie2out.txt
# 查看单样品物种丰度表
less -S SRS014476-Supragingival_plaque_profile.txt
# 多线程模式,第2个样品
metaphlan2.py SRS014459-Stool.fasta.gz --input_type fasta --nproc 4 > SRS014459-Stool_profile.txt

3. 多样品分析

# 剩下的4个样品
metaphlan2.py SRS014464-Anterior_nares.fasta.gz --input_type fasta --nproc 4 > SRS014464-Anterior_nares_profile.txt
metaphlan2.py SRS014470-Tongue_dorsum.fasta.gz --input_type fasta --nproc 4 > SRS014470-Tongue_dorsum_profile.txt
metaphlan2.py SRS014472-Buccal_mucosa.fasta.gz --input_type fasta --nproc 4 > SRS014472-Buccal_mucosa_profile.txt
metaphlan2.py SRS014494-Posterior_fornix.fasta.gz --input_type fasta --nproc 4 > SRS014494-Posterior_fornix_profile.txt

或者

# 一个循环完成6个样品的分析
for f in SRS*.fasta.gz
do
    metaphlan2.py $f --input_type fasta --nproc 4 > ${f%.fasta.gz}_profile.txt
done

4. 六个样品的物种丰度表
SRS014459-Stool_profile.txt
SRS014464-Anterior_nares_profile.txt SRS014470-Tongue_dorsum_profile.txt
SRS014472-Buccal_mucosa_profile.txt
SRS014476-Supragingival_plaque_profile.txt
SRS014494-Posterior_fornix_profile.txt

5. 六个样品的比对结果
SRS014459-Stool.fasta.gz.bowtie2out.txt
SRS014464-Anterior_nares.fasta.gz.bowtie2out.txt
SRS014470-Tongue_dorsum.fasta.gz.bowtie2out.txt
SRS014472-Buccal_mucosa.fasta.gz.bowtie2out.txt
SRS014476-Supragingival_plaque.fasta.gz.bowtie2out.txt
SRS014494-Posterior_fornix.fasta.gz.bowtie2out.txt

6. 合并六个样品的物种丰度表

merge_metaphlan_tables.py *_profile.txt > merged_abundance_table.txt

获取结果总表:merged_abundance_table.txt

# 查看结果总表
less -S merged_abundance_table.txt

四、hcluast2绘制热图

1. conda安装hclust2

conda install -c biobakery hclust2

2. 提取种水平丰度信息

grep -E "(s__)|(^ID)" merged_abundance_table.txt | grep -v "t__" | sed 's/^.*s__//g' > merged_abundance_table_species.txt

3. 绘制热图

hclust2.py -i merged_abundance_table_species.txt -o abundance_heatmap_species.png --ftop 25 --f_dist_f braycurtis --s_dist_f braycurtis --cell_aspect_ratio 0.5 -l --flabel_size 6 --slabel_size 6 --max_flabel_len 100 --max_slabel_len 100 --minv 0.1 --dpi 300

五、GraPhlAn绘制进化树

1. conda安装GraPhlAn

conda install -c biobakery graphlan

2. 准备输入文件

获取merged_abundance.tree.txt和merged_abunance.annot.txt

export2graphlan.py --skip_rows 1,2 -i merged_abundance_table.txt --tree merged_abundance.tree.txt --annotation merged_abundance.annot.txt --most_abundant 100 --abundance_threshold 1 --least_biomarkers 10 --annotations 5,6 --external_annotations 7 --min_clade_size 1

3. 绘制进化树

获取:
merged_abundance.xml
merged_abundance.png
merged_abundance_legend.png
merged_abundance_annot.png

graphlan_annotate.py --annot merged_abundance.annot.txt merged_abundance.tree.txt merged_abundance.xml
graphlan.py --dpi 300 merged_abundance.xml merged_abundance.png --external_legends

六、PanPhlAn绘制种水平heatmap

PanPhlAn教程

1. 输入数据

MetaPhlAn intermediate bowtie2 output files

13530241_SF05.fasta.gz.bowtie2out.txt
13530241_SF06.fasta.gz.bowtie2out.txt
19272639_SF05.fasta.gz.bowtie2out.txt
19272639_SF06.fasta.gz.bowtie2out.txt
40476924_SF05.fasta.gz.bowtie2out.txt
40476924_SF06.fasta.gz.bowtie2out.txt

2. 创建所选物种丰度表

物种:s__Eubacterium_siraeum
丰度:大于1%

metaphlan2.py --input_type bowtie2out -t clade_specific_strain_tracker --clade s__Eubacterium_siraeum --min_ab 1.0 13530241_SF05.fasta.gz.bowtie2out.txt > 13530241_SF05.siraeum.txt
metaphlan2.py --input_type bowtie2out -t clade_specific_strain_tracker --clade s__Eubacterium_siraeum --min_ab 1.0 13530241_SF06.fasta.gz.bowtie2out.txt > 13530241_SF06.siraeum.txt
metaphlan2.py --input_type bowtie2out -t clade_specific_strain_tracker --clade s__Eubacterium_siraeum --min_ab 1.0 19272639_SF05.fasta.gz.bowtie2out.txt > 19272639_SF05.siraeum.txt
metaphlan2.py --input_type bowtie2out -t clade_specific_strain_tracker --clade s__Eubacterium_siraeum --min_ab 1.0 19272639_SF06.fasta.gz.bowtie2out.txt > 19272639_SF06.siraeum.txt
metaphlan2.py --input_type bowtie2out -t clade_specific_strain_tracker --clade s__Eubacterium_siraeum --min_ab 1.0 40476924_SF05.fasta.gz.bowtie2out.txt > 40476924_SF05.siraeum.txt
metaphlan2.py --input_type bowtie2out -t clade_specific_strain_tracker --clade s__Eubacterium_siraeum --min_ab 1.0 40476924_SF06.fasta.gz.bowtie2out.txt > 40476924_SF06.siraeum.txt

结果:
13530241_SF05.siraeum.txt
13530241_SF06.siraeum.txt
19272639_SF05.siraeum.txt
19272639_SF06.siraeum.txt
40476924_SF05.siraeum.txt
40476924_SF06.siraeum.txt

3. 结果合并

merge_metaphlan_tables.py *.siraeum.txt > siraeum_tracker.txt

4. 绘制热图

hclust2.py -i siraeum_tracker.txt -o siraeum_tracker.png --skip_rows 1 --f_dist_f hamming --no_flabels --dpi 300 --cell_aspect_ratio 0.01

你可能感兴趣的:(MetaPhlAn2宏基因组物种注释)