导读
上一篇介绍了MetaPhlAn:宏基因组微生物分类分析教程,这次来学习MetaPhlAn2的使用方法。
bitbucket地址:https://bitbucket.org/biobakery/biobakery/wiki/metaphlan2
依赖:
Python (version >= 2.7)
Bowtie2
Numpy
Pandas (optional, only required by utility scripts)
BioPython (optional, only required by utility scripts)
SciPy (optional, only required by utility scripts)
Matplotlib (optional, only required by utility scripts)
biom (optional, only required for biom format input/output)
一、conda安装
conda install -c bioconda metaphlan2
二、测序数据
windows下载:
SRS014476-Supragingival_plaque.fasta.gz
SRS014494-Posterior_fornix.fasta.gz
SRS014459-Stool.fasta.gz
SRS014464-Anterior_nares.fasta.gz
SRS014470-Tongue_dorsum.fasta.gz
SRS014472-Buccal_mucosa.fasta.gz
linux下载:
curl -O https://bitbucket.org/biobakery/biobakery/raw/tip/demos/biobakery_demos/data/metaphlan2/input/SRS014476-Supragingival_plaque.fasta.gz
curl -O https://bitbucket.org/biobakery/biobakery/raw/tip/demos/biobakery_demos/data/metaphlan2/input/SRS014494-Posterior_fornix.fasta.gz
curl -O https://bitbucket.org/biobakery/biobakery/raw/tip/demos/biobakery_demos/data/metaphlan2/input/SRS014459-Stool.fasta.gz
三、MetaPhlAn2分析
1. 准备
mkdir metaphlan2_analysis
mv ~/Downloads/SRS*.fasta.gz metaphlan2_analysis/
cd metaphlan2_analysis
ls
2. 单样品分析
# 分析第一个样品
metaphlan2.py SRS014476-Supragingival_plaque.fasta.gz --input_type fasta > SRS014476-Supragingival_plaque_profile.txt
# 查看比对结果
less -S SRS014476-Supragingival_plaque.fasta.gz.bowtie2out.txt
# 查看单样品物种丰度表
less -S SRS014476-Supragingival_plaque_profile.txt
# 多线程模式,第2个样品
metaphlan2.py SRS014459-Stool.fasta.gz --input_type fasta --nproc 4 > SRS014459-Stool_profile.txt
3. 多样品分析
# 剩下的4个样品
metaphlan2.py SRS014464-Anterior_nares.fasta.gz --input_type fasta --nproc 4 > SRS014464-Anterior_nares_profile.txt
metaphlan2.py SRS014470-Tongue_dorsum.fasta.gz --input_type fasta --nproc 4 > SRS014470-Tongue_dorsum_profile.txt
metaphlan2.py SRS014472-Buccal_mucosa.fasta.gz --input_type fasta --nproc 4 > SRS014472-Buccal_mucosa_profile.txt
metaphlan2.py SRS014494-Posterior_fornix.fasta.gz --input_type fasta --nproc 4 > SRS014494-Posterior_fornix_profile.txt
或者
# 一个循环完成6个样品的分析
for f in SRS*.fasta.gz
do
metaphlan2.py $f --input_type fasta --nproc 4 > ${f%.fasta.gz}_profile.txt
done
4. 六个样品的物种丰度表
SRS014459-Stool_profile.txt
SRS014464-Anterior_nares_profile.txt SRS014470-Tongue_dorsum_profile.txt
SRS014472-Buccal_mucosa_profile.txt
SRS014476-Supragingival_plaque_profile.txt
SRS014494-Posterior_fornix_profile.txt
5. 六个样品的比对结果
SRS014459-Stool.fasta.gz.bowtie2out.txt
SRS014464-Anterior_nares.fasta.gz.bowtie2out.txt
SRS014470-Tongue_dorsum.fasta.gz.bowtie2out.txt
SRS014472-Buccal_mucosa.fasta.gz.bowtie2out.txt
SRS014476-Supragingival_plaque.fasta.gz.bowtie2out.txt
SRS014494-Posterior_fornix.fasta.gz.bowtie2out.txt
6. 合并六个样品的物种丰度表
merge_metaphlan_tables.py *_profile.txt > merged_abundance_table.txt
获取结果总表:merged_abundance_table.txt
# 查看结果总表
less -S merged_abundance_table.txt
四、hcluast2绘制热图
1. conda安装hclust2
conda install -c biobakery hclust2
2. 提取种水平丰度信息
grep -E "(s__)|(^ID)" merged_abundance_table.txt | grep -v "t__" | sed 's/^.*s__//g' > merged_abundance_table_species.txt
3. 绘制热图
hclust2.py -i merged_abundance_table_species.txt -o abundance_heatmap_species.png --ftop 25 --f_dist_f braycurtis --s_dist_f braycurtis --cell_aspect_ratio 0.5 -l --flabel_size 6 --slabel_size 6 --max_flabel_len 100 --max_slabel_len 100 --minv 0.1 --dpi 300
五、GraPhlAn绘制进化树
1. conda安装GraPhlAn
conda install -c biobakery graphlan
2. 准备输入文件
获取merged_abundance.tree.txt和merged_abunance.annot.txt
export2graphlan.py --skip_rows 1,2 -i merged_abundance_table.txt --tree merged_abundance.tree.txt --annotation merged_abundance.annot.txt --most_abundant 100 --abundance_threshold 1 --least_biomarkers 10 --annotations 5,6 --external_annotations 7 --min_clade_size 1
3. 绘制进化树
获取:
merged_abundance.xml
merged_abundance.png
merged_abundance_legend.png
merged_abundance_annot.png
graphlan_annotate.py --annot merged_abundance.annot.txt merged_abundance.tree.txt merged_abundance.xml
graphlan.py --dpi 300 merged_abundance.xml merged_abundance.png --external_legends
六、PanPhlAn绘制种水平heatmap
PanPhlAn教程
1. 输入数据
MetaPhlAn intermediate bowtie2 output files
13530241_SF05.fasta.gz.bowtie2out.txt
13530241_SF06.fasta.gz.bowtie2out.txt
19272639_SF05.fasta.gz.bowtie2out.txt
19272639_SF06.fasta.gz.bowtie2out.txt
40476924_SF05.fasta.gz.bowtie2out.txt
40476924_SF06.fasta.gz.bowtie2out.txt
2. 创建所选物种丰度表
物种:s__Eubacterium_siraeum
丰度:大于1%
metaphlan2.py --input_type bowtie2out -t clade_specific_strain_tracker --clade s__Eubacterium_siraeum --min_ab 1.0 13530241_SF05.fasta.gz.bowtie2out.txt > 13530241_SF05.siraeum.txt
metaphlan2.py --input_type bowtie2out -t clade_specific_strain_tracker --clade s__Eubacterium_siraeum --min_ab 1.0 13530241_SF06.fasta.gz.bowtie2out.txt > 13530241_SF06.siraeum.txt
metaphlan2.py --input_type bowtie2out -t clade_specific_strain_tracker --clade s__Eubacterium_siraeum --min_ab 1.0 19272639_SF05.fasta.gz.bowtie2out.txt > 19272639_SF05.siraeum.txt
metaphlan2.py --input_type bowtie2out -t clade_specific_strain_tracker --clade s__Eubacterium_siraeum --min_ab 1.0 19272639_SF06.fasta.gz.bowtie2out.txt > 19272639_SF06.siraeum.txt
metaphlan2.py --input_type bowtie2out -t clade_specific_strain_tracker --clade s__Eubacterium_siraeum --min_ab 1.0 40476924_SF05.fasta.gz.bowtie2out.txt > 40476924_SF05.siraeum.txt
metaphlan2.py --input_type bowtie2out -t clade_specific_strain_tracker --clade s__Eubacterium_siraeum --min_ab 1.0 40476924_SF06.fasta.gz.bowtie2out.txt > 40476924_SF06.siraeum.txt
结果:
13530241_SF05.siraeum.txt
13530241_SF06.siraeum.txt
19272639_SF05.siraeum.txt
19272639_SF06.siraeum.txt
40476924_SF05.siraeum.txt
40476924_SF06.siraeum.txt
3. 结果合并
merge_metaphlan_tables.py *.siraeum.txt > siraeum_tracker.txt
4. 绘制热图
hclust2.py -i siraeum_tracker.txt -o siraeum_tracker.png --skip_rows 1 --f_dist_f hamming --no_flabels --dpi 300 --cell_aspect_ratio 0.01