导读

使用 HUMAnN2（The HMP Unified Metabolic Analysis Network 2）进行宏基
因组学分析。 HUMAnN2 在宏基因组研究中非常有用，通过这个分析，不仅能获
得微生物的物种丰度信息，还能准确有效地获得微生物代谢途径和功能模块信息。

HUMAnN2官网：https://huttenhower.sph.harvard.edu/humann
HUMAnN2 SOP：Metagenomics standard operating procedure v2
HUMAnN2 GitHub：https://github.com/biobakery/humann

一、特点

能识别宏基因组数据中的细菌、真菌、古菌、病毒、基因家族和通路。

二、分析流程图

三、安装依赖

下载并安装配置所需要的所有软件，包括 FastQC、 HUMAnN2、 MetaPhlAn2、
Bowtie2、 Diamond、 MinPath、 Python、 perl、 cutadapt 等，安装环境为 redhat/ubuntu/centos操作系统，为了避免部分软件安装不成功，需要使用系统的 root 账户进行安装。

部分软件安装：HUMAnN2、cutadapt、metaphlan2

# HUMAnN2
# 安装过程见： http://huttenhower.sph.harvard.edu/humann2
pip install humann2 #下载软件
humann2 --version #测试安装是否成功
humann2_databases --download chocophlan full $DIR #下载 chocophlan 数据库
humann2_databases --download uniref uniref90_diamond $DIR # 下载 full UniRef90 database (approx. size = 11 GB)
humann2_databases --download uniref uniref90_ec_filtered_diamond $DIR # 下 载 UniRef90 EC filtered database (RECOMMENDED, approx. size = 846 MB)

# cutadapt
# 安装过程见： https://cutadapt.readthedocs.io/en/stable/installation.html
pip install --user --upgrade cutadapt #下载并安装软件HUMAnN2 

# metaphlan2
# 软件安装过程见：https://bitbucket.org/biobakery/metaphlan2#markdown-header-installation
wget https://bitbucket.org/biobakery/metaphlan2/get/default.zip #下载
unzip default.zip #解压
mv biobakery-metaphlan2-5175d8783801 metaphlan2
cd metaphlan2/
export PATH=`pwd`:`pwd`/utils:$PATH #设置路径
export mpa_dir=`pwd` #设置路径
metaphlan2.py #测试

安装软件版本：

四、分析流程

1 质控和质检

# 切除引物序列：
cutadapt -g read1_5’primer -G read2_5’primer -o R1_cut.fastq -p R2_cut.fastq R1.fastq R2.fastq

# 质量检查：
fastqc *_cut.fastq

# 去除宿主序列；SLIDINGWINDOW 质检
kneaddata -i R1_cut.fastq -i R2_cut.fastq -o output_dir(kneaddata_out) \
-db /hg19/ --trimmomatic /soft_route/ \
-t n --trimmomatic-options "SLIDINGWINDOW:4:20 MINLEN:50" \
--bowtie2-options "--very-sensitive --dovetail" --remove-intermediate-output

# 质检结果统计
kneaddata_read_count_table –input input_dir(kneaddata_out) --output file.txt

2 数据整理

# 合并宿主污染序列到指定目录：
mkdir kneaddata_out/contam_seq
mv *_contam_*fastq contam_seq

# 合并正反义链：使用 perl 脚本，脚本下载于 github
perl concat_paired_end.pl -p 4 --no_R_match \
-o output_dir cat_reads kneaddataout/*_paired_*.fastq

3 使用 HUMAnN2 进行微生物分类单元、基因家族、通路注释

该步骤为本历程最关键和最耗时间的步骤，使用软件推荐的更小的参考数据库可提高效率

humann2 --threads n --protein-database /route/humann2_full_uniref90/ \
--input /route/cat_reads/*.fastq \
--output /route/humann2_out_full/

humann2 --threads n /route/humann2_ec_uniref90/ \
--input /route/cat_reads/*.fastq \
--output /route/humann2_out_ec/

过程：
Creating output directory: /data/huty/Anno73/humann2_out/332
Output files will be written to: /data/huty/Anno73/humann2_out/332

物种注释：
Running metaphlan2.py ........
Found g__Viruses_noname.s__Vibrio_phage_pYD38_A : 38.99% of mapped reads
Found g__Bacteroides.s__Bacteroides_coprocola : 29.31% of mapped reads
Total species selected from prescreen: 40
Selected species explain 99.89% of predicted community composition

基因家族注释：
Creating custom ChocoPhlAn database ........
Running bowtie2-build ........
Running bowtie2 ........
Total bugs from nucleotide alignment: 33
g__Bacteroides.s__Bacteroides_coprocola: 14754780 hits
g__Roseburia.s__Roseburia_inulinivorans: 116636 hits
Total gene families from nucleotide alignment: 81939
Unaligned reads after nucleotide alignment: 50.0360338749 %

蛋白注释：
Running diamond ........
Aligning to reference database: uniref90.ec_filtered.1.1.dmnd
Total bugs after translated alignment: 34
g__Bacteroides.s__Bacteroides_coprocola: 14754780 hits
g__Roseburia.s__Roseburia_inulinivorans: 116636 hits
Total gene families after translated alignment: 92213
Unaligned reads after translated alignment: 48.6624332243 %

统计基因家族，计算通路丰度和覆盖度
Computing gene families ...
Computing pathways abundance and coverage ...
Output files created:
/data/huty/Anno73/humann2_out/332/332_genefamilies.tsv
/data/huty/Anno73/humann2_out/332/332_pathabundance.tsv
/data/huty/Anno73/humann2_out/332/332_pathcoverage.tsv

4 将每个样本的 humann2 输出导入到一个表中

mkdir humann2_final_out

humann2_join_tables -s --input humann2_out/ --file_name pathabundance \
--output humann2_final_out/humann2_pathabundance.tsv

humann2_join_tables -s --input humann2_out/ --file_name pathcoverage \
--output humann2_final_out/humann2_pathcoverage.tsv

humann2_join_tables -s --input humann2_out/ --file_name genefamilies \
--output humann2_final_out/humann2_genefamilies.tsv

5 重新标准化基因家族和通路的丰度（标准化为百分比）

humann2_renorm_table \
--input humann2_final_out/humann2_pathabundance.tsv \
--units relab \
--output humann2_final_out/humann2_pathabundance_relab.tsv

humann2_renorm_table \
--input humann2_final_out/humann2_genefamilies.tsv \
--units relab \
--output humann2_final_out/humann2_genefamilies_relab.tsv

6 分割 humann2 输出的丰度表为分层（unclassified）和非分层表

分层的就是每个功能注明到特定菌的某个功能，不分层的只是看每个样本的情况。

humann2_split_stratified_table \
--input humann2_final_out/humann2_pathabundance_relab.tsv \
--output humann2_final_out

humann2_split_stratified_table \
--input humann2_final_out/humann2_genefamilies_relab.tsv \
--output humann2_final_out

humann2_split_stratified_table \
--input humann2_final_out/humann2_pathcoverage.tsv \
--output humann2_final_out

7 合并 Metaphlan2 结果

mkdir metaphlan2_out
cp `find ./ -name "*_metaphlan_bugs*"` ./metaphlan2_out

# 需要进入 python2.7 环境
merge_metaphlan_tables.py metaphlan2_out/*metaphlan_bugs_list.tsv > metaphlan2_merged.txt

# 每个样本名字太长，需要精简一点
sed -i 's/_1_kneaddata_paired_1.assembled_metaphlan_bugs_list//g' metaphlan2_merged.txt

# 取出属水平的丰度数据，可以导入 R 进行下游分析
head -n1 metaphlan2_merged.txt >> genus.txt
grep "g__" metaphlan2_merged.txt | grep -v "|s__" | grep -v "unclassified" >> genus.txt

# 取出种水平的丰度数据，可以导入 R 进行下游分析
head -n 1 metaphlan2_merged.txt >> species.txt
grep "s__" metaphlan2_merged.txt | grep -v "unclassified" >> species.txt

8 Metaphlan2 结果的可视化

# 需要进入 python2.7 环境
# heatmap 按各样本丰度绘制热图，显示前 25 种高丰度菌种类，样品默认采用 bray_curtis 距离聚类，微生物采用相关性距离聚类；数值采用对数 10 进行变换
metaphlan_hclust_heatmap.py -c bbcry --top 25 --minv 0.1 -s log --in metaphlan2_merged.txt --out abundance_heatmap.pdf

HUMAnN2 宏基因组分析 | 小白版

导读

一、特点

二、分析流程图

三、安装依赖

四、分析流程

你可能感兴趣的:(HUMAnN2 宏基因组分析 | 小白版)