扩增子测序简化版(Linux部分)

16S rRNA 基因是编码原核生物核糖体小亚基的基因,长度约为1542bp,其分子大小适中,突变率小,是细菌系统分类学研究中最常用和最有用的标志。16S rRNA基因序列包括9个可变区和10个保守区,保守区序列反映了物种间的亲缘关系,而可变区序列则能体现物种间的差异。


细菌16S.png

测序引物.png

双末端(PE reads)

1. 生成样品路径清单(file path)文件

echo 'sample-id','absolute-filepath','direction' > manifest_pe.txt
ls *_1.fastq.gz|while read id; 
do 
echo "${id%%_*},$PWD/$id,forward">> manifest_pe.txt; 
echo "${id%%_*},$PWD/${id%%_*}_2.fastq.gz,reverse">> manifest_pe.txt;
done

2. 导入qiime2

conda activate qiime2-2020.11
time qiime tools import \
--type 'SampleData[PairedEndSequencesWithQuality]' \
--input-path manifest_pe.txt \
--output-path summary_pe.qza \
--input-format PairedEndFastqManifestPhred33 

生成可视化文件

time qiime demux summarize \
--i-data summary_pe.qza \
--o-visualization summary_pe.zip

3. Cutadapt去接头

V3-V4区
time qiime cutadapt trim-paired \
--p-cores 4 \
--i-demultiplexed-sequences summary_pe.qza \
--p-front-f CGACCTACGGGNGGCWGCAG \
--p-front-r TCGACTACHVGGGTATCTAATCC \
--o-trimmed-sequences cutadapt-summary.qza \
--verbose \
&>cutadapt.log
V4-V5区
time qiime cutadapt trim-paired \
--p-cores 4 \
--i-demultiplexed-sequences summary_pe.qza \
--p-front-f GTGCCAGCMGCCGCGGTAA \
--p-front-r CCGTCAATTCMTTTRAGTTT  \
--o-trimmed-sequences cutadapt-summary.qza \
--verbose \
&>cutadapt.log

4. DADA2去噪

mkdir -p R
time qiime dada2 denoise-paired \
--p-n-threads 4 \
--i-demultiplexed-seqs cutadapt-summary.qza \
--o-table ./R/1_table-dada2_pe.qza \
--o-representative-sequences rep-seqs-dada2_pe.qza \
--o-denoising-stats dada2-stats_pe.qza \
--p-trunc-len-f 0 \
--p-trunc-len-r 0
real    111m17.195s
user    111m4.600s
sys 0m9.834s

5. 用代表序列构建进化树

cp rep-seqs-dada2_pe.qza ./R/2_rep-seqs-dada2_pe.qza
time qiime phylogeny align-to-tree-mafft-fasttree \
--i-sequences rep-seqs-dada2_pe.qza  \
--o-alignment aligned-rep-seqs_pe.qza \
--o-masked-alignment masked-aligned-rep-seqs_pe.qza \
--o-tree unrooted-tree_pe.qza \
--o-rooted-tree ./R/3_rooted-tree_pe.qza

6. 用代表序列物种注释

ggreen数据库

time qiime feature-classifier classify-sklearn \
--i-classifier /media/lzx/0000678400004823/Qiime2/gg-13-8-99-nb-classifier.qza \
--i-reads rep-seqs-dada2_pe.qza \
--o-classification ./R/4_taxonomy_gg.qza ##

gg-13-8-99-515-806-nb-classifier.qza:物种注释分类器

Sliva

time qiime feature-classifier classify-sklearn \
--i-classifier /media/lzx/0000678400004823/Qiime2/silva-138-99-nb-classifier.qza \
--i-reads rep-seqs-dada2_pe.qza \
--o-classification ./R/4_taxonomy_sliva.qza ##

taxonomy.qza: 物种注释结果。

7. 导入R

library("phyloseq")
library("ggpubr")#用于事后检验标记
library("MicrobiotaProcess")
library("tidyverse")
otu <- "table-dada2_pe.qza"
rep <- "rep-seqs-dada2_pe.qza"
tree <- "rooted-tree_pe.qza"
tax <- "taxonomy_pe.qza"
sample <- "metadata_pe.txt"
ps_dada2 <- import_qiime2(otuqza=otu, taxaqza=tax,refseqqza=rep,
                          mapfilename=sample,treeqza=tree)

单末端reads(三代测序数据)

生成样品路径清单(file path)文件

echo 'sample-id','absolute-filepath','direction' > manifest_se.txt
ls *.fastq.gz|while read id; 
do 
echo "${id%%.*},$PWD/$id,forward">> manifest_se.txt; 
done

导入qiime2

conda activate qiime2-2020.11
qiime tools import \
--type 'SampleData[SequencesWithQuality]' \
--input-path manifest_se.txt \
--output-path summary_se.qza  \
--input-format SingleEndFastqManifestPhred33 

生成可视化文件

time qiime demux summarize \
--i-data summary_se.qza \
--o-visualization summary_se.qzv

DADA2去噪

mkdir -p R
time qiime dada2 denoise-single \
--p-n-threads 4 \
--i-demultiplexed-seqs summary_se.qza \
--o-table ./R/1_table-dada2_se.qza \
--o-representative-sequences rep-seqs-dada2_se.qza \
--o-denoising-stats dada2-stats_se.qza \
--p-trunc-len 0

用代表序列构建进化树

cp rep-seqs-dada2_se.qza ./R/2_rep-seqs-dada2_se.qza
time qiime phylogeny align-to-tree-mafft-fasttree \
--i-sequences rep-seqs-dada2_se.qza  \
--o-alignment aligned-rep-seqs_se.qza \
--o-masked-alignment masked-aligned-rep-seqs_se.qza \
--o-tree unrooted-tree_se.qza \
--o-rooted-tree ./R/3_rooted-tree_se.qza

用代表序列物种注释(green数据库)

time qiime feature-classifier classify-sklearn \
--i-classifier /media/lzx/0000678400004823/Qiime2/gg-13-8-99-nb-classifier.qza \
--i-reads rep-seqs-dada2_se.qza \
--o-classification ./R/4_taxonomy_gg.qza 

用代表序列物种注释(silva数据库)

time qiime feature-classifier classify-sklearn \
--i-classifier ../silva-138-99-nb-classifier.qza \
--i-reads 2_rep-seqs-dada2_se.qza \
--o-classification 4_taxonomy_silva.qza 

导入R

library(MicrobiotaProcess)
library(tidyverse)
library(ggsci)  
library(RColorBrewer)
otu <- "1_table-dada2_se.qza"
rep <- "2_rep-seqs-dada2_se.qza"
tree <- "3_rooted-tree_se.qza"
tax <- "4_taxonomy_gg.qza"
sample <- "5_metadata_se.txt"
qiimedata <- import_qiime2(otuqza=otu,taxaqza=tax,refseqqza=rep,
                           mapfilename=sample,treeqza=tree)

你可能感兴趣的:(扩增子测序简化版(Linux部分))