http://www.drive5.com/usearch/manual/cmd_otutab_core.html
鉴定核心微生物组——大多数样品中出现的OTUs,这也是Usearch11新增的功能。
本质上是统计每个OTUs在大量样品中出现的频率。比如在所有样本中都出现,即100%。特别大量样本时,如几千个,可能很少有OTU存在于所有样本,可选择95%,或90%的样本中出现的OTUs作为核心OTUs。
Identifies a possible “core microbiome” of OTUs which are present in more samples than others.
输入文件为经典的QIIME格式OTU表
Input is an OTU table in QIIME classic format.
在一些样品或许多样本中出现的OTU可能是假的,原因可能是串道(cross-talk)或扩增、测序错误的假OTU。为方便检查,otutab_core命令产生分析报告,标明哪些OTU可能是串道、哪些可能是测序错误。
The presence of an OTU in some or many samples can be spurious because of cross-talk or because the OTU itself is spurious. To enable manual review, the otutab_core command generates a report indicating cases where the presence of an OTU may be spurious due to cross-talk, and where an OTU may be spurious due to sequence errors.
如果使用-sintaxin参数指定物种注释,报导中也会包括物种信息。
If a sintax tabbed file is provided using the -sintaxin option, then the taxonomy of the core OTUs is included in the report.
如果使用distmxin选项提供距离矩阵,可用于鉴定主导的OTUs,例如在报告中高丰度OTUs与低丰度OTUs相似,如果存在主导OTUs,那这些低丰主OTUs是假的。
If a distance matrix is provided using the distmxin option, this is used to identify possible dominant OTUs, i.e. high-abundance OTUs which are similar to a low-abundance OTUs in the report. If there is a dominant OTU, this may indicate that the low-abundance OTU is spurious.
-tabbedout
参数指定输出文件。OTUs按样品中出现频率排序,包括如下12个字段。
The -tabbedout option specifies the output file. OTUs are sorted in order of decreasing number of samples where they are present. Fields are:
If the minimum or LoQ count is much smaller than the maximum count, this suggests that the smaller counts may be due to cross-talk.
If the size of an OTU is much smaller than a neighboring “dominant” OTU, then the OTU itself may be spurious due to sequence error.
Example
usearch -calc_distmx otus.fa -tabbedout distmx.txt \
-sparsemx_minid 0.9 -termid 0.8
usearch -sintax otus.fa -strand both -db ref16s.txt \
-tabbedout sintax.txt
usearch -otutab_core otutab.txt -distmxin distmx.txt \
-sintaxin sintax.txt -tabbedout core.txt
在使用中,我碰到了报错。可以把-distmxin distmx.txt去掉。可以正常获得结果。
结果文件如下:
OTUID Samples Size Freq DomOTU DomSize DomId Min LoQ Med HiQ Max Taxonomy Core
OTU_2 1000 5079019 0.131 . . . 162 1915 3270 5470 23217 d:Bacteria,p:"Proteobacteria",c:Betaproteobacteria,o:Burkholderiales,f:Burkholderiaceae,g:Ralstonia,s:Ralstonia_mannitolilytica 100
OTU_34 999 180434 0.00466 . . . 1 40 83 174 2484 d:Bacteria,p:"Proteobacteria",c:Gammaproteobacteria,o:Clostridiales,f:Chloroplast,g:Streptophyta,s:Porticoccus_litoralis 99.9154
具体每列的意义见上方帮助文档。最主要的结果是Samples列,即该OTU在多少个样本中检测到。我们还需要将此数值除以总样本量,才能获得Core OTU的比例,方便筛选核心OTUs。
为鼓励读者交流、快速解决科研困难,我们建立了“宏基因组”专业讨论群,目前己有国内外1800+ 一线科研人员加入。参与讨论,获得专业解答,欢迎分享此文至朋友圈,并扫码加主编好友带你入群,务必备注“姓名-单位-研究方向-职称/年级”。技术问题寻求帮助,首先阅读《如何优雅的提问》学习解决问题思路,仍末解决群内讨论,问题不私聊,帮助同行。
学习扩增子、宏基因组科研思路和分析实战,关注“宏基因组”
点击阅读原文,跳转最新文章目录阅读
https://mp.weixin.qq.com/s/5jQspEvH5_4Xmart22gjMA