Identifies a possible “core microbiome” of OTUs which are present in more samples than others.
Input is an OTU table in QIIME classic format.
The presence of an OTU in some or many samples can be spurious because of cross-talk or because the OTU itself is spurious. To enable manual review, the otutab_core command generates a report indicating cases where the presence of an OTU may be spurious due to cross-talk, and where an OTU may be spurious due to sequence errors.
If a sintax tabbed file is provided using the -sintaxin option, then the taxonomy of the core OTUs is included in the report.
If a distance matrix is provided using the distmxin option, this is used to identify possible dominant OTUs, i.e. high-abundance OTUs which are similar to a low-abundance OTUs in the report. If there is a dominant OTU, this may indicate that the low-abundance OTU is spurious.
The -tabbedout option specifies the output file. OTUs are sorted in order of decreasing number of samples where they are present. Fields are:
If the minimum or LoQ count is much smaller than the maximum count, this suggests that the smaller counts may be due to cross-talk.
If the size of an OTU is much smaller than a neighboring “dominant” OTU, then the OTU itself may be spurious due to sequence error.
usearch -calc_distmx otus.fa -tabbedout distmx.txt \
-sparsemx_minid 0.9 -termid 0.8
usearch -sintax otus.fa -strand both -db ref16s.txt \
-tabbedout sintax.txt
usearch -otutab_core otutab.txt -distmxin distmx.txt \
-sintaxin sintax.txt -tabbedout core.txt
在使用中,我碰到了报错。可以把-distmxin distmx.txt去掉。可以正常获得结果。
OTUID Samples Size Freq DomOTU DomSize DomId Min LoQ Med HiQ Max Taxonomy Core
OTU_2 1000 5079019 0.131 . . . 162 1915 3270 5470 23217 d:Bacteria,p:"Proteobacteria",c:Betaproteobacteria,o:Burkholderiales,f:Burkholderiaceae,g:Ralstonia,s:Ralstonia_mannitolilytica 100
OTU_34 999 180434 0.00466 . . . 1 40 83 174 2484 d:Bacteria,p:"Proteobacteria",c:Gammaproteobacteria,o:Clostridiales,f:Chloroplast,g:Streptophyta,s:Porticoccus_litoralis 99.9154
具体每列的意义见上方帮助文档。最主要的结果是Samples列,即该OTU在多少个样本中检测到。我们还需要将此数值除以总样本量,才能获得Core OTU的比例,方便筛选核心OTUs。
为鼓励读者交流、快速解决科研困难,我们建立了“宏基因组”专业讨论群,目前己有国内外1800+ 一线科研人员加入。参与讨论,获得专业解答,欢迎分享此文至朋友圈,并扫码加主编好友带你入群,务必备注“姓名-单位-研究方向-职称/年级”。技术问题寻求帮助,首先阅读《如何优雅的提问》学习解决问题思路,仍末解决群内讨论,问题不私聊,帮助同行。