龙star180

算法文献阅读：dbCAN2：一个用于自动化碳水化合物活性酶注释的宏服务器

期刊

Nucleic Acids Research （19.160/Q1）

dbCAN2: a meta server for automated carbohydrate-active enzyme annotation

dbCAN2：一个用于自动化碳水化合物活性酶注释的宏服务器

ABSTRACT

Complex carbohydrates of plants are the main food sources of animals and microbes, and serve as promising renewable feedstock for biofuel and biomaterial production. Carbohydrate active enzymes (CAZymes) are the most important enzymes for complex carbohydrate metabolism. With an increasing number of plant and plant-associated microbial genomes and metagenomes being sequenced, there is an urgent need of automatic tools for genomic data mining of CAZymes. We developed the dbCAN web server in 2012 to provide a public service for automated CAZyme annotation for newly sequenced genomes. Here, dbCAN2 (http://cys.bios.niu.edu/dbCAN2) is presented as an updated meta server, which integrates three state-of-the-art tools for CAZome (all CAZymes of a genome) annotation: (i) HMMER search against the dbCAN HMM (hidden Markov model) database; (ii) DIAMOND search against the CAZy pre-annotated CAZyme sequence database and (iii) Hotpep search against the conserved CAZyme short peptide database. Combining the three outputs and removing CAZymes found by only one tool can significantly improve the CAZome annotation accuracy. In addition, dbCAN2 now also accepts nucleotide sequence submission, and offers the service to predict physically linked CAZyme gene clusters (CGCs), which will be a very useful online tool for identifying putative polysaccharide utilization loci (PULs) in microbial genomes or metagenomes.

植物的复杂碳水化合物是动物和微生物的主要食物来源，并作为有前途的可再生原料用于生物燃料和生物材料生产。碳水化合物活性酶（CAZymes）是复杂碳水化合物代谢的最重要酶。随着越来越多的植物和植物相关的微生物基因组和宏基因组被测序，迫切需要自动工具来进行CAZymes的基因组数据挖掘。我们在2012年开发了dbCAN网络服务器，为新测序的基因组提供自动CAZyme注释的公共服务。在这里，dbCAN2（http://cys.bios.niu.edu/dbCAN2）作为一个更新的宏服务器被提出来，它整合了三种最先进的CAZome（一个基因组的所有CAZymes）注释工具：（i）针对dbCAN HMM（隐马尔可夫模型）数据库的HMMER搜索；（ii）针对CAZy预先注释的CAZyme序列数据库的DIAMOND搜索；（iii）针对保守CAZyme短肽数据库的Hotpep搜索。将这三种输出结果结合起来，并删除仅由一种工具发现的CAZymes，可以显著提高CAZome注释的准确性。此外，dbCAN2现在也接受核苷酸序列的提交，并提供预测物理连接的CAZyme基因簇（CGCs）的服务，这将是一个非常有用的在线工具，用于识别微生物基因组或宏基因组中假定的多糖利用位点（PULs）。

INTRODUCTION

Importance of complex carbohydrates

复合碳水化合物的重要性

Carbohydrates are one of the four major classes of large biopolymers found in all cells together with nucleic acids, proteins, and lipids. Carbohydrates include monosaccharides, oligosaccharides, and polysaccharides. Hybrid biopolymers with carbohydrates covalently linked to other biopolymers, such as glycoproteins and glycolipids, are called glycoconjugates. Complex carbohydrates and glycoconjugates are synthesized, degraded, and modified by carbohydrate active enzymes (CAZymes) in all organisms (1). Particularly, plants use photosynthesis to convert carbon dioxide and water into sugars, which are further turned into carbohydrates such as starches and celluloses with the help of CAZymes. Therefore, CAZymes are vitally important for plants and plant-associated animals and microbes, and not surprisingly CAZyme genes are particularly abundant in genomes of plants and plant-degrading microbes (2,3).

碳水化合物是与核酸、蛋白质和脂质一起存在于所有细胞中的四类主要大型生物聚合物之一。碳水化合物包括单糖、低聚糖和多糖。碳水化合物与其他生物聚合物共价连接的混合生物聚合物，如糖蛋白和糖脂，被称为糖共轭物。复杂的碳水化合物和糖共轭物在所有生物体中都是由碳水化合物活性酶（CAZymes）合成、降解和修饰的（1）。特别是，植物利用光合作用将二氧化碳和水转化为糖类，在CAZymes的帮助下进一步转化为淀粉和纤维素等碳水化合物。因此，CAZymes对植物和与植物相关的动物和微生物来说是极其重要的，而且CAZyme基因在植物和降解植物的微生物的基因组中特别丰富，这一点不足为奇（2,3）。

Importance of CAZymes

CAZymes的重要性

In addition to their significance in bioenergy and agricultural industries (4), CAZymes are also extremely important for human health (5). This is because humans and other animals depend on bacteria living in the digestive tracts to degrade various indigestible carbohydrates and salvage nutrients (6). It has been shown that the genomes of animal gut bacteria encode hundreds of carbohydrate-degrading GH (glycoside hydrolase) genes, in contrast to only 17 digestive GH genes encoded in the human genome (7). Recent research has suggested that altering the dietary carbohydrate composition has a profound impact on the gut microbiota structure, which further influence the human health (8,9).

除了在生物能源和农业产业中的意义外（4），CAZymes对人类健康也极为重要（5）。这是因为人类和其他动物依靠生活在消化道中的细菌来降解各种难以消化的碳水化合物并挽救营养物质（6）。已有研究表明，动物肠道细菌的基因组编码了数百个降解碳水化合物的GH（糖苷水解酶）基因，而人类基因组中只编码了17个消化道GH基因（7）。最近的研究表明，改变饮食中的碳水化合物成分对肠道微生物群结构有深刻的影响，从而进一步影响人类健康（8,9）。

CAZy database

Since 1990s over 360 CAZyme families have been defined and classified by the CAZy database (10), forming six major classes: glycosyltransferases [GTs], glycoside hydrolases [GHs], polysaccharide lyases [PLs], carbohydrate esterases [CEs], carbohydrate-binding module [CBM] and enzymes for the auxiliary activities [AAs]. CAZy also assigns GenBank proteins to CAZyme families and these CAZy pre-annotated proteins are the foundation for sequence similarity-based CAZyme annotation.

自20世纪90年代以来，超过360个CAZyme家族被CAZy数据库定义和分类（10），形成六个主要类别：糖基转移酶[GTs]、糖苷水解酶[GHs]、多糖裂解酶[PLs]、碳水化合物酯酶[CEs]、碳水化合物结合模块[CBM]和辅助性活动的酶[AAs]。CAZy还将GenBank的蛋白质分配给CAZyme家族，这些CAZy预注解的蛋白质是基于序列相似性的CAZyme注释的基础。

Methods for CAZyme annotation

Owing to the importance of CAZymes, newly sequenced genomes are often analyzed for putative CAZymes (collectively named CAZome). Two approaches of CAZome annotation exist in the literature:

由于CAZymes的重要性，新测序的基因组经常被分析为推定的CAZymes（统称为CAZome）。文献中存在两种CAZome注释的方法。

(A) Users contact the CAZy database for collaboration, who will perform semi-automatic CAZome annotation for the users (11); as expert manual curations are involved, CAZy annotation is regarded as the gold standard method.

(A) 用户联系CAZy数据库进行合作，他们将为用户进行半自动的CAZome注释(11)；由于涉及到专家的手工整理，CAZy注释被认为是黄金标准方法。

(B) Users run automatic tools such as HMMER (12) or BLAST (13) by themselves for CAZome annotation on their own computers or on the web (see below). Before 2012, BLAST was often used to search against CAZy pre-annotated proteins on users’ own computers.

(B) 用户自己在自己的电脑上或网络上运行自动工具，如HMMER（12）或BLAST（13），进行CAZome注释（见下文）。在2012年之前，BLAST经常被用来在用户自己的电脑上对照CAZy预先注释的蛋白质进行搜索。

In 2010, CAT (CAZyme Analysis Toolkit) was developed as a web server, which allows users to run both BLAST and HMMER searches remotely on the CAT web server (14). The HMMER search is run against Pfam HMMs (hidden Markov models) that are associated with CAZy preannotated CAZymes.

2010年，CAT（CAZyme Analysis Toolkit）被开发成一个网络服务器，它允许用户在CAT网络服务器上远程运行BLAST和HMMER搜索（14）。HMMER搜索是针对与CAZy预注解的CAZymes相关的Pfam HMMs（隐马尔可夫模型）运行的。

In 2012, we developed dbCAN, a database of HMMs for CAZyme family-specific signature domains (4). Different from CAT, for each CAZyme family we retrieved its signature domains from CAZy pre-annotated members, by searching against the CDD (conserved domain database of NCBI) database and manual literature curation; we then built our own HMMs for most CAZyme families instead of using Pfam HMMs.

2012年，我们开发了dbCAN，一个针对CAZyme家族特定特征域的HMMs数据库（4）。与CAT不同的是，对于每个CAZyme家族，我们通过搜索CDD（NCBI保守域数据库）数据库和人工文献整理，从CAZy预先注释的成员中检索其特征域；然后我们为大多数CAZyme家族建立自己的HMMs，而不是使用Pfam HMMs。

We update dbCAN almost once a year, by creating HMMs for CAZyme families and subfamilies newly created in the CAZy database (Figure 1). Users can download our HMMs and run HMMER locally for automated CAZome annotation. We also provide a Perl script to help parse the HMMER output, which returns CAZyme signature domains, their boundaries, E-values, and HMM domain coverage. Such domain-based annotation is particularly useful for CAZymes, as they tend to be modular proteins with multiple CAZyme domains and sometime domain repeats (e.g. multiple CBMs of the same family).

我们几乎每年更新一次dbCAN，为CAZyme家族和CAZy数据库中新创建的亚家族创建HMMs（图1）。用户可以下载我们的HMMs并在本地运行HMMER以实现CAZome的自动注释。我们还提供了一个Perl脚本来帮助解析HMMER的输出，该脚本返回CAZyme特征域、其边界、E值和HMM域覆盖率。这种基于域的注释对CAZymes特别有用，因为它们往往是具有多个CAZyme域的模块化蛋白质，有时会出现域的重复（例如同一家族的多个CBM）。

Figure 1

Figure 1. dbCAN is updated every year and now has 575 HMMs. X-axis: year; Y-axis: number of HMMs of families (blue) and subfamilies (red).

图1. dbCAN每年更新一次，现在有575个HMMs。X轴：年份；Y轴：科（蓝色）和亚科（红色）的HMMs的数量。

To help users who do not have programming experience, we also developed a web server to allow users submit protein sequences and run HMMER on our server to identify CAZymes. With the CAT website no longer maintained since 2013 and eventually obsolete in 2017, dbCAN has become the only web server that is still actively updated and offering online CAZyme annotation service.

为了帮助没有编程经验的用户，我们还开发了一个网络服务器，允许用户提交蛋白质序列并在我们的服务器上运行HMMER来识别CAZymes。随着CAT网站自2013年起不再维护，并在2017年最终被淘汰，dbCAN成为唯一仍在积极更新并提供在线CAZyme注释服务的网络服务器。

In 2017, a new tool named Hotpep (15) annotates CAZymes by searching against PPR (peptide pattern recognition) library for conserved short peptide motifs (16) present in different CAZyme families. In the PPR library, each CAZyme family has a set of 6-mer peptides that are conserved in that family, and Hotpep is used to scan new proteins for the presence of these peptides in order to assign the query proteins into existing CAZyme families.

2017年，一个名为Hotpep（15）的新工具通过针对PPR（肽模式识别）库搜索不同CAZyme家族中存在的保守的短肽图案（16）来注释CAZymes。在PPR库中，每个CAZyme家族都有一组在该家族中保守的6-mer肽，Hotpep被用来扫描新蛋白质中是否存在这些肽，以便将查询的蛋白质归入现有的CAZyme家族。

Importance of automated CAZyme annotation

自动CAZyme注释的重要性

It should be mentioned that approach B is actually also included in approach A, but can be fully automated and carried out in the users’ own hands. Using CAZy already annotated CAZomes to benchmark the automated CAZyme annotation found >90% of accuracy typically for model bacterial genomes (3). Clearly, as more and more genomes and metagenomes becoming available, such automated CAZome annotation has a clear advantage over annotation by CAZy through collaboration, in that users can quickly obtain the candidate CAZyme gene list by themselves as part of their bioinformatics pipeline for genome annotation.

值得一提的是，方法B实际上也包括在方法A中，但可以完全自动化，由用户自己进行。使用CAZy已经注释过的CAZomes来作为自动CAZyme注释的基准，发现典型的模型细菌基因组的准确率>90%（3）。显然，随着越来越多的基因组和宏基因组的出现，这种自动CAZome注释比CAZy通过合作进行的注释具有明显的优势，因为用户可以自己快速获得候选CAZyme基因列表，作为他们基因组注释的生物信息学管道的一部分。

Indeed, the popularity of automated CAZome annotation can be manifested by citations of the two approaches. Specifically, ∼100 papers have been published since 2012 with CAZomes annotated by collaboration with CAZy (according to http://www.cazy.org/Genomes.html). As a comparison, more than 300 papers have been published since 2012 using dbCAN for CAZome annotation (according to Google Scholar: https://scholar.google.com/scholar?cites=5112424923296812233 , only counted papers that used the tool for finding CAZymes), and more than 100 papers have been published since 2012 using CAT for CAZome annotation (according to Google Scholar: https://scholar.google.com/scholar?cites=12948408578800903520, also only counted papers that used the tool for finding CAZymes).

事实上，自动CAZome注释的普及可以从两种方法的引用中体现出来。具体来说，自2012年以来，有100多篇论文是通过与CAZy合作进行CAZomes注释而发表的（根据http://www.cazy.org/Genomes.html）。作为比较，自2012年以来，使用dbCAN进行CAZome注释的论文有300多篇（根据谷歌学术：https://scholar.google.com/scholar?cites=5112424923296812233，只计算了使用该工具寻找CAZymes的论文），自2012年以来，使用CAT进行CAZome注释的论文有100多篇（根据谷歌学术：https://scholar.google.com/scholar?cites=12948408578800903520，也只计算了使用该工具寻找CAZymes的论文）。

Lastly, the availability of dbCAN HMMs has also enabled other bioinformatics tools to incorporate CAZyme annotation step into their data analysis workflows, e.g., MOCAT2 (17), DemaDb (18), proGenomes (19) and SACCHARIS (20).

最后，dbCAN HMMs的可用性也使其他生物信息学工具能够将CAZyme注释步骤纳入其数据分析工作流程，例如MOCAT2（17）、DemaDb（18）、proGenomes（19）和SACCHARIS（20）。

NEW FUNCTIONS AND UPDATES

Figure 2 shows the overall design of dbCAN2, an updated meta server of dbCAN server, which has the following new functions: (i) allows submission of DNA sequences in addition to protein sequences; (ii) integrates three state-of-theart tools/databases for automated CAZyme annotation; (iii) can identify transcription factors (TFs), transporters (TCs), and further CAZyme gene clusters (CGCs) using CGC-Finder (3); (iv) combines the results from the three tools, allows visualization as a Venn diagram and detailed results as graphs, and offers an easy solution to download results as text files.

图2显示了dbCAN2的整体设计，它是dbCAN服务器的一个更新的宏服务器，具有以下新功能。(i) 除蛋白质序列外，还允许提交DNA序列；(ii) 集成了三个最先进的工具/数据库，用于自动CAZyme注释；(iii) 可以识别转录因子（TF）、转运体（TC），并使用CGC-Finder（3）进一步识别CAZyme基因簇（CGC）；(iv) 结合三个工具的结果，允许以维恩图的形式进行可视化，并以图表形式显示详细结果，并提供一个简单的解决方案，以文本文件下载结果。

Figure 2

Figure 2. Overall design of dbCAN2 meta server. GCPU (gene cluster plot utility) and CGC-Finder (CAZyme gene cluster finder) are two tools developed for dbCAN2.

图2. dbCAN2宏服务器的总体设计。GCPU（基因簇绘图工具）和CGC-Finder（CAZyme基因簇搜索器）是为dbCAN2开发的两个工具。

DNA sequence submission

In addition to protein submission, dbCAN2 now also accepts nucleotide sequences, e.g. the complete or draft genomes and metagenomes of prokaryotes. Protein sequences are predicted by calling Prodigal (21) if the query is genomes, or FragGeneScan (22) if the query is short DNAs from metagenomes or mRNAs or coding sequences of proteins. As eukaryotic gene prediction is more complex and often needs additional input data (e.g. transcriptome data), users should perform gene predictions for eukaryotic genomes elsewhere and only submit protein sequences to dbCAN2.

除了蛋白质的提交，dbCAN2现在也接受核苷酸序列，例如原核生物的完整或草稿基因组和宏基因组。如果查询的是基因组，则通过调用Prodigal（21）来预测蛋白质序列；如果查询的是来自宏基因组的短DNA或mRNA或蛋白质的编码序列，则调用FragGeneScan（22）。由于真核生物的基因预测比较复杂，往往需要额外的输入数据（如转录组数据），因此用户应该在其他地方进行真核生物基因组的基因预测，只提交蛋白质序列给dbCAN2。

Meta server of three tools/databases

The dbCAN web server (http://csbl.bmb.uga.edu/dbCAN/) currently provides HMMER search against dbCAN HMM database, and also DIAMOND (23) search against CAZy pre-annotated CAZyme sequence database. However, the results from the two tools are presented on two separate pages and not integrated at any level. In dbCAN2, we have added the third tool: Hotpep search against the PPR short peptide library. We have also systematically compared the outputs of the three tools against the CAZy pre-annotated CAZomes (i.e. as the gold standard sets) of three bacterial genomes and three eukaryotic genomes (Supplementary Table S1), in order to: (i) find the best parsing thresholds (e.g. E-value) for each tool, (ii) evaluate the annotation performance of the three tools and (iii) find the best way to aggregate the three outputs to achieve the best annotation performance.

dbCAN网络服务器（http://csbl.bmb.uga.edu/dbCAN/）目前提供了针对dbCAN HMM数据库的HMMER搜索，以及针对CAZy预注解CAZyme序列数据库的DIAMOND（23）搜索。然而，这两个工具的结果是在两个独立的页面上显示的，没有在任何层面上进行整合。在dbCAN2中，我们增加了第三个工具。针对PPR短肽库的Hotpep搜索。我们还将这三个工具的输出结果与三个细菌基因组和三个真核生物基因组的CAZy预注释CAZomes（即作为黄金标准集）进行了系统的比较（补充表S1），以便。(i) 找到每个工具的最佳解析阈值（如E值），(ii) 评估三个工具的注释性能，(iii) 找到汇总三个输出的最佳方式，以实现最佳的注释性能。

The accuracy is calculated as an F-score = 2 × (Recall × Precision)/(Recall + Precision) for the three tools on each examined genome, following the method presented in our previous papers (2,3). We removed unclassified CAZymes (e.g. GH0) and families not in the PPR library when calculating F-scores. Supplementary Table S1 presents the best parsing thresholds that we selected to use for the web server: (i) for HMMER+dbCAN, we use E-value <1e–15 and coverage >0.35; (ii) for DIAMOND+CAZy, we use E-value <1e–102 and (iii) for Hotpep+PPR, we use the number of conserved peptide hits >6 and the sum of conserved peptide frequencies >2.6. Table 1 shows that DIAMOND+CAZy has the highest F-score (0.89) for bacteria but the lowest F-score for eukaryotes (0.84); in contrast, Hotpep + PPR has the highest F-score (0.94) for eukaryotes but the lowest F-score for bacteria (0.80). HMMER + dbCAN performs very well for both eukaryotes (0.86) and bacteria (0.88) and a slightly higher overall F-score than the other two tools (Supplementary Table S1). In terms of running time, DIAMOND runs the fastest, followed by Hotpep and HMMER.

按照我们以前的论文（2,3）中提出的方法，在每个被检查的基因组上，三个工具的准确度被计算为F分数=2×（召回率×精确度）/（召回率+精确度）。在计算F分数时，我们删除了未分类的CAZymes（如GH0）和不在PPR库中的家族。补充表S1列出了我们选择用于网络服务器的最佳解析阈值：（i）对于HMMER+dbCAN，我们使用E值<1e-15和覆盖率>0.35；（ii）对于DIAMOND+CAZy，我们使用E值<1e-102和（iii）对于Hotpep+PPR，我们使用保守肽点击率>6和保守肽频率之和>2.6。表1显示，DIAMOND+CAZy对细菌的F分数最高（0.89），但对真核生物的F分数最低（0.84）；相反，Hotpep+PPR对真核生物的F分数最高（0.94），但对细菌的F分数最低（0.80）。HMMER + dbCAN对真核生物（0.86）和细菌（0.88）都有很好的表现，总体的F分数比其他两个工具略高（补充表S1）。就运行时间而言，DIAMOND运行最快，其次是Hotpep和HMMER。

More importantly, we found that the best performance of automated CAZyme annotation is to aggregate the outputs of the three tools and keep candidates found by at least two tools. Table 1 shows that the F-score can be increased to 0.93 when keeping proteins found by at least two tools.

更重要的是，我们发现自动CAZyme注释的最佳性能是汇总三个工具的输出，并保留至少两个工具发现的候选蛋白。表1显示，当保留至少两个工具发现的蛋白质时，F-score可以提高到0.93。

Table 1

aTwenty four CAZyme families are classified into 207 subfamilies by phylogenetic clustering and CAZy expert curation (10). bThree hundred and forty two CAZyme families are classified into 7036 groups by PPR (15,16). cThe time is in seconds and calculated on Escherichia coli K-12 MG1655 proteome (4140 proteins). The detailed calculations on accuracy and speed are available in Supplementary Table S1. No correspondence has been established between PPR groups and CAZy subfamilies, and in dbCAN web server we only report CAZy subfamily annotation, whenever it is available.

a通过系统发育聚类和CAZy专家策划（10），将24个CAZyme家族分为207个亚家族；b通过PPR（15,16），将342个CAZyme家族分为7036组；c时间以秒为单位，在大肠杆菌K-12 MG1655蛋白质组（4140个蛋白质）上计算。关于准确性和速度的详细计算结果见补充表S1。PPR组和CAZy亚家族之间没有建立对应关系，在dbCAN网络服务器中，只要有CAZy亚家族注释，我们只报告CAZy亚家族。

However, the above F-score calculation only considered whether a protein is found by any of the three tools. When considering if a protein is assigned to the correct family or families, we found that the F-scores for all the three tools had slightly dropped (Supplementary Table S2), with Hotpep + PPR dropped the most (dropped to 0.86 for eukaryotes and 0.70 for bacteria) and HMMER + dbCAN dropped the least (dropped to 0.85 for eukaryotes and 0.82 for bacteria). Additionally, proteins can have multiple CAZyme domains, and it is also interesting to know where the domain boundaries are. Figure 3 shows two example CAZyme proteins found by all the three tools. Both proteins have multiple CAZyme domains according to dbCAN annotation (Figure 3A). According to HMMER + dbCAN output (Figure 3C), AT1G11720.1 is annotated as CBM53(154–237) + CBM53(329–423) + CBM53(496–584) + GT5(595–1038) and YP 002573728.1 as GH9(36–466) +CBM3(491–576) + CBM3(724–804) + CBM3(923–1003) + GH48(1134–1753), i.e. all the CAZyme domains and domain repeats and their positions are reported (Table 1). However, according to both Hotpep + PPR and DIAMOND + CAZy, AT1G11720.1 is annotated as GT5 + CBM53 and YP 002573728.1 as GH9 + GH48 + CBM3, i.e. proteins are assigned to the multiple families correctly, though without reporting domain repeats and positions (Table 1).

然而，上述F分数的计算只考虑了一个蛋白质是否被这三种工具中的任何一种发现。当考虑一个蛋白质是否被分配到正确的科或族时，我们发现三个工具的F分数都略有下降（补充表S2），其中Hotpep + PPR下降最多（真核生物下降到0.86，细菌下降到0.70），HMMER + dbCAN下降最少（真核生物下降到0.85，细菌下降到0.82）。此外，蛋白质可以有多个CAZyme结构域，知道结构域的边界在哪里也很有意思。图3显示了所有三种工具发现的两个CAZyme蛋白的例子。根据dbCAN的注释，这两个蛋白质都有多个CAZyme结构域（图3A）。根据HMMER + dbCAN输出（图3C），AT1G11720.1被注释为CBM53（154-237）+ CBM53（329-423）+ CBM53（496-584）+ GT5（595-1038），YP 002573728. 1为GH9(36-466)+CBM3(491-576)+CBM3(724-804)+CBM3(923-1003)+GH48(1134-1753)，即报告了所有CAZyme结构域和结构域重复及其位置（表1）。然而，根据Hotpep + PPR和DIAMOND + CAZy，AT1G11720.1被注释为GT5 + CBM53，YP 002573728.1被注释为GH9 + GH48 + CBM3，即蛋白质被正确地分配到多个家族，尽管没有报告结构域重复和位置（表1）。

Figure 3

Figure 3. Comparison of annotation results for multi-domain CAZymes using three different tools. (A) Two example proteins (AT1G11720.1 and YP 002573728.1) are illustrated with their CAZyme domain architecture based on dbCAN search. (B) DIAMOND search result for the two proteins showing the best CAZy protein hit; (C) HMMER search result against dbCAN HMM database, from which (A) is derived; (D) Hotpep search result against PPR library; Frequency means the sum of conserved peptide frequencies and Hits means the number of conserved peptide hits (15).

图3. 使用三种不同工具对多域CAZymes的注释结果的比较。(A) 两个例子的蛋白质（AT1G11720.1和YP 002573728.1）与它们基于dbCAN搜索的CAZyme领域结构图示。(B) 这两个蛋白质的DIAMOND搜索结果显示了最佳的CAZy蛋白命中率；(C) 针对dbCAN HMM数据库的HMMER搜索结果，(A)就是来自该数据库；(D) 针对PPR库的Hotpep搜索结果；Frequency指保守肽频率之和，Hits指保守肽命中率的数量（15）。

It should be mentioned that DIAMOND + CAZy has a much higher risk than the other two tools to give wrong CAZyme family annotation. For example, if a query protein only has a GT5 domain and has AAD30251.1 as its best CAZy hit, transferring the family assignment of AAD30251.1 (GT5 + CBM53) to the query would be wrong (as no CBM53 in the query). However, such mistakes will not happen in HMMER and Hotpep searches, as they are conserved domain and motif-based methods.

值得一提的是，与其他两个工具相比，DIAMOND + CAZy给出错误的CAZyme家族注释的风险要高很多。例如，如果一个查询蛋白只有GT5结构域，而AAD30251.1是其最好的CAZy命中，那么将AAD30251.1（GT5+CBM53）的家族分配转移到查询中是错误的（因为查询中没有CBM53）。然而，这种错误不会发生在HMMER和Hotpep搜索中，因为它们是基于保守域和主题的方法。

CAZyme gene clusters (CGCs)

Another important new function of dbCAN2 is that it allows identification of CGCs, when the genomic locations of all genes of the query genome are given. In literature, CGCs are also known as polysaccharide utilization loci (PULs), which are defined as physically linked genes specializing in the degradation of various complex carbohydrates (24). Most experimentally characterized PULs are found in Bacteroidetes genomes (25), but have also been reported in Proteobacteria and Firmicutes of various carbohydrate-rich environments (26). The PULDB of CAZy initially focused on susCD (starch utilization system C and D transporters) associated PULs, and more recently expanded to present CAZyme clusters (3 and more CAZyme genes clustered in the genome) on its website (25). However, PULDB focuses on Bacteroidetes genomes and does not allow online genome submissions for PUL predictions. Recently, we defined CGCs as a more general term of PULs (3), which must contain three classes of signature genes: at least one CAZyme gene, one transporter (TC) gene, and one transcription factor (TF) gene. Between two adjacent signature genes, a certain number of non-signature genes can be inserted. We have developed a Python program (CGCFinder) that can automatically identify CGCs (3).

dbCAN2的另一个重要的新功能是，当查询基因组的所有基因的基因组位置都给定时，它可以识别CGCs。在文献中，CGCs也被称为多糖利用位点（PULs），它被定义为专门用于降解各种复杂碳水化合物的物理连接基因（24）。大多数实验特征的PULs在类杆菌的基因组中发现（25），但也有报道在各种富含碳水化合物的环境中的变形杆菌和韧皮菌中发现（26）。CAZy的PULDB最初专注于susCD（淀粉利用系统C和D转运器）相关的PULs，最近扩展到在其网站上呈现CAZyme集群（基因组中3个及以上CAZyme基因集群）（25）。然而，PULDB专注于类杆菌基因组，不允许在线提交基因组进行PUL预测。最近，我们将CGC定义为PULs的一个更普遍的术语（3），它必须包含三类特征基因：至少一个CAZyme基因、一个转运体（TC）基因和一个转录因子（TF）基因。在两个相邻的签名基因之间，可以插入一定数量的非签名基因。我们已经开发了一个Python程序（CGCFinder），可以自动识别CGC（3）。

In the dbCAN2 job submission page, we provide the ‘Find CAZyme gene clusters’ option. When users submit a protein query file, they must also provide a gene position file in order to predict CGCs. This gene position file is not required if users submit a nucleotide query file, because the gene prediction programs can generate the gene position file internally. With protein sequences, our server will predict TFs and TCs by DIAMOND search against TF and TC databases (explained in (3)), and then CGC-Finder will be called to locate genes of CAZymes, TFs, TCs in the genome, and identify CGCs.

在dbCAN2工作提交页面，我们提供了 "查找CAZyme基因簇 "选项。当用户提交蛋白质查询文件时，他们还必须提供一个基因位置文件，以便预测CGCs。如果用户提交的是核苷酸查询文件，则不需要这个基因位置文件，因为基因预测程序可以在内部生成基因位置文件。有了蛋白质序列，我们的服务器将通过对TF和TC数据库的DIAMOND搜索来预测TF和TC（在（3）中解释），然后调用CGC-Finder来定位基因组中的CAZymes、TF、TC的基因，并识别CGCs。

Web design

For the job submission page, we have options to allow users to specify if they would: (i) use one of the three tools or all three tools for CAZyme annotation; (ii) use protein or nucleotide sequences as input; (iii) use CGC-Finder to predict CGCs. As shown in Figure 2, if nucleotide sequences are submitted, gene prediction programs will be first called to predict protein-coding genes and then protein sequences will be used for CAZyme annotation. If CGC-Finder option is selected, TFs and TCs will also be predicted and the gene location file will be used to predict CGCs.

对于工作提交页面，我们有一些选项，允许用户指定他们是否会。(i) 使用三种工具中的一种或所有三种工具进行CAZyme注释；(ii) 使用蛋白质或核苷酸序列作为输入；(iii) 使用CGC-Finder来预测CGCs。如图2所示，如果提交核苷酸序列，将首先调用基因预测程序来预测蛋白质编码基因，然后使用蛋白质序列进行CAZyme注释。如果选择CGC-Finder选项，TFs和TCs也将被预测，基因位置文件将被用于预测CGCs。

For the result page (Figure 4), five tabs are shown each with a data table: (i) HMMER result table; (ii) DIAMOND result table; (iii) Hotpep result table; (iv) Overview table; (v) CGC-Finder table. Above the tabs, a Venn diagram is shown to illustrate the overlaps among the outputs of the three tools (Figure 4A). Click on any numbers in the diagram will open a pop-out window displaying the protein IDs in that region.

在结果页面（图4），显示了五个标签，每个标签都有一个数据表：（i）HMMER结果表；（ii）DIAMOND结果表；（iii）Hotpep结果表；（iv）概述表；（v）CGC-Finder表。在标签的上方，显示了一个维恩图，以说明三个工具的输出结果之间的重叠（图4A）。点击图中的任何数字都会打开一个弹出的窗口，显示该区域的蛋白质ID。

Figure 4

Figure 4. Screenshots of dbCAN2 result pages. (A) Venn diagram to show overlaps among the results of the three tools; (B) CGC-Finder result tab; (C) Overview tab combining results from the three tools and SignalP; (D) genomic location plot of an example CGC (signature genes are in red, green and blue colors, while non-signature genes are in gray); (E) detailed information of an example CGC.

图4. dbCAN2结果页面的截图。(A) 显示三种工具结果重叠的维恩图；(B) CGC-Finder结果标签；(C) 结合三种工具和SignalP结果的概览标签；(D) 一个例子CGC的基因组位置图（标志性基因为红、绿、蓝三色，而非标志性基因为灰色）；(E) 一个例子CGC的详细信息。

The Overview tab combines the results of the three CAZyme annotation tools plus SignalP (27) prediction result (Figure 4C). The number of tools that find a CAZyme protein is also shown as a column, in addition to the CAZyme family assignment (for DIAMOND and Hotpep) and domain assignment (for HMMER). Users can sort the Table according to the number of tools column and easily filter out proteins found by only one tool to get the most accurate CAZyme list.

概述 "选项卡结合了三个CAZyme注释工具的结果和SignalP（27）的预测结果（图4C）。找到CAZyme蛋白的工具数量也作为一列显示，此外还有CAZyme家族分配（对于DIAMOND和Hotpep）和结构域分配（对于HMMER）。用户可以根据工具的数量对该表进行排序，并轻松地过滤掉仅由一个工具发现的蛋白质，以获得最准确的CAZyme列表。

The CGC-Finder tab presents the CGCs identified in the query genome/proteome, with columns such as the genomic locations of the CGC and the three classes of signature genes in the CGCs (Figure 4B). The default parameters in running CGC-Finder include: (i) at least one CAZyme and one TC genes and (ii) the number of non-signature genes that are allowed to be inserted between two adjacent signature genes is ≤2. The two parameters can be changed underneath the CGC table to rerun CGC-Finder and then the CGC-Finder tab will be updated to display the new CGC list.

CGC-Finder选项卡显示了在查询基因组/蛋白质组中确定的CGC，其中有CGC的基因组位置和CGC中的三类特征基因等栏目（图4B）。运行CGC-Finder时的默认参数包括。(i）至少一个CAZyme和一个TC基因；（ii）允许在两个相邻的签名基因之间插入的非签名基因的数量为≤2。这两个参数可以在CGC表下改变，重新运行CGC-Finder，然后CGC-Finder标签就会更新，显示新的CGC列表。

Clicking on each CGC opens a new page showing the CGC genomic context plot using GCPU (gene cluster plotting utility), a Python script we developed to plot the genes in the CGCs as arrows in different colors (Figure 4D). Below the plot is a Table (Figure 4E), which shows the detailed genomic location of each member gene in the CGC, including the distance of a signature gene from its upstream signature gene (Upstream distance) and the distance from its downstream signature gene (Downstream distance), as well as their best DIAMOND hits in the CAZy, TF and TC databases.

点击每个CGC可以打开一个新的页面，显示CGC的基因组背景图，使用GCPU（基因组绘图工具），这是我们开发的一个Python脚本，将CGC中的基因绘制成不同颜色的箭头（图4D）。图的下面是一个表（图4E），显示了CGC中每个成员基因的详细基因组位置，包括一个特征基因与其上游特征基因的距离（上游距离）和与其下游特征基因的距离（下游距离），以及它们在CAZy、TF和TC数据库中的最佳DIAMOND结果。

In all the five tabs and the individual CGC page, links to tab-delimited plain text files are provided for users to conveniently download and open in their local computers using Excel spreadsheet for further analysis. The Venn diagram and the CGC plot can also be downloadable as image files (e.g. SVG and PDF) and further edited by the users using Illustrator.

在所有五个标签和单个CGC页面中，都提供了标签限定的纯文本文件的链接，方便用户下载并在本地计算机中使用Excel电子表格打开，以便进一步分析。维恩图和CGC图也可以下载为图像文件（如SVG和PDF），并由用户用Illustrator进一步编辑。

Lastly, we also provide a web page for each CAZyme protein to plot its dbCAN domains and PPR conserved peptides in the sequence. We also allow users to download a master script to run all tools as well as the CGC-Finder program on their local computers.

最后，我们还为每个CAZyme蛋白提供了一个网页，以绘制其在序列中的dbCAN域和PPR保守肽。我们还允许用户下载一个主脚本，在他们的本地计算机上运行所有工具以及CGC-Finder程序。

CONCLUSIONS

dbCAN2 is a web server for automated carbohydrate-active enzyme annotation. It is an updated version of the original dbCAN web server, and has the following new features:

dbCAN2是一个用于自动化碳水化合物活性酶注释的网络服务器。它是原dbCAN网络服务器的更新版本，并具有以下新功能。

(1) dbCAN2 allows submission of nucleotide sequences: genomic sequences of prokaryotic draft genomes and metagenomes;

(1) dbCAN2允许提交核苷酸序列：原核生物基因组草案和宏基因组的基因组序列。

(2) dbCAN2 integrates three state-of-the-art tools/databases for automated CAZyme annotation: (i) HMMER for annotated CAZyme domain boundaries determination according to the dbCAN CAZyme domain HMM database; (ii) DIAMOND for fast Blast hits in the CAZy database; (iii) Hotpep for short conserved motifs in the PPR library;

(2) dbCAN2集成了三个最先进的工具/数据库用于自动CAZyme注释：(i) HMMER用于根据dbCAN CAZyme结构域HMM数据库确定注释的CAZyme结构域边界；(ii) DIAMOND用于CAZy数据库中的快速Blast点击；(iii) Hotpep用于PPR库中的短保守主题。

(3) dbCAN2 can also identify transcription factors (TFs), transporters (TCs), and further CAZyme gene clusters (CGCs) using CGC-Finder if users submit protein sequences plus gene location files or genomic DNA sequence file;

(3）如果用户提交蛋白质序列加上基因位置文件或基因组DNA序列文件，dbCAN2还可以使用CGC-Finder识别转录因子（TFs）、转运体（TCs）和进一步的CAZyme基因簇（CGCs）。

(4) dbCAN2 combines the results from the three tools and allows visualization of the overlaps as Venn diagram and the detailed results as graphs. dbCAN2 meta server will be updated once a year to use the most updated CAZy database, dbCAN HMM database and Hotpep peptide database.

(4) dbCAN2结合了三个工具的结果，并允许以维恩图的形式对重叠部分进行可视化，以图表的形式对详细结果进行可视化。dbCAN2元服务器将每年更新一次，以使用最新的CAZy数据库、dbCAN HMM数据库和Hotpep肽数据库。

你可能感兴趣的:(算法文献阅读：dbCAN2：一个用于自动化碳水化合物活性酶注释的宏服务器)

使用STM32实现LCD显示粉绿色的西瓜大大 stm32 单片机嵌入式硬件
实现LCD显示内容的关键是通过STM32控制LCD的驱动芯片，将要显示的内容以二进制的方式发送给驱动芯片，然后由驱动芯片控制液晶屏幕显示。下面是一个使用STM32实现LCD显示内容的简单案例，详细说明如下：硬件准备：准备一块STM32开发板和一个带有驱动芯片的LCD屏幕。将LCD屏幕与STM32开发板通过引脚连接。引入必要的库文件：在代码中引入STM32的相关库文件，这些库文件包含了对STM32的
【头歌C语言程序与设计】数据类型与基本操作畅游星辰大海 #头歌C语言程序设计 c语言
目录写在前面正文第1关：数值与字符的通用性实验第2关：转义字符实验第3关：浮点数实验第4关：数值类型综合实验写在最后写在前面本文代码是我自己所作，本人水平有限，可能部分代码看着不够简练，运行效率不高,但都能运行成功。另外，如果想了解更多，请订阅专栏头歌C语言程序与设计正文第1关：数值与字符的通用性实验本关任务：了解C语言中字符型和整型的通用性，根据提示，输出字母p-Q的数值大小，理解英文姓名排序方
聊聊langchain4j的Tools(Function Calling) langchain4j
序本文主要研究一下langchain4j的Tools(FunctionCalling)示例tool@Slf4jpublicclassWeatherTools{@Tool("Returnstheweatherforecastfortomorrowforagivencity")StringgetWeather(@P("Thecityforwhichtheweatherforecastshouldber
python笔记1 lu_32 python
1.计算面积与周长：r=8s=r*rprint("面积是")print(s)z=r+r+r+rprint("周长是")print(z)#面积是#64#周长是#322.输入圆的半径，计算出圆的面积和周长：r=input("请输入半径：")r=float(r)s=3.14*r*rprint("圆的面积：",s)r=input("请输入圆的半径")r=int(r)s=3.14*r*rprint("圆的半
String类型为什么不可变 27xixi java高频 java
在大多数编程语言（如Java、Python、C#等）中，String类型被设计为不可变（Immutable），这意味着一旦一个字符串对象被创建，它的值就不能被修改。以下是这一设计的原因及具体表现：一、不可变性的表现直接修改字符串会创建新对象Stringstr="Hello";str=str+"World";//实际是创建了一个新字符串对象，而非修改原对象原字符串“Hello”未被修改，而是生成了新
浏览器中输入 URL 到显示主页的完整过程 27xixi java高频 java
在浏览器中输入URL到显示主页的完整过程涉及网络通信、资源加载、渲染引擎协作等多个环节。以下是详细步骤：URL解析输入处理：浏览器解析URL格式（协议、域名、路径、参数等），若未指定协议（如直接输入example.com），默认补全为http://或https://。安全检查：检查地址合法性（如屏蔽恶意域名）。DNS解析将域名转换为IP地址：本地缓存查询浏览器缓存→系统hosts文件→路由器缓存→
Java volatile 关键字详解 27xixi java高频 java 单例模式开发语言
Javavolatile关键字详解1.volatile的作用与原理可见性保证：volatile修饰的变量在修改后，会立即同步到主内存，其他线程读取时直接从主内存获取最新值，确保多线程环境下的可见性。例如：volatilebooleanflag=false;当线程A修改flag为true后，线程B能立即感知到变化。禁止指令重排序：volatile通过插入内存屏障（MemoryBarrier）禁止编译
TCP建立连接的三次握手过程枫凯网络协议 java tcp 三次握手
TCP是因特网中的传输层协议，使用三次握手协议建立连接，下面是TCP建立连接的全过程。上图画出了TCP建立连接的过程。假定主机A运行的是TCP客户程序，B运行的是TCP服务器程序。最初两端的TCP进程都处于CLOSED状态。图中在主机下面的是TCP进程所处的状态。A是主动打开连接，B是被动打开连接。B的TCP服务器进程先创建传输控制模块TCB，准备接受客户进程的连接请求，然后服务器进程就处于LIS
二、docker 存储阿无@_@ docker学习 docker eureka 容器
docker四种方式：默认、volumes数据卷、bindmounts挂载、tmpfsmount(仅在linux环境中提供)，其中volumes、bindmounts两种实现持久化容器数据；默认：数据保存在运行的容器中，容器删除后，数据也随之删除；volumes：数据卷，数据存放在主机文件系统/var/lib/docker/volumes/目录下，该目录由docker管理，其它进程不允许修改，推荐
谷歌Gemini 3大模型发布，AI领域再掀波澜！广拓科技人工智能
在人工智能的浩瀚宇宙中，每一次重大突破都如同一颗璀璨的新星，照亮我们对未来的想象。而近期，谷歌发布的Gemini3大模型，无疑是其中最为耀眼的存在，它在AI领域激起的波澜，迅速蔓延至全球科技圈，引发了广泛关注与热烈讨论。随着AI技术的迅猛发展，我们已经见证了众多令人惊叹的创新成果。从智能语音助手到图像识别技术，从自动驾驶汽车到医疗诊断辅助系统，AI正以前所未有的速度改变着我们的生活和工作方式。在这
Flet 项目常见问题解决方案龙香令Beatrice
Flet项目常见问题解决方案fletFletenablesdeveloperstoeasilybuildrealtimeweb,mobileanddesktopappsinPython.Nofrontendexperiencerequired.项目地址:https://gitcode.com/gh_mirrors/fl/flet1.项目基础介绍和主要编程语言Flet是一个开源框架，允许开发者在Py
Flet 框架教程樊贝路Strawberry
Flet框架教程fletFletenablesdeveloperstoeasilybuildrealtimeweb,mobileanddesktopappsinPython.Nofrontendexperiencerequired.项目地址:https://gitcode.com/gh_mirrors/fl/flet1.项目介绍Flet是一个框架，它允许开发者使用Python轻松构建实时的Web、
虚幻引擎入门_光照 MJ-MK 虚幻引擎入门虚幻游戏引擎
光照静态/固定/可移动物体静态物体在任何情况都不允许移动，且允许光照烘焙，渲染速度最快，开销最小。固定物体不能在运行时运动，启用光照缓存，缓存动态阴影。可移动物体可以在运行时移动，投射动态阴影，渲染速度最慢。静态/固定/可移动光源定向光源是固定光源，是平行光，只能旋转，移动和缩放都不造成影响，常用于模拟太阳光。点光源类似白炽灯，可以移动和缩放，但旋转没有意义，常用于区域照明。聚光源可以移动、旋转、
数值类型自学引导 Ssaty. python 前端数据库
第1关：计算边长为整数的正方形面积任务描述本关任务：编写一个能计算正方形面积的小程序。相关知识为了完成本关任务，你需要掌握：1.输入函数2.字符串转整数3.数值运算4.输出函数#输入一个正整数，以其数值为正方形的边长，计算并输出正方形的面积width=int(input())print(width**2
电网电压暂态扰动机理与工业设备抗失压防护策略研究安科瑞-小李单片机嵌入式硬件
什么是晃电？国标GB/T30137-2013中定义:工频电压方均根值突然降至额定值的90%~10%，持续时间为10ms~1min后恢复正常的现象。Acrel8757+V晃电的原因1.系统侧因素短路故障：雷击、线路接地、设备误碰等导致电网短路，故障点电压骤降，并通过电网传播至用户侧，是造成严重电压暂降的主因（占配电网故障的95.4%）保护装置动作：自动重合闸、备用电源切换等操作会引起短时电压波动2.
Spring家族三体问题：从XML地狱到自动装配的救赎之路桃木山人深挖面经 spring xml java
标准答案（技术定义版）1.SpringFramework定义：轻量级Java开发框架，提供全面的基础设施支持核心功能：IoC容器：通过依赖注入（DI）管理对象生命周期与依赖关系AOP：面向切面编程，实现日志、事务等横切关注点事务管理：声明式事务（@Transactional）与编程式事务数据访问：集成JDBC、ORM框架的统一抽象层关键特性：模块化设计（spring-core,spring-con
JavaScript性能优化指南：聚焦DOM操作优化桃木山人技术杂谈 javascript 性能优化开发语言
引言：性能优化的关键路径在Web应用开发中，JavaScript性能直接影响用户体验。虽然存在多种优化手段，但DOM操作优化往往能带来最显著的性能提升。本文将以DOM操作为核心展开深入分析，并简要概述其他优化方向。核心优化：DOM操作性能提升1.问题根源分析浏览器渲染引擎与JavaScript引擎独立运作，频繁的DOM操作会导致：重排（Reflow）：计算元素几何属性重绘（Repaint）：更新元
1025. 【USACO题库】2.2.1 Preface Numbering序言页码 (❁´◡`❁)Jimmy(❁´◡`❁) 粉丝才可以看的NC题解算法
文章目录题目描述输入输出样例输入样例输出题解代码题目描述一类书的序言是以罗马数字标页码的。传统罗马数字用单个字母表示特定的数值，一下是标准数字表:I1L50M1000V5C100X10D500最多3个可以表示为10n的数字(I,X,C,M)可以连续放在一起，表示它们的和:III=3CCC=300可表示为5x10n的字符(V,L,D)从不连续出现。除了下一个规则，一般来说，字符以递减的顺序接连出现:
1046. 【USACO题库】3.2.2 Stringsobits__01串 (❁´◡`❁)Jimmy(❁´◡`❁) 粉丝才可以看的NC题解 C++算法
题目:题目描述考虑排好序的N(N<=31)位二进制数。你会发现，这很有趣。因为他们是排列好的，而且包含所有可能的长度为N且含有1的个数小于等于L(L<=N)的数。你的任务是输出第I（1<=I<=长度为N的二进制数的个数）大的，长度为N，且含有1的个数小于等于L的那个二进制数。输入从文件kimbits.in中读入数据。共一行，用空格分开的三个整数N，L，I。输出输出到文件kimbits.out中。共
3251: 【基础】卒的遍历 (❁´◡`❁)Jimmy(❁´◡`❁) #oj题解算法数据结构 c++
题目描述在一张n*m的棋盘上（如6行7列）的最左上角（1,1）的位置有一个卒。该卒只能向下或者向右走，且卒采取的策略是先向下，下边走到头就向右，请问从（1,1）点走到（n,m）点可以怎样走，输出这些走法。输入两个整数n，m代表棋盘大小（3=2,1->3,1->3,2->3,32:1,1->2,1->2,2->3,2->3,33:1,1->2,1->2,2->2,3->3,34:1,1->1,2->
STMicroelectronics 系列：STM32H7 系列_（1）.STM32H7系列概述 kkchenkx 机器人控制系统和单片机开发 stm32 嵌入式硬件单片机
STM32H7系列概述1.引言STM32H7系列是STMicroelectronics公司推出的一款高性能、低功耗的32位微控制器系列。该系列基于ArmCortex-M7内核，具有强大的处理能力、丰富的外设和先进的安全性特性，适用于需要高性能计算和复杂算法处理的应用场景。本节将详细介绍STM32H7系列的主要特点、架构和应用场景，帮助读者快速了解该系列微控制器的基本信息。
Python字符串 DDD小小小宇宙 python 开发语言
字符串1.程序中需要加上双引号或者双引号来表示字符串2.字符串可以存放任意数量的字符，无法修改的数据容器字符串运算：加法：多个字符串按照次序合并为一个字符串在实际使用的时候，数字和字符串的加法通常需要将数字的类型转换成str乘法：1个字符串乘以n，可以得到n个复制的字符串例子：输入一个字符，使用该字符打印一个3层的金字塔x=input(':')print(""+x)print(""+x+x+x)p
TEX Quotes(UVA 272) (❁´◡`❁)Jimmy(❁´◡`❁) #oj题解 UVA的题目 c++算法
题目标签:点这里懒人题干给你一文本，将其中奇数个"替换为``（两个`），偶数个"替换为''（两个'）。DescriptionTEX是由DonaldKnuth开发的一种排版语言。它将源文本与一些排版指令结合在一起，希望能产生一个漂亮的文件。排版好看文件使用“和“来限定引号，而不是使用大多数键盘提供的无聊的"来限定。键盘通常没有有向双引号，但它们有一个左单引号`和一个右单引号'。现在来检查你的键盘，找
Python入门指南：从简介到安装小团团0 开发语言 python
Python简介Python是一种高级编程语言，由荷兰程序员GuidovanRossum于1989年圣诞节期间开始设计，并于1991年发布了第一个公开发行版。Python的命名源于英国喜剧团体MontyPython，Guido以此表达对该喜剧团体的喜爱。Python的特点主要体现在以下几个方面：解释型语言：Python是一种解释型语言，这意味着在开发过程中无需编译，可以直接运行源代码。交互式语言：
大语言模型的潜力是否被高估 dev.null AI #NLP 语言模型人工智能机器学习
关于大语言模型（LLM）的潜力是否被高估，目前学术界和产业界存在显著分歧。以下从技术能力、应用局限性和未来发展方向三个方面综合分析：一、技术能力的争议：潜力与局限并存对现实世界的理解与模拟MIT的研究表明，LLM在训练过程中可能自发形成对现实世界的内部模拟。例如，通过解决卡雷尔编程谜题（KarelPuzzle），模型在没有直接接触环境信息的情况下，正确率从初始的随机指令提升至92.4%，并展现出对
Transformer架构在生成式AI中的应用解析二进制独立开发非纯粹GenAI 人工智能 transformer 架构深度学习机器学习 tensorflow 迁移学习
文章目录1.Transformer架构概述1.1Transformer的核心思想1.2Transformer架构的优势2.Transformer在文本生成中的应用2.1GPT系列：基于Transformer的自回归文本生成2.2BERT系列：基于Transformer的双向编码器3.Transformer在图像生成中的应用3.1VisionTransformer（ViT）3.2DALL·E：基于T
TCP 采用三次握手建立连接的原因 27xixi java高频 tcp/ip 网络
TCP采用三次握手建立连接的根本原因是为了解决网络通信中的两个核心问题：可靠性和历史连接的消除。两次握手无法满足这些需求，而四次握手虽然理论上可行，但会引入冗余和效率问题。以下是详细分析：一、两次握手的问题如果只用两次握手（客户端发送SYN，服务端回复SYN-ACK后直接建立连接），会引发以下问题：无法防止历史连接的干扰场景：客户端发送了一个旧的SYN报文（例如网络延迟导致的重传），服务端收到后回
HashMap的奇幻漂流：当一个数组决定去整容桃木山人深挖面经哈希算法算法数据结构
标准答案（面试官最爱版）HashMap实现原理：数据结构：数组+链表/红黑树（Java8+）哈希算法：(h=key.hashCode())^(h>>>16)索引计算：(n-1)&hash（n为数组长度）冲突解决：链表→红黑树（阈值=8），树→链表（阈值=6）扩容机制：2倍扩容，负载因子默认0.75用程序员黑话：“它就是个会变形的瑞士卷——平时是夹心饼干（数组+链表），吃撑了变千层蛋糕（红黑树）”一
1141. 【贪心算法】排队打水 (❁´◡`❁)Jimmy(❁´◡`❁) 粉丝才可以看的NC题解贪心算法算法
题目描述有n（nusingnamespacestd;typedefpairIpair;arrayArrayMan;intn;intmain(){scanf("%d",&n);for(inti=0;i
系统架构设计师——架构风格庄隐 #系统架构设计师系统架构架构系统架构设计师
概述软件体系结构风格是指在软件架构设计中，针对特定应用领域所采用的一套惯用模式，这些模式定义了系统的组织方式。以下是对软件体系结构风格的详细解析：1.体系结构风格的概念目的：简化设计过程，提高设计的重用性和可维护性。特点：每种风格都有其特定的适用范围和优势，适用于不同的应用场景和需求。2.词汇表构件：系统中的基本功能单元，如客户端、服务器、数据库等。连接件：用于构件间交互的桥梁，如管道、总线、过滤
Spring4.1新特性——Spring MVC增强 jinnianshilongnian spring 4.1
目录 Spring4.1新特性——综述 Spring4.1新特性——Spring核心部分及其他 Spring4.1新特性——Spring缓存框架增强 Spring4.1新特性——异步调用和事件机制的异常处理 Spring4.1新特性——数据库集成测试脚本初始化 Spring4.1新特性——Spring MVC增强 Spring4.1新特性——页面自动化测试框架Spring MVC T
mysql 性能查询优化 annan211 java sql 优化 mysql 应用服务器
1 时间到底花在哪了？ mysql在执行查询的时候需要执行一系列的子任务，这些子任务包含了整个查询周期最重要的阶段，这其中包含了大量为了检索数据列到存储引擎的调用以及调用后的数据处理，包括排序、分组等。在完成这些任务的时候，查询需要在不同的地方花费时间，包括网络、cpu计算、生成统计信息和执行计划、锁等待等。尤其是向底层存储引擎检索数据的调用操作。这些调用需要在内存操
windows系统配置 cherishLC windows
删除Hiberfil.sys ：使用命令powercfg -h off 关闭休眠功能即可： http://jingyan.baidu.com/article/f3ad7d0fc0992e09c2345b51.html 类似的还有pagefile.sys msconfig 配置启动项 shutdown 定时关机 ipconfig 查看网络配置 ipconfig /flushdns
人体的排毒时间 Array_06 工作
======================== || 人体的排毒时间是什么时候？|| ======================== 转载于： http://zhidao.baidu.com/link?url=ibaGlicVslAQhVdWWVevU4TMjhiKaNBWCpZ1NS6igCQ78EkNJZFsEjCjl3T5EdXU9SaPg04bh8MbY1bR
ZooKeeper cugfy zookeeper
Zookeeper是一个高性能，分布式的，开源分布式应用协调服务。它提供了简单原始的功能，分布式应用可以基于它实现更高级的服务，比如同步，配置管理，集群管理，名空间。它被设计为易于编程，使用文件系统目录树作为数据模型。服务端跑在java上，提供java和C的客户端API。 Zookeeper是Google的Chubby一个开源的实现，是高有效和可靠的协同工作系统，Zookeeper能够用来lea
网络爬虫的乱码处理随意而生爬虫网络
下边简单总结下关于网络爬虫的乱码处理。注意，这里不仅是中文乱码，还包括一些如日文、韩文、俄文、藏文之类的乱码处理，因为他们的解决方式是一致的，故在此统一说明。网络爬虫，有两种选择，一是选择nutch、hetriex，二是自写爬虫，两者在处理乱码时，原理是一致的，但前者处理乱码时，要看懂源码后进行修改才可以，所以要废劲一些；而后者更自由方便，可以在编码处理
Xcode常用快捷键张亚雄 xcode
一、总结的常用命令：隐藏xcode command+h 退出xcode command+q 关闭窗口 command+w 关闭所有窗口 command+option+w 关闭当前
mongoDB索引操作 adminjun mongodb 索引
一、索引基础： MongoDB的索引几乎与传统的关系型数据库一模一样，这其中也包括一些基本的优化技巧。下面是创建索引的命令： > db.test.ensureIndex({"username":1}) 可以通过下面的名称查看索引是否已经成功建立： &nbs
成都软件园实习那些话 aijuans 成都软件园实习
无聊之中，翻了一下日志，发现上一篇经历是很久以前的事了，悔过~~ 　　断断续续离开了学校快一年了，习惯了那里一天天的幼稚、成长的环境，到这里有点与世隔绝的感觉。不过还好，那是刚到这里时的想法，现在感觉在这挺好，不管怎么样，最要感谢的还是老师能给这么好的一次催化成长的机会，在这里确实看到了好多好多能想到或想不到的东西。　　都说在外面和学校相比最明显的差距就是与人相处比较困难，因为在外面每个人都
Linux下FTP服务器安装及配置 ayaoxinchao linux FTP服务器 vsftp
检测是否安装了FTP [root@localhost ~]# rpm -q vsftpd 如果未安装：package vsftpd is not installed 安装了则显示：vsftpd-2.0.5-28.el5累死的版本信息安装FTP 运行yum install vsftpd命令，如[root@localhost ~]# yum install vsf
使用mongo-java-driver获取文档id和查找文档 BigBird2012 driver
注：本文所有代码都使用的mongo-java-driver实现。在MongoDB中，一个集合（collection）在概念上就类似我们SQL数据库中的表（Table），这个集合包含了一系列文档（document）。一个DBObject对象表示我们想添加到集合（collection）中的一个文档（document），MongoDB会自动为我们创建的每个文档添加一个id，这个id在
JSONObject以及json串 bijian1013 json JSONObject
一.JAR包简介要使程序可以运行必须引入JSON-lib包，JSON-lib包同时依赖于以下的JAR包： 1.commons-lang-2.0.jar 2.commons-beanutils-1.7.0.jar 3.commons-collections-3.1.jar &n
[Zookeeper学习笔记之三]Zookeeper实例创建和会话建立的异步特性 bit1129 zookeeper
为了说明问题，看个简单的代码， import org.apache.zookeeper.*; import java.io.IOException; import java.util.concurrent.CountDownLatch; import java.util.concurrent.ThreadLocal
【Scala十二】Scala核心六：Trait bit1129 scala
Traits are a fundamental unit of code reuse in Scala. A trait encapsulates method and field definitions, which can then be reused by mixing them into classes. Unlike class inheritance, in which each c
weblogic version 10.3破解 ronin47 weblogic
版本：WebLogic Server 10.3 说明：%DOMAIN_HOME%：指WebLogic Server 域(Domain）目录例如我的做测试的域的根目录 DOMAIN_HOME=D:/Weblogic/Middleware/user_projects/domains/base_domain 1.为了保证操作安全，备份%DOMAIN_HOME%/security/Defa
求第n个斐波那契数 BrokenDreams
今天看到群友发的一个问题：写一个小程序打印第n个斐波那契数。自己试了下，搞了好久。。。基础要加强了。 &nbs
读《研磨设计模式》-代码笔记-访问者模式-Visitor bylijinnan java 设计模式
声明：本文只为方便我个人查阅和理解，详细的分析以及源代码请移步原作者的博客http://chjavach.iteye.com/ import java.util.ArrayList; import java.util.List; interface IVisitor { //第二次分派，Visitor调用Element void visitConcret
MatConvNet的excise 3改为网络配置文件形式 cherishLC matlab
MatConvNet为vlFeat作者写的matlab下的卷积神经网络工具包，可以使用GPU。主页： http://www.vlfeat.org/matconvnet/ 教程： http://www.robots.ox.ac.uk/~vgg/practicals/cnn/index.html 注意：需要下载新版的MatConvNet替换掉教程中工具包中的matconvnet： http
ZK Timeout再讨论 chenchao051 zookeeper timeout hbase
http://crazyjvm.iteye.com/blog/1693757 文中提到相关超时问题，但是又出现了一个问题，我把min和max都设置成了180000，但是仍然出现了以下的异常信息： Client session timed out, have not heard from server in 154339ms for sessionid 0x13a3f7732340003
CASE WHEN 用法介绍 daizj sql group by case when
CASE WHEN 用法介绍 1. CASE WHEN 表达式有两种形式 --简单Case函数 CASE sex WHEN '1' THEN '男' WHEN '2' THEN '女' ELSE '其他' END --Case搜索函数 CASE WHEN sex = '1' THEN
PHP技巧汇总:提高PHP性能的53个技巧 dcj3sjt126com PHP
PHP技巧汇总:提高PHP性能的53个技巧　　用单引号代替双引号来包含字符串，这样做会更快一些。因为PHP会在双引号包围的字符串中搜寻变量，　　单引号则不会，注意：只有echo能这么做，它是一种可以把多个字符串当作参数的函数译注：　　PHP手册中说echo是语言结构，不是真正的函数，故把函数加上了双引号)。　　1、如果能将类的方法定义成static，就尽量定义成static，它的速度会提升将近4倍
Yii框架中CGridView的使用方法以及详细示例 dcj3sjt126com yii
CGridView显示一个数据项的列表中的一个表。表中的每一行代表一个数据项的数据,和一个列通常代表一个属性的物品(一些列可能对应于复杂的表达式的属性或静态文本)。　　CGridView既支持排序和分页的数据项。排序和分页可以在AJAX模式或正常的页面请求。使用CGridView的一个好处是,当用户浏览器禁用JavaScript,排序和分页自动退化普通页面请求和仍然正常运行。实例代码如下：
Maven项目打包成可执行Jar文件 dyy_gusi assembly
Maven项目打包成可执行Jar文件在使用Maven完成项目以后，如果是需要打包成可执行的Jar文件，我们通过eclipse的导出很麻烦，还得指定入口文件的位置，还得说明依赖的jar包，既然都使用Maven了，很重要的一个目的就是让这些繁琐的操作简单。我们可以通过插件完成这项工作，使用assembly插件。具体使用方式如下： 1、在项目中加入插件的依赖： <plugin>
php常见错误 geeksun PHP
1. kevent() reported that connect() failed (61: Connection refused) while connecting to upstream, client: 127.0.0.1, server: localhost, request: "GET / HTTP/1.1", upstream: "fastc
修改linux的用户名 hongtoushizi linux change password
Change Linux Username 更改Linux用户名，需要修改4个系统的文件： /etc/passwd /etc/shadow /etc/group /etc/gshadow 古老/传统的方法是使用vi去直接修改，但是这有安全隐患（具体可自己搜一下），所以后来改成使用这些命令去代替： vipw vipw -s vigr vigr -s 具体的操作顺
第五章常用Lua开发库1-redis、mysql、http客户端 jinnianshilongnian nginx lua
对于开发来说需要有好的生态开发库来辅助我们快速开发，而Lua中也有大多数我们需要的第三方开发库如Redis、Memcached、Mysql、Http客户端、JSON、模板引擎等。一些常见的Lua库可以在github上搜索，https://github.com/search?utf8=%E2%9C%93&q=lua+resty。 Redis客户端 lua-resty-r
zkClient 监控机制实现 liyonghui160com zkClient 监控机制实现
直接使用zk的api实现业务功能比较繁琐。因为要处理session loss，session expire等异常，在发生这些异常后进行重连。又因为ZK的watcher是一次性的，如果要基于wather实现发布/订阅模式，还要自己包装一下，将一次性订阅包装成持久订阅。另外如果要使用抽象级别更高的功能，比如分布式锁，leader选举
在Mysql 众多表中查找一个表名或者字段名的 SQL 语句 pda158 mysql
在Mysql 众多表中查找一个表名或者字段名的 SQL 语句：　　方法一：SELECT table_name, column_name from information_schema.columns WHERE column_name LIKE 'Name'; 　　方法二：SELECT column_name from information_schema.colum
程序员对英语的依赖 Smile.zeng 英语程序猿
1、程序员最基本的技能，至少要能写得出代码，当我们还在为建立类的时候思考用什么单词发牢骚的时候，英语与别人的差距就直接表现出来咯。 2、程序员最起码能认识开发工具里的英语单词，不然怎么知道使用这些开发工具。 3、进阶一点，就是能读懂别人的代码，有利于我们学习人家的思路和技术。 4、写的程序至少能有一定的可读性，至少要人别人能懂吧... 以上一些问题，充分说明了英语对程序猿的重要性。骚年
Oracle学习笔记(8) 使用PLSQL编写触发器 vipbooks oracle sql 编程活动 Access
时间过得真快啊，转眼就到了Oracle学习笔记的最后个章节了，通过前面七章的学习大家应该对Oracle编程有了一定了了解了吧，这东东如果一段时间不用很快就会忘记了，所以我会把自己学习过的东西做好详细的笔记，用到的时候可以随时查找，马上上手！希望这些笔记能对大家有些帮助！这是第八章的学习笔记，学习完第七章的子程序和包之后

按字母分类： A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 其他