振哥在，世界充满爱！

文献学习01-Neural Relation Extraction for Knowledge Base Enrichment

论文信息：Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 229-240.

Abstract

1、Introduction

（1）Large KB

（2）先前的研究工作积累：

（3）本文的贡献

2、Related Work

2.1 Open Information Extraction

2.2 Entity-aware Relation Extraction

3、Proposed Model

3.1 Solution Framework

3.2 Dataset Collection

3.3 Joint Learning of word and Entity Embeddings

3.4 N-gram Based Attention Model

3.5 Triple Generation

4、Experiments

4.1 Hyperparameters

4.2 Models

4.3 Result

Disscussion

5 Conclusions

Abstract

We study relation extraction for knowledge base (KB) enrichment. Specifically, we aim to extract entities and their relationships from sentences in the form of triples and map the elements of the extracted triples to an existing KB in an end-to-end manner. Previous studies focus on the extraction itself and rely on Named Entity Disambiguation (NED) to map triples into the KB space. This way,NED errors may cause extraction errors that affect the overall precision and recall. To address this problem, we propose an end-to-end relation extraction model for KB enrichment based on a neural encoder-decoder model. We collect high-quality training data by distant supervision with co-reference resolution and paraphrase detection. We propose an n-gram based attention model that captures multi-word entity names in a sentence. Our model employs jointly learned word and entity embeddings to support named entity disambiguation. Finally, our model uses a modified beam search and a triple classifier to help generate high-quality triples. Our model outperforms state-of-theart baselines by 15.51% and 8.38% in terms of F1 score on two real-world datasets.

作者研究的内容是知识库KB扩充中的关系抽取。

目标：以三元组的形式从句子中抽取实体和关系，并且以端到端的方式将其映射到现有的KB中。

问题：NED会导致抽取错误，进而影响到整体精度和召回率。为了解决此问题，作者提出两个模型：

（1）we propose an end-to-end relation extraction model for KB enrichment based on a neural encoder-decoder model.

（2）We propose an n-gram based attention model that captures multi-word entity names in a sentence.

利用的技术：jointly learned word and entity embeddings，modified beam search and triple classifier.

效果：从两个真实数据集的F1得分分析，本文提出的模型分别比最新基准高出15.51%和8.38%。

数据集：We evaluate our model on two real datasets including WIKI and GEO test datasets.

评价指标：precision, recall, F1 Score 评价指标计算过程可参考：https://blog.csdn.net/qq_30507287/article/details/121418944

实体、关系提取，嵌入，消歧然后丰富KB。

1、Introduction

（1）Large KB

DBpedia（Auer et al. 2007）, Wikidata（Vrandecic and Krotzsch, 2014）, Yago（Suchanek et al., 2007）.

缺点：However，these KBs are far from complete and mandate continuous enrichment and curation.

（2）先前的研究工作积累：

1）基于嵌入模型（embedding-based model (Nguyen et al, 2018), Wang et al., 2015）

2）基于对齐模型（entity alignment model (chen et al., 2017; Sun et al., 2017; Trisedya et al., 2019)）

上述两个模型用于丰富知识库。

3）无监督方法（Unsuervised approaches）

Open IE（Open Information Extraction）参考文献（Banko et al., 2007; Corro and Gemulla , 2013; Gashteovski et al., 2017）

4）监督方法（Supervised approaches）

Supervised approaches train statistical and neural models for inferring the relationship between two known entities in a sentence(Mintz et al., 2009; Riedel et al., 2010,2013; Zeng et al., 2015; Lin et al., 2016) 监督方法训练统计和神经模型，用于推断一个句子中两个已知实体之间的关系。

Only few studies have fully integrated the mapping of extracted triples onto uniquely identified KB entities by using logical reasoning on the existing KB to disambiguate the extracted entities.(e.g., Suchanek et al., 2009; Sa et al., 2017) 只有少数的研究通过对KB进行逻辑推理来消除提取的实体的歧义，并将提取的三元组映射完全整合到唯一标识的KB实体中。

5）实体消歧（Named Entity Disambiguation , NED） (cf. the survey by Shen et al. (2015))

6）本文基于encoder-decoder framework (Cho et al., 2014)从句子中生成三元组。表示的形式为：。要做的事情就是将KB中不存在的找出来并添加到KB中。

7）A standard encoder-decoder model with attention (Bahdanau et al., 2015) is,

8）skip gram(Mikolov et al., 2013) and TransE ( Bordes et al., 2013) 本文利用联合学习进行预训练。主要分为两个部分：①embeddings capture the relationship between words and entities. 嵌入捕获的实体和词之间的关系②the entity embeddings preserve the relationships between entities. 实体嵌入保留实体之间的关系。

9）为了处理数据不足的问题，利用远程监督的方法生成对齐的句子对和三元组作为训练的数据。

数据增强的方法：

-：co-reference resolution (Clark and Manning, 2016) and dictionary-based paraphrase detection (Ganitkevitch et al., 2013; Grycner and Weikum, 2016)

（3）本文的贡献

①本文提出一种端到端模型用于抽取和规范三元组，并将其添加到KB中。该模型减少了关系提取和NED之间错误的传播，但现有的方法则容易发生这样的错误传播。

②基于Attention model提出的n-gram模型：

-有效地将实体及其关系的多词提取映射到KB中唯一标识的实体和谓词中；

-本文提出联合学习模型，用于词和实体嵌入，以捕获词和实体之间的关系，目标是实体消歧。

-本文提出一种改进的波束搜索和三元组过滤器，用于产生高质量的三元组。

③提出评价指标

-基于两个真实的训练集做的测试；

-采取了远程监督的方法，利用co-reference resolution 和 paraphrase detection相结合生成高质量的训练数据集。

-实验结果优于neural relation extraction (Lin et al., 2016) he NED model(Hoffart et al., 2011; Kolitsas et al., 2018)

2、Related Work

2.1 Open Information Extraction

（1）Banko et al.(2007) 介绍开放信息提取（Open IE），同时提出学习器，提取器和评估器的三阶段管道。

学习器：采用无监督的方式依赖学习规则抽取；

提取器：将名词作为变量，将链接短语作为谓词生成候选三元组；

评估器：根据统计为每个候选词计算一个概率值。

（2）（Fader et al., 2011; Mausam et al., 2012; Angeli et al., 2015; Mausam, 2016）等人采用远程监督的方式、手动匹配等提高Open IE的准确性。

（3）ClauseIE Corro and Gemulla 2013年开发了ClauseIE ，并且能够够句子的从句中提取三元组。

（4）MinIE Gashteovski et al.(2017) 开发了MinIE,是生成的的三元组比ClauseIE更简洁。

（5）Stanovsky et al.(2018) 提出监督学习用于Open IE，将抽取的关系转换为标签。并且bi-LSTM 模型用于预测这些标签。

（6）与文本工作最相关的是：Neural Open IE（Cui et al., 2018）.witch proposed an encoder-decoder with model to extract triples. 但是不合适提取规范化实体的关系。

（7）Another line of studies use neural learning for semantic role labeling (He et al.,2018)另一类研究使用神经学习进行语义角色标注。该方法能够识别单个输入句子的谓词参数结构，而不是从语料库（知识库）中进行关系提取。

（8）相同实体的不同名字和短语会导致多个三元组的生成，以这种方式添加到KG中，会导致KG数据污染。（Shen et al., 2015）使用实体链接（NED），（Gala'rraga et al., 2014）使用聚类进行解决。

2.2 Entity-aware Relation Extraction

（1）受Brin(1998)的启发，（Mintz et al., 2009; Suchanek et al., 2009; Carlson et al., 2010）利用现有KG的种子事实来进行远程监督，从种子事实中学习提取模式，将这些模式提取新的事实候选者，迭代此过程，最后使用统计推断，例如分类器，来减少错误率。所做内容基于假设：在同一个句子中，种子事实的实体的同时出现是表达实体之间语义关系的指标。这是错误标记的潜在来源。

（2）(Hoffmann et al., 2010;Riedel et al., 2010, 2013; Surdeanu et al., 2012)克服上述局限，但忽略了实体和KG中的实体映射。

（3）Suchanek et al. (2009) and Sa et al. (2017) used probabilistic-logical inference to eliminate false positives。基于约束解决方案，概率图模型的蒙特卡洛采样。但是存在的问题是：计算复杂度高，并且依赖于建模约束和适当的先验条件。

（4）近期研究内容：

①Nguyen and Grishman（2015）提出了具有多尺寸窗口内核的卷积网络;

②Zeng et al.,（2015）提出了分段卷积神经网络（PCNN）;

③Lin et al.,（2016，2017）通过提出PCNN，在句子层面给予ATTENTION来改进这种方法，这种方法在实验研究中效果最好；因此，本文选择它作为比较方法的主要基准。后续研究考虑了进一步的变化：
④Zhou et al.,（2018）提出了层次注意;

⑤Ji et al.,（2017）合并了实体描述;

⑥Miwa and Bansal（2016）合并了语法功能;

⑦Sorokin and Gurevych（2017）使用了背景知识进行语境化;

综上：这些模型都不适合于KG enrichment，因为都没有实体规范化。

3、Proposed Model

3.1 Solution Framework

（1）data collection module

本文将知识库中已有的三元组和包含文本语料库中此类三元组的句子对齐。后续将作为神经关系提取模块中的训练数据。数据对齐的过程采用远程监督完成。

为了获取大量高质量的对齐，本文使用共指消解去提取句子中隐式的实体名，从而扩大对齐的候选句子的集合。

本文使用基于字典的释义检测来过滤不表达实体之间任何关系的句子。

（2）embedding module

本文提出一种结合词skip-gram和实体嵌入的联合学习方法（Mikolov et al., 2013）。以skip-gram 计算词嵌入，以TransE（Bordes et al., 2013）计算实体嵌入。

联合学习的目标是捕获单词和实体的相似性，将实体名映射到对应的实体ID上。

此外，实体结果嵌入还被用于训练三元组分类器，以帮助过滤掉本文神经关系提取模型生成的无效的三元组。

（3）neural relation extraction module

本文基于注意模型提出了n-gram，通过扩展注意机制到句子，进行n-gram 标记。

注意计算n-gram注意权重组合，以捕获补充标准注意模型的单词级注意的动词或名词短语上下文。中个扩展帮助模型更好捕获实体和关系的多词上下文。

编码器-译码器模型输出一个实体和谓词ID序列，每三个ID表示一个三元组。为了生成高质量的三元组，本文提出两种策略：

①改进波束搜索；计算提取的实体与输入句子实体名称的表面含义的相似度，确保正确的实体预测。

②使用三元分类器；使用联合学习的实体嵌入进行训练，过滤无效的三元组。

3.2 Dataset Collection

目标：extract triples form a sentence for KB enichment by proposing a supervised relation extraction model.

Following Sorokin and Gurevych(2017), we use distant supervision (Mintz et al., 2009) to align sentences in Wikipedia with triples in Wikidata (Vrandecic and Kröotzsch, 2014)

We map an entity mention in a sentence to the corresponding entity entry (i.e., Wikidata ID) in Wikidata via the hyperlink associated to the entity mention, which is recorded in Wikidata as the url property of the entity entry.我们通过与实体提及相关联的超链接将句子中的实体提及映射到Wikidata中相应的实体条目（即Wikidata ID），该超链接作为实体条目的url属性记录在Wikidata中。

Each pair may contain one sentence and multiple triples. 每对可以包含一个句子和多个三元组。

We sort the order of the triples based on the order of the predicate paraphrase that indicate the relationships between entities in the sentence.我们根据谓词释义的顺序对三元组的顺序进行排序，谓词释义表示句子中实体之间的关系。

We collect sentence-triple pairs by extracting sentences that contain both head and tail entities of Wikidata triples. To generate high-quality sentence-triples pairs, we propose two additional steps:(1)extracting sentence that contain implicit entity names using co-reference resolution, and (2) filtering sentences that do not express any relationships using paraphrase detection. We detail these steps below.我们通过提取包含Wikidata三元组的头实体和尾实体的句子来收集句子三元组对。为了生成高质量的句子三元组对，我们提出了两个附加步骤：（1）使用实体共指解析提取包含隐式实体名称的句子，以及（2）使用释义检测过滤不表达任何关系的句子。我们将在下面详细介绍这些步骤。

Prior to aligning the sentences with triples, in step(1) , we find the implicit entity names to increase the number of candidate sentences to be aligned. We apply co-reference resolution (2016) to each paragraph in a Wikipedia article and replace the extracted co-references with the proper entity name. 在使用三元组对齐句子之前，在步骤（1）中，我们找到隐式实体名称以增加要对齐的候选句子的数量。本文对维基百科文章中的每个段落应用共指方案（2016），并用适当的实体名称替换提取的共指实体。

We observe that the first sentence of a paragraph in a Wikipedia article may contain a pronoun that refers to the main entity. For example, there is a paragraph in the Barack Obama article that starts with a sentence "He was reelected to the Illinois Senate in 1998". This may cause the standard co-reference resolution to miss the implicit entity names for the rest of the paragraph. To address this problem, we heuristically replace the pronouns in the first sentence of a paragraph if the main entity name of the Wikipedia page is not mentioned. For the sentence in the previous examples, we replace 'He' with "Barack Obama".我们注意到维基百科文章中段落的第一句可能包含一个代词，该代词指的是主要实体。例如，巴拉克·奥巴马（Barack Obama）的文章中有一段以“他于1998年再次当选伊利诺伊州参议院”开头。这可能会导致标准共指解析在本段的其余部分遗漏隐式实体名称。为了解决这个问题，如果没有提到维基百科页面的主要实体名称，我们试探性地替换段落第一句中的代词。对于前面示例中的句子，我们将“他”替换为“巴拉克·奥巴马”。

The intuition is that a Wikipedia article contains content of a single entity of interest , and that the pronouns mentioned in the first sentence of a paragraph mostly relate to the main entity.直觉是维基百科的文章包含一个感兴趣的实体的内容，并且段落第一句中提到的代词大多与主要实体相关。

In Step(2) we use a dictionary based paraphrase detection to capture relationships between entities in a sentence. First, we create a dictionary by populating predicate paraphrases from three source including PATTY (2012), POLY (2016)and PPDB (2013) that yield 540 predicates and 24,013 unique paraphrases. For example , predicates paraphrases for the relationship "place of birth" are {born in , was born in , ...} . Then we use this dictionary to filter sentences that do not express any relationships between entities. We use extract string matching to find verbal or noun phrases in a sentence which is a paraphrase of a predicate of a triple. For example , for the triple, the sentence "Barack Obama was born in 1961 in Honolulu, Hawaii" will be retained while the sentence "Barack Obama visited Honolulu in 2010"will be removed (the sentence may be retained if there is another valid triple ). This helps filter noises for the sentence-triple alignment.

The collected dataset contains 255,654 sentence-triple pairs. For each pair, the maximum number of triples is four (i.e., a sentence can produce at most four triples). 收集的数据集包含255654个句子三对。对于每一对，三元组的最大数量是四个（即，一个句子最多可以产生四个三元组）。We split the dataset into train set(80%) , dev set(10%) and test set (10%) (we call it the WIKI test dataset). For stress testing (to test the proposed model on a different style of text than the training data) , we also collect another test dataset outside Wikipedia. we apply the same procedure to the user reviews of a travel website. First, we collect user reviews on 100 popular landmarks in Australia. Then, we apply the adapted distant supervision to the reviews and collect 1,000 sentence-triple pairs (we call it the GEO test dataset). Table 2 summarizes the statistics of our datasets.我们将数据集分为训练集（80%）、开发集（10%）和测试集（10%）（我们称之为WIKI测试数据集）。对于压力测试（在与训练数据不同的文本样式上测试建议的模型），我们还收集了维基百科之外的另一个测试数据集。我们对旅游网站的用户评论采用相同的程序。首先，我们收集了澳大利亚100个热门地标的用户评论。然后，我们将调整后的远程监控应用于评论，并收集1000个句子三对（我们称之为地理测试数据集）。表2总结了我们数据集的统计数据。

3.3 Joint Learning of word and Entity Embeddings

Our relation extraction model is based on the encoder-decoder framework which has been widely used in Neural Machine Translation to translate text from one language to another. In our setup, we aim to translate a sentence into triples, and hence the vocabulary of the source input is a set of English words while the vocabulary of the target output is a set of entity and predicate IDs in an existing KG . To compute the embeddings of the source and target vocabularies , we propose a joint learning of word and entity embeddings that is effective to capture the similarity between words and entities for named entity disambiguation (2016) .我们的关系提取模型是基于编码器解码器框架，已广泛应用于神经机器翻译，以将文本从一种语言翻译为另一种语言。在我们的设置中，我们的目标是将一个句子翻译成三元组，因此源输入的词汇是一组英语单词，而目标输出的词汇是一组KG中已有的实体和谓词ID。为了计算源词汇和目标词汇的嵌入，我们提出了一种单词和实体嵌入的联合学习方法，该方法可有效捕获单词和实体之间的相似性，用于命名实体消歧（2016）。

Note that our method differs from that of Yamada et al.(2016). We use joint learning by combining skip-gram (2013) to compute the word embeddings and TransE(2013) to compute the entity embeddings(including the relationship embeddings), while (2016)use Wikipedia Link-based Measure (WLM) (2018) that does not consider the relationship embeddings.请注意，我们的方法不同于Yamada等人（2016年）的方法。我们使用联合学习（Skip Grand）（2013）来计算单词嵌入和转换（2013）来计算实体嵌入（包括关系嵌入），而（2016）使用不考虑关系嵌入的基于维基百科链接的度量（WLM）（2018）

To establish the interaction between the entity and word embeddings ,we follow the Anchor Context Model proposed by Yamada et al.(2016). First , we generate a text corpus by combining the original text and the modified anchor text of Wikipedia. This is done by replacing the entity names in a sentence with the related entity or predicate IDs.

Then we use the skip-gram method to compute the word embeddings from the generated corpus.

3.4 N-gram Based Attention Model

Our proposed relation extraction model integrates the extraction and canonicalization tasks for KB enrichment in an end-to-end manner. To build such a model, we employ an encoder-decoder model(2014) to translate a sentence into a sequence of triples .The encoder encoders a sentence into a vector that is used by the decoder as a context to generate a sequence of triples.Because we treat the input and output as a sequence, We use the LSTM networks (1997) in the encoder and the decoder.我们提出的关系提取模型以端到端的方式集成了知识库充实的提取和规范化任务。为了建立这样一个模型，我们使用编码器-解码器模型（2014）将一个句子翻译成一个三元组序列。编码器将一个句子编码成一个向量，解码器将该向量用作生成一个三元组序列的上下文。因为我们将输入和输出视为一个序列，所以我们使用LSTM网络（1997）在编码器和解码器中。

The encoder-decoder with attention model (2015) has been used in machine translation. However, in the relation extraction task, the attention model cannot capture the multiword entity names. In our preliminary investigation , we found that the attention model yields misalignment between the word and the entity.带注意模型的编码器-解码器（2015）已用于机器翻译。但是，在关系提取任务中，注意模型无法捕获多词实体名称。在我们的初步调查中，我们发现注意模型导致单词和实体之间的错位。

The above problem is due to the same words in the names of different entities.上述问题是由于不同实体名称中的相同词语造成的。

We address the above problem by proposing an n-gram based attention model.This model computes the attention of all possible n-grams of the sentence input.The attention weights are computed over the n-gram combinations of the word embeddings , and hence the context vector for the decoder is computed as follows.我们通过提出一个基于n-gram的注意模型来解决上述问题。该模型计算句子输入的所有可能n-gram的注意。注意权重在单词嵌入的n-gram组合上计算，因此解码器的上下文向量计算如下。

3.5 Triple Generation

The output of the encoder-decoder model is a sequence of the entity and predicate IDs where every three tokens indicate a triple. Therefore , to extract a triple, we simply group every three tokens of the generated output.编码器-解码器模型的输出是实体和谓词ID的序列，其中每三个标记表示一个三元组。因此，为了提取一个三元组，我们只需对生成的输出的每三个标记进行分组。

However, the greedy approach (i.e., picking the entity with the highest probability of the last softmax layer of the decoder) may lead the model to extract incorrect entities due to the similarity between entity embeddings(e.g., the embeddings of New York City and Chicago may be similar because both are cities in USA).然而，由于实体嵌入之间的相似性，贪婪方法（即，选择解码器最后一个softmax层中概率最高的实体）可能导致模型提取不正确的实体（例如，纽约市和芝加哥市的嵌入可能相似，因为两者都是美国的城市）.

To address this problems, we propose two strategies: re-ranking the predicated entities using a modified beam search and filtering invalid triples using a triple classifier.为了解决这个问题，我们提出了两种策略：使用改进的beam搜索对谓词实体重新排序，以及使用三元组分类器过滤无效的三元组。

The modified beam search re-ranks top-k (k=10 in our experiments) entity IDs that are predicted by the decoder by computing the edit distance between the entity names (obtained from the KB) and every n-gram token of the input sentence. The intuition is that the entity name should be mentioned in the sentence so that the entity with the highest similarity will be chose as the output.修改后的beam搜索对解码器通过计算实体名称（从KB中获得）和输入句子的每个n-gram标记之间的编辑距离预测的top-k（在我们的实验中k=10）实体id进行重新排序。直觉是，句子中应该提到实体名称，以便选择相似度最高的实体作为输出。

Our triple classifier is trained with entity embeddings from the joint learning (see section 3.3) . Triple classification is one of the metrics to evaluate the quality of entity embeddings (2013) . We build a classifier to determine the validity of a triple. We train a binary classifier based on the plausibility score (h+r-t) (the score to compute the entity embeddings). We create negative samples by corrupting the valid triples (i.e., replacing the head of tail entity by a random entity).The triple classifier is effective to filter invalid triple such as .我们的三分类器通过联合学习中的实体嵌入进行训练（见第3.3节）。三重分类是评估实体嵌入质量的指标之一（2013年）。我们构建了一个分类器来确定三元组的有效性。我们根据似然性分数（h+r-t）（计算实体嵌入的分数）训练一个二元分类器。我们通过破坏有效的三元组（即，用随机实体替换首尾实体）来创建负样本。三元组分类器可以有效地过滤无效的三元组，例如。

4、Experiments

We evaluate our model on two real datasets including WIKI and GEO test datasets(see Section 3.2). We use precision, recall , and F1 score as the evaluation metrics.

4.1 Hyperparameters

We use grid search to find the best hype-parameters for the networks.We use 512 hidden units for both the encoder and the decoder. We use 64 dimensions of pre-trained word and entity embeddings (see section 3.3). We use a 0.5 dropout rate of regularization on both the encoder and the decoder. We use Adam(2015) with a learning rate of 0.0002.

LSTM unit:512

Embedding-dim:64

Dropout dropout:0.5

Adam,lr=0.0002

4.2 Models

We compare our proposed model with three existing models including CNN, MiniE, and ClausIE by (2013). To map the extracted entities by these models, we use two state-of-the-art NED systems including AIDA and NeuralEL.

The precision (tested on our dataset) of AIDA and NeuralEL are 70% and 61% respectively . To map the extracted predicates (relationships) of the unsupervised approaches output, we use the dictionary based paraphrase detection.AIDA和NeuralEL的精确度（在我们的数据集上测试）分别为70%和61%。为了映射无监督方法输出的提取谓词（关系），我们使用基于词典的释义检测.

We use the same dictionary that is used to collect the dataset(i.e., the combination of three paraphrase dictionaries including PATTY(2012), POLY(2016),and PPDB(2013)).We replace the extracted predicate with the correct predicate ID if one of the paraphrases of the correct predicate(i.e., the gold standard) appear in the extracted predicate.我们使用用于收集数据集的同一词典（即，三个释义词典的组合，包括PATTY（2012）、POLY（2016）和PPDB（2013））。如果正确谓词的一个释义（即金标准）出现在提取的谓词中，我们将用正确的谓词ID替换提取的谓词

Otherwise, we replace the extracted predicate with "NA" to indicate an unrecognized predicate. We also compare out N-gram Attention model with two encoder-decoder based models including the Single Attention model(2015) and Transformer model(2017)

4.3 Result

Table 3 shows that the end-to-end models outperform the existing model. In particular , our proposed n-gram attention model achieves the best result in terms of precision, recall , and F1 score. Our proposed model outperforms the best exiting model(MinIE) by 33.39% and 34.78% in terms of F1 score on the WIKI and GEO test dataset respectively.

These results are respected since the existing models are affected by the error propagation of the NED. As expected ,the combination of the existing models with AIDA achieves higher F1 scores than the combination with NeuralEL as AIDA achieves a higher precision than NeuralEL.由于现有模型受到NED误差传播的影响，因此这些结果受到重视。正如预期的那样，现有模型与AIDA的组合比与NeuralEL的组合获得更高的F1分数，因为AIDA比NeuralEL的精度更高。

To further show the effect of error propagation, we set up an experiment without the canonicalization task (i.e., the objective is predicting a relationship between know entities). We remove the NED pre-processing step by allowing the CNN model to access the correct entities. Meanwhile, we provide the correct entities to the decoder of our proposed model. In this setup, our proposed model achieves 86.34% and 79.11%, while CNN achieves 81.92% and 75.82% in precision over the WIKI an GEO test datasets, respectively.为了进一步显示错误传播的影响，我们在没有规范化任务的情况下进行了一个实验（即，目标是预测已知实体之间的关系）。我们通过允许CNN模型访问正确的实体来删除NED预处理步骤。同时，我们为我们提出的模型的解码器提供了正确的实体。在此设置中，我们提出的模型在WIKI和地理测试数据集上的精度分别达到86.34%和79.11%，而CNN的精度分别达到81.92%和75.82%。

Table 3 also shows that the pre-trained embeddings improve the performance of the model in all measures. Moreover, the pre-trained embeddings help the model to converge faster. In our experiments , the models that use the pre-trained embeddings converge in 20 epochs on average, while the models that do not use the pre-trained embeddings converge in 30-40 epochs. Our triple classifier combined with the modified beam search boost the performance of the model. The modified beam search provides a high recall by extracting the correct entities based on the surface form in the input sentence while the triple classifier provides a high precision by filtering the invalid triples.表3还显示，经过预训练的嵌入在所有方面都提高了模型的性能。此外，预训练的嵌入有助于模型更快地收敛。在本文的实验中，使用预训练嵌入的模型平均在20个时期内收敛，而不使用预训练嵌入的模型在30-40个时期内收敛。本文中的三分类器与改进的波束搜索相结合，提高了模型的性能。改进的beam搜索通过基于输入句子中的表面形式提取正确的实体来提供高召回率，而三元组分类器通过过滤无效的三元组来提供高精度。

Disscussion

We further perform manual error analysis. We found that the incorrect output of our model is caused by the same entity name of two different entities. The modified beam search cannot disambiguate those entities as it only considers the lexical similarity . We consider using context-based similarity as future work.我们进一步执行手动错误分析。我们发现模型的错误输出是由两个不同实体的相同实体名称引起的。修改后的beam搜索无法消除这些实体的歧义，因为它只考虑词汇相似性。我们认为使用基于上下文的相似性作为未来的工作。

5 Conclusions

We proposed an end-to-end relation extraction model for KB enrichment that integrates the extraction and canonicalization tasks. Our model thus reduces the error propagation between relations extraction and NED that existing approaches are prone to. To obtain high-quality training data, we adapt distant supervision and augment it with co-reference resolution and paraphrase detection. We propose an n-gram based attention model that better captures the multi-word entity names in a sentence. Moreover , we propose a modified beam search and a triple classification that helps the model to generate high-quality triples.我们提出了一个端到端关系提取模型，用于知识库扩展，该模型集成了提取和规范化任务。因此，我们的模型减少了现有方法容易出现的关系提取和NED之间的错误传播。为了获得高质量的训练数据，我们采用了远程监控，并通过共同引用解析和释义检测对其进行了增强。我们提出了一个基于n-gram的注意模型，该模型能够更好地捕捉句子中的多词实体名称。此外，我们提出了一种改进的波束搜索和三重分类，帮助模型生成高质量的三重。

Experimental results show that our proposed model outperforms the existing models by 33.39% and 34.78% in terms of F1 score on the WIKI and GEO test dataset respectively. These results confirm that our model reduces the error propagation between NED and relation extraction. Our proposed n-gram attention model outperforms the other encoder-decoder models by 15.51% and 8.38% in terms of F1 score on the two real-world datasets. These results confirm that our model better captures the multi-word entity names in a sentence. In the future , we plan to explore context based similarity to complement the lexical similarity to improve the overall performance.

实验结果表明，我们提出的模型在WIKI和GEO test数据集上的F1得分分别比现有模型高33.39%和34.78%。这些结果证实了我们的模型减少了NED和关系提取之间的错误传播。在两个真实数据集上，我们提出的n-gram注意模型的F1分数比其他编码器-解码器模型分别高出15.51%和8.38%。这些结果证实了我们的模型能够更好地捕捉句子中的多词实体名称。在未来，我们计划探索基于上下文的相似度来补充词汇相似度，以提高整体性能。

有待于解决的问题：

（1）同名的不同实体的处理；

（2）基于上下文的相似度来补充词汇相似度。

你可能感兴趣的:(Knowledge,Graph,知识抽取,知识图谱,自然语言处理,知识推理)

自动控制原理题海9.6：线性系统的状态空间分析与综合考研参考题 FUXI_Willard 自动控制考研自动控制
《自动控制原理题海与考研指导》习题精选，用于知识点巩固与提升。第九章：线性系统的状态空间分析与综合Example9.37试确定下列二次型函数的定号性：V(x)=2x12+3x22
CAD插件技术真心不难，无非是画点线条，CAD内部能实现的，C#调用acdbmgd.dll和acmgd.dll也能实现思杰软件 c#
CAD插件看起来很神秘，其实一个合格码农经过几天就能快速掌握。没什么秘密，开发CAD插件和winform一样简单学几个类库用法就是（只是太多人不喜欢知识分享），在CAD里展现界面和winform略有不同（整个项目工程在文章的最后有下载）。学习CAD插件开发的动机是为了薪水，由于公司是做显示屏和触摸屏的，养了一堆CAD的设计工程师拿着8K以上的薪水，当时我做为信息系统开发人员才拿4K，4个人要开发维
跟我一起学Python数据处理（一百零三）之命令行参数解析与云服务应用 lilye66 python linux 开发语言
跟我一起学Python数据处理（一百零三）之命令行参数解析与云服务应用大家好！我写这系列博客的初衷是想和大家一起学习进步。在学习Python数据处理的过程中，我发现其中有很多有趣又实用的知识，所以迫不及待地想和大家分享。接下来，咱们就一起深入学习相关的知识点。一、Python命令行参数解析在Python编程里，有时候我们希望通过命令行给脚本传递额外信息，让脚本根据这些信息执行不同任务。比如有个数据
Unity手机游戏开发：从搭建到发布上线全流程实战是Dream呀 Dream好书推荐 Python游戏开发 unity 游戏引擎
前言：技术书籍是学习技术知识的重要资源之一。读技术书可以帮助我们学习新技能和知识，技术书籍提供了可靠的、全面的信息，帮助我们快速学习新技能和知识。同时技术书籍有助于保持你的竞争力，因为它们提供了最新的技术知识和实践。这在当今快速发展的技术领域尤为重要，不断学习新知识和技能才能保持竞争力。总之，读技术书对于学习技术知识、提高职业素养和保持竞争力都非常重要。Dream联合金主爸爸给大家送书啦！本期为大
从实例出发，讲解BLE专业调试工具nRF Connect ZZQ-ZZQ 物联网IOT 物联网 nRF Connect ble
nRFConnect是NORDICSemiconductor提供的一套强大的低功耗蓝牙（BLE）开发工具和应用程序，本文从两个示例着手分析：iBeacon和Eddystone协议的信标Beacon前置知识：什么是信标Beacon？信标（Beacon）是一种基于低功耗蓝牙（BluetoothLowEnergy,BLE）技术的小型无线发射设备，用于广播信号，向附近的智能设备（如智能手机、平板电脑）传输
前沿计组知识入门（二） tianyunlinger 计组人工智能笔记
第2页：并行计算与编程硬件：多处理器多内存互连网络系统软件：并行操作系统用于表达和协调并发的编程构造应用软件：并行算法目标：利用硬件、系统和应用软件实现加速（速度提升）：Tp=TspT_p=\frac{T_s}{p}Tp=pTs解决需要大量内存的问题第3页：并行算法/公式化并行公式化：并行化串行算法。并行算法：可能与串行算法完全不同。重点：主要讨论如何开发并行公式化。也会涉及一些非串行算法的并行例
GNNs入门（三）GraphSAGE 我也秃了 GNN AI 神经网络
GraphSAGE什么是GraphSAGE？GraphSAGE的动机GraphSAGE的基本原理采样策略聚合函数的设计与选择参数学习和泛化能力GraphSAGE的应用场景实践经验与建议总结什么是GraphSAGE？GraphSAGE（GraphSampleandAggregation）是一种专注于图数据的归纳表示学习（inductiverepresentationlearning）方法，由Will
图神经网络实战（9）——GraphSAGE详解与实现盼小辉丶图神经网络从入门到项目实战图神经网络 GNN pytorch
图神经网络实战（9）——GraphSAGE详解与实现0.前言1.GraphSAGE原理1.1邻居采样1.2聚合2.构建GraphSAGE模型执行节点分类2.1数据集分析2.2构建GraphSAGE模型3.PinSAGE小结系列链接0.前言GraphSAGE是专为处理大规模图而设计的图神经网络(GraphNeuralNetworks,GNN)架构。在科技行业，可扩展性是推动系统增长的关键驱动力。因此
基于 langchain+ollama 创建私有化知识库大语言模型 langchain 知识图谱 LLM 人工智能 llama RAG 知识库
本文实现了基于langchain的本地知识库的基本功能，可离线访问。主要使用了Langchain，ChromaDbOllama。概念介绍什么是RAGRAG是retrieval-augmented-generation的缩写，直译中文的意思是检索增强生成，可以简单理解能让训练好的大模型LLM可以结合外部数据，可以补充或者修正大模型返回的答案，提高答案的准确性。LLM有哪些痛点数据陈旧，一旦训练完成无
Maven依赖问题排查坎坷er Maven maven java
前言：大家好，我是小熊，25届毕业生，目前在一家满意的公司实习。本篇文章为Maven依赖问题排查，刚开始应该是不全面的，但文章会一直更新的。个人简介：大家好，我是小熊，一个想吃鱼的男人目前状况：25届毕业生，在一家满意的公司实习欢迎大家：这里是CSDN，我用来快速回顾知识准备面试的地方，欢迎来到我的博客背景刚进公司实习，使用svn(类似git代码管理工具）拉取公司项目的时候，遇到找不到依赖的问题，
【自学笔记】Oracle基础知识点总览-持续更新 Long_poem 笔记 oracle 数据库
提示：文章写完后，目录可以自动生成，如何生成可参考右边的帮助文档文章目录Oracle数据库基础知识点总览1.数据库安装与配置2.SQL基础3.PL/SQL基础4.数据库管理5.高级主题总结Oracle数据库基础知识点总览1.数据库安装与配置安装Oracle数据库：下载Oracle安装包，按照向导进行安装。配置监听器：使用netca工具配置监听器，确保客户端可以连接到数据库。创建数据库：使用dbca
DeepSeek怎么用，DeepSeek使用指南最全合集（保姆级教程） xiecoding.cn deepseek deepseek使用指南 deepseek怎么用 deepseek免费教学 deepseek资料合集
DeepSeek是一款由国内顶尖团队开发的人工智能大模型，旨在为用户提供高效、智能的问答和知识服务。作为国产AI模型的代表，DeepSeek不仅在自然语言处理（NLP）领域表现出色，还在多个应用场景中展现了强大的能力。与ChatGPT等国际知名模型相比，DeepSeek在中文语境下的表现尤为突出，能够更好地理解中文的复杂语义和文化背景。DeepSeek使用资源下载为了方便大家更好地学习和使用Dee
C 语言中的数组与指针：深入剖析与实践应用 Kurbaneli c语言算法
一、引言在C语言的学习旅程中，数组与指针是两个极为重要且强大的概念。它们不仅是C语言高效操作数据的基础，也是C语言能够广泛应用于系统编程、嵌入式开发等领域的关键因素之一。深入理解并熟练掌握数组与指针的使用，对于编写高质量、高性能的C语言程序至关重要。本文将详细介绍C语言中数组与指针的相关知识，并通过丰富的示例代码进行说明。二、数组的基本概念与使用数组的定义：数组是一种构造数据类型，它可以存储多个相
Kettle 实战面试题及参考答案（完整版）一杯小周 etl 大数据
Kettle实战面试题及参考答案（完整版）以下是几个Kettle（PentahoDataIntegration）项目实战面试题及参考答案的完整Markdown格式整理，包含代码示例和优化方案。目录中文乱码处理增量数据抽取数据去重方案亿级数据同步作业失败排查1.Kettle抽取数据时遇到中文乱码，如何解决？答案：原因：数据库客户端默认编码与Kettle设置不一致（如UTF-8与GBK）。解决方案：在
字节后端面试面经综合分析阿贾克斯的黎明 java java 开发语言 go
目录一、字节豆包后端一面面经解析（一）自我介绍（二）实习项目拷打（三）手撕代码（四）C++多态相关问题（五）智能指针相关问题（六）malloc相关问题（七）mmap相关问题（八）多线程相关问题（九）MySQL存储引擎相关问题（十）Redis持久化相关问题（十一）前沿知识相关问题（十二）反问环节二、字节广告后端一面面经解析（一）自我介绍（二）实习项目拷打（三）文件系统相关问题（四）LinuxIO多路
硅基流动：免费领取2000万Token，畅享AI大模型盛宴！ ai开发
硅基流动（SiliconFlow）是一家专注于大规模AI计算的技术公司，由清华大学高性能计算研究所孙广宇教授团队创立。其核心团队来自清华大学、MIT等顶尖高校，致力于为企业和开发者提供高性能的AI模型推理和训练解决方案。硅基流动通过创新的系统优化技术，大幅提升AI模型的计算效率，帮助企业降低部署成本，是AI领域的一颗新星。https://cloud.siliconflow.cn/i/i05xEFB
【在Oracle Linux 7 上安装Oracle 19c - 基于ASM存储的单实例数据库】 AllanHwang Oracle oracle 数据库 linux
在OracleLinux7上安装Oracle19c-基于ASM存储的单实例数据库前言环境环境准备阶段配置主机名、网络等安装ORACLE先决条件的软件包修改系统参数安装阶段下载软件安装GRID安装ORACLEDBCA建库使用阶段ASM的启动和暂停自动启停手动启停前言记录Oracle学习过程，形成自己的知识体系。目前有部分操作还不清楚具体原理，后期随着学习的深入，逐渐完善本文档。环境系统：Oracle
大模型WebUI：Gradio全解12——LangChain原理、架构和组件（3）龙焰智能 LangChain架构 Integration包 LangGraph LangGraph平台 LangSmith Langfuse
大模型WebUI：Gradio全解12——LangChain原理、架构和组件（3）前言本篇摘要12.LangChain原理及agents构建GradioUI12.3LangChain架构12.3.1LangChain12.3.2IntegrationPackages1.概念2.示例12.3.3LangGraph1.概念2.示例12.3.4LangGraphPlatform1.概览2.优势分析12.
前端面试100问！！一只松 javascript es6
面试造火箭，工作拧螺丝！在技术圈毕竟只有百分之一的人能进入BAT，百分之九九的小伙伴只能在普通公司做这普通的事情，厌烦哪些标题党，我们抛开那些高大上的台词，回归到面试的本质。本课程帮助小伙伴们快速梳理知识，不会涉及到具体的很细节的知识点，关注面试本身。公司一般会从以下5个方面考察一个人的能力，本课程的100问是总结了最近2-3年常问的面试题，适合初中级前端工程师。1、HTML(5)和CSS3方面1
path 路径模块咖啡の猫 node.js 后端
在Node.js开发中，处理文件路径是一项常见的任务。为了简化路径操作并避免跨平台兼容性问题，Node.js提供了一个名为path的核心模块。本文将详细介绍path模块的基础知识、主要功能以及如何利用它来实现路径操作。什么是path模块？path模块提供了实用工具来处理和转换文件路径，使得开发者能够编写出跨平台兼容的代码。无论是构建动态文件路径还是解析现有路径，path模块都能大大简化这些操作。跨
软考程序员各模块知识点对应的分值分布及考试形式总结水瓶丫头站住考试排序算法算法数据结构
软考程序员考试分为基础知识（综合知识）和应用技术两个科目，各科目满分均为75分，合格标准通常为45分。以下是各模块知识点对应的分值分布及考试形式总结：一、综合知识（上午考试）题型：75道客观选择题（含5道专业英语题），每题1分，总分75分。核心模块及分值（基于近10次考试统计）：数据结构和算法（11-13分）重点：顺序表、链表、树、图、排序与查找算法等。计算机系统基础知识（7-11分）包含进制转换
ACL2024最佳论文揭榜，中国本科生破译3000年前的甲骨文密码会议之眼人工智能人工智能
标题：ACL2024最佳论文揭榜！中国本科生破译3000年前的甲骨文密码一、会议介绍ACL（AnnualMeetingoftheAssociationforComputationalLinguistics）2024，即第62届国际计算语言学年会，正在2024年8月11日至8月16日在泰国曼谷召开。作为计算语言学和自然语言处理领域的顶级会议，ACL是学者们展示最新成果、交流思想的重要平台。它被CCF
网络安全审计员网络安全-老纪 web安全网络安全
在当今数字化时代，随着信息技术的迅猛发展，网络安全问题日益凸显，成为各行各业不容忽视的重要议题。特别是对于企业、政府机构等组织而言，网络安全不仅关乎数据资产的安全，更与组织的声誉、客户信任乃至法律法规的遵从性紧密相连。因此，网络安全审计员这一角色应运而生，他们在保障网络安全方面发挥着举足轻重的作用。网络安全审计员是负责监督和评估组织网络安全状况的专业人员。他们需具备深厚的网络安全知识，包括但不限于
基于LangChain-Chatchat实现的RAG-本地知识库的问答应用[6]-实现Milvus向量检索+实现自定义关键词调整Embedding模型汀、人工智能 LLM工业级落地实践 langchain milvus embedding 人工智能自然语言处理语言模型大模型
基于LangChain-Chatchat实现的RAG-本地知识库的问答应用[6]-实现Milvus向量检索+实现自定义关键词调整Embedding模型0.Milvus与Faiss对比Milvus相对于Faiss的优势主要体现在以下几个方面：在线数据更新与实时搜索：Milvus支持在线的数据更新和实时的向量搜索，这意味着在数据频繁变动的场景下，用户无需重新构建整个索引，从而大大减少了维护成本。相比之
自动控制原理题海9.5：线性系统的状态空间分析与综合 FUXI_Willard 自动控制自动控制考研
《自动控制原理题海与考研指导》习题精选，用于知识点巩固与提升。第九章：线性系统的状态空间分析与综合Example9.29设有连续时间系统(A,b,c)(A,b,c)(A,</
Redis系列之进阶篇（下）可乐不渴了 Redis redis 进阶
Redis系列之进阶篇（下）前言上一期我们学习了Redis的一些高级应用，今天我们来继续学习Redis的高级技术。这篇文章主要内容是：布隆过滤器限流GeoHashScan本文所学知识点过多，请做好实践。1.布隆过滤器布隆过滤器是一种高级数据结构，专门用于解决去重和检测某个对象是否存在的问题。布隆过滤器就像一个不怎么精确的set结构，当你使用它的contains方法判断某个对象是否存在时，它可能会误
如何更加优雅提问：浅谈提示词愚戏师语言模型人工智能自然语言处理
询问是一门艺术，如何优雅高效地提问很可能是未来十年每个人的必备素质参考ISO/IEC23894人工智能系统工程标准第一步：理论基础构建目标：通过结构化分析与实践验证，提升提示词设计的精准度、可控性与生成效率一、提示词设计的核心方法论分阶目标拆解基础层：明确任务类型（生成、推理、分类、创作等）逻辑层：定义输出格式（步骤化、代码块、表格、故事体例等）优化层：嵌入约束条件（长度、风格、知识范围、反例排除
J-LangChain，用Java实现LangChain编排！轻松加载PDF、切分文档、向量化存储，再到智能问答花千树-010 JLangChain-TG langchain java pdf AIGC nlp AI编程
Java如何玩转大模型编排、RAG、Agent？？？在自然语言处理（NLP）的浪潮中，LangChain作为一种强大的模型编排框架，已经在Python社区中广受欢迎。然而，对于Java开发者来说，能否有一个同样高效、灵活的工具来实现类似功能？答案是肯定的！今天，我们将聚焦J-LangChain——一个专为Java打造的LangChain实现，带你探索如何用Java语言轻松构建从PDF处理到智能问答
互联网产品经理西部驯兽师项目管理软件工程方法论产品经理
要成为高级互联网产品经理，需要系统性地构建知识体系、规划职业路径、积累实战经验并建立行业影响力。以下是具体路径和建议：一、知识体系构建1.核心能力模块市场与用户洞察：掌握市场分析工具（SWOT、PESTEL、波特五力模型）。精通用户研究方法（问卷、访谈、用户画像、A/B测试）。学习行为心理学（如《影响力》《思考，快与慢》）。产品设计与技术基础：熟悉产品设计工具（Axure、Figma、墨刀）。理解
LangChain教程 - RAG - PDF问答花千树-010 LangChain langchain pdf AIGC python nlp embedding 人工智能
系列文章索引LangChain教程-系列文章在现代自然语言处理（NLP）中，基于文档内容的问答系统变得愈发重要，尤其是当我们需要从大量文档中提取信息时。通过结合文档检索和生成模型（如RAG，Retrieval-AugmentedGeneration），我们可以构建强大的问答系统。本博客将详细介绍如何使用FastAPI和LangChain框架，创建一个基于PDF文档的RAG问答API。一、背景在许多
遍历dom 并且存储（将每一层的DOM元素存在数组中）换个号韩国红果果 JavaScript html
数组从0开始！！ var a=[],i=0; for(var j=0;j<30;j++){ a[j]=[];//数组里套数组，且第i层存储在第a[i]中 } function walkDOM(n){ do{ if(n.nodeType!==3)//筛选去除#text类型 a[i].push(n); //con
Android+Jquery Mobile学习系列(9)-总结和代码分享白糖_ JQuery Mobile
目录导航经过一个多月的边学习边练手，学会了Android基于Web开发的毛皮，其实开发过程中用Android原生API不是很多，更多的是HTML/Javascript/Css。个人觉得基于WebView的Jquery Mobile开发有以下优点： 1、对于刚从Java Web转型过来的同学非常适合，只要懂得HTML开发就可以上手做事。 2、jquerym
impala参考资料 dayutianfei impala
记录一些有用的Impala资料 1. 入门资料 >>官网翻译： http://my.oschina.net/weiqingbin/blog?catalog=423691 2. 实用进阶 >>代码&架构分析： Impala/Hive现状分析与前景展望：http
JAVA 静态变量与非静态变量初始化顺序之新解周凡杨 java 静态非静态顺序
今天和同事争论一问题，关于静态变量与非静态变量的初始化顺序，谁先谁后，最终想整理出来！测试代码： import java.util.Map; public class T { public static T t = new T(); private Map map = new HashMap(); public T(){ System.out.println(&quo
跳出iframe返回外层页面 g21121 iframe
在web开发过程中难免要用到iframe，但当连接超时或跳转到公共页面时就会出现超时页面显示在iframe中，这时我们就需要跳出这个iframe到达一个公共页面去。首先跳转到一个中间页，这个页面用于判断是否在iframe中，在页面加载的过程中调用如下代码： <script type="text/javascript"> //<!-- function
JAVA多线程监听JMS、MQ队列 510888780 java多线程
背景：消息队列中有非常多的消息需要处理，并且监听器onMessage（）方法中的业务逻辑也相对比较复杂，为了加快队列消息的读取、处理速度。可以通过加快读取速度和加快处理速度来考虑。因此从这两个方面都使用多线程来处理。对于消息处理的业务处理逻辑用线程池来做。对于加快消息监听读取速度可以使用1.使用多个监听器监听一个队列；2.使用一个监听器开启多线程监听。对于上面提到的方法2使用一个监听器开启多线
第一个SpringMvc例子布衣凌宇 spring mvc
第一步：导入需要的包；第二步：配置web.xml文件 <?xml version="1.0" encoding="UTF-8"?> <web-app version="2.5" xmlns="http://java.sun.com/xml/ns/javaee" xmlns:xsi=
我的spring学习笔记15-容器扩展点之PropertyOverrideConfigurer aijuans Spring3
PropertyOverrideConfigurer类似于PropertyPlaceholderConfigurer，但是与后者相比，前者对于bean属性可以有缺省值或者根本没有值。也就是说如果properties文件中没有某个bean属性的内容，那么将使用上下文（配置的xml文件）中相应定义的值。如果properties文件中有bean属性的内容，那么就用properties文件中的值来代替上下
通过XSD验证XML antlove xml schema xsd validation SchemaFactory
1. XmlValidation.java package xml.validation; import java.io.InputStream; import javax.xml.XMLConstants; import javax.xml.transform.stream.StreamSource; import javax.xml.validation.Schem
文本流与字符集百合不是茶 PrintWrite()的使用字符集名字别名获取
文本数据的输入输出; 输入;数据流,缓冲流输出;介绍向文本打印格式化的输出PrintWrite(); package 文本流; import java.io.FileNotFound
ibatis模糊查询sqlmap-mapping-**.xml配置 bijian1013 ibatis
正常我们写ibatis的sqlmap-mapping-*.xml文件时，传入的参数都用##标识，如下所示： <resultMap id="personInfo" class="com.bijian.study.dto.PersonDTO"> <res
java jvm常用命令工具——jdb命令(The Java Debugger) bijian1013 java jvm jdb
用来对core文件和正在运行的Java进程进行实时地调试，里面包含了丰富的命令帮助您进行调试，它的功能和Sun studio里面所带的dbx非常相似，但 jdb是专门用来针对Java应用程序的。现在应该说日常的开发中很少用到JDB了，因为现在的IDE已经帮我们封装好了，如使用ECLI
【Spring框架二】Spring常用注解之Component、Repository、Service和Controller注解 bit1129 controller
在Spring常用注解第一步部分【Spring框架一】Spring常用注解之Autowired和Resource注解（http://bit1129.iteye.com/blog/2114084）中介绍了Autowired和Resource两个注解的功能，它们用于将依赖根据名称或者类型进行自动的注入，这简化了在XML中，依赖注入部分的XML的编写，但是UserDao和UserService两个bea
cxf wsdl2java生成代码super出错,构造函数不匹配 bitray super
由于过去对于soap协议的cxf接触的不是很多,所以遇到了也是迷糊了一会.后来经过查找资料才得以解决. 初始原因一般是由于jaxws2.2规范和jdk6及以上不兼容导致的.所以要强制降为jaxws2.1进行编译生成.我们需要少量的修改: 我们原来的代码 wsdl2java com.test.xxx -client http://..... 修改后的代
动态页面正文部分中文乱码排障一例 ronin47
公司网站一部分动态页面，早先使用apache+resin的架构运行，考虑到高并发访问下的响应性能问题，在前不久逐步开始用nginx替换掉了apache。不过随后发现了一个问题，随意进入某一有分页的网页，第一页是正常的（因为静态化过了）；点“下一页”，出来的页面两边正常，中间部分的标题、关键字等也正常，唯独每个标题下的正文无法正常显示。因为有做过系统调整，所以第一反应就是新上
java-54- 调整数组顺序使奇数位于偶数前面 bylijinnan java
import java.util.Arrays; import java.util.Random; import ljn.help.Helper; public class OddBeforeEven { /** * Q 54 调整数组顺序使奇数位于偶数前面 * 输入一个整数数组，调整数组中数字的顺序，使得所有奇数位于数组的前半部分，所有偶数位于数组的后半
从100PV到1亿级PV网站架构演变 cfyme 网站架构
一个网站就像一个人，存在一个从小到大的过程。养一个网站和养一个人一样，不同时期需要不同的方法，不同的方法下有共同的原则。本文结合我自已14年网站人的经历记录一些架构演变中的体会。 1：积累是必不可少的架构师不是一天练成的。 1999年，我作了一个个人主页，在学校内的虚拟空间，参加了一次主页大赛，几个DREAMWEAVER的页面，几个TABLE作布局，一个DB连接，几行PHP的代码嵌入在HTM
[宇宙时代]宇宙时代的GIS是什么？ comsci Gis
我们都知道一个事实，在行星内部的时候，因为地理信息的坐标都是相对固定的，所以我们获取一组GIS数据之后，就可以存储到硬盘中，长久使用。。。但是，请注意，这种经验在宇宙时代是不能够被继续使用的宇宙是一个高维时空
详解create database命令 czmmiao database
完整命令 CREATE DATABASE mynewdb USER SYS IDENTIFIED BY sys_password USER SYSTEM IDENTIFIED BY system_password LOGFILE GROUP 1 ('/u01/logs/my/redo01a.log','/u02/logs/m
几句不中听却不得不认可的话 datageek
1、人丑就该多读书。 2、你不快乐是因为：你可以像猪一样懒，却无法像只猪一样懒得心安理得。 3、如果你太在意别人的看法，那么你的生活将变成一件裤衩，别人放什么屁，你都得接着。 4、你的问题主要在于：读书不多而买书太多，读书太少又特爱思考，还他妈话痨。 5、与禽兽搏斗的三种结局：(1)、赢了，比禽兽还禽兽。(2)、输了，禽兽不如。(3)、平了，跟禽兽没两样。结论：选择正确的对手很重要。 6
1 14:00 PHP中的“syntax error, unexpected T_PAAMAYIM_NEKUDOTAYIM”错误 dcj3sjt126com PHP
原文地址：http://www.kafka0102.com/2010/08/281.html 因为需要，今天晚些在本机使用PHP做些测试，PHP脚本依赖了一堆我也不清楚做什么用的库。结果一跑起来，就报出类似下面的错误：“Parse error: syntax error, unexpected T_PAAMAYIM_NEKUDOTAYIM in /home/kafka/test/
xcode6 Auto layout and size classes dcj3sjt126com ios
官方GUI https://developer.apple.com/library/ios/documentation/UserExperience/Conceptual/AutolayoutPG/Introduction/Introduction.html iOS中使用自动布局（一） http://www.cocoachina.com/ind
通过PreparedStatement批量执行sql语句【sql语句相同，值不同】梦见x光 sql 事务批量执行
比如说：我有一个List需要添加到数据库中，那么我该如何通过PreparedStatement来操作呢？ public void addCustomerByCommit(Connection conn , List<Customer> customerList) { String sql = "inseret into customer(id
程序员必知必会----linux常用命令之十【系统相关】 hanqunfeng Linux常用命令
一.linux快捷键 Ctrl+C : 终止当前命令 Ctrl+S : 暂停屏幕输出 Ctrl+Q : 恢复屏幕输出 Ctrl+U : 删除当前行光标前的所有字符 Ctrl+Z : 挂起当前正在执行的进程 Ctrl+L : 清除终端屏幕，相当于clear 二.终端命令 clear : 清除终端屏幕 reset : 重置视窗，当屏幕编码混乱时使用 time com
NGINX IXHONG nginx
pcre 编译安装 nginx conf/vhost/test.conf upstream admin { server 127.0.0.1:8080; } server { listen 80; &
设计模式--工厂模式 kerryg 设计模式
工厂方式模式分为三种： 1、普通工厂模式：建立一个工厂类，对实现了同一个接口的一些类进行实例的创建。 2、多个工厂方法的模式：就是对普通工厂方法模式的改进，在普通工厂方法模式中，如果传递的字符串出错，则不能正确创建对象，而多个工厂方法模式就是提供多个工厂方法，分别创建对象。 3、静态工厂方法模式：就是将上面的多个工厂方法模式里的方法置为静态，
Spring InitializingBean/init-method和DisposableBean/destroy-method mx_xiehd java spring bean xml
1.initializingBean/init-method 实现org.springframework.beans.factory.InitializingBean接口允许一个bean在它的所有必须属性被BeanFactory设置后，来执行初始化的工作，InitialzingBean仅仅指定了一个方法。通常InitializingBean接口的使用是能够被避免的，（不鼓励使用，因为没有必要
解决Centos下vim粘贴内容格式混乱问题 qindongliang1922 centos vim
有时候，我们在向vim打开的一个xml，或者任意文件中，拷贝粘贴的代码时，格式莫名其毛的就混乱了，然后自己一个个再重新，把格式排列好，非常耗时，而且很不爽，那么有没有办法避免呢？答案是肯定的，设置下缩进格式就可以了，非常简单：在用户的根目录下直接vi ~/.vimrc文件然后将set pastetoggle=<F9> 写入这个文件中，保存退出，重新登录，
netty大并发请求问题 tianzhihehe netty
多线程并发使用同一个channel java.nio.BufferOverflowException: null at java.nio.HeapByteBuffer.put(HeapByteBuffer.java:183) ~[na:1.7.0_60-ea] at java.nio.ByteBuffer.put(ByteBuffer.java:832) ~[na:1.7.0_60-ea]
Hadoop NameNode单点问题解决方案之一 AvatarNode wyz2009107220 NameNode
我们遇到的情况 Hadoop NameNode存在单点问题。这个问题会影响分布式平台24*7运行。先说说我们的情况吧。我们的团队负责管理一个1200节点的集群(总大小12PB)，目前是运行版本为Hadoop 0.20，transaction logs写入一个共享的NFS filer(注：NetApp NFS Filer)。经常遇到需要中断服务的问题是给hadoop打补丁。 DataNod