JasonSera

Mind the GAP: A Balanced Corpus of Gendered Ambiguous Pronouns--论文笔记

Mind the GAP: A Balanced Corpus of Gendered Ambiguous Pronouns

注意差距: 性别歧义代词的平衡语料库

Abstract

Coreference resolution is an important task for natural language understanding, and the resolution of ambiguous pronouns a longstanding challenge. Nonetheless, existing corpora do not capture ambiguous pronouns in sufficient volume or diversity to accurately indicate the practical utility of models. Furthermore, we find gender bias in existing corpora and systems favoring masculine entities. To address this, we present and release GAP, a genderbalanced labeled corpus of 8,908 ambiguous pronoun–name pairs sampled to provide diverse coverage of challenges posed by real-world text. We explore a range of baselines that demonstrate the complexity of the challenge, the best achieving just 66.9% F1. We show that syntactic structure and continuous neural models provide promising, complementary cues for approaching the challenge.

共指消解是自然语言理解的一项重要任务，而歧义代词的消解是一个长期以来的难题。尽管如此，现有的语料库并没有充分的体积或多样性来捕捉模糊的代词，以准确地表明模型的实用性。此外，我们发现现有的语料库和系统中的性别偏见有利于男性实体。为了解决这个问题，我们提出并发布了 GAP，一个由8,908个模棱两可的代词名称对抽样组成的性别平衡标记语料库，以提供现实世界文本所带来的挑战的不同覆盖面。我们探索了一系列基线，展示了挑战的复杂性，最好的只有66.9% 的 F1。我们发现，句法结构和连续神经模型为接近挑战提供了有希望的、互补的线索。

1 Introduction

Coreference resolution involves linking referring expressions that evoke the same discourse entity, as defined in shared tasks such as CoNLL 2011/2012 (Pradhan et al., 2012) and MUC (Grishman and Sundheim, 1996). Unfortunately, high scores on these tasks do not necessarily translate into acceptable performance for downstream applications such as machine translation (Guillou, 2012) and fact extraction (Nakayama, 2008). In particular, high-scoring systems successfully identify coreference relationships between string-matching proper names, but fare worse on anaphoric mentions such as pronouns and common noun phrases (Stoyanov et al., 2009; Rahman and Ng, 2012; Durrett and Klein, 2013).

共指消解包括连接引起相同话语实体的指称表达，如 CoNLL 2011/2012(Pradhan 等人，2012)和 MUC (Grishman 和 Sundheim，1996)。不幸的是，这些任务的高分并不一定转化为下游应用程序可接受的性能，比如机器翻译(Guillou，2012)和事实提取(Nakayama，2008)。特别是，得分高的系统成功地识别了字符串匹配专有名称之间的共指关系，但在回指提及方面，如代词和普通名词短语(Stoyanov 等，2009年; Rahman 和 Ng，2012年; Durrett 和 Klein，2013年)表现较差。

We consider the problem of resolving gendered ambiguous pronouns in English, such as she1 in:

我们考虑了英语中性别歧义代词的解析问题，如 she1 in:

In May, Fujisawa joined Mari Motohashi’s rink as the team’s skip, moving back from Karuizawa to Kitami where she had spent her junior days.

With this scope, we make three key contributions:

在这个范围内，我们做出了三个关键的贡献:

We design an extensible, language-independent mechanism for extracting challenging ambiguous pronouns from text.

我们设计了一个可扩展的、独立于语言的机制，用于从文本中提取具有挑战性的歧义代词。
We build and release GAP, a human-labeled corpus of 8,908 ambiguous pronoun–name pairs derived from Wikipedia.2 This data set targets the challenges of resolving naturally occurring ambiguous pronouns and rewards systems that are gender-fair.

我们建立并发布了 GAP，这是一个源自维基百科的8908个含糊代词名称对的人类标记语料库。2这个数据集的目标是解决自然发生的含糊代词和性别公平奖励系统的挑战。
We run four state-of-the-art coreference resolvers and several competitive simple baselines on GAP to understand limitations in current modeling, including gender bias. We find that syntactic structure and transformer models (Vaswani et al., 2017) provide promising, complementary cues for approaching GAP.

我们运行四个最先进的共参考解析器和几个竞争简单的基线差距，以了解当前建模的局限性，包括性别偏见。我们发现语法结构和转换器模型(Vaswani et al. ，2017)为研究 GAP 提供了有希望的、互补的线索。

Coreference resolution decisions can drastically alter how automatic systems process text. Biases in automatic systems have caused a wide range of underrepresented groups to be served in an inequitable way by downstream applications (Hardt, 2014). We take the construction of the new GAP corpus as an opportunity to reduce gender bias in coreference data sets; in this way, GAP can promote equitable modeling of reference phenomena complementary to the recent work of Zhao et al. (2018) and Rudinger et al. (2018). Such approaches promise to improve equity of downstream models, such as triple extraction for knowledge-base populations.

共引用解析决策可以极大地改变自动系统处理文本的方式。自动化系统中的偏见已经导致大量未被充分代表的群体被下游应用程序以不公平的方式服务(哈特，2014)。我们以构建新的 GAP 语料库为契机，减少共指数据集中的性别偏见; 通过这种方式，GAP 可以促进参照现象的公平建模，补充赵等人(2018年)和 Rudinger 等人(2018年)最近的工作。这些方法有望提高下游模式的公平性，例如知识基础群体的三重提取。

2 Background

Existing datasets do not capture ambiguous pronouns in sufficient volume or diversity to benchmark systems for practical applications.

现有的数据集不能在足够的数量或多样性中捕捉模糊的代词，以便为实际应用基准系统。

2.1 Data Sets with Ambiguous Pronouns

Winograd schemas (Levesque et al., 2012) are closely related to our work as they contain ambiguous pronouns. These are pairs of short texts with an ambiguous pronoun and a special word (in square brackets) that switches its referent:

温诺格拉德图式(Levesque et al. ，2012)与我们的研究密切相关，因为它们包含模糊的代词。这些短文中有一个模棱两可的代词和一个特殊的词(在方括号中) ，它们可以切换它的指称:

The trophy would not fit in the brown suitcase because it was too [big/small].

The Definite Pronoun Resolution Data Set (Rahman and Ng, 2012) comprises 943 Winograd schemas written by undergraduate students and later extended by Peng et al. (2015). The First Winograd Schema Challenge (Morgenstern et al., 2016) released 60 examples adapted from published literary works (Pronoun Disambiguation Problem)3 and 285 manually constructed schemas (Winograd Schema Challenge).4 More recently, Rudinger et al. (2018) and Zhao et al. (2018) have created two Winograd schema-style datasets containing 720 and 3,160 sentences, respectively, where each sentence contains a gendered pronoun and two occupation (or participant) antecedent candidates that break occupational gender stereotypes. Overall, ambiguous pronoun datasets have been limited in size and, most notably, consist only of manually constructed examples that do not necessarily reflect the challenges faced by systems in the wild.

确定代词解析数据集(Rahman 和 Ng，2012)包含943个由大学生编写的 Winograd 模式，后来由彭等人(2015)扩展。第一个 Winograd 模式挑战(Morgenstern 等人，2016年)发布了60个例子，改编自已发表的文学作品(代词消歧问题)3和285手工构建的模式(Winograd 模式挑战)。4最近，Rudinger 等人(2018年)和 Zhao 等人(2018年)创建了两个 Winograd 风格的模式集，分别包含720个和3,160个句子，每个句子包含一个性别代词和两个职业(或参与者)候选人，打破职业性别模式。总的来说，模棱两可的代词数据集在规模上是有限的，最值得注意的是，它只包含手工构建的例子，这些例子并不一定反映系统在野外所面临的挑战。

In contrast, the largest and most widely used coreference corpus, OntoNotes (Pradhan et al., 2007), is general purpose. In OntoNotes, simpler high-frequency coreference examples (e.g., those captured by string matching) greatly outnumber examples of ambiguous pronouns, which obscures performance results on that key class (Stoyanov et al., 2009; Rahman and Ng, 2012). Ambiguous pronouns greatly impact main entity resolution in Wikipedia, the focus of Ghaddar and Langlais (2016a), who use WikiCoref, a corpus of 30 full articles annotated with coreferences (Ghaddar and Langlais, 2016b).

相比之下，最大和最广泛使用的共指语料库 OntoNotes (Pradhan et al. 2007)是通用的。在 OntoNotes 中，更简单的高频共引用例子(例如，字符串匹配捕获的例子)远远多于模糊代词的例子，这掩盖了关键类的性能结果(Stoyanov 等人，2009; Rahman 和 Ng，2012)。模棱两可的代词极大地影响了维基百科的主要实体解析，维基百科是 Ghaddar 和朗格莱斯(2016a)的焦点，他们使用 WikiCoref，一个由30篇完整文章组成的语料库，并附有共同参考文献注释(Ghaddar and Langlais，2016b)。

GAP examples are not strictly Winograd schemas because they have no reference-flipping word. Nonetheless, they contain two person named entities of the same gender and an ambiguous pronoun that may refer to either (or neither). As such, they represent a similarly difficult challenge and require the same inferential capabilities. More importantly, GAP is larger than existing Winograd schema datasets, and the examples are from naturally occurring Wikipedia text. GAP complements OntoNotes by providing an extensive targeted dataset of naturally occurring ambiguous pronouns.

GAP 示例并不是严格意义上的 Winograd 模式，因为它们没有引用翻转词。尽管如此，它们还是包含了两个同性别的人名实体和一个可能指代其中一个(或两个都不指代)的模棱两可的代词。因此，它们代表了同样困难的挑战，需要同样的推理能力。更重要的是，GAP 比现有的 Winograd 模式数据集更大，并且这些例子来自于自然出现的维基百科文本。GAP 通过提供一个自然发生的歧义代词的广泛的有针对性的数据集补充了 OntoNotes。

2.2 Modeling Ambiguous Pronouns

State-of-the-art coreference systems struggle to resolve ambiguous pronouns that require world knowledge and commonsense reasoning (Durrett and Klein, 2013). Past efforts have tried to mine semantic preferences and inferential knowledge via predicate–argument statistics mined from corpora (Dagan and Itai, 1990; Yang et al., 2005), semantic roles (Kehler et al., 2004; Ponzetto and Strube, 2006), contextual compatibility features (Liao and Grishman, 2010; Bansal and Klein, 2012), and event role sequences (Bean and Riloff, 2004; Chambers and Jurafsky, 2008). These usually bring small improvements in general coreference datasets and larger improvements in targeted Winograd datasets.

最先进的共指系统努力解决需要世界知识和常识推理的模糊代词(Durrett 和 Klein，2013)。过去的努力试图通过从语料库(Dagan 和 Itai，1990; Yang 等，2005)、语义角色(Kehler 等，2004; Ponzetto 和 Strube，2006)、语境兼容性特征(Liao 和 Grishman，2010; Bansal 和 Klein，2012)和事件角色序列(Bean 和 Riloff，2004; Chambers 和 Jurafsky，2008)中挖掘语义偏好和推理知识。这些通常带来对一般共参考数据集的小改进和对目标 Winograd 数据集的大改进。

Rahman and Ng (2012) scored 73.05% precision on their Winograd dataset after incorporating targeted features such as narrative chains, Webbased counts, and selectional preferences. Peng et al. (2015)’s system improved the state of the art to 76.41% by acquiring hsubject, verb, objecti and hsubject/object, verb, verbi knowledge triples.

Rahman 和 Ng (2012年)在他们的 Winograd 数据集中结合了目标特性，如叙述链、基于网络的计数和选择性偏好，得到了73.05% 的精确度。彭等人(2015)的系统通过获取主语、动词、对象和主语/宾语、动词、 verbi 知识三元组，将现有技术水平提高到76.41% 。

In the First Winograd Schema Challenge (Morgenstern et al., 2016), participants used methods ranging from logical axioms and inference to neural network architectures enhanced with commonsense knowledge (Liu et al., 2017), but no system qualified for the second round. Recently, Trinh and Le (2018) have achieved the best results on the Pronoun Disambiguation Problem and Winograd Schema Challenge datasets, achieving 70% and 63.7%, respectively, which are 3 percentage points and 11 percentage points above Liu et al.’s (2017)’s previous state of the art. Their model is an ensemble of word-level and character-level recurrent language models, which, despite not being trained on coreference data, encode commonsense as part of the more general language modeling task. It is unclear how these systems perform on naturally occurring ambiguous pronouns. For example, Trinh and Le’s (2018) system relies on choosing a candidate from a pre-specified list, and it would need to be extended to handle the case that the pronoun does not corefer with any given candidate. By releasing GAP, we aim to foster research in this direction, and set several competitive baselines without using targeted resources.

在第一个 Winograd 模式挑战(Morgenstern et al. ，2016)中，参与者使用了从逻辑公理和推理到用常识知识增强的神经网络架构的方法(Liu et al. ，2017) ，但是没有一个系统能够胜任第二轮。最近，Trinh 和 Le (2018)在代词消歧问题和 Winograd 模式挑战数据集上取得了最好的结果，分别达到了70% 和63.7% ，比 Liu 等人(2017)以前的水平高出3个百分点和11个百分点。他们的模型是单词级和字符级循环语言模型的集合，尽管没有受过共指数据的训练，但将常识编码为更一般的语言建模任务的一部分。目前尚不清楚这些系统如何执行自然发生的歧义代词。例如，Trinh 和 Le (2018)系统依赖于从预先指定的候选人名单中选择候选人，它需要扩展以处理代词不与任何特定候选人共指的情况。通过发布 GAP，我们的目标是促进这方面的研究，并在不使用目标资源的情况下设置几个竞争性基准。

2.3 Bias in Machine Learning

Although existing corpora have promoted research into coreference resolution, they suffer from gender bias. Specifically, of the over 2,000 gendered pronouns in the OntoNotes test corpus, less than 25% are feminine (Zhao et al., 2018). The imbalance is more pronounced on the development and training sets, with less than 20% feminine pronouns each. WikiCoref contains only 12% feminine pronouns. In the Definite Pronoun Resolution Dataset training data, 27% of the gendered pronouns are feminine, and the Winograd Schema Challenge datasets contain 28% and 33% feminine examples. Two exceptions are the recent WinoBias (Zhao et al., 2018) and Winogender schemas (Rudinger et al., 2018) datasets, which reveal how occupation-specific gender bias pervades in the majority of publicly available coreference resolution systems by including a balanced number of feminine pronouns that corefer with anti-stereotypical occupations (see Example (3), from WinoBias). These datasets focus on pronominal coreference where the antecedent is a nominal mention, whereas GAP focuses on relations where the antecedent is a named entity.

虽然现有的语料库已经促进了对共指消解的研究，但它们也存在性别偏见。具体来说，在 OntoNotes 测试语料库中的2000多个性别代词中，只有不到25% 是女性化的(赵等人，2018)。这种不平衡在发展和训练方面更加明显，每个方面的女性代词不到20% 。只有12% 的女性代词。在确定代词解析数据集的训练数据中，27% 的性别代词是女性化的，Winograd 模式挑战数据集包含28% 和33% 的女性化例子。两个例外是最近的 WinoBias (赵等人，2018年)和 Winogender schemas (鲁丁格等人，2018年)数据集，这些数据集揭示了大多数公开可用的共同参照解决系统中如何普遍存在职业特定性别偏见，包括与反常规职业共同参照的平衡数量的女性代词(见示例(3) ，来自 WinoBias)。这些数据集侧重于代词共指，其中先行词是名词性提及，而 GAP 侧重于先行词是命名实体的关系。

The salesperson sold some books to the librarian because she was trying the sell them.

The pervasive bias in existing datasets is concerning given that learned NLP systems often reflect and even amplify training biases (Bolukbasi et al., 2016; Caliskan et al., 2017; Zhao et al., 2017). A growing body of work defines notions of fairness, bias, and equality in data and machine-learned systems (Pedreshi et al., 2008; Hardt et al., 2016; Skirpan and Gorelick, 2017; Zafar et al., 2017), and debiasing strategies include expanding and rebalancing data (Torralba and Efros, 2011; Buda, 2017; Ryu et al., 2017; Shankar et al., 2017), and balancing performance across subgroups (Dwork et al., 2012). In the context of coreference resolution, Zhao et al. (2018) have shown how debiasing techniques (e.g., swapping the gender of male pronouns and antecedents in OntoNotes, using debiased word embeddings, balancing Bergsma and Lin’s [2006] gender list) succeed at reducing the gender bias of multiple off-the-shelf coreference systems.

现有数据集中普遍存在的偏见令人担忧，因为习得的 NLP 系统经常反映甚至放大训练偏见(borukbasi 等人，2016; Caliskan 等人，2017; Zhao 等人，2017)。越来越多的工作定义了数据和机器学习系统中公平、偏见和平等的概念(Pedreshi 等人，2008年; Hardt 等人，2016年; Skirpan 和 Gorelick，2017年; Zafar 等人，2017年) ，和去偏策略包括扩展和再平衡数据(Torralba 和 Efros，2011年; 比达，2017年; Ryu 等人，2017年; Shankar 等人，2017年) ，和跨越性能的平衡。在共指消解的背景下，Zhao 等人(2018)已经展示了去偏技术(例如，在 OntoNotes 中交换男性代词和先行词的性别，使用去偏的词嵌入，平衡 Bergsma 和 Lin 的[2006]性别列表)如何成功地减少多个现成共指系统的性别偏见。

We work towards fairness in coreference by releasing a diverse, gender-balanced corpus for ambiguous pronoun resolution and further investigating performance differences by gender, not specifically on pronouns with an occupation antecedent but more generally on gendered pronouns.

我们通过发布一个多样的、性别均衡的语料库来解决歧义代词，并进一步研究性别的表现差异，不是具体地研究有职业先行词的代词，而是更广泛地研究性别代词的表现差异，从而实现共指的公平性。

3 GAP Corpus

We create a corpus of 8,908 human-annotated ambiguous pronoun-name examples from Wikipedia. Examples are obtained from a large set of candidate contexts and are filtered through a multistage process designed to improve quality and diversity

我们从维基百科创建了一个包含8,908个带有人类注释的歧义代词的例子的语料库。例子是从一大组候选上下文中获得的，并通过一个旨在提高质量和多样性的多阶段过程进行筛选

We choose Wikipedia as our base dataset given its wide use in natural language understanding tools, but are mindful of its well-known gender biases. Specifically, less than 15% of biographical Wikipedia pages are about women. Furthermore, women are written about differently than men: For example, women’s biographies are more likely to mention marriage or divorce (Bamman and Smith, 2014), abstract terms are more positive in male biographies than female biographies (Wagner et al., 2016), and articles about women are less central to the article graph (Graells-Garrido et al., 2015).

我们选择维基百科作为我们的基本数据集，因为它在自然语言理解工具中有着广泛的用途，但是我们也注意到它众所周知的性别偏见。具体来说，维基百科中只有不到15% 的传记页面是关于女性的。此外，女性的写作方式与男性不同: 例如，女性的传记更有可能提到结婚或离婚(班曼和史密斯，2014年) ，抽象术语在男性传记中比在女性传记中更积极(瓦格纳等人，2016年) ，关于女性的文章在文章图表中不那么重要(格雷尔斯-加里多等人，2015年)。

3.1 Extraction and Filtering

Extraction targets three patterns, given in Table 1, that characterize locally ambiguous pronoun contexts. We limit to singular mentions, gendered non-reflexive pronouns, and names whose head tokens are different from one another. Additionally, we do not allow intruders: There can be no other compatible mention (by gender, number, and entity type) between the pronoun and the two names.

提取目标是表1中给出的三种模式，它们描述了局部歧义代词上下文的特征。我们限制使用单数提及、带性别的非反身代词，以及头部标记彼此不同的名称。此外，我们不允许入侵者: 在代词和两个名字之间不能有其他兼容的提及(按性别、数字和实体类型)。

To limit the success of naïve resolution heuristics, we apply a small set of constraints to focus on those pronouns that are truly hard to resolve.

为了限制天真决断启发式的成功，我们应用了一小组约束来关注那些真正难以解决的代词。

FINALPRO. Both names must be in the same sentence, and the pronoun may appear in the same or directly following sentence.

FINALPRO.两个名字必须出现在同一个句子中，代词可以出现在同一个句子中，也可以直接出现在句子的后面。
MEDIALPRO. The first name must be in the sentence directly preceding the pronoun and the second name, both of which are in the same sentence. To decrease the bias for the pronoun to be coreferential with the first name, the pronoun must be in an initial subordinate clause or be a possessive in an initial prepositional phrase.

MEDIALPRO.第一个名字必须在代词和第二个名字之前的句子中，它们都在同一个句子中。为了减少代词与名字相互指称的偏见，代词必须位于从句，或者是介词短语中的所有格。
INITIALPRO. All three mentions must be in the same sentence and the pronoun must be in an initial subordinate clause or a possessive in an initial prepositional phrase.

INITIALPRO.这三个词必须出现在同一个句子里，代词必须出现在从句或者介词短语的所有格中。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-UeLsXp6Y-1617692267193)(C:\Users\55099\AppData\Roaming\Typora\typora-user-images\image-20210406105549140.png)]

From the extracted contexts, we sub-sample those to send for annotation. We do this to improve diversity in five dimensions:

从提取的上下文中，我们对要发送注释的上下文进行子样本化处理。我们这样做是为了在五个方面提高多样性:

Page Coverage. We retain at most three examples per page–gender pair to ensure a broad coverage of domains.

网页覆盖率。我们保留每页最多三个例子-性别配对，以确保广泛的覆盖领域。
Gender. The raw pipeline extracts contexts with a m:f ratio of 9:1. We oversampled feminine pronouns to achieve a 1:1 ratio.5

性别。原始管道以9:1的 m: f 比例提取上下文。我们对女性代词进行过多取样，以达到1:1的比例。5
Extraction Pattern. The raw pipeline output contains seven times more FINALPRO contexts than MEDIALPRO and INITIALPROcombined, so we oversampled the latter two to lower the ratio to 6:1:1.

抽取模式。原始管道输出包含的 FINALPRO 上下文是 MEDIALPRO 和 INITIALPROcombined 的7倍，因此我们对后两个上下文进行了过采样，将比率降低到6:1:1。
Page Entity. Pronouns in a Wikipedia page often refer to the entity the page is about. We include such examples in our dataset but balance them 1:1 against examples that do not include mentions of the page entity.

页面实体。维基百科页面上的代词通常指的是该页面所涉及的实体。我们在数据集中包含了这样的例子，但是将它们与不提及页面实体的例子进行了1:1的比较。
Coreferent Name. To ensure that mention order is not a cue for systems, our final dataset is balanced for label — namely, whether Name A or Name B is the pronoun’s referent.

名称代号。为了确保提及顺序不是系统的线索，我们最后的数据集是平衡的标签---- 也就是说，名字 a 或名字 b 是代词的指称。

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-FZbyzX9y-1617692267196)(C:\Users\55099\AppData\Roaming\Typora\typora-user-images\image-20210406105634727.png)]

We applied these constraints to the raw extractions to select 8,604 contexts (17,208 examples) for annotation that were globally balanced in all dimensions (e.g., 1:1 gender ratio in MEDIALPRO extractions). Table 2 summarizes the diversity ratios obtained in the final dataset, whose compilation is described next

我们将这些约束应用到原始提取中，选择了8,604个上下文(17,208个示例)进行注释，这些上下文在所有方面都是全球平衡的(例如，MEDIALPRO 提取中的1:1性别比例)。表2总结了在最终数据集中获得的多样性比率，下面将描述其编译

3.2 Annotation

We used a pool of in-house raters for human annotation of our examples. Each example was presented to three workers, who selected one of five labels (Table 3). Full sentences of at least 50 tokens preceding each example were presented as context (prior context beyond a section break is not included). Rating instructions accompany the dataset release.

我们使用了一个内部评级员池对示例进行人工注释。每个例子都提供给三个工人，他们从五个标签中选择一个(表3)。每个例子之前至少有50个标记的完整句子作为上下文呈现(不包括节分隔符之前的上下文)。伴随数据集发布的评级指令。

Despite workers not being expert linguists, we find good agreement both within workers and between workers and an expert. Inter-annotator agreement was κ = 0.74 on the Fleiss et al. (2003) kappa statistic; in 73% of cases there was full agreement between workers, in 25% of cases two of three workers agreed, and only in 2% of cases was there no consensus. We discard the 194 cases with no consensus. On 30 examples rated by an expert linguist, there was agreement on 28 and one was deemed to be truly ambiguous with the given context.

尽管工作人员不是语言专家，但我们发现工作人员内部以及工作人员和专家之间都有很好的一致性。关于 Fleiss 等人(2003年)的 kappa 统计数据，注释者之间的一致性为 κ = 0.74; 73% 的情况下工人之间完全一致，25% 的情况下三个工人中有两个同意，只有2% 的情况下没有达成一致意见。我们抛弃了194个案例，没有达成共识。经语言专家评定的30个例子中，有28个例子得到了一致意见，其中一个例子被认为在特定背景下确实含糊不清。

To produce our final dataset, we applied additional high-precision filtering to remove some error cases identified by workers,6 and discarded the “Both" (no ambiguity) and “Not Sure" contexts. Given that many of the feminine examples received the “Both" label from referents having stage and married names (Example (4)), this unbalanced the number of masculine and feminine examples.

为了生成最终的数据集，我们应用了额外的高精度过滤去除了一些由工作者识别的错误案例，6并且丢弃了“两者”(没有歧义)和“不确定”上下文。由于许多女性例子都从具有阶段名和已婚名的指称者那里得到了“ Both”的标签(例子(4)) ，这使得男性例子和女性例子的数量不平衡。

(4) Ruby Buckton is a fictional character from the Australian Channel Seven soap opera Home and Away, played by Rebecca Breeds. She debuted . . .

To correct this, we discarded masculine examples to re-achieve 1:1 gender balance. Additionally, we imposed the constraint that there be one example per Wikipedia article per pronoun form (e.g., his), to reduce similarity between examples. The final counts for each label are given in the second column of Table 3. Given that the 4,454 contexts each contain two annotated names, this constitutes 8,908 pronoun–name pair labels.

为了纠正这一点，我们放弃了男性化的例子，重新实现1:1的性别平衡。此外，我们强加了一个约束条件，即每篇维基百科文章每个代词形式(例如，his)都有一个例子，以减少例子之间的相似性。每个标签的最终计数在表3的第二列中给出。考虑到4,454个上下文中每个都包含两个带注释的名字，这就构成了8,908个代词名称对标签。

4 Experiments

We set up the GAP challenge and analyze the applicability of a range of off-the-shelf tools. We find that existing resolvers do not perform well and are biased to favor better resolution of masculine pronouns. We empirically validate the observation that Transformer models (Vaswani et al., 2017) encode coreference relationships, adding to the results by Voita et al. (2018) on machine translation, and Trinh and Le (2018) on language modeling. Furthermore, we show they complement traditional linguistic cues such as syntactic distance and parallelism.

我们设置了 GAP 挑战，并分析了一系列现成工具的适用性。我们发现现有的解析器并不能很好地解析男性代词，而且偏向于更好地解析男性代词。我们通过实证验证了变压器模型(Vaswani et al. ，2017)编码相互参照关系的观察结果，增加了 Voita et al. (2018年)关于机器翻译的结果，以及 Trinh 和 Le (2018年)关于语言建模的结果。此外，我们发现它们补充了传统的语言线索，如句法距离和排比。

All experiments use the Google Cloud NL API7 for pre-processing, unless otherwise noted.

除非另有说明，所有实验都使用 Google Cloud NL API7进行预处理。

4.1 GAP Challenge

GAP is an evaluation corpus and we segment the final dataset into a development and test set of 4,000 examples each;8 we reserve the remaining 908 examples as a small validation set for parameter tuning. All examples are presented with the URL of the source Wikipedia page, allowing us to define two task settings: snippet-context in which the URL may not be used, and page-context in which it may. Although name spans are given in the data, we urge the community not to treat this as a gold mention or Winograd-style task. That is, systems should detect mentions for inference automatically, and access labeled spans only to output predictions.

GAP 是一个评估语料库，我们将最终的数据集分割成一个开发和测试集，每个集有4,000个例子; 8我们保留其余的908个例子作为一个小的验证集用于参数调整。所有示例都以维基百科源页面的 URL 展示，允许我们定义两个任务设置: 可能不使用 URL 的 snippet 上下文和可能使用 URL 的页面上下文。尽管数据中提供了名字的范围，但我们敦促社区不要将其视为一个黄金提名或 winograd 风格的任务。也就是说，系统应该自动检测推理的提及，并且标记为 span 的访问只能检测输出预测。

To reward unbiased modeling, we define two evaluation metrics: F1 score and Bias. Concretely, we calculate F1 score Overall as well as by the gender of the pronoun (Masculine and Feminine). Bias is calculated by taking the ratio of feminine to masculine F1 scores, typically less than 1.9

为了回报无偏建模，我们定义了两个评估指标: F1评分和偏差。具体来说，我们根据代词(阳性和阴性)的性别来计算 F1总分。偏差是通过计算女性与男性 F1分数的比值计算出来的，通常小于1.

4.2 Off-the-Shelf Resolvers

The first set of baselines we explore are four representative off-the-shelf coreference systems: the rule-based system of Lee et al. (2013) and three neural resolvers—Clark and Manning (2015),10 Wiseman et al. (2016),11 and Lee et al. (2017).12 All were trained on OntoNotes and run in as close to their out-of-the-box configuration as possible.13 System clusters were scored against GAP examples according to whether the cluster

我们探索的第一组基线是四个具有代表性的现成共同参照系统: Lee 等人(2013年)的基于规则的系统和三个神经解析器ー clark 和 Manning (2015年) ，10 Wiseman 等人(2016年) ，11和 Lee 等人(2017年)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-j5S1bQaw-1617692267200)(C:\Users\55099\AppData\Roaming\Typora\typora-user-images\image-20210406111930993.png)]

containing the target pronoun also contained the correct name (TP) or the incorrect name (FP), using mention heads for alignment. We report here their performance on GAP as informative baselines, but expect retraining on Wikipedia-like texts to yield an overall improvement in performance. (This remains as future work.)

包含目标代词也包含正确的名称(TP)或不正确的名称(FP) ，使用提及头部对齐。我们在这里报告他们在 GAP 上的表现，作为信息性基准，但是期望在维基百科式的文本上进行再培训，以全面提高性能。(这仍然是未来的工作。)

Table 4 shows that all systems struggle on GAP. That is, despite modeling improvements in recent years, ambiguous pronoun resolution remains a challenge. We note particularly the large difference in performance between genders, which traditionally has not been tracked but has fairness implications for downstream tasks using these publicly available models.

表4显示所有系统都在 GAP 上挣扎。也就是说，尽管近年来模型建模有所改进，但模糊代词解析仍然是一个挑战。我们特别注意到性别之间的表现差异很大，这在传统上没有被追踪，但对使用这些公开可用模型的下游任务具有公平性影响。

Table 5 provides evidence that this low performance is not solely due to domain and task differences between GAP and OntoNotes. Specifically, with the exception of Clark and Manning (2015), the table shows that system performance on pronoun–name coreference relations in the OntoNotes test set14 is not vastly better than GAP. One possible reason that in-domain OntoNotes performance and out-of-domain GAP performance are not very different could be that state-of-the-art systems are highly tuned for resolving names rather than ambiguous pronouns.

表5提供的证据表明，这种低性能并不完全是由于 GAP 和 OntoNotes 之间的域和任务差异。具体来说，除了 Clark 和 Manning (2015年)之外，该表显示，在 OntoNotes 测试集14中，系统在代词名称共指关系方面的性能并不比 GAP 好多少。域内 OntoNotes 性能和域外 GAP 性能没有很大差别的一个可能的原因是，最先进的系统能够高度调整以解决名称问题，而不是模糊的代词。

Further, the relative performance of the four systems is different on GAP than on OntoNotes. Particularly interesting is that the current strongest system overall for OntoNotes, namely, Lee et al. (2017), scores best on GAP pronouns but has the largest gender bias on OntoNotes. This perhaps is not surprising given the dominance of masculine examples in that corpus. It is outside the scope of this paper to provide an in-depth analysis of the data and modeling decisions that cause this bias; instead, we release GAP to address the measurement problem behind the bias.

此外，四种系统在 GAP 上的相对性能与 OntoNotes 上的不同。特别有趣的是，目前最强大的系统总体为 OntoNotes，即李等人(2017年) ，得分最好的 GAP 代词，但有最大的性别偏见 OntoNotes。这也许并不令人惊讶，因为男性的例子在该语料库中占主导地位。对导致这种偏差的数据和建模决策进行深入分析超出了本文的范围; 相反，我们发布 GAP 来解决这种偏差背后的测量问题。

Figure 1 compares the recall/precision trade-off for each system split by Masculine and Feminine examples, as well as combined (Overall). Also shown is a simple syntactic Parallelism heuristic in which subject and direct object pronoun are resolved to names with the same grammatical role (see §4.3). In this visualization, we see a further factor contributing to the low performance of off-the-shelf systems, namely, their low recall. That is, whereas personal pronouns are overwhelmingly anaphoric in both OntoNotes and Wikipedia texts, OntoNotes-trained models are conservative. This observation is consistent with the results for Lee et al. (2013) on the Definite Pronoun Resolution Dataset (Rahman and Ng, 2012), on which the system scored 47.2% F1,15 failing to beat a random baseline due to conservativeness.

图1比较了按阳性和阴性示例以及组合(Overall)划分的每个系统的召回率/精度折衷。还显示了一个简单的句法平行启发式，其中主语和直接宾语代词被解析为具有相同语法角色的名称(见4.3)。在这个可视化中，我们看到了导致现成系统低性能的另一个因素，即它们的低召回率。也就是说，人称代词在《奥托尼奥特》和《维基百科》文本中绝大多数是回指，而奥托尼奥特训练出来的模型是保守的。这一观察结果与 Lee 等人(2013年)关于确定代词分辨数据集(Rahman 和 Ng，2012年)的结果一致，在这个数据集上，系统得分为47.2% F1,15由于保守性而未能超过随机基线。

4.3 Coreference-Cue Baselines

To understand the shortcomings of state-of-the-art coreference systems on GAP, the upper sections of Table 6 consider several simple baselines based on traditional cues for coreference.

为了理解现有的关于 GAP 的共参照系统的缺点，表6的上面部分考虑了几个基于传统线索的简单基线以供共参照。

To calculate these baselines, we first detect candidate antecedents by finding all mentions of PERSON entity type, NAME mention type (headed by a proper noun), and, for structural cues, that are not in a syntactic position which precludes coreference with the pronoun. We do not require gender match because gender annotations are not provided by the Google Cloud NL API and, even if they were, gender predictions on last names (without the first name) are not reliable in the snippetcontext setting. Second, we select among the candidates using one of the heuristics described next.

为了计算这些基线，我们首先检测候选先行词，找出所有提到的 PERSON 实体类型，NAME 提及类型(以专有名词为首) ，以及，对于结构线索，不在句法位置，排除与代词共指。我们不需要性别匹配，因为 Google Cloud NL API 没有提供性别注释，即使提供了，在 snippetcontext 环境下，对姓氏(没有名字)的性别预测也不可靠。其次，我们使用下面描述的启发式方法在候选人中进行选择。

For scoring purposes, we do not require exact string match for mention alignment—that is, if the selected candidate is a substring of a given name (or vice versa), we infer a coreference relation between that name and the target pronoun.

为了得分的目的，我们不需要提及对齐的确切字符串匹配---- 也就是说，如果选定的候选字符串是给定名称的子字符串(反之亦然) ，我们推断该名称和目标代词之间的共指关系。

Surface Cues Baseline cues that require only access to the input text are:

仅需要访问输入文本的表面线索 Baseline 线索包括:

RANDOM. Select a candidate uniformly at random

选择一个候选人均匀随机
TOKEN DISTANCE. Select the closest candidate to the pronoun, with distance measured as the number of tokens between spans.
TOPICAL ENTITY. Select the closest candidate that contains the most frequent token string among extracted candidates.

The performance of RANDOM (41.5 Overall) is lower than an otherwise possible guess rate of ∼50%. This is because the baseline considers all possible candidates, not just the two annotated names. Moreover, the difference between masculine and feminine examples suggests that there are more distractor mentions in the context of feminine pronouns in GAP. To measure the impact of pronoun context, we include performance on the artificial gold-two-mention setting, where only the two name spans are candidates for inference (Table 7). RANDOM is indeed closer here to the expected 50% and other baselines are closer to gender-parity.

RANDOM (总计41.5)的性能低于其他可能的猜测率ーー50% 。这是因为基线考虑所有可能的候选人，而不仅仅是两个带注释的名字。此外，男性和女性的例子之间的差异表明，有更多的干扰提及在语境中的女性代词的 GAP。为了衡量代词上下文的影响，我们将性能包括在人工黄金两个提及设置中，其中只有两个名字跨度是候选推理(表7)。RANDOM 确实更接近预期的50% ，其他基准更接近性别均等。

TOKEN DISTANCE and TOPICAL ENTITY are only weak improvements above RANDOM, validating that our dataset creation methodology controlled for these factors.

TOKEN DISTANCE 和 TOPICAL ENTITY 仅仅是 RANDOM 之上的微弱改进，验证了我们的数据集创建方法为这些因素所控制。

Structural Cues Baseline cues that may additionally access syntactic structure are:

能够进一步进入句法结构的基线线索有:

SYNTACTIC DISTANCE. Select the syntactically closest candidate to the pronoun. Back off to TOKEN DISTANCE.

选择在句法上与代词最接近的候选词，退回到 TOKEN DISTANCE。
PARALLELISM. If the pronoun is a subject or direct object, select the closest candidate with the same grammatical argument. Back off to SYNTACTIC DISTANCE.

平行性。如果代词是主语或直接宾语，则选择语法论点相同的最接近的候选人。退回到句法距离。

Both cues yield strong baselines comparable to the strongest OntoNotes-trained systems (cf. Table 4). In fact, Lee et al. (2017) and PARALLELISM produce remarkably similar output: of the 2,000 example pairs in the development set, the two have completely opposing predictions (i.e., Name A vs. Name B) on only 325 examples. Further, the cues are markedly gender-neutral, improving the Bias metric by 9 percentage points in the standard task formulation and to parity in the gold-two-mention case. In contrast to surface cues, having the full candidate set is helpful: mention alignment via a non-indicated candidate successfully scores 69% of PARALLELISM predictions.

这两个线索产生强大的基线相当于最强大的 ontonotes-训练系统(参见表4)。事实上，Lee 等人(2017)和 PARALLELISM 产生了非常相似的输出: 在开发集中的2000个示例对中，两个对只有325个示例具有完全相反的预测(即，Name a vs. Name b)。此外，这些线索显然是中性的，在标准任务制定中偏见指标提高了9个百分点，在提金两项的情况下达到了均等。与表面线索相反，拥有完整的候选集是有帮助的: 通过一个未指明的候选人成功地得到69% 的 PARALLELISM 预测。/n/n

Wikipedia Cues To explore the page-context setting, we consider a Wikipedia-specific cue:

维基百科提示为了探索页面上下文设置，我们考虑一个维基百科特有的提示:

URL. Select the syntactically closest candidate that has a token overlap with the page title. Back off to PARALLELISM

选择在语法上与页面标题重叠的最接近的候选者。退回到 PARALLELISM

The heuristic gives a performance gain of 2% overall compared to PARALLELISM. That the feature is not more helpful again validates our methodology for extracting diverse examples. We expect future work to greatly improve on this baseline by using the wealth of cues in Wikipedia articles, including page text.

与 PARALLELISM 相比，这种启发式算法的总体性能提高了2% 。该特征没有更多的帮助，再次验证了我们的方法提取不同的例子。我们期望未来的工作能够通过利用维基百科文章中丰富的线索，包括页面文本，在这个基准上大大改进。

4.4 Transformer Models for Coreference

The recent Transformer model (Vaswani et al., 2017) demonstrated tantalizing representations for coreference: When trained for machine translation, some self-attention layers appear to show stronger attention weights between coreferential elements.17 Voita et al. (2018) found evidence for this claim for the English pronouns it, you, and I in a movie subtitles dataset (Lison et al., 2018). GAP allows us to explore this claim on Wikipedia for ambiguous personal pronouns. To do so, we investigate the heuristic:

最近的变形金刚模型(Vaswani et al. ，2017)展示了诱人的共指表征: 当训练机器翻译，一些自我注意层似乎显示出更强的共指元素之间的注意力权重。17 Voita et al. (2018)在一个电影字幕数据集(Lison et al. ，2018)中找到了这种声称的证据。GAP 允许我们探究维基百科上模棱两可的人称代词。为了做到这一点，我们研究了启发式问题:

TRANSFORMER. Select the candidate that attends most to the pronoun.

选择最关心代词的候选人。

The Transformer model underlying our experiments is trained for 350k steps on the 2014 English-German NMT task,18 using the same settings as Vaswani et al. (2017). The model processes texts as a series of subtokens (text fragments the size of a token or smaller) and learns three multi-head attention matrices over these, two self-attention matrices (one over the subtokens of the source sentences and one over those of the target sentences), and a cross-attention matrix between the source and target. Each attention matrix is decomposed into a series of feedforward layers, each composed of discrete heads designed to specialize for different dimensions in the training signal. We input GAP snippets as English source text and extract attention values from the source self-attention matrix; the target side (German translations) is not used.

作为我们实验基础的变压器模型被训练为2014年英语-德语 NMT 任务的350k 步骤，18使用与 Vaswani 等人(2017)相同的设置。该模型将文本处理为一系列子标记(文本片段大小为标记或更小) ，并在这些子标记上学习三个多头注意矩阵，两个自注意矩阵(一个在源句子的子标记上，一个在目标句子的子标记上) ，以及源句子和目标句子之间的交叉注意矩阵。每个注意力矩阵被分解成一系列的前馈层，每个前馈层由专门针对训练信号中的不同维度设计的离散头部组成。我们输入 GAP 片段作为英文源文本，并从源自我注意力矩阵中提取注意力值; 不使用目标方(德语翻译)。

We calculate the attention between a name and pronoun to be the mean over all subtokens in these spans; the attention between two subtokens is the sum of the raw attention values between all occurrences of those subtoken strings in the input snippet. These two factors control for variation between Transformer models and the spreading of attention between different mentions of the same entity.

我们计算一个名字和代词之间的注意力是这些区间中所有子标记的平均值; 两个子标记之间的注意力是输入片段中这些子标记字符串所有出现之间的原始注意力值之和。这两个因素控制着变压器模型之间的差异和注意力在同一实体的不同提及之间的分散。

TRANSFORMER-SINGLE Table 8 gives the performance of the TRANSFORMER heuristic over each self-attention head on the development dataset. Consistent with the observations by Vaswani et al. (2017), we observe that the coreference signal is localized on specific heads and that these heads are in the deep layers of the network (e.g., L3H7). During development, we saw that the specific heads which specialize for coreference are different between different models.

TRANSFORMER-SINGLE 表8给出了 TRANSFORMER 启发式在开发数据集上的每个自我注意头上的性能。与 Vaswani 等人(2017)的观察结果一致，我们观察到共参照信号位于特定的头部，这些头部位于网络的深层(例如，L3H7)。在开发过程中，我们发现不同型号之间专门用于共同参考的特定磁头是不同的。

The TRANSFORMER-SINGLE baseline in Table 6 is the one set by L3H7 in Table 8. Despite not having access to syntactic structure, TRANSFORMERSINGLE far outperforms all surface cues above.

表6中的 TRANSFORMER-SINGLE 基线是表8中由 L3H7设置的基线。尽管不能使用语法结构，但 transmersingle 远胜于上述所有表面提示。

That is, we find evidence for the claim that Transformer models implicitly learn language understanding relevant to coreference resolution. Even more promising, we find that the instances of coreference that TRANSFORMER-SINGLE can handle is substantially different from those of PARALLELISM; see Table 9.

也就是说，我们发现了一些证据，证明 Transformer 模型隐式地学习了与共指解析相关的语言理解。更有希望的是，我们发现 TRANSFORMER-SINGLE 可以处理的共引用实例与 PARALLELISM 实例有本质上的不同; 请参见表9。

TRANSFORMER-MULTI We learn to compose the signals from different self-attention heads using extra tree classifiers (Geurts et al., 2006).19 We choose this classifier because we have little available training data and a small feature set. Specifically, for each candidate antecedent, we:

我们学习使用额外的树分类器从不同的自我注意头部组合信号(Geurts et al. ，2006)。19我们选择这个分类器是因为我们没有可用的训练数据和一个小的特征集。具体来说，对于每个候选人的前提条件，我们:

Extract one feature for each of the 48 Transformer heads. The feature value is True if there is a substring overlap between the candidate and the prediction of TRANSFORMERSINGLE.

为48个变压器头中的每个提取一个特征。如果候选字符串和 TRANSFORMERSINGLE 的预测之间存在子字符串重叠，则特征值为 True。/n/n
Use the χ 2 statistic to reduce dimensionality. We found k = 3 worked well.

使用 χ2统计量进行降维，我们发现 k = 3效果很好。
Learn an extra trees classifier over these three features with the validation dataset.

利用验证数据集，学习一个额外的树分类器，覆盖这三个特性。

That TRANSFORMER-MULTI is stronger than TRANSFORMER-SINGLE in Table 6 suggests that different self-attention heads encode different dimensions of the coreference problem. Though the gain is modest when all mentions are under consideration, Table 7 shows a 4.2 percentage point overall improvement over TRANSFORMERSINGLE for the gold-two-mention task. Future work could explore filtering the candidate list presented to Transformer models to reduce the impact of distractor mentions in a pronoun’s context— for example, by gender in the page-context setting. It is also worth stressing that these models are trained on very little data (the GAP validation set). These preliminary results suggest that learned models incorporating such features from the Transformer and using more data are worth exploring further.

表6中的 TRANSFORMER-MULTI 比 TRANSFORMER-SINGLE 更强，这表明不同的自我注意头脑编码共指问题的不同维度。尽管在考虑所有提及的情况下，收益并不大，但是表7显示，与 TRANSFORMERSINGLE 相比，gold-two-mention 任务总体上有4.2个百分点的改进。未来的工作可以探索如何筛选提交给 Transformer 模型的候选列表，以减少代词上下文中干扰词提及的影响，例如，通过页面上下文设置中的性别来筛选。同样值得强调的是，这些模型是基于很少的数据(GAP 验证集)进行训练的。这些初步的结果表明，从变压器中吸收这些特性并使用更多数据的学习模型值得进一步探索。

4.5 GAP Benchmarks

Table 10 sets the baselines for the GAP challenge. We include the off-the-shelf system that performed best Overall on the development set (Lee et al., 2017), as well as our strongest baseline for the two task settings, PARALLELISM20 and URL

表10列出了 GAP 挑战的基线。我们包括现成的系统，表现最好的整体开发集(李等人，2017年) ，以及我们最强的基线为两个任务设置，PARALLELISM20和 URL

We note that strict comparisons cannot be made between our snippet-context baselines given that Lee et al. (2017) has access to OntoNotes annotations that we do not, and we have access to pronoun ambiguity annotations that Lee et al. (2017) do not.

我们注意到，由于 Lee 等人(2017年)可以访问 OntoNotes 注释，而我们不能访问 OntoNotes 注释，而且我们可以访问 Lee 等人(2017年)不能访问的代词歧义注释，因此我们不能对代词歧义注释进行严格的比较。

5 Error Analysis

We have shown that GAP is challenging for both off-the-shelf systems and our baselines. To assess the variance between these systems and gain a more qualitative understanding of what aspects of GAP are challenging, we use the number of off-the-shelf systems that agree with the rater-provided labels (Agreement with Gold) as a proxy for difficulty. Table 11 breaks down the name-pronoun examples in the development set by Agreement with Gold (the smaller the agreement the harder the example).

我们已经表明，GAP 对现成系统和我们的基线都具有挑战性。为了评估这些系统之间的差异，并更加定性地了解 GAP 的哪些方面具有挑战性，我们使用同意评级员提供的标签(与黄金协议)的现成系统的数量作为难度的代理。表11分解了由 Agreement with Gold 设置的开发中的名称代词示例(协议越小，示例越难)。

Agreement with Gold is low (average 2.1) and spread. Less than 30% of the examples are successfully solved by all systems (labeled Green), and just under 15% are so challenging that none of the systems gets them right (Red). The majority are in between (Yellow). Many Green cases have syntactic cues for coreference, but we find no systematic trends within Yellow.

与黄金的协议是低(平均2.1)和利差。只有不到30% 的例子被所有系统成功解决(标记为绿色) ，只有不到15% 的例子是如此具有挑战性，以至于没有一个系统正确解决它们(红色)。大多数介于两者之间(黄色) 。许多绿色案例具有句法线索作为共指，但我们在黄色案例中没有发现系统趋势。

Table 12 provides a fine-grained analysis of 75 Red cases. When labeling these cases, two important considerations emerged: (1) labels often overlap, with one example possibly fitting into multiple categories; and (2) GAP requires global reasoning—cues from different entity mentions work together to build a snippet’s interpretation. The Red examples in particular exemplify the challenge of GAP, and point toward the need for multiple modeling strategies to achieve significantly higher scores on the data set.

表12提供了对75个 Red 案例的细粒度分析。在对这些案例进行标记时，出现了两个重要的考虑因素: (1)标记常常重叠，其中一个案例可能适合于多个类别; (2) GAP 需要全局推理ーー来自不同实体的提示共同构建片段的解释。Red 示例特别说明了 GAP 的挑战，并指出需要多种建模策略以在数据集上获得更高的得分。

6 Conclusions

We have presented a data set and a set of strong baselines for a new coreference task, GAP. We designed GAP to represent the challenges posed by real-world text, in which ambiguous pronouns are important and difficult to resolve. We high lighted gaps in the existing state of the art, and proposed the application of Transformer models to address these. Specifically, we show how traditional linguistic features and modern sentence encoder technology are complementary.

我们提出了一个新的共同参考任务，GAP 的数据集和一组强大的基线。我们设计的差距代表了现实世界的文本所带来的挑战，其中模糊的代词是重要的和难以解决。我们在现有的艺术状态高亮空白，并提出应用变压器模型来解决这些问题。具体来说，我们展示了传统的语言特征和现代的句子编码技术是如何互补的。

Our work contributes to the emerging body of work on the impact of bias in machine learning. We saw systematic differences between genders in analysis; this is consistent with many studies that have called out differences in how men and women are discussed publicly. By rebalancing our data set for gender, we hope to reward systems that are able to capture these complexities fairly.

我们的工作有助于研究机器学习中偏见的影响。我们在分析中看到了性别之间的系统性差异; 这与许多研究一致，这些研究指出了公开讨论男性和女性的差异。通过重新平衡我们的性别数据集，我们希望奖励能够公平地捕捉这些复杂性的系统。

It has been outside the scope of this paper to explore bias in other dimensions, to analyze coreference in other languages, and to study the impact on downstream systems of improved coreference resolution. We look forward to future work in these directions.

探讨其他维度的偏误，分析其他语言中的共指现象，以及研究改进的共指消解对下游系统的影响，已经超出了本文的研究范围。我们期待着今后在这些方面的工作。

Acknowledgments

你可能感兴趣的:(偏见研究,人工智能,自然语言处理)

DeepSeek+元脑企智大模型一体机（培训交流）互联网之路. deepseek
互联网各领域资料分享专区(不定期更新)：Sheet获取方式：请用手机点击链接进行保存，会自动赠送1TB内存，若链接生效请及时后台留言，谢谢。链接如下（夸克网盘）：夸克网盘分享免责声明：1）所共享的所有内容均来源于网络共享资源，版权归原作者或企业所有，下载的任何资源仅能用于学习和研究目的，请勿用于商业用途，否则后果自负。2）尊重版权，这些资源仅供个人学习和交流使用，请勿用于商业用途。本文档仅做整理。
大规模语言模型从理论到实践开源指令数据集 AI天才研究院 DeepSeek R1 &大数据AI人工智能大模型 AI大模型企业级应用开发实战 AI大模型应用入门实战与进阶计算科学神经计算深度学习神经网络大数据人工智能大型语言模型 AI AGI LLM Java Python 架构设计 Agent RPA
大规模语言模型从理论到实践开源指令数据集1.背景介绍大规模语言模型（LargeLanguageModels,LLMs）近年来在自然语言处理（NLP）领域取得了显著的进展。诸如GPT-3、BERT等模型在各种任务中表现出色，从文本生成到翻译，再到问答系统，几乎无所不能。这些模型的成功离不开庞大的训练数据集和复杂的算法架构。然而，如何有效地构建和利用开源指令数据集，仍然是一个值得深入探讨的话题。2.核
使用Python和LangChain创建可调用工具的智能对话机器人：全面指南 m0_57781768 python langchain 机器人
使用Python和LangChain创建可调用工具的智能对话机器人：全面指南在当今技术迅猛发展的时代，人工智能（AI）和自然语言处理（NLP）技术的应用范围越来越广。尤其是对话机器人，它们不仅能与人类进行自然交互，还能通过调用外部API与各种系统对接，为用户提供更加智能和灵活的服务。本文将通过一系列实例和代码演示，向您展示如何利用Python编程语言和LangChain框架，创建能够使用外部工具（
机器视觉|手势识别：基于YOLOv5的手部检测与MediaPipe的关键点估计 RockLiu@805 机器视觉 YOLO
手势识别：基于YOLOv5的手部检测与MediaPipe的关键点估计在实时计算机视觉应用中，手部检测与关键点估计是实现手势识别的重要基础。本文将介绍一种基于深度学习的手势识别技术方案，通过结合YOLOv5物体检测网络和MediaPipe关键点检测框架，实现实时的手部定位与关键点提取。技术背景gesturerecognition作为计算机视觉领域的重要研究方向，在HCI（人机交互）、遥控行为分析、虚
毕业论文如何进行自查？ kexiaoya2013 论文笔记论文阅读 AIGC
又到了一年的毕业季，论文提交也进入了倒计时。不少同学在最后关头才发现论文存在各种问题。与其事后补救，还不如提前进行论文自查。一、内容自查1、核心要素是否完整检查论文标题是否简洁明确，摘要是否概括了研究目的、方法、结论，关键词是否精准覆盖了主题。2、逻辑是否清晰连贯从引言到结论，需层层递进。引言需说明研究背景与意义，主体章节需有明确的研究方法、数据分析和讨论，结论应总结成果并提出展望，避免与正文内容
智能编程新时代：DeepSeek加持下的开发工具革新 inscode_013
最新接入DeepSeek-V3模型，点击下载最新版本InsCodeAIIDE标题：智能编程新时代：DeepSeek加持下的开发工具革新在当今快速发展的科技领域，编程工具的智能化已经成为不可阻挡的趋势。随着人工智能技术的不断进步，开发者们迎来了前所未有的机遇和挑战。其中，集成DeepSeek模型的AI开发工具，正以其强大的功能和便捷的操作，引领着编程方式的革命性变革。本文将探讨这种创新工具的应用场景
智能编程新时代：DeepSeek加持下的开发利器 inscode_099
最新接入DeepSeek-V3模型，点击下载最新版本InsCodeAIIDE智能编程新时代：DeepSeek加持下的开发利器在当今快速发展的科技时代，编程工具的智能化已经成为不可阻挡的趋势。随着人工智能技术的不断进步，开发者们不再满足于传统的IDE（集成开发环境），而是渴望更加智能、高效的开发工具。在这种背景下，一款集成了DeepSeek-V3模型的AI开发工具应运而生，它不仅能够大幅提升开发效率
pyhon基于django/flask网上摄影工作室Django-SpringBoot-php-Node.js-flask QQ_511008285 django flask spring boot php python laravel node.js
目录技术栈介绍具体实现截图系统设计研究方法：设计步骤设计流程核心代码部分展示研究方法详细视频演示试验方案论文大纲源码获取/详细视频演示技术栈介绍Django-SpringBoot-php-Node.js-flask本课题的研究方法和研究步骤基本合理，难度适中，本选题是学生所学专业知识的延续，符合学生专业发展方向，对于提高学生的基本知识和技能以及钻研能力有益。该学生能够在预定时间内完成该课题的设计。
智能化编程新时代，DeepSeek加持下的开发革命 ObsidianRaven13
最新接入DeepSeek-V3模型，点击下载最新版本InsCodeAIIDE标题：智能化编程新时代，DeepSeek加持下的开发革命随着人工智能技术的飞速发展，编程领域正迎来一场前所未有的变革。从传统的手动编码到如今的智能辅助开发，这一过程不仅极大地提升了开发效率，还让编程变得更加简单和高效。在众多新兴工具中，基于DeepSeek模型的智能编程助手正在成为开发者的新宠。今天，我们将探讨这种工具如何
智能化开发新时代：DeepSeek加持下的编程革命 MoonbeamOwl67
最新接入DeepSeek-V3模型，点击下载最新版本InsCodeAIIDE标题：智能化开发新时代：DeepSeek加持下的编程革命在当今快速发展的科技时代，软件开发已经成为推动社会进步的重要动力。然而，对于许多开发者而言，编写高质量的代码仍然是一项充满挑战的任务。从复杂的算法设计到繁琐的调试过程，每一个环节都需要耗费大量的时间和精力。而随着人工智能技术的迅猛发展，一种全新的编程方式正在悄然改变这
【杨乐昆何凯明AI论文】没有归一化的Transformer模型东临碣石82 人工智能 transformer 深度学习
摘要：归一化层在现代神经网络中无处不在，长期以来一直被视为不可或缺的组成部分。本研究表明，通过使用一种极其简单的技术，没有归一化的Transformer模型可以达到相同或更好的性能。我们引入了动态Tanh（DyT），这是一种逐元素操作，表示为DyT(x)=tanh(alphax)，作为Transformer中归一化层的即插即用替代品。DyT的灵感来源于这样一个观察：Transformer中的层归一
基于深度学习的个性化新闻推荐系统设计与实现计算机毕设 sj52abcd 深度学习课程设计人工智能毕业设计
博主介绍：✌专注于VUE,小程序，安卓，Java,python,物联网专业，有17年开发经验，长年从事毕业指导，项目实战✌选取一个适合的毕业设计题目很重要。✌关注✌私信我✌具体的问题，我会尽力帮助你。研究的背景:随着互联网技术的发展和普及,人们越来越依赖互联网获取信息。然而,随着信息量的不断增加,用户在查找新闻时面临着信息过载的问题。为了解决这个问题,个性化新闻推荐系统被广泛应用。个性化新闻推荐系
python 人工智能实战案例 2401_86114612 pygame python java
大家好，今天我们要分享，python编程人工智能小例子python人工智能100例子，一起探索吧！1.背景介绍概述在这个世纪，人类已经处于数字化的时代，而这也让很多其他行业都进入了数字化领域python列表有哪些基本操作,python列表功能很重要吗。其中包括游戏行业。游戏行业的蓬勃发展促使机器学习的产生，通过计算机能够进行高效率地模拟人类的学习、决策过程，不断升级提升人类的能力。游戏领域中的AI
23章9节：分层随机抽样及其在R语言中的实现与验证 DAT｜R科学与人工智能用R探索医药数据科学 r语言开发语言 r-4.2.1 机器学习人工智能算法
在统计学和数据科学的实际工作中，抽样方法始终扮演着至关重要的角色。如何从庞大的总体中获取具有代表性的样本，一直是数据分析过程中需要面对的核心问题之一。分层随机抽样作为一种常用的抽样方法，因其能够针对总体中的不同亚群体（层）进行有针对性的抽样，从而提高样本代表性、降低抽样误差，被广泛应用于社会调查、市场研究、医学试验等各个领域。本文旨在系统地阐述分层随机抽样的理论基础、抽样方法及其在R语言中的实现，
Python 在人工智能领域的实际6大案例 Solomon_肖哥弹架构人工智能机器学习 python
Python作为一种功能强大且易于学习的编程语言，在人工智能（AI）领域得到了广泛的应用。从机器学习到深度学习，从自然语言处理到计算机视觉，Python提供了丰富的库和框架，使得开发者能够快速实现各种AI应用。本文将通过多个实际案例，展示Python在人工智能领域的强大功能和应用前景。二、案例一：手写数字识别（MNIST）1.背景介绍手写数字识别是机器学习领域的经典入门项目，MNIST数据集包含了
基于AI算法实现的情感倾向分析的方法程序员奇奇计算机毕设人工智能算法
完整代码：https://download.csdn.net/download/pythonyanyan/87430621背景目前，情感倾向分析的方法主要分为两类：一种是基于情感词典的方法；一种是基于机器学习的方法，如基于大规模语料库的机器学习。前者需要用到标注好的情感词典，英文的词典有很多，中文主要有知网整理的情感词典Hownet和台湾大学整理发布的NTUSD两个情感词典，还有哈工大信息检索研究
机器学习算法实战——天气数据分析（主页有源码）喵了个AI 机器学习实战机器学习算法数据分析
✨个人主页欢迎您的访问✨期待您的三连✨✨个人主页欢迎您的访问✨期待您的三连✨✨个人主页欢迎您的访问✨期待您的三连✨1.引言天气数据分析是气象学和数据科学交叉领域的一个重要研究方向。随着大数据技术的发展，气象数据的采集、存储和分析能力得到了显著提升。机器学习算法在天气数据分析中的应用，不仅能够提高天气预报的准确性，还能为气候研究、灾害预警等提供有力支持。本文将介绍机器学习在天气数据分析中的应用，探讨
基于人工智能的智能视频内容分析系统小彭律师 python
基于人工智能的智能视频内容分析系统系统功能1.视频数据预处理降噪与滤波：去除视频画面中的噪点和干扰画质增强：调整亮度、对比度和色彩平衡关键帧提取：减少数据量，提取关键信息2.目标识别检测基于深度学习模型（YOLO、FasterR-CNN等）识别多种目标类型（人、车辆、物品等）适应不同光照、角度和遮挡情况输出目标位置、类别和置信度3.行为分析研判基于时序模型（LSTM、3D-CNN等）分析目标动作规
AGI的学习与适应能力 AGI大模型与大数据研究院计算科学神经计算深度学习神经网络大数据人工智能大型语言模型 AI AGI LLM Java Python 架构设计 Agent RPA
“AGI的学习与适应能力”1.背景介绍1.1人工通用智能(AGI)的定义人工通用智能(ArtificialGeneralIntelligence,AGI)是指能够像人类一样具有广泛的理解和学习能力、可以完成多种复杂任务的人工智能系统。与狭义人工智能(NarrowAI)专注于特定领域和特定任务不同,AGI旨在模拟人类整体认知能力,包括感知、推理、学习、计划、创造力和自我意识等。1.2AGI的重要性和
Deepseek 使用攻略隔窗听雨眠人工智能
人工智能飞速发展的时代，新的技术和工具不断涌现，Deepseek便是其中备受瞩目的存在。它以强大的功能和出色的表现，吸引了众多用户的关注。今天，就让我们一起来深入了解一下Deepseek究竟是什么，以及如何使用它。一、什么是DeepseekDeepseek（深度求索）是一家位于杭州的人工智能公司，同时也是一系列大语言模型的统称。它由中国对冲基金高毅资产创立并提供支持，其模型均以开源形式发布。Dee
手撕multi-head self attention 代码心若成风、自然语言处理语言模型 transformer
在深度学习和自然语言处理领域，多头自注意力（Multi-HeadSelf-Attention）机制是Transformer模型中的核心组件之一。它允许模型在处理序列数据时，能够同时关注序列中的不同位置，从而捕获到丰富的上下文信息。下面，我们将详细解析多头自注意力机制的实现代码。一、概述多头自注意力机制的核心思想是将输入序列进行多次线性变换，然后分别计算自注意力得分，最后将所有头的输出进行拼接，并通
【prompt实战】知乎问题解答专家姚瑞南 prompt实战应用案例 prompt
本文原创作者：姚瑞南AI-agent大模型运营专家，先后任职于美团、猎聘等中大厂AI训练专家和智能运营专家岗；多年人工智能行业智能产品运营及大模型落地经验，拥有AI外呼方向国家专利与PMP项目管理证书。（转载需经授权）#Role:知乎问题解答分类专家##Profile:你是一个知乎问题解答分类专家，主要帮助用户解答各类领域专业问题，包括但不限于金融领域、职场问题、互联网领域、科技领域、人工智能领域
DeepSeek：全栈开发者视角下的AI革命者大富大贵7 程序员知识储备1 程序员知识储备2 程序员知识储备3 人工智能
DeepSeek：全栈开发者视角下的AI革命者写在前面随着人工智能（AI）技术的不断进步，AI已经成为各行各业创新的核心动力。从自动驾驶到智能制造，再到自然语言处理和图像识别，AI正在逐渐渗透并改变着我们的生活和工作方式。DeepSeek，作为AI领域的新兴技术，凭借其独特的技术架构和颠覆性的创新理念，成为了全栈开发者关注的焦点。本文将从全栈开发者的角度出发，详细解析DeepSeek的诞生、技术架
Assembly语言的自然语言处理花韵婷包罗万象 golang 开发语言后端
Assembly语言在自然语言处理中的应用引言自然语言处理（NaturalLanguageProcessing,NLP）作为人工智能的一个重要分支，致力于实现计算机与人类语言之间的互动。随着计算能力的提升以及大数据的蓬勃发展，NLP在各个领域的应用如火如荼。从语音识别、机器翻译到情感分析等，NLP正在改变我们与信息之间的互动方式。不过，当前主流的NLP研究通常是用高级编程语言（如Python、Ja
Julia语言的学习路线樟松包罗万象 golang 开发语言后端
Julia语言学习路线指南引言在编程语言层出不穷的今天，Julia作为一门新兴的高级编程语言，以其出色的性能和易用性逐渐获得了越来越多的关注。特别是在科学计算、数据分析和机器学习等领域，Julia的表现十分出色，成为研究人员和开发者的热门选择。本文将为希望学习Julia语言的读者提供一条详细的学习路线，包括基础知识、工具、库、项目和实践经验等，帮助大家有效地掌握这门语言。一、了解Julia语言在开
大语言模型原理与工程实践：大语言模型强化对齐 AGI大模型与大数据研究院 DeepSeek R1 &大数据AI人工智能计算科学神经计算深度学习神经网络大数据人工智能大型语言模型 AI AGI LLM Java Python 架构设计 Agent RPA
大语言模型原理与工程实践：大语言模型强化对齐作者：禅与计算机程序设计艺术/ZenandtheArtofComputerProgramming1.背景介绍1.1问题的由来随着人工智能技术的迅猛发展，大语言模型（LargeLanguageModels，LLMs）如GPT-3、LaMDA等，在自然语言处理（NLP）领域取得了显著的突破。这些模型在问答、翻译、文本生成等方面展现出惊人的能力，但同时也引发了
从图形处理到通用计算的进化之路绿算技术 GPU架构介绍科技 gpu算力
图形处理单元，作为现代计算机中不可或缺的一部分，已经从最初的图形渲染专用处理器，发展成为强大的并行计算引擎，广泛应用于人工智能、科学计算、游戏娱乐等领域。本文将深入探讨GPU架构的演变历程、核心组件以及其在不同应用场景中的优势。GPU架构的演变：从固定功能到可编程流水线早期的GPU采用固定功能流水线架构，专为图形渲染任务而设计。这种架构将图形渲染流程划分为一系列固定的阶段，例如顶点处理、光栅化、纹
高效快速教你deepseek如何进行本地部署并且可视化对话 chatgpt
随着最近一个新的人工智能deepseek的爆火，很多大佬都开始了在本地进行deepseek的部署操作，并且离线也可以使用，这里的话我就一步一步带你们部署本地的deepseek,说实话这个人工智能的实力不亚于openai的gpt安装ollama我们需要先安装ollama，安装地址ollama,我们直接点击下载，我们在下载的时候尽量使用我们的谷歌浏览器，有魔法的最好带上魔法，不然安装的时候可能会出问题
Python 爬虫实战：电影评论数据抓取与自然语言处理西攻城狮北 python 爬虫开发语言
引言作为一名对电影数据和自然语言处理感兴趣的内容创作者，我决定利用Python爬虫技术抓取IMDb上的电影评论数据，并进行自然语言处理分析。这不仅可以帮助我们了解观众对电影的反馈，还能为电影制作方提供有价值的参考。一、项目背景IMDb（互联网电影数据库）是全球最大的电影数据库，用户可以在上面查看电影信息和用户评论。本项目旨在爬取IMDb上的电影评论，并对评论进行自然语言处理（NLP），以提取情感、
useblackbox黑箱AI编码助理百态老人 python 开发语言
黑箱AI是一个人工智能的编码助理可以让代码快10倍。它使您能够把任何问题变成代码和功能,如从任何视频提取代码和代码自动完成。它有以下几个特点：可以从视频、图片、PDF等格式中复制代码。可以将任何问题转换成代码。可以快速找到任何代码片段，并开始编码。支持20多种编程语言的代码自动补全。有Chrome扩展和VSCode扩展。这个网站有不同的收费方案，根据你需要的功能和使用量而定。它有以下几种选择：好开
关于旗正规则引擎中的MD5加密问题何必如此 jsp MD5 规则加密
一般情况下，为了防止个人隐私的泄露，我们都会对用户登录密码进行加密，使数据库相应字段保存的是加密后的字符串，而非原始密码。在旗正规则引擎中，通过外部调用，可以实现MD5的加密，具体步骤如下： 1.在对象库中选择外部调用，选择“com.flagleader.util.MD5”，在子选项中选择“com.flagleader.util.MD5.getMD5ofStr({arg1})”； 2.在规
【Spark101】Scala Promise/Future在Spark中的应用 bit1129 Promise
Promise和Future是Scala用于异步调用并实现结果汇集的并发原语，Scala的Future同JUC里面的Future接口含义相同，Promise理解起来就有些绕。等有时间了再仔细的研究下Promise和Future的语义以及应用场景，具体参见Scala在线文档：http://docs.scala-lang.org/sips/completed/futures-promises.html
spark sql 访问hive数据的配置详解 daizj spark sql hive thriftserver
spark sql 能够通过thriftserver 访问hive数据，默认spark编译的版本是不支持访问hive，因为hive依赖比较多，因此打的包中不包含hive和thriftserver,因此需要自己下载源码进行编译，将hive，thriftserver打包进去才能够访问，详细配置步骤如下： 1、下载源码 2、下载Maven,并配置此配置简单，就略过
HTTP 协议通信周凡杨 java httpclient http 通信
一：简介 HTTPCLIENT，通过JAVA基于HTTP协议进行点与点间的通信！二：代码举例测试类： import java
java unix时间戳转换 g21121 java
把java时间戳转换成unix时间戳： Timestamp appointTime=Timestamp.valueOf(new SimpleDateFormat("yyyy-MM-dd HH:mm:ss").format(new Date())) SimpleDateFormat df = new SimpleDateFormat("yyyy-MM-dd hh:m
web报表工具FineReport常用函数的用法总结（报表函数）老A不折腾 web报表 finereport 总结
说明：本次总结中，凡是以tableName或viewName作为参数因子的。函数在调用的时候均按照先从私有数据源中查找，然后再从公有数据源中查找的顺序。 CLASS CLASS(object):返回object对象的所属的类。 CNMONEY CNMONEY(number,unit)返回人民币大写。 number:需要转换的数值型的数。 unit:单位，
java jni调用c++ 代码报错墙头上一根草 java C++jni
# # A fatal error has been detected by the Java Runtime Environment: # # EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x00000000777c3290, pid=5632, tid=6656 # # JRE version: Java(TM) SE Ru
Spring中事件处理de小技巧 aijuans spring Spring 教程 Spring 实例 Spring 入门 Spring3
Spring 中提供一些Aware相关de接口，BeanFactoryAware、 ApplicationContextAware、ResourceLoaderAware、ServletContextAware等等，其中最常用到de匙ApplicationContextAware.实现ApplicationContextAwaredeBean，在Bean被初始后，将会被注入 Applicati
linux shell ls脚本样例 annan211 linux linux ls源码 linux 源码
#! /bin/sh - #查找输入文件的路径 #在查找路径下寻找一个或多个原始文件或文件模式 # 查找路径由特定的环境变量所定义 #标准输出所产生的结果通常是查找路径下找到的每个文件的第一个实体的完整路径 # 或是filename :not found 的标准错误输出。 #如果文件没有找到则退出码为0 #否则即为找不到的文件个数 #语法 pathfind [--
List,Set,Map遍历方式 (收集的资源,值得看一下) 百合不是茶 list set Map遍历方式
List特点：元素有放入顺序，元素可重复 Map特点：元素按键值对存储，无放入顺序 Set特点：元素无放入顺序，元素不可重复（注意：元素虽然无放入顺序，但是元素在set中的位置是有该元素的HashCode决定的，其位置其实是固定的） List接口有三个实现类：LinkedList，ArrayList，Vector LinkedList：底层基于链表实现，链表内存是散乱的，每一个元素存储本身
解决SimpleDateFormat的线程不安全问题的方法 bijian1013 java thread 线程安全
在Java项目中，我们通常会自己写一个DateUtil类，处理日期和字符串的转换，如下所示： public class DateUtil01 { private SimpleDateFormat dateformat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss"); public void format(Date d
http请求测试实例（采用fastjson解析） bijian1013 http 测试
在实际开发中，我们经常会去做http请求的开发，下面则是如何请求的单元测试小实例，仅供参考。 import java.util.HashMap; import java.util.Map; import org.apache.commons.httpclient.HttpClient; import
【RPC框架Hessian三】Hessian 异常处理 bit1129 hessian
RPC异常处理概述 RPC异常处理指是，当客户端调用远端的服务，如果服务执行过程中发生异常，这个异常能否序列到客户端？如果服务在执行过程中可能发生异常，那么在服务接口的声明中，就该声明该接口可能抛出的异常。在Hessian中，服务器端发生异常，可以将异常信息从服务器端序列化到客户端，因为Exception本身是实现了Serializable的
【日志分析】日志分析工具 bit1129 日志分析
1. 网站日志实时分析工具 GoAccess http://www.vpsee.com/2014/02/a-real-time-web-log-analyzer-goaccess/ 2. 通过日志监控并收集 Java 应用程序性能数据(Perf4J) http://www.ibm.com/developerworks/cn/java/j-lo-logforperf/ 3.log.io 和
nginx优化加强战斗力及遇到的坑解决 ronin47 nginx 优化
　　　先说遇到个坑，第一个是负载问题，这个问题与架构有关，由于我设计架构多了两层，结果导致会话负载只转向一个。解决这样的问题思路有两个：一是改变负载策略，二是更改架构设计。　　　由于采用动静分离部署，而nginx又设计了静态，结果客户端去读nginx静态，访问量上来，页面加载很慢。解决：二者留其一。最好是保留apache服务器。　　　来以下优化：　　　
java-50-输入两棵二叉树A和B，判断树B是不是A的子结构 bylijinnan java
思路来自： http://zhedahht.blog.163.com/blog/static/25411174201011445550396/ import ljn.help.*; public class HasSubtree { /**Q50. * 输入两棵二叉树A和B，判断树B是不是A的子结构。例如，下图中的两棵树A和B，由于A中有一部分子树的结构和B是一
mongoDB 备份与恢复开窍的石头 mongDB备份与恢复
Mongodb导出与导入 1: 导入/导出可以操作的是本地的mongodb服务器,也可以是远程的. 所以,都有如下通用选项: -h host 主机 --port port 端口 -u username 用户名 -p passwd 密码 2: mongoexport 导出json格式的文件
[网络与通讯]椭圆轨道计算的一些问题 comsci 网络
如果按照中国古代农历的历法，现在应该是某个季节的开始，但是由于农历历法是3000年前的天文观测数据，如果按照现在的天文学记录来进行修正的话，这个季节已经过去一段时间了。。。。。也就是说，还要再等3000年。才有机会了，太阳系的行星的椭圆轨道受到外来天体的干扰，轨道次序发生了变
软件专利如何申请 cuiyadll 软件专利申请
软件技术可以申请软件著作权以保护软件源代码，也可以申请发明专利以保护软件流程中的步骤执行方式。专利保护的是软件解决问题的思想，而软件著作权保护的是软件代码（即软件思想的表达形式）。例如，离线传送文件，那发明专利保护是如何实现离线传送文件。基于相同的软件思想，但实现离线传送的程序代码有千千万万种，每种代码都可以享有各自的软件著作权。申请一个软件发明专利的代理费大概需要5000-8000申请发明专利可
Android学习笔记 darrenzhu android
1.启动一个AVD 2.命令行运行adb shell可连接到AVD,这也就是命令行客户端 3.如何启动一个程序 am start -n package name/.activityName am start -n com.example.helloworld/.MainActivity 启动Android设置工具的命令如下所示： # am start -
apache虚拟机配置，本地多域名访问本地网站 dcj3sjt126com apache
现在假定你有两个目录，一个存在于 /htdocs/a，另一个存在于 /htdocs/b 。现在你想要在本地测试的时候访问 www.freeman.com 对应的目录是 /xampp/htdocs/freeman ,访问 www.duchengjiu.com 对应的目录是 /htdocs/duchengjiu。 1、首先修改C盘WINDOWS\system32\drivers\etc目录下的
yii2 restful web服务[速率限制] dcj3sjt126com PHP yii2
速率限制为防止滥用，你应该考虑增加速率限制到您的API。例如，您可以限制每个用户的API的使用是在10分钟内最多100次的API调用。如果一个用户同一个时间段内太多的请求被接收，将返回响应状态代码 429 (这意味着过多的请求)。要启用速率限制, [[yii\web\User::identityClass|user identity class]] 应该实现 [[yii\filter
Hadoop2.5.2安装——单机模式 eksliang hadoop hadoop单机部署
转载请出自出处：http://eksliang.iteye.com/blog/2185414 一、概述 Hadoop有三种模式单机模式、伪分布模式和完全分布模式，这里先简单介绍单机模式，默认情况下，Hadoop被配置成一个非分布式模式，独立运行JAVA进程，适合开始做调试工作。二、下载地址 Hadoop 网址http:
LoadMoreListView+SwipeRefreshLayout（分页下拉）基本结构 gundumw100 android
一切为了快速迭代 import java.util.ArrayList; import org.json.JSONObject; import android.animation.ObjectAnimator; import android.os.Bundle; import android.support.v4.widget.SwipeRefreshLayo
三道简单的前端HTML/CSS题目 ini html Web 前端 css 题目
使用CSS为多个网页进行相同风格的布局和外观设置时，为了方便对这些网页进行修改，最好使用（）。http://hovertree.com/shortanswer/bjae/7bd72acca3206862.htm 在HTML中加入<table style=”color:red; font-size:10pt”>，此为（）。http://hovertree.com/s
overrided方法编译错误 kane_xie override
问题描述：在实现类中的某一或某几个Override方法发生编译错误如下： Name clash: The method put(String) of type XXXServiceImpl has the same erasure as put(String) of type XXXService but does not override it 当去掉@Over
Java中使用代理IP获取网址内容（防IP被封，做数据爬虫） mcj8089 免费代理IP 代理IP 数据爬虫 JAVA设置代理IP 爬虫封IP
推荐两个代理IP网站： 1. 全网代理IP：http://proxy.goubanjia.com/ 2. 敲代码免费IP：http://ip.qiaodm.com/ Java语言有两种方式使用代理IP访问网址并获取内容，方式一，设置System系统属性 // 设置代理IP System.getProper
Nodejs Express 报错之 listen EADDRINUSE qiaolevip 每天进步一点点学习永无止境 nodejs 纵观千象
当你启动 nodejs服务报错： >node app Express server listening on port 80 events.js:85 throw er; // Unhandled 'error' event ^ Error: listen EADDRINUSE at exports._errnoException (
C++中三种new的用法 _荆棘鸟_ C++new
转载自：http://news.ccidnet.com/art/32855/20100713/2114025_1.html 作者: mt 其一是new operator，也叫new表达式；其二是operator new，也叫new操作符。这两个英文名称起的也太绝了，很容易搞混，那就记中文名称吧。new表达式比较常见，也最常用，例如： string* ps = new string("
Ruby深入研究笔记1 wudixiaotie Ruby
module是可以定义private方法的 module MTest def aaa puts "aaa" private_method end private def private_method puts "this is private_method" end end