shelley__huang

NATURAL LANGUAGE INFERENCE OVER INTERACTION SPACE

文章目录

ABSTRACT
1 INTRODUCTION
2 RELATED WORK
3 MODEL

3.1 INTERACTIVE INFERENCE NETWORK
3.2 DENSELY INTERACTIVE INFERENCE NETWORK

4 EXPERIMENTS

4.1 DATA
4.2 EXPERIMENTS SETTING
4.3 EXPERIMENT ON MULTINLI
4.4 EXPERIMENT ON SNLI

ABSTRACT

Natural Language Inference (NLI) task requires an agent to determine the logical relationship between a natural language premise and a natural language hypothesis. We introduce Interactive Inference Network (IIN), a novel class of neural network architectures that is able to achieve high-level understanding of the sentence pair by hierarchically extracting semantic features from interaction space. We show that an interaction tensor (attention weight) contains semantic information to solve natural language inference, and a denser interaction tensor contains richer semantic information. One instance of such architecture, Densely Interactive Inference Network (DIIN), demonstrates the state-of-the-art performance on large scale NLI copora and large-scale NLI alike corpus. It’s noteworthy that DIIN achieve a greater than 20% error reduction on the challenging Multi-Genre NLI (MultiNLI; Williams et al. 2017) dataset with respect to the strongest published system.
自然语言推理（NLI）任务要求代理确定自然语言前提和自然语言假设之间的逻辑关系。我们引入了交互式推理网络（IIN），这是一种新颖的神经网络体系结构，能够通过从交互空间中分层提取语义特征来实现对句子对的高级理解。我们证明了交互张量（注意权重）包含语义信息来解决自然语言推理，而更密集的交互张量包含更丰富的语义信息。这种架构的一个实例，密集交互推理网络（DIIN），展示了大规模NLI copora和大规模NLI相似语料库的最先进性能。值得注意的是，对于最强的已发布系统，DIIN在具有挑战性的多类型NLI（MultiNLI; Williams等人2017）数据集上实现了大于20％的误差减少。

1 INTRODUCTION

Natural Language Inference (NLI also known as recognizing textual entiailment, or RTE) task requires one to determine whether the logical relationship between two sentences is among entailment (if the premise is true, then the hypothesis must be true), contradiction (if the premise is true, then the hypothesis must be false) and neutral (neither entailment nor contradiction). NLI is known as a fundamental and yet challenging task for natural language understanding(Williams et al., 2017), not only because it requires one to identify the language pattern, but also to understand certain common sense knowledge. In Table 1, three samples from MultiNLI corpus show solving the task requires one to handle the full complexity of lexical and compositional semantics. The previous work on NLI (or RTE) has extensively researched on conventional approaches(Fyodorov et al., 2000; Bos & Markert, 2005; MacCartney & Manning, 2009). Recent progress on NLI is enabled by the availability of 570k human annotated dataset(Bowman et al., 2015) and the advancement of representation learning technique.
自然语言推理（NLI也称为识别文本启动或RTE）任务需要人们确定两个句子之间的逻辑关系是否在蕴涵之间（如果前提是真的，那么假设必须是真的），矛盾（如果前提是是真的，那么假设必须是假的）和中立的（既不是蕴涵也不是矛盾）。 NLI被认为是自然语言理解的一项基本而又具有挑战性的任务（Williams et al。，2017），不仅因为它需要一个人来识别语言模式，而且还需要理解某些常识知识。在表1中，来自MultiNLI语料库的三个样本显示解决任务需要一个来处理词汇和组合语义的完整复杂性。之前关于NLI（或RTE）的研究已经对传统方法进行了广泛的研究（Fyodorov等，2000; Bos＆Markert，2005; MacCartney＆Manning，2009）。最近在NLI上的进展得益于570k人类注释数据集的可用性（Bowman等，2015）和表示学习技术的进步。

Among the core representation learning techniques, attention mechanism is broadly applied in many NLU tasks since its introduction: machine translation(Bahdanau et al., 2014), abstractive summarization(Rush et al., 2015), Reading Comprehension(Hermann et al., 015), dialog system(Mei et al., 2016), etc. As described by Vaswani et al. (2017), “An attention function can be described as mapping a query and a set of key-value pairs to an output, where the query, keys, values, and output are all vectors. The output is computed as a weighted sum of the values, where the weight assigned to each value is computed by a compatibility function of the query with the corresponding key”. Attention mechanism is known for its alignment between representations, focusing one part of representation over another, and modeling the dependency regardless of sequence length. Observing attention’s powerful capability, we hypothesize that the attention weight can assist with machine to understanding the text.
在核心表征学习技术中，注意机制自引入以来广泛应用于许多NLU任务：机器翻译（Bahdanau等，2014），抽象概括（Rush等，2015），阅读理解（Hermann等， 015），对话系统（Mei等，2016）等。如Vaswani等人所述。（2017），“注意功能可以描述为将查询和一组键值对映射到输出，其中查询，键，值和输出都是向量。输出计算为值的加权和，其中分配给每个值的权重由查询与相应密钥“的兼容性函数计算。已知注意机制在表示之间的对齐，将表示的一部分聚焦在另一部分上，并且无论序列长度如何都对依赖性建模。观察注意力的强大能力，我们假设注意力可以帮助机器理解文本。

A regular attention weight, the core component of the attention mechanism, encodes the crosssentence word relationship into a alignment matrix. However, a multi-head attention weightVaswani et al. (2017) can encode such interaction into multiple alignment matrices, which shows a more powerful alignment. In this work, we push the multi-head attention to a extreme by building a word-by-word dimension-wise alignment tensor which we call interaction tensor. The interaction tensor encodes the high-order alignment relationship between sentences pair. Our experiments demonstrate that by capturing the rich semantic features in the interaction tensor, we are able to solve natural language inference task well, especially in cases with paraphrase, antonyms and overlapping words.
定期注意权重是注意机制的核心组成部分，它将交叉词关系编码为对齐矩阵。然而，多头注意力的重量是瓦斯瓦尼等人。（2017）可以将这种交互编码成多个对齐矩阵，这显示出更强大的对齐。在这项工作中，我们通过建立一个逐字的尺寸方向对齐张量将多头注意力推向极端，我们将其称为交互张量。交互张量编码句子对之间的高阶对齐关系。我们的实验表明，通过捕获交互张量中丰富的语义特征，我们能够很好地解决自然语言推理任务，尤其是在具有释义，反义词和重叠词的情况下。

We dub the general framework as Interactive Inference Network(IIN). To the best of our knowledge, it is the first attempt to solve natural language inference task in the interaction space. We further explore one instance of Interactive Inference Network, Densely Interactive Inference Network (DIIN), which achieves new state-of-the-art performance on both SNLI and MultiNLI copora. To test the generality of the architecture, we interpret the paraphrase identification task as natural language inference task where matching as entailment, not-matching as neutral. We test the model on Quora Question Pair dataset, which contains over 400k real world question pair, and achieves new state-of-the-art performance.
我们将通用框架称为交互式推理网络（IIN）。据我们所知，它是在交互空间中首次尝试解决自然语言推理任务。我们进一步探索了交互式推理网络，密集交互式推理网络（DIIN）的一个实例，它在SNLI和MultiNLI copora上实现了最新的最先进性能。为了测试架构的一般性，我们将复述识别任务解释为自然语言推理任务，其中匹配为蕴涵，不匹配为中性。我们在Quora Question Pair数据集上测试模型，该数据集包含超过400k的真实世界问题对，并实现了新的最先进的性能。

We introduce the related work in Section 2, and discuss the general framework of IIN along with a specific instance that enjoys state-of- the-art performance on multiple datasets in Section 3. We describe experiments and analysis in Section 4. Finally, we conclude and discuss future work in Section 5.
我们在第2节介绍了相关的工作，并讨论了IIN的一般框架以及在第3节中对多个数据集具有最新性能的特定实例。我们在第4节中描述了实验和分析。最后，我们总结并讨论第5节中的未来工作。

2 RELATED WORK

The early exploration on NLI mainly rely on conventional methods and small scale datasets(Marelli et al., 2014). The availability of SNLI dataset with 570k human annotated sentence pairs has enabled a good deal of progress on natural language understanding. The essential representation learning techniques for NLU such as attention(Wang & Jiang, 2015), memory(Munkhdalai & Yu, 2016) and the use of parse structure(Bowman et al., 2016; Mou et al., 2015) are studied on the SNLI which serves as an important benchmark for sentence understanding. The models trained on NLI task can be divided into two categories:
(i) sentence encoding-based model which aims to find vector representation for each sentence and classifies the relation by using the concatenation of two vector representation along with their absolute element-wise difference and element-wise product(Bowman et al., 2016; Vendrov et al., 2015; Mou et al., 2015; Liu et al., 2016; Munkhdalai & Yu, 2016).
(ii) Joint feature models which use the cross sentence feature or attention from one sentence to another(Rocktaschel et al., 2015; Wang & Jiang, 2015; Cheng et al., 2016; Parikh et al., 2016; Wang et al., 2017; Yu & Munkhdalai, 2017; Sha et al., 2016).
对NLI的早期探索主要依赖于传统方法和小规模数据集（Marelli等，2014）。具有570k人类注释句子对的SNLI数据集的可用性使得在自然语言理解方面取得了很大进展。研究了NLU的基本表征学习技术，如注意力（Wang＆Jiang，2015），记忆（Munkhdalai＆Yu，2016）和解析结构的使用（Bowman et al，2016; Mou et al，2015）。 SNLI是句子理解的重要基准。在NLI任务上训练的模型可以分为两类：
（i）基于句子编码的模型，其旨在找到每个句子的向量表示，并通过使用两个向量表示的级联以及它们的绝对元素差异和元素相乘来对关系进行分类（Bowman et al，2016; Vendrov et al，2015; Mou et al，2015; Liu et al，2016; Munkhdalai＆Yu，2016）。
（ii）使用交叉句特征或从一个句子到另一个句子的关注的联合特征模型（Rocktaschel等，2015; Wang＆Jiang，2015; Cheng等，2016; Parikh等，2016;Wang等。，2017; Yu＆Munkhdalai，2017; Sha et al，2016）。

After neural attention mechanism is successfully applied on the machine translation task, such technique has became widely used in both natural language process and computer vision domains. Many variants of attention technique such as hard- attention(Xu et al., 2015), self-attention(Parikh et al., 2016), multi-hop attention(Gong & Bowman, 2017), bidirectional attention(Seo et al., 2016) and multi-head attention(Vaswani et al., 2017) are also introduced to tackle more complicated tasks. Before this work, neural attention mechanism is mainly used to make alignment, focusing on specific part of the representation. In this work, we want to show that attention weight contains rich semantic information required for understanding the logical relationship between sentence pair.
神经注意机制成功应用于机器翻译任务后，这种技术已广泛应用于自然语言过程和计算机视觉领域。注意力技术的许多变种，如强烈关注（Xu et al，2015），自我关注（Parikh等，2016），多跳注意（Gong＆Bowman，2017），双向注意（Seo et al ，2016）和多头注意（Vaswani等，2017）也被引入以解决更复杂的任务。在此工作之前，神经注意机制主要用于进行对齐，侧重于表示的特定部分。在这项工作中，我们希望表明注意力量包含理解句子对之间逻辑关系所需的丰富语义信息。

Though RNN or LSTM are very good for variable length sequence modeling, using Convolutional neural network in NLU tasks is very desirable because of its parallelism in computation. Convolutional structure has been successfully applied in various domain such as machine translation(Gehring et al., 2017), sentence classification(Kim, 2014), text matching(Hu et al., 2014) and sentiment analysis(Kalchbrenner et al., 2014), etc. The convolution structure is also applied on different level of granularity such as byte(Zhang & LeCun, 2017), character(Zhang et al., 2015), word(Gehring et al., 2017) and sentences(Mou et al., 2015) levels.
尽管RNN或LSTM非常适用于可变长度序列建模，但由于其在计算中的并行性，因此在NLU任务中使用卷积神经网络是非常期望的。卷积结构已成功应用于各种领域，如机器翻译（Gehring等，2017），句子分类（Kim，2014），文本匹配（Hu et al，2014）和情感分析（Kalchbrenner等，2014）卷积结构也应用于不同的粒度级别，如byte（Zhang＆LeCun，2017），character（Zhang et al，2015），word（Gehring et al，2017）和sentence（Mou 等，2015）水平。

3 MODEL

3.1 INTERACTIVE INFERENCE NETWORK

The Interactive Inference Network (IIN) is a hierarchical multi-stage process and consists of five components. Each of the components is compatible with different type of implementations. Potentially all exiting approaches in machine learning, such as decision tree, support vector machine and neural network approach, can be transfer to replace certain component in this architecture. We focus on neural network approaches below. Figure 1 provides a visual illustration of Interactive Inference Network.
交互式推理网络（IIN）是一个分层的多阶段过程，由五个部分组成。每个组件都与不同类型的实现兼容。机器学习中可能存在的所有现有方法，例如决策树，支持向量机和神经网络方法，都可以转移以替换该体系结构中的某些组件。我们专注于下面的神经网络方法。图1提供了交互式推理网络的可视化图示。

Embedding Layer converts each word or phrase to a vector representation and construct the representation matrix for sentences. In embedding layer, a model can map tokens to vectors with the pre-trained word representation such as GloVe(Pennington et al., 2014), word2Vec(Mikolov et al., 2013) and fasttext(Joulin et al., 2016). It can also utilize the preprocessing tool, e.g. named entity recognizer, part-of-speech recognizer, lexical parser and coreference identifier etc., to incorporate more lexical and syntactical information into the feature vector.
嵌入层将每个单词或短语转换为矢量表示，并构造句子的表示矩阵。在嵌入层中，模型可以将令牌映射到具有预训练单词表示的向量，例如GloVe（Pennington等人，2014），word2Vec（Mikolov等人，2013）和fasttext（Joulin等人，2016）。它还可以利用预处理工具，例如，命名实体识别器，词性识别器，词法解析器和共同识别器等，将更多的词法和句法信息合并到特征向量中。
Encoding Layer encodes the representations by incorporating the context information or enriching the representation with desirable features for future use. For instance, a model can adopt bidirectional recurrent neural network to model the temporal interaction on both direction, recursive neural network(Socher et al., 2011) (also known as TreeRNN) to model the compositionality and the recursive structure of language, or self-attention to model the long-term dependency on sentence. Different components of encoder can be combined to obtain a better sentence matrix representation.
编码层通过合并上下文信息或者使用期望的特征丰富表示以供将来使用来对表示进行编码。例如，一个模型可以采用双向递归神经网络来模拟两个方向上的时间相互作用，递归神经网络（Socher et al，2011）（也称为TreeRNN）来模拟语言的组合性和递归结构，或者自我 - 注重模仿对句子的长期依赖。可以组合编码器的不同组件以获得更好的句子矩阵表示。
Interaction Layer creates an word-by-word interaction tensor by both premise and hypothesis representation matrix. The interaction can be modeled in different ways. A common approach is to compute the cosine similarity or dot product between each pair of feature vector. On the other hand, a high-order interaction tensor can be constructed with the outer product between two matrix representations.
交互层通过前提和假设表示矩阵创建逐字交互张量。可以以不同方式对交互进行建模。常见的方法是计算每对特征向量之间的余弦相似性或点积。另一方面，可以利用两个矩阵表示之间的外积来构造高阶交互张量。
Feature Extraction Layer adopts feature extractor to extract the semantic feature from interaction tensor. The convolutional feature extractors, such as AlexNet(Krizhevsky et al., 2012), VGG(Simonyan & Zisserman, 2014), Inception(Szegedy et al., 2014), ResNet(He et al., 2016) and DenseNet(Huang et al., 2016), proven work well on image recognition are completely compatible under such architecture. Unlike the work (Kim, 2014; Zhang et al., 2015) who employs 1-D sliding window, our CNN architecture allows 2-D kernel to extract semantic interaction feature from the word-by-word interaction between n-gram pair. Sequential or tree-like feature extractors are also applicable in the feature extraction layer.
特征提取层采用特征提取器从交互张量中提取语义特征。卷积特征提取器，例如AlexNet（Krizhevsky等人，2012），VGG（Simonyan＆Zisserman，2014），Inception（Szegedy等人，2014），ResNet（He等人，2016）和DenseNet（Huang等）。 al。，2016），在图像识别方面的成熟工作在这种架构下是完全兼容的。与使用1-D滑动窗口的工作（Kim，2014; Zhang et al，2015）不同，我们的CNN架构允许2-D内核从n-gram对之间的逐字交互中提取语义交互特征。顺序或树状特征提取器也适用于特征提取层。
Output Layer decodes the acquired features to give prediction. Under the setting of NLI, the output layer predicts the confidence on each class.
输出层对获取的特征进行解码以进行预测。在NLI的设置下，输出层预测每个类的置信度。

3.2 DENSELY INTERACTIVE INFERENCE NETWORK

Here we introduce Densely Interactive Inference Network (DIIN), which is a relatively simple instantiation of IIN but produces state-of-the-art performance on multiple datasets.
这里我们介绍密集交互式推理网络（DIIN），这是一个相对简单的IIN实例，但在多个数据集上产生了最先进的性能。

Embedding Layer:
For DIIN, we use the concatenation of word embedding, character feature and syntactical features. The word embedding is obtained by mapping token to high dimensional vector space by pre-trained word vector (840B GloVe). The word embedding is updated during training. As in (Kim et al., 2016; Lee et al., 2016), we filter character embedding with 1D convolution kernel. The character convolutional feature maps are then max pooled over time dimension for each token to obtain a vector. The character features supplies extra information for some out-ofvocabulary (OOV) words. Syntactical features include one-hot part-of-speech (POS) tagging feature and binary exact match (EM) feature. The EM value is activated if there are tokens with same stem or lemma in the other sentence as the corresponding token. The EM feature is simple while found useful as in reading comprehension task (Chen et al., 2017a). In analysis section, we study how EM feature helps text understanding. Now we have premise representation $\in R^{p×d}$ and hypothesis representation $\in R^{h×d}$ , where p refers to the sequence length of premise, h refers to the sequence length of hypothesis and d means the dimension of both representation. The 1-D convolutional neural network and character features weights share the same set of parameters between premise and hypothesis.
对于DIIN，我们使用单词嵌入，字符特征和句法特征的串联。通过预训练的单词向量（840B GloVe）将令牌映射到高维向量空间来获得单词嵌入。嵌入一词在培训期间更新。如（Kim et al，2016; Lee et al，2016），我们使用1D卷积核过滤字符嵌入。然后，对于每个标记，字符卷积特征图在时间维度上最大化，以获得向量。角色特征为一些词汇外（OOV）词提供额外信息。语法特征包括单热词性（POS）标记功能和二进制精确匹配（EM）功能。如果在另一个句子中有与相应标记具有相同词干或引理的标记，则激活EM值。 EM功能很简单，但在阅读理解任务中很有用（Chen et al，2017a）。在分析部分，我们研究EM特征如何帮助文本理解。现在我们有前提表示 $\in R^{p×d}$ 和假设表示 $\in R^{h×d}$ ，其中p表示前提的序列长度，h表示假设的序列长度，d表示两个表示的维度。 1-D卷积神经网络和角色特征权重在前提和假设之间共享相同的参数集。

Encoding Layer:
In the encoding layer, the premise representation $P$ and the hypothesis representation $H$ are passed through a two-layer highway network, thus having $\hat P \in R^ {p×d}$ and $\hat H \in R^{h×d}$ for new premise representation and new hypothesis representation. These new representation are then passed to self-attention layer to take into account the word order and context information. Take premise as example, we model self-attention by
$A_{ij}=\alpha(\hat P_i), \hat P_i \in R$ (1)
$\overline P_i = \sum^p_{j=1}\frac{exp(A_{ij})}{\sum^p_{k=1}exp(A_{kj})} \hat P_j, \\ \forall i, j \in [1, ..., p] \tag 2$
在编码层中，前提表示 $P$ 和假设表示 $H$ 通过双层公路网络，因此具有 $\hat P \in R^ {p×d}$ 和 $\hat H \in R^{h×d}$ 用于新的前提表示和新的假设表示。然后将这些新表示传递给自我关注层以考虑词序和上下文信息。以前提为例，我们通过模拟自我关注

where $\overline P_i$ is a weighted summation of $\hat P$ . We choose $\alpha(a ,b)=w^T_a[a;b;a \circ b]$ where $w_a \in R^{3d}$ is a trainable weight, $\circ$ is element-wise multiplication, $[;]$ is vector concatenation across row, and the implicit multiplication is matrix multiplication. Then both $\hat P$ and $\overline P$ are fed into a semantic composite fuse gate (fuse gate in short), which acts as a skip connection. The fuse gate is implemented as
$z_i=tanh(W^{1T}[\hat P_i;\overline P_i]+b^1)$ (3)
$r_i=\sigma(W^{2T}[\hat P_i; \overline P_i]+b^2)$ (4)
$f_i=\sigma(W^{3T}[\hat P_i;\overline P_i+b^3])$ (5)
$\tilde P=r_i \circ \hat P_i+f_i \circ z_i$
其中 $\overline P_i$ 是 $\hat P$ 的加权和。我们选择 $\alpha(a ,b)=w^T_a[a;b;a \circ b]$ ，其中 $w_a \in R^{3d}$ 是可训练的权重， $\circ$ 是逐元素乘法， $[;]$ 是跨行的向量连接，并且隐式乘法是矩阵乘法。然后，P ^和P被馈送到语义复合熔丝门（简称熔丝门），其用作跳过连接。熔丝门实现为

where $W^1$ , $W^2$ , $W^3 \in R^{2d×d}$ and $b^1$ , $b^2$ , $b^3 \in R^d$ are trainable weights, $s i g m a$ is sigmoid nonlinear operation.
其中 $W^1$ , $W^2$ , $W^3 \in R^{2d×d}$ 和 $b^1$ , $b^2$ , $b^3 \in R^d$ 是可训练的权重，σ是S形非线性运算。

We do the same operation on hypothesis representation, thus having $\tilde H$ . The weights of intraattention and fuse gate for premise and hypothesis are not shared, but the difference between the weights of are penalized. The penalization aims to ensure the parallel structure learns the similar functionality but is aware of the subtle semantic difference between premise and hypothesis.
我们对假设表示进行相同的操作，因此具有 $\tilde H$ 。前提和假设的内部注意力和熔断器门的权重不共享，但权重之间的差异受到惩罚。惩罚旨在确保并行结构学习类似功能，但意识到前提和假设之间的微妙语义差异。

Interaction Layer:
The interaction layer models the interaction between premise encoded representation $P^{enc}$ and hypothesis encoded representation $H^{enc}$ as follows:
$I_{ij}=\beta (\tilde P_i, \tilde H_j) \in R^d, \forall i \in [1, ..., p], \forall \in [1, ..., h]$ (7)
交互层模拟前提编码表示 $P^{enc}$ 与假设编码表示 $H^{enc}$ 之间的交互，如下所示：

where $\tilde P_i$ is the $i$ -th row vector of $\tilde P$ , and $\tilde H_j$ is the $j$ -th row vector of $\tilde H$ . Though there are many implementations of interaction, we find $\beta(a, b)=a \circ b$ very useful.
其中 $\tilde P_i$ 是 $\tilde P$ 的第 $i$ 行向量， $\tilde H_j$ 是 $\tilde H$ 的第 $j$ 行向量。虽然有许多交互实现，但我们发现 $\beta(a, b)=a \circ b$ 非常有用。

Feature Extraction Layer:
We adopt DenseNet(Huang et al., 2016) as convolutional feature extractor in DIIN. Though our experiments show ResNet(He et al., 2016) works well in the architecture, we choose DenseNet because it is effective in saving parameters. One interesting observation with ResNet is that if we remove the skip connection in residual structure, the model does not converge at all. We found batch normalization delays convergence without contributing to accuracy, therefore we does not use it in our case. A ReLU activation function is applied after all convolution unless otherwise noted. Once we have the interaction tensor $I$ , we use a convolution with 1 × 1 kernel to scale down the tensor in a ratio, $\eta$ , without following ReLU. If the input channel is $k$ then the output channel is $\eta)$ . Then the generated feature map is feed into three sets of Dense block(Huang et al., 2016) and transition block pair. The DenseNet block contains n layers of 3 × 3 convolution layer with growth rate of $g$ . The transition layer has a convolution layer with 1 × 1 kernel for scaling down purpose, followed by a max pooling layer with stride 2. The transition scale down ratio in transition layer is $\theta$ .
我们采用DenseNet（Huang et al。，2016）作为DIIN中的卷积特征提取器。虽然我们的实验表明ResNet（He et al。，2016）在架构中运行良好，但我们选择DenseNet是因为它在保存参数方面很有效。 ResNet的一个有趣观察是，如果我们在残差结构中删除跳过连接，则模型根本不会收敛。我们发现批量归一化会延迟收敛而不会影响准确性，因此我们不会在我们的情况下使用它。除非另有说明，否则在所有卷积之后应用ReLU激活函数。一旦我们得到了相互作用张量 $I$ ，我们就使用1×1内核的卷积来按照比率 $\eta$ 缩小张量，而不遵循ReLU。如果输入通道是 $k$ ，则输出通道是 $floor(k×\eta)$ 。然后将生成的特征映射输入三组密集块（Huang等，2016）和转换块对。 DenseNet块包含n层3×3卷积层，生长速率为 $g$ 。过渡层具有用于缩小目的的具有1×1内核的卷积层，随后是具有步幅2的最大池化层。过渡层中的过渡缩小比率是 $\theta$ 。

Output Layer:
DIIN uses a linear layer to classify final flattened feature representation to three classes.
DIIN使用线性层将最终展平的特征表示分类为三个类。

4 EXPERIMENTS

In this section, we present the evaluation of our model. We first perform quantitative evaluation, comparing our model with other competitive models. We then conduct some qualitative analyses to understand how DIIN achieve the high level understanding through interaction.
在本节中，我们将介绍我们模型的评估。我们首先进行定量评估，将我们的模型与其他竞争模型进行比较然后，我们进行一些定性分析，以了解DIIN如何通过互动实现高水平的理解。

4.1 DATA

Here we introduce three datasets we evaluate our model on. The evaluation metric for all dataset is accuracy.
这里我们介绍我们评估模型的三个数据集。所有数据集的评估指标都是准确性。

SNLI
Stanford Natural Language Inference (SNLI; Bowman et al. 2015) has 570k human annotated sentence pairs. The premise data is draw from the captions of the Flickr30k corpus, and the hypothesis data is manually composed. The labels provided in are “entailment”, “neutral’, “contradiction” and “-”. “-” shows that annotators cannot reach consensus with each other, thus removed during training and testing as in other works. We use the same data split as in Bowman et al. (2015).
斯坦福自然语言推理（SNLI; Bowman2015等人）有570k人类注释句子对。前提数据来自Flickr30k语料库的标题，假设数据是手动编写的。提供的标签是“蕴涵”，“中立”，“矛盾”和“ - ”。 “ - ”表示注释者无法达成共识，因此在训练和测试过程中与其他工作一样被删除。我们使用与Bowman（2015年）等人相同的数据分割。

MultiNLI
Multi-Genre NLI Corpus (MultiNLI; Williams et al. 2017) has 433k sentence pairs, whose collection process and task detail are modeled closely to SNLI. The premise data is collected from maximally broad range of genre of American English such as written non-fiction genres (SLATE, OUP, GOVERNMENT, VERBATIM, TRAVEL), spoken genres (TELEPHONE, FACETO-FACE), less formal written genres (FICTION, LETTERS) and a specialized one for 9/11. Half of these selected genres appear in training set while the rest are not, creating in-domain (matched) and cross-domain (mismatched) development/test sets. We use the same data split as provided by Williams et al. (2017). Since test set labels are not provided, the test performance is obtained through submission on Kaggle.com2. Each team is limited to two submissions per day.
多类型NLI语料库（MultiNLI; Williams等人2017）拥有433k个句子对，其收集过程和任务细节与SNLI密切相关。前提数据来自美国英语最广泛的类型，如书面非小说类型（SLATE，OUP，政府，VERBATIM，TRAVEL），口语类型（TELEPHONE，FACETO-FACE），不太正式的书面类型（FICTION，信件）和专门用于9/11事件的信件。这些选定类型中的一半出现在训练集中，而其余部分则不出现，创建域内（匹配）和跨域（不匹配）开发/测试集。我们使用Williams（2017年）等人提供的相同数据分割。由于未提供测试集标签，因此可通过Kaggle.com2上的提交获得测试性能。每个团队每天限制提交两份。

Quora question pair
Quora question pair dataset contains over 400k real world question pair selected from Quora.com. A binary annotation which stands for match (duplicate) or not match (not duplicate) is provided for each question pair. In our case, duplicate question pair can be interpreted as entailment relation and not duplicate as neutral. We use the same split ratio as mentioned in (Wang et al., 2017).
Quora问题对数据集包含从Quora.com中选择的超过400k真实世界的问题对。为每个问题对提供表示匹配（重复）或不匹配（不重复）的二进制注释。在我们的例子中，重复的问题对可以被解释为蕴涵关系而不是重复为中性。我们使用与（Wang等人，2017）中提到的相同的分流比。

4.2 EXPERIMENTS SETTING

We implement our algorithm with Tensorflow(Abadi et al., 2016) framework. An Adadelta optimizer(Zeiler, 2012) with ρ as 0.95 and as 1e−8 is used to optimize all the trainable weights. The initial learning rate is set to 0.5 and batch size to 70. When the model does not improve best indomain performance for 30,000 steps, an SGD optimizer with learning rate of 3e−4 is used to help model to find a better local optimum. Dropout layers are applied before all linear layers and after word-embedding layer. We use an exponential decayed keep rate during training, where the initial keep rate is 1.0 and the decay rate is 0.977 for every 10,000 step. We initialize our word embeddings with pre-trained 300D GloVe 840B vectors Pennington et al. (2014) while the out-of-vocabulary word are randomly initialized with uniform distribution. The character embeddings are randomly initialized with 100D. We crop or pad each token to have 16 characters. The 1D convolution kernel size for character embedding is 5. All weights are constraint by L2 regularization, and the L2 regularization at step t is calculated as follows:
$L2Ratio_t=\sigma(\frac{(t-L2FullStep/2)*8}{L2FullStep/2})*L2FullRatio$ (8)

我们使用Tensorflow（Abadi等，2016）框架实现我们的算法。使用ρ为0.95和1e-8的Adadelta优化器（Zeiler，2012）来优化所有可训练的权重。初始学习率设置为0.5，批量大小设置为70.当模型没有提高30,000步的最佳indomain性能时，使用学习率为3e-4的SGD优化器来帮助建模以找到更好的局部最优。在所有线性层之前和在字嵌入层之后应用滤除层。我们在训练期间使用指数衰减保持率，其中初始保持率为1.0，并且每10,000步骤的衰减率为0.977。我们使用预先训练的300D GloVe 840B向量Pennington等人初始化我们的单词嵌入。（2014），而词汇外单词随机初始化为均匀分布。字符嵌入随机初始化为100D。我们裁剪或填充每个标记有16个字符。字符嵌入的1D卷积核大小为5.所有权重均由L2正则化约束，步骤t的L2正则化计算如下：

where $L 2 F u l l R a t i o$ determines the maximum L2 regularization ratio, and $L 2 F u l l S t e p$ determines at which step the maximum L2 regularization ratio would be applied on the L2 regularization. We choose $L 2 F u l l R a t i o$ as 0:9e -5 and $L 2 F u l l S t e p$ as 100,000. The ratio of L2 penalty between the difference of two encoder weights is set to 1e-3. For a dense block in feature extraction layer, the number of layer n is set to 8 and growth rate g is set to 20. The first scale down ratio η in feature extraction layer is set to 0:3 and transitional scale down ratio θ is set to 0:5. The sequence length is set as a hard cutoff on all experiments: 48 for MultiNLI, 32 for SNLI and 24 for Quora Question Pair Dataset. During the experiments on MultiNLI, we use 15% of data from SNLI as in Williams et al. (2017). We select the parameter by the best run of development accuracy. Our ensembling approach considers the majority vote of the predictions given by multiple runs of the same model under different random parameter initialization.

其中L2FullRatio确定最大L2正则化比率， $L 2 F u l l S t e p$ 确定最大L2正则化比率将应用于L2正则化的哪一步。我们选择 $L 2 F u l l R a t i o$ 为0：9e-5， $L 2 F u l l S t e p$ 为100,000。两个编码器权重的差值之间的L2惩罚的比率被设置为1e-3。对于特征提取层中的密集块，层n的数量被设置为8并且增长率g被设置为20。特征提取层中的比率η设定为0：3，过渡比例缩小比率θ设定为0：5。序列长度在所有实验中设置为硬截止：对于MultiNLI为48，对于SNLI为32，对于Quora问题对数据集为24。在MultiNLI的实验中，我们使用来自SNLI的15％的数据，如Williams（2017年）等人所述。我们通过最佳的开发精度来选择参数。我们的集成方法考虑了在不同随机参数初始化下由同一模型的多次运行给出的预测的多数投票。

4.3 EXPERIMENT ON MULTINLI

We compare our result with all other published systems in Table 2. Besides ESIM, the state-of-theart model on SNLI, all other models appear at RepEval 2017 workshop. RepEval 2017 workshop requires all submitted model to be sentence encoding-based model therefore alignment between sentences and memory module are not eligible for competition. All models except ours share one common feature that they use LSTM as a essential building block as encoder. Our approach, without using any recurrent structure, achieves the new state-of- the-art performance of 80.0%, exceeding current state-of-the-art performance by more than 5%. Unlike the observation from Nangia et al. (2017), we find the out-of-domain test performance is consistently lower than in-domain test performance. Selecting parameters from the best in-domain development accuracy partially contributes to this result.
我们将结果与表2中的所有其他已发布系统进行比较。除了ESIM，SNLI上的最新模型，所有其他模型都出现在2017年RepEval研讨会上。 2017年RepEval研讨会要求所有提交的模型都是基于句子编码的模型，因此句子和内存模块之间的对齐不符合竞争条件。除了我们之外的所有模型都有一个共同特征，即它们使用LSTM作为编码器的基本构建块。我们的方法在不使用任何重复结构的情况下，实现了80.0％的最新技术性能，超过了当前最先进的性能超过5％。与Nangia等人的观察结果不同。（2017），我们发现域外测试性能始终低于域内测试性能。从最佳的域内开发精度中选择参数部分地有助于此结果。

4.4 EXPERIMENT ON SNLI

In Table 3, we compare our model to other model performance on SNLI. Experiments (2-7) are sentence encoding based model. Bowman et al. (2016) provides a BiLSTM baseline. Vendrov et al. (2015) adopts two layer GRU encoder with pre-trained ”skip-thoughts” vectors. To capture sentence-level semantics, Mou et al. (2015) use tree-based CNN and Bowman et al. (2016) propose a stack-augmented parser- interpreter neural network (SPINN) which incorporates parsing information in a sequential manner. Liu et al. (2016) uses intra-attention on top of BiLSTM to generate sentence representation, and Munkhdalai & Yu (2016) proposes an memory augmented neural network to encode the sentence. The next group of model, experiments (8-18), uses cross sentence feature. Rocktaschel et al. (2015) aligns each sentence word-by-word with attention on top of LSTMs. Wang ¨ & Jiang (2015) enforces cross sentence attention word-by-word matching with the proprosed mLSTM model. Cheng et al. (2016) proposes long short-term memory-network(LSTMN) with deep attention fusion that links the current word to previous word stored in memory. Parikh et al. (2016) decomposes the task into sub-problems and conquer them respectively. Yu & Munkhdalai (2017) proposes neural tree indexer, a full n-ary tree whose subtrees can be overlapped. Re-read LSTM proposed by Sha et al. (2016) considers the attention vector of one sentence as the inner-state of LSTM for another sentence. Chen et al. (2016) propose a sequential model that infers locally, and a ensemble with tree-like inference module that further improves performance. We show our model, DIIN, achieves state-of-the-art performance on the competitive leaderboard.

在表3中，我们将我们的模型与SNLI上的其他模型性能进行比较。实验（2-7）是基于句子编码的模型。鲍曼等人。（2016）提供BiLSTM基线。 Vendrov等人。（2015）采用具有预训练的“跳过思想”向量的两层GRU编码器。为了捕获句子级别的语义，Mou等人。（2015）使用基于树的CNN和Bowman等人。（2016）提出了一种堆栈扩充的解析器 - 解释器神经网络（SPINN），其以顺序方式结合解析信息。刘等人。（2016）在BiLSTM之上使用内部注意来生成句子表示，并且Munkhdalai＆Yu（2016）提出了用于编码句子的记忆增强神经网络。下一组模型，实验（8-18），使用交叉句子功能。 Rocktaschel等人。（2015）在LSTM之上逐字逐句地对齐每个句子。 Wang¨＆Jiang（2015）用proprosed mLSTM模型逐字逐句地强制执行交叉句注意。程等人。（2016）提出了具有深度注意力融合的长期短期记忆网络（LSTMN），其将当前词链接到存储在存储器中的先前词。 Parikh等。（2016）将任务分解为子问题并分别征服它们。 Yu＆Munkhdalai（2017）提出了神经树索引器，一个完整的n-ary树，其子树可以重叠。重新阅读Sha等人提出的LSTM。（2016）将一个句子的注意向量视为另一个句子的LSTM的内在状态。陈等人。（2016）提出了一个推断本地的顺序模型，以及一个具有树状推理模块的集合，进一步提高了性能。我们展示了我们的模型DIIN，在竞争排行榜上实现了最先进的性能。

4.5 EXPERIMENT ON QUORA QUESTION PAIR DATASET

In this subsection, we evaluate the effectiveness of our model for paraphrase identification as natural language inference task. Other than our baselines, we compare with Wang et al. (2017) and Tomar et al. (2017). BIMPM models different perspective of matching between sentence pair on both direction, then aggregates matching vector with LSTM. DECATTword and DECATTchar uses automatically collected in-domain paraphrase data to noisy pretrain n-gram word embedding and ngram subword embedding correspondingly on decomposable attention model proposed by (Parikh et al., 2016). In Table 4, our experiment shows DIIN has better performance than all other models and an ensemble score is higher than the former best result for more than 1 percent.
在本小节中，我们评估了我们的复述识别模型作为自然语言推理任务的有效性。除了我们的基线，我们与Wang等人进行了比较。（2017年）和托马尔等人。（2017年）。 BIMPM模拟两个方向上句子对之间匹配的不同视角，然后将匹配向量与LSTM聚合。 DECATTword和DECATTchar使用自动收集的域内释义数据对噪声预训练n-gram词嵌入和ngram子词嵌入相应地在（Parikh等，2016）提出的可分解注意力模型上。在表4中，我们的实验表明DIIN具有比所有其他模型更好的性能，并且集合得分高于前一个最佳结果超过1％。

你可能感兴趣的:(自然语言处理,论文)

探索OpenAI和LangChain的适配器集成：轻松切换模型提供商 nseejrukjhad langchain easyui 前端 python
#探索OpenAI和LangChain的适配器集成：轻松切换模型提供商##引言在人工智能和自然语言处理的世界中，OpenAI的模型提供了强大的能力。然而，随着技术的发展，许多人开始探索其他模型以满足特定需求。LangChain作为一个强大的工具，集成了多种模型提供商，通过提供适配器，简化了不同模型之间的转换。本篇文章将介绍如何使用LangChain的适配器与OpenAI集成，以便轻松切换模型提供商
使用Apify加载Twitter消息以进行微调的完整指南 nseejrukjhad twitter easyui 前端 python
#使用Apify加载Twitter消息以进行微调的完整指南##引言在自然语言处理领域，微调模型以适应特定任务是提升模型性能的常见方法。本文将介绍如何使用Apify从Twitter导出聊天信息，以便进一步进行微调。##主要内容###使用Apify导出推文首先，我们需要从Twitter导出推文。Apify可以帮助我们做到这一点。通过Apify的强大功能，我们可以批量抓取和导出数据，适用于各类应用场景。
深入理解 MultiQueryRetriever：提升向量数据库检索效果的强大工具 nseejrukjhad 数据库 python
深入理解MultiQueryRetriever：提升向量数据库检索效果的强大工具引言在人工智能和自然语言处理领域，高效准确的信息检索一直是一个关键挑战。传统的基于距离的向量数据库检索方法虽然广泛应用，但仍存在一些局限性。本文将介绍一种创新的解决方案：MultiQueryRetriever，它通过自动生成多个查询视角来增强检索效果，提高结果的相关性和多样性。MultiQueryRetriever的工
女儿考研完报考雅思捡拾流年
是否我过于焦虑？会不会无形间让女儿觉得压力太大了啊。2022年对于我们家来说是不平常的一年。女儿今年大四，为了准备考研，暑假也没回家，年初去了学校到了年末才回家。女儿自己一个人面对考研，没有参加培训，大四学校作业论文等课业也多，她同时也是很努力复习考研的。在疫情开放很多羊的时期，女儿终于顺顺利利参加12月24、25号的考研，我们和家人都觉得女儿回家来要好好休息调养。可女儿回到家，我再查阅考研信息，
自然语言处理_tf-idf _feivirus_ 算法机器学习和数学自然语言处理 tf-idf 逆文档频率词频
importpandasaspdimportmath1.数据预处理docA="Thecatsatonmyface"docB="Thedogsatonmybed"wordsA=docA.split("")wordsB=docB.split("")wordsSet=set(wordsA).union(set(wordsB))print(wordsSet){'on','my','face','sat',
绝招曝光！3小时高效利用ChatGPT写出精彩论文 kkai人工智能 chatgpt 人工智能 ai 学习媒体
在这份指南中，我将深入解析如何利用ChatGPT4.0的高级功能，指导整个学术研究和写作过程。从初步探索研究主题，到撰写结构严谨的学术论文，我将一步步展示如何在每个环节中有效运用ChatGPT。如果您还未使用PLUS版本，可以参考相关教程。**初步探索与主题的确定**起初，我处于庞大的知识领域中，寻找一个可深入研究的领域。ChatGPT如同灯塔，通过深入分析最新研究趋势和领域热点，帮助我在广阔的学
免费的GPT可在线直接使用（一键收藏） kkai人工智能 gpt
1、LuminAI（https://kk.zlrxjh.top）LuminAI标志着一款融合了星辰大数据模型与文脉深度模型的先进知识增强型语言处理系统，旨在自然语言处理（NLP）的技术开发领域发光发热。此系统展现了卓越的语义把握与内容生成能力，轻松驾驭多样化的自然语言处理任务。VisionAI在NLP界的应用领域广泛，能够胜任从机器翻译、文本概要撰写、情绪分析到问答等众多任务。通过对大量文本数据的
自动写论文的网站推荐这5款实用类工具小猪包333 写论文人工智能深度学习计算机视觉 AI写作
在当今学术研究和写作领域，AI论文写作工具的出现极大地提高了写作效率和质量。这些工具不仅能够帮助研究人员快速生成论文草稿，还能进行内容优化、查重和排版等操作。以下是五款实用类工具推荐，特别是千笔-AIPassPaper。1.千笔-AIPassPaper千笔-AIPassPaper是一款功能强大且全面的AI论文写作助手，用户只需输入基本的研究需求和关键词，便能迅速生成一篇完整的论文。该工具利用先进的
推荐3家毕业AI论文可五分钟一键生成！文末附免费教程！小猪包333 写论文人工智能 AI写作深度学习计算机视觉
在当前的学术研究和写作领域，AI论文生成器已经成为许多研究人员和学生的重要工具。这些工具不仅能够帮助用户快速生成高质量的论文内容，还能进行内容优化、查重和排版等操作。以下是三款值得推荐的AI论文生成器：千笔-AIPassPaper、懒人论文以及AIPaperPass。千笔-AIPassPaper千笔-AIPassPaper是一款基于深度学习和自然语言处理技术的AI写作助手，旨在帮助用户快速生成高质
4款毕业论文参考文献格式生成器（附加详细步骤）小猪包333 写论文人工智能深度学习计算机视觉 AI写作
在撰写毕业论文时，参考文献的格式规范是至关重要的。为了帮助学生和学者们更高效地生成符合要求的参考文献格式，本文将详细介绍四款推荐的参考文献格式生成器，并提供详细的使用步骤。1.千笔-AIPassPaper千笔-AIPassPaper是一款先进的AI辅助论文写作工具，不仅能够自动生成大纲、开题报告，还能一键生成参考文献。AI论文，免费大纲，10分钟3万字https://www.aipaperpass
AI论文写作推荐哪个好？分享5款AI论文写作带数据图表网站小猪包333 写论文人工智能深度学习计算机视觉
在当今学术研究和写作领域，AI论文写作工具的出现极大地提高了写作效率和质量。这些工具不仅能够帮助研究人员快速生成论文草稿，还能进行内容优化、查重和排版等操作。以下是五款推荐的AI论文写作工具，包括千笔-AIPassPaper。千笔-AIPassPaper千笔-AIPassPaper是一款功能强大的AI论文写作助手，旨在帮助用户快速生成高质量的论文内容。AI论文，免费大纲，10分钟3万字https:
AI论文题目生成器怎么用？9款论文写作网站简单3步搞定小猪包333 写论文人工智能深度学习计算机视觉
在当今信息爆炸的时代，AI写作工具的出现极大地提高了写作效率和质量。本文将详细介绍9款优秀的论文写作网站，并重点推荐千笔-AIPassPaper。一、千笔-AIPassPaper千笔-AIPassPaper是一款功能强大的AI论文生成器，基于最新的自然语言处理技术，能够一键生成高质量的毕业论文、开题报告等文本内容。它不仅提供智能选题、文献推荐和论文润色等功能，还具有较高的用户评价。其文献综述生成功
AI大模型的架构演进与最新发展季风泯灭的季节 AI大模型应用技术二人工智能架构
随着深度学习的发展，AI大模型（LargeLanguageModels,LLMs）在自然语言处理、计算机视觉等领域取得了革命性的进展。本文将详细探讨AI大模型的架构演进，包括从Transformer的提出到GPT、BERT、T5等模型的历史演变，并探讨这些模型的技术细节及其在现代人工智能中的核心作用。一、基础模型介绍：Transformer的核心原理Transformer架构的背景在Transfo
毕业论文附录一般都写什么?大学生写论文是干嘛用的写个原创论文人工智能深度学习 AI写作 chatgpt 论文阅读
毕业论文的附录通常包含一些在正文中不便于展示或详细阐述的内容，但对理解论文整体又具有重要意义的资料。具体来说，附录可能包含以下内容：AI论文，免费大纲，10分钟3万字，查重高于15%退费，支持数据图表！！AIPaperPass-AI论文写作指导平台AIPaperPass是AI原创论文写作平台，免费千字大纲，5分钟生成3万字初稿，提供答辩汇报ppt、开题报告、任务书等，40篇真实中英文知网参考文献，
【加密算法基础——RSA 加密】 XWWW668899 网络服务器笔记 python
RSA加密RSA（Rivest-Shamir-Adleman）加密是非对称加密，一种广泛使用的公钥加密算法，主要用于安全数据传输。公钥用于加密，私钥用于解密。RSA加密算法的名称来源于其三位发明者的姓氏：R:RonRivestS:AdiShamirA:LeonardAdleman这三位计算机科学家在1977年共同提出了这一算法，并发表了相关论文。他们的工作为公钥加密的基础奠定了重要基础，使得安全通
《拖延心理学》（一）你为什么会拖延？|木盒笔记纯se蓝调
《拖延心理学》是帮助你向拖延症宣战的一本书，作者简·博克和莱诺拉·袁是全球知名的拖延症治疗专家。大概每个人或多或少总会有一点拖延症的行为。比如明天要叫论文了，今天你还没有写好，你一边在焦虑症怎么办，一边又拿着手机漫无目的的刷新闻；比如你想了很久准备减肥，但是迟迟又没有行动，想着今天晚上少吃一点吧、明天我就开始运动。今天分析的笔记来告诉你“你为什么会拖延？”，解读人杨坚。有人说拖延就像巨大的泥沼，让
2024年华为杯数学建模研赛C题思路代码+论文助攻 DS数模 2024华为杯数学建模华为 2024华为杯 2024研究生数学建模 2024研赛
2024年华为杯研究生数学建模竞赛（以下简研赛）将于9月21日上午8时正式开始。下文包含：2024研赛思路解析、研赛参赛时间及规则信息说明、好用的数模技巧及如何备战数学建模竞赛C君将会第一时间发布选题建议、所有题目的思路解析、相关代码、参考文献、参考论文等多项资料，帮助大家取得好成绩。2024年研赛将于9月21日上午8时正式开始这里有些资料，大家可以看看：【2024最全国赛研赛数模资料包】C君珍贵
轻量级模型解读——轻量transformer系列 lishanlu136 #图像分类轻量级模型 transformer 图像分类
先占坑，持续更新。。。文章目录1、DeiT2、ConViT3、Mobile-Former4、MobileViTTransformer是2017谷歌提出的一篇论文，最早应用于NLP领域的机器翻译工作，Transformer解读，但随着2020年DETR和ViT的出现(DETR解读，ViT解读)，其在视觉领域的应用也如雨后春笋般渐渐出现，其特有的全局注意力机制给图像识别领域带来了重要参考。但是tran
基于JavaWeb开发的Java+SpringMvc+vue+element实现上海汽车博物馆平台网顺技术团队成品程序项目 java vue.js 汽车课程设计 spring boot
基于JavaWeb开发的Java+SpringMvc+vue+element实现上海汽车博物馆平台作者主页网顺技术团队欢迎点赞收藏⭐留言文末获取源码联系方式查看下方微信号获取联系方式承接各种定制系统精彩系列推荐精彩专栏推荐订阅不然下次找不到哟Java毕设项目精品实战案例《1000套》感兴趣的可以先收藏起来，还有大家在毕设选题，项目以及论文编写等相关问题都可以给我留言咨询，希望帮助更多的人文章目录基
FlagEmbedding 吉小雨 python库 python
FlagEmbedding教程FlagEmbedding是一个用于生成文本嵌入（textembeddings）的库，适合处理自然语言处理（NLP）中的各种任务。嵌入（embeddings）是将文本表示为连续向量，能够捕捉语义上的相似性，常用于文本分类、聚类、信息检索等场景。官方文档链接：FlagEmbedding官方GitHub一、FlagEmbedding库概述1.1什么是FlagEmbeddi
深度学习-点击率预估-研究论文2024-09-14速读 sp_fyf_2024 深度学习人工智能
深度学习-点击率预估-研究论文2024-09-14速读1.DeepTargetSessionInterestNetworkforClick-ThroughRatePredictionHZhong,JMa,XDuan,SGu,JYao-2024InternationalJointConferenceonNeuralNetworks,2024深度目标会话兴趣网络用于点击率预测摘要：这篇文章提出了一种新
【NumPy】深入解析numpy.zeros()函数二七830 numpy
欢迎莅临我的个人主页这里是我深耕Python编程、机器学习和自然语言处理（NLP）领域，并乐于分享知识与经验的小天地！博主简介：我是二七830，一名对技术充满热情的探索者。多年的Python编程和机器学习实践，使我深入理解了这些技术的核心原理，并能够在实际项目中灵活应用。尤其是在NLP领域，我积累了丰富的经验，能够处理各种复杂的自然语言任务。技术专长：我熟练掌握Python编程语言，并深入研究了机
Humanize 项目教程尤嫒冰
Humanize项目教程humanizeAJSlibraryforaddinga“humantouch”todata.项目地址:https://gitcode.com/gh_mirrors/humani/humanize项目介绍Humanize是一个开源项目，旨在将机器生成的文本转换为更加自然、人性化的文本。该项目通过先进的算法和自然语言处理技术，使得AI生成的内容更加贴近人类的表达方式，从而提高
全自动解密解码神器 — Ciphey K'illCode python_模块 python vscode
Ciphey是一个使用自然语言处理和人工智能的全自动解密/解码/破解工具。简单地来讲，你只需要输入加密文本，它就能给你返回解密文本。就是这么牛逼。有了Ciphey，你根本不需要知道你的密文是哪种类型的加密，你只知道它是加密的，那么Ciphey就能在3秒甚至更短的时间内给你解密，返回你想要的大部分密文的答案。下面就给大家介绍Ciphey的实战使用教程。1.准备开始之前，你要确保Python和pip已
CV、NLP、数据控掘推荐、量化海的那边- AI算法自然语言处理人工智能
下面是对CV（计算机视觉）、NLP（自然语言处理）、数据挖掘推荐和量化的简要概述及其应用领域的介绍：1.CV（计算机视觉，ComputerVision）定义：计算机视觉是一门让计算机能够从图像或视频中提取有用信息，并做出决策的学科。它通过模拟人类的视觉系统来识别、处理和理解视觉信息。主要任务：图像分类：识别图像中的物体并分类，比如猫、狗、车等。目标检测：在图像或视频中定位并识别多个对象，如人脸检测
深度解析：如何使用输出解析器将大型语言模型（LLM）的响应解析为结构化JSON格式 m0_57781768 语言模型 json 人工智能
深度解析：如何使用输出解析器将大型语言模型（LLM）的响应解析为结构化JSON格式在现代自然语言处理（NLP）的应用中，大型语言模型（LLM）已经成为了重要的工具。这些模型能够生成丰富的自然语言文本，适用于各种应用场景。然而，在某些应用中，开发者不仅仅需要生成文本，还需要将这些生成的文本转换为结构化的数据格式，例如JSON。这种结构化的数据格式在数据传输、存储以及进一步处理时具有显著优势。本文将深
深入探讨：如何在Python中通过LangChain技术精准追踪大型语言模型（LLM）的Token使用情况 m0_57781768 python langchain 语言模型
深入探讨：如何在Python中通过LangChain技术精准追踪大型语言模型（LLM）的Token使用情况在现代的人工智能开发中，大型语言模型（LLM）已经成为了不可或缺的工具，无论是用于自然语言处理、对话生成，还是其他复杂的文本生成任务。然而，随着这些模型的广泛应用，开发者面临的一个重要挑战是如何有效地追踪和管理Token的使用情况，特别是在生产环境中，Token的使用直接影响着API调用的成本
使用最大边际相关性(MMR)选择示例：提高AI模型的多样性和相关性 aehrutktrjk 人工智能 easyui 前端 python
使用最大边际相关性(MMR)选择示例：提高AI模型的多样性和相关性引言在机器学习和自然语言处理领域，选择合适的训练示例对模型性能至关重要。最大边际相关性(MaximalMarginalRelevance,MMR)是一种优秀的示例选择方法，它不仅考虑了示例与输入的相关性，还注重保持所选示例之间的多样性。本文将深入探讨如何使用MMR来选择示例，以提高AI模型的性能和泛化能力。什么是最大边际相关性(MM
使用LangChain和OpenAI实现高效文本标注 aehrutktrjk langchain python
使用LangChain和OpenAI实现高效文本标注引言在自然语言处理(NLP)领域，文本标注是一项重要且常见的任务。它涉及为文本分配标签，如情感、语言、风格等。本文将介绍如何使用LangChain和OpenAI的API来实现高效的文本标注系统。我们将探讨如何设置环境、定义标注模式，以及如何使用OpenAI的模型来执行标注任务。环境准备首先，我们需要安装必要的库并设置API密钥：%pipinsta
推荐开源项目：Zotero引用计数管理器——学术研究的智能助手蔡鸿烈Hope
推荐开源项目：Zotero引用计数管理器——学术研究的智能助手zotero-citationcountsZoteropluginforauto-fetchingcitationcountsfromvarioussources项目地址:https://gitcode.com/gh_mirrors/zo/zotero-citationcounts项目介绍在学术界，每篇论文背后都承载着学者们辛勤的研究成
java封装继承多态等麦田的设计者 java eclipse jvm c encapsulatopn
最近一段时间看了很多的视频却忘记总结了，现在只能想到什么写什么了，希望能起到一个回忆巩固的作用。 1、final关键字译为：最终的 &
F5与集群的区别 bijian1013 weblogic 集群 F5
http请求配置不是通过集群，而是F5；集群是weblogic容器的，如果是ejb接口是通过集群。 F5同集群的差别，主要还是会话复制的问题，F5一把是分发http请求用的，因为http都是无状态的服务，无需关注会话问题，类似
LeetCode[Math] - #7 Reverse Integer Cwind java 题解 Math LeetCode Algorithm
原题链接：#7 Reverse Integer 要求：按位反转输入的数字例1：输入 x = 123, 返回 321 例2：输入 x = -123, 返回 -321 难度：简单分析：对于一般情况，首先保存输入数字的符号，然后每次取输入的末位（x%10）作为输出的高位（result = result*10 + x%10）即可。但
BufferedOutputStream 周凡杨
首先说一下这个大批量，是指有上千万的数据量。例子：有一张短信历史表，其数据有上千万条数据，要进行数据备份到文本文件，就是执行如下SQL然后将结果集写入到文件中！ select t.msisd
linux下模拟按键输入和鼠标被触发 linux
查看/dev/input/eventX是什么类型的事件， cat /proc/bus/input/devices 设备有着自己特殊的按键键码，我需要将一些标准的按键，比如0－9，X－Z等模拟成标准按键，比如KEY_0,KEY-Z等，所以需要用到按键模拟，具体方法就是操作/dev/input/event1文件，向它写入个input_event结构体就可以模拟按键的输入了。 linux/in
ContentProvider初体验肆无忌惮_ ContentProvider
ContentProvider在安卓开发中非常重要。与Activity，Service，BroadcastReceiver并称安卓组件四大天王。在android中的作用是用来对外共享数据。因为安卓程序的数据库文件存放在data/data/packagename里面，这里面的文件默认都是私有的，别的程序无法访问。如果QQ游戏想访问手机QQ的帐号信息一键登录，那么就需要使用内容提供者COnte
关于Spring MVC项目（maven）中通过fileupload上传文件 843977358 mybatis spring mvc 修改头像上传文件 upload
Spring MVC 中通过fileupload上传文件，其中项目使用maven管理。 1.上传文件首先需要的是导入相关支持jar包：commons-fileupload.jar,commons-io.jar 因为我是用的maven管理项目，所以要在pom文件中配置（每个人的jar包位置根据实际情况定） <!-- 文件上传 start by zhangyd-c --&g
使用svnkit api，纯java操作svn，实现svn提交，更新等操作 aigo svnkit
原文：http://blog.csdn.net/hardwin/article/details/7963318 import java.io.File; import org.apache.log4j.Logger; import org.tmatesoft.svn.core.SVNCommitInfo; import org.tmateso
对比浏览器，casperjs，httpclient的Header信息 alleni123 爬虫 crawler header
@Override protected void doGet(HttpServletRequest req, HttpServletResponse res) throws ServletException, IOException { String type=req.getParameter("type"); Enumeration es=re
java.io操作 DataInputStream和DataOutputStream基本数据流百合不是茶 java 流
1，java中如果不保存整个对象，只保存类中的属性，那么我们可以使用本篇文章中的方法，如果要保存整个对象先将类实例化后面的文章将详细写到 2，DataInputStream 是java.io包中一个数据输入流允许应用程序以与机器无关方式从底层输入流中读取基本 Java 数据类型。应用程序可以使用数据输出流写入稍后由数据输入流读取的数据。
车辆保险理赔案例 bijian1013 车险
理赔案例：一货运车，运输公司为车辆购买了机动车商业险和交强险，也买了安全生产责任险，运输一车烟花爆竹，在行驶途中发生爆炸，出现车毁、货损、司机亡、炸死一路人、炸毁一间民宅等惨剧，针对这几种情况，该如何赔付。赔付建议和方案：客户所买交强险在这里不起作用，因为交强险的赔付前提是：“机动车发生道路交通意外事故”；如果是交通意外事故引发的爆炸，则优先适用交强险条款进行赔付，不足的部分由商业
学习Spring必学的Java基础知识(5)—注解 bijian1013 java spring
文章来源：http://www.iteye.com/topic/1123823，整理在我的博客有两个目的：一个是原文确实很不错，通俗易懂，督促自已将博主的这一系列关于Spring文章都学完；另一个原因是为免原文被博主删除，在此记录，方便以后查找阅读。有必要对
【Struts2一】Struts2 Hello World bit1129 Hello world
Struts2 Hello World应用的基本步骤创建Struts2的Hello World应用，包括如下几步： 1.配置web.xml 2.创建Action 3.创建struts.xml，配置Action 4.启动web server，通过浏览器访问配置web.xml <?xml version="1.0" encoding="
【Avro二】Avro RPC框架 bit1129 rpc
1. Avro RPC简介 1.1. RPC RPC逻辑上分为二层，一是传输层，负责网络通信；二是协议层，将数据按照一定协议格式打包和解包从序列化方式来看，Apache Thrift 和Google的Protocol Buffers和Avro应该是属于同一个级别的框架，都能跨语言，性能优秀，数据精简，但是Avro的动态模式（不用生成代码，而且性能很好）这个特点让人非常喜欢，比较适合R
lua　set get cookie ronin47 lua cookie
lua: local access_token = ngx.var.cookie_SGAccessToken if access_token then ngx.header["Set-Cookie"] = "SGAccessToken="..access_token.."; path=/;Max-Age=3000" end
java-打印不大于N的质数 bylijinnan java
public class PrimeNumber { /** * 寻找不大于N的质数 */ public static void main(String[] args) { int n=100; PrimeNumber pn=new PrimeNumber(); pn.printPrimeNumber(n); System.out.print
Spring源码学习-PropertyPlaceholderHelper bylijinnan java spring
今天在看Spring 3.0.0.RELEASE的源码，发现PropertyPlaceholderHelper的一个bug 当时觉得奇怪，上网一搜，果然是个bug，不过早就有人发现了，且已经修复：详见： http://forum.spring.io/forum/spring-projects/container/88107-propertyplaceholderhelper-bug
[逻辑与拓扑]布尔逻辑与拓扑结构的结合会产生什么? comsci 拓扑
如果我们已经在一个工作流的节点中嵌入了可以进行逻辑推理的代码,那么成百上千个这样的节点如果组成一个拓扑网络,而这个网络是可以自动遍历的,非线性的拓扑计算模型和节点内部的布尔逻辑处理的结合,会产生什么样的结果呢? 是否可以形成一种新的模糊语言识别和处理模型呢? 大家有兴趣可以试试,用软件搞这些有个好处,就是花钱比较少,就算不成
ITEYE 都换百度推广了 cuisuqiang Google AdSense 百度推广广告外快
以前ITEYE的广告都是谷歌的Google AdSense，现在都换成百度推广了。为什么个人博客设置里面还是Google AdSense呢？都知道Google AdSense不好申请，这在ITEYE上也不是讨论了一两天了，强烈建议ITEYE换掉Google AdSense。至少，用一个好申请的吧。什么时候能从ITEYE上来点外快，哪怕少点
新浪微博技术架构分析 dalan_123 新浪微博架构
新浪微博在短短一年时间内从零发展到五千万用户，我们的基层架构也发展了几个版本。第一版就是是非常快的，我们可以非常快的实现我们的模块。我们看一下技术特点，微博这个产品从架构上来分析，它需要解决的是发表和订阅的问题。我们第一版采用的是推的消息模式，假如说我们一个明星用户他有10万个粉丝，那就是说用户发表一条微博的时候，我们把这个微博消息攒成10万份，这样就是很简单了，第一版的架构实际上就是这两行字。第
玩转ARP攻击 dcj3sjt126com r
我写这片文章只是想让你明白深刻理解某一协议的好处。高手免看。如果有人利用这片文章所做的一切事情，盖不负责。网上关于ARP的资料已经很多了，就不用我都说了。用某一位高手的话来说，“我们能做的事情很多，唯一受限制的是我们的创造力和想象力”。 ARP也是如此。以下讨论的机子有一个要攻击的机子：10.5.4.178 硬件地址：52:54:4C:98
PHP编码规范 dcj3sjt126com 编码规范
一、文件格式 1. 对于只含有 php 代码的文件，我们将在文件结尾处忽略掉 "?>" 。这是为了防止多余的空格或者其它字符影响到代码。例如：<?php$foo = 'foo';2. 缩进应该能够反映出代码的逻辑结果，尽量使用四个空格，禁止使用制表符TAB，因为这样能够保证有跨客户端编程器软件的灵活性。例
linux 脱机管理（nohup） eksliang linux nohup nohup
脱机管理 nohup 转载请出自出处：http://eksliang.iteye.com/blog/2166699 nohup可以让你在脱机或者注销系统后，还能够让工作继续进行。他的语法如下 nohup [命令与参数] --在终端机前台工作 nohup [命令与参数] & --在终端机后台工作但是这个命令需要注意的是，nohup并不支持bash的内置命令，所
BusinessObjects Enterprise Java SDK greemranqq java BO SAP Crystal Reports
最近项目用到oracle_ADF 从SAP/BO 上调用水晶报表，资料比较少，我做一个简单的分享，给和我一样的新手提供更多的便利。首先，我是尝试用JAVA JSP 去访问的。官方API：http://devlibrary.businessobjects.com/BusinessObjectsxi/en/en/BOE_SDK/boesdk_ja
系统负载剧变下的管控策略 iamzhongyong 高并发
假如目前的系统有100台机器，能够支撑每天1亿的点击量（这个就简单比喻一下），然后系统流量剧变了要，我如何应对，系统有那些策略可以处理，这里总结了一下之前的一些做法。 1、水平扩展这个最容易理解，加机器，这样的话对于系统刚刚开始的伸缩性设计要求比较高，能够非常灵活的添加机器，来应对流量的变化。 2、系统分组假如系统服务的业务不同，有优先级高的，有优先级低的，那就让不同的业务调用提前分组
BitTorrent DHT 协议中文翻译 justjavac bit
前言做了一个磁力链接和BT种子的搜索引擎 {Magnet & Torrent}，因此把 DHT 协议重新看了一遍。 BEP: 5Title: DHT ProtocolVersion: 3dec52cb3ae103ce22358e3894b31cad47a6f22bLast-Modified: Tue Apr 2 16:51:45 2013 -070
Ubuntu下Java环境的搭建 macroli java 工作 ubuntu
配置命令：　　$sudo apt-get install ubuntu-restricted-extras 　　再运行如下命令：　　$sudo apt-get install sun-java6-jdk 　　待安装完毕后选择默认Java. 　　$sudo update- alternatives --config java 　　安装过程提示选择，输入“2”即可，然后按回车键确定。
js字符串转日期（兼容IE所有版本） qiaolevip TO Date String IE
/** * 字符串转时间（yyyy-MM-dd HH:mm:ss） * result （分钟） */ stringToDate : function(fDate){ var fullDate = fDate.split(" ")[0].split("-"); var fullTime = fDate.split("
【数据挖掘学习】关联规则算法Apriori的学习与SQL简单实现购物篮分析 superlxw1234 sql 数据挖掘关联规则
关联规则挖掘用于寻找给定数据集中项之间的有趣的关联或相关关系。关联规则揭示了数据项间的未知的依赖关系，根据所挖掘的关联关系，可以从一个数据对象的信息来推断另一个数据对象的信息。例如购物篮分析。牛奶 ⇒ 面包 [支持度：3%，置信度：40%] 支持度3%：意味3%顾客同时购买牛奶和面包。置信度40%：意味购买牛奶的顾客40%也购买面包。规则的支持度和置信度是两个规则兴
Spring 5.0 的系统需求，期待你的反馈 wiselyman spring
Spring 5.0将在2016年发布。Spring5.0将支持JDK 9。 Spring 5.0的特性计划还在工作中，请保持关注，所以作者希望从使用者得到关于Spring 5.0系统需求方面的反馈。