Elvira521yan

【论文学习】Bidirectional LSTM-CRF Models for Sequence Tagging（论文翻译）

Bidirectional LSTM-CRF Models for Sequence Tagging（论文翻译）

Abstract

In this paper, we propose a variety of Long Short-Term Memory (LSTM) based models for sequence tagging. These models include LSTM networks, bidirectional LSTM (BI-LSTM) networks, LSTM with a Conditional Random Field (CRF) layer (LSTM-CRF) and bidirectional LSTM with a CRF layer (BI-LSTM-CRF). Our work is the first to apply a bidirectional LSTM CRF (denoted as BI-LSTM-CRF) model to NLP benchmark sequence tagging data sets. We show that the BILSTM-CRF model can efficiently use both past and future input features thanks to a bidirectional LSTM component. It can also use sentence level tag information thanks to a CRF layer. The BI-LSTMCRF model can produce state of the art (or close to) accuracy on POS, chunking and NER data sets. In addition, it is robust and has less dependence on word embedding as compared to previous observations.

在本文中，本文提出了一系列基于长短期记忆（LSTM）的序列标注模型。这些模型包括LSTM，BI-LSTM，LSTM-CRF，BI-LSTM-CRF我们的工作是首次将双向的LSTM CRF(简称BI-LSTM-CRF)模型应用于NLP基准序列标记数据集。我们证明，由于双向LSTM组件，bilsm - crf模型可以有效地利用过去和未来的输入特性。由于CRF层，它还可以使用句子级别的标记信息。BI-LSTMCRF模型可以在POS、分块和NER数据集上产生最先进(或接近)的精度。此外，与以前的观测相比，该方法具有较强的鲁棒性，对嵌入词的依赖性较小。

1.Introduction

Sequence tagging including part of speech tagging (POS), chunking, and named entity recognition (NER) has been a classic NLP task. It has drawn research attention for a few decades. The output of taggers can be used for down streaming applications. For example, a named entity recognizer trained on user search queries can be utilized to identify which spans of text are products, thus triggering certain products ads. Another example is that such tag information can be used by a search engine to find relevant webpages.

序列标记包括词性标记(POS)、分块和命名实体识别(NER)，一直是经典的NLP任务。几十年来，它引起了研究界的关注。标记器的输出可以用于下行流应用程序。例如，一个在用户搜索查询上训练好的命名实体识别器，可以用来识别文本中的哪些词是产品，从而触发某些产品广告。另一个例子是，这样的标签信息可以被搜索引擎用来查找相关网页。

Most existing sequence tagging models are linear statistical models which include Hidden Markov Models (HMM), Maximum entropy Markov models (MEMMs) (McCallum et al., 2000), and Conditional Random Fields (CRF) (Lafferty et al., 2001). Convolutional network based models (Collobert et al., 2011) have been recently proposed to tackle sequence tagging problem. We denote such a model as Conv-CRF as it consists of a convolutional network and a CRF layer on the output (the term of sentence level loglikelihood (SSL) was used in the original paper). The Conv-CRF model has generated promising results on sequence tagging tasks. In speech language understanding community, recurrent neural network (Mesnil et al., 2013; Yao et al., 2014) and convolutional nets (Xu and Sarikaya, 2013) based models have been recently proposed. Other relevant work includes (Graves et al., 2005; Graves et al., 2013) which proposed a bidirectional recurrent neural network for speech recognition.

现有的序列标记模型大多是线性统计模型，包括隐马尔可夫模型(HMM)、最大熵马尔可夫模型(MEMMs) (McCallum et al.， 2000)和条件随机场(CRF) (Lafferty et al.， 2001)。最近提出了基于卷积网络的模型(Collobert et al.， 2011)来解决序列标记问题。我们将这种模型称为Conv-CRF，因为它由卷积网络和输出端的CRF层组成(原论文使用了句子级loglikelihood (SSL)这个术语)。Conv-CRF模型在序列标记任务上产生了良好的结果。在语音语言理解领域，最近提出了基于递归神经网络和卷积网络的语音理解模型。其他相关工作包括提出了一种用于语音识别的双向递归神经网络。

In this paper, we propose a variety of neural network based models to sequence tagging task. These models include LSTM networks, bidirectional LSTM networks (BI-LSTM), LSTM networks with a CRF layer (LSTM-CRF), and bidirectional LSTM networks with a CRF layer (BILSTM-CRF). Our contributions can be summarized as follows. 1) We systematically compare the performance of aforementioned models on NLP tagging data sets; 2) Our work is the first to apply a bidirectional LSTM CRF (denoted as BI-LSTM-CRF) model to NLP benchmark sequence tagging data sets. This model can use both past and future input features thanks to a bidirectional LSTM component. In addition, this model can use sentence level tag information thanks to a CRF layer. Our model can produce state of the art (or close to) accuracy on POS, chunking and NER data sets; 3) We show that BI-LSTMCRF model is robust and it has less dependence on word embedding as compared to previous observations (Collobert et al., 2011). It can produce accurate tagging performance without resorting to word embedding.

本文提出了一系列基于神经网络的序列标注模型。这些模型包括LSTM网络、双向LSTM网络(BI-LSTM)、带CRF层的LSTM网络(LSTM-CRF)和带CRF层的双向LSTM网络(BILSTM-CRF)。我们的贡献可以总结如下：1)系统比较了上述模型在NLP标记数据集上的性能;2)首次将双向LSTM CRF(简称BI-LSTM-CRF)模型应用于NLP基准序列标记数据集。由于双向LSTM组件，该模型可以同时使用过去和将来的输入特性。此外，由于CRF层的存在，该模型可以使用句子级标记信息。我们的模型可以在POS、分块和NER数据集上产生最先进(或接近)的精度;3)我们证明BI-LSTMCRF模型是稳健的，与之前的观察相比，它对嵌入词的依赖更少(Collobert et al.， 2011)。它不需要依靠嵌入词就可以产生精确的标注性能。

The remainder of the paper is organized as follows. Section 2 describes sequence tagging models used in this paper. Section 3 shows the training procedure. Section 4 reports the experiments results. Section 5 discusses related research. Finally Section 6 draws conclusions.

本文的其余部分内容如下。第2节描述了本文中使用的序列标记模型。第3节展示了训练过程。第4节报告了实验结果。第5节讨论了相关研究。最后第6节得出结论。

2.Models

In this section, we describe the models used in this paper: LSTM, BI-LSTM, CRF, LSTM-CRF and BI-LSTM-CRF.

在本节，我们描述了本文中使用的模型：LSTM, BI-LSTM, CRF, LSTM-CRF and BI-LSTM-CRF.

2.1.LSTM Networks

Recurrent neural networks (RNN) have been employed to produce promising results on a variety of tasks including language model (Mikolov et al., 2010; Mikolov et al., 2011) and speech recognition (Graves et al., 2005). A RNN maintains a memory based on history information, which enables the model to predict the current output conditioned on long distance features.

循环神经网络在语言模型和语音识别等方面取得了良好的效果。一个RNN维持了一个基于历史信息的记忆单元，使模型能够根据长距离特征预测当前输出。

Figure 1 shows the RNN structure (Elman, 1990) which has an input layer x, hidden layer h and output layer y. In named entity tagging context, x represents input features and y represents tags. Figure 1 illustrates a named entity recognition system in which each word is tagged with other (O) or one of four entity types: Person (PER), Location (LOC), Organization (ORG), and Miscellaneous (MISC). The sentence of EU rejects German call to boycott British lamb . is tagged as B-ORG O B-MISC O O O B-MISC O O, where B-, I- tags indicate beginning and intermediate positions of entities.

图1为RNN结构(Elman, 1990)，其中输入层为x，隐含层为h，输出层为y。在命名实体标签上下文中，x代表输入特征，y代表标签。如图1所示，一个命名实体识别系统，其中每个单词都被标记为其他(O)或四种实体类型之一:Person (PER)、Location (LOC)、Organization (ORG)和杂类(MISC)。句子：欧盟的判决驳回了德国抵制英国羔羊的呼吁。标记为B-org O B-misc O O O B-misc O O O，其中B-、I-标记表示实体的起始位置和中间位置。

An input layer represents features at time t. They could be one-hot-encoding for word feature, dense vector features, or sparse features. An input layer has the same dimensionality as feature size. An output layer represents a probability distribution over labels at time t. It has the same dimensionality as size of labels. Compared to feedforward network, a RNN introduces the connection between the previous hidden state and current hidden state (and thus the recurrent layer weight parameters). This recurrent layer is designed to store history information. The values in the hidden and output layers are computed as follows:

h(t) = f(Ux(t) + Wh(t-1)); (1)

y(t) = g(Vh(t)); (2)

where U, W, and V are the connection weights to be computed in training time, and f(z) and g(z) are sigmoid and softmax activation functions as follows：
$=\frac{1}{1+ e^z}$

$\frac{e^{zm}}{\sum_{k}^{} {e^zk}}$

输入层表示时间t的特征，可以是对单词特征、密集向量特征或稀疏特征进行一次独热编码。输入层的维数与特征维度相同。输出层表示t时刻在标签上的概率分布，它与标签具有相同的维度。与前馈网络相比，RNN引入了前隐状态与当前隐状态之间的联系(从而引入了递归层权值参数)。这个循环层用于存储历史信息。隐藏层和输出层的值计算如下:

h(t) = f(Ux(t) + Wh(t-1)); (1)

y(t) = g(Vh(t)); (2)

其中U、W、V为训练时需要计算的连接权值，f(z)、g(z)为sigmoid、softmax激活函数，如下所示：
$=\frac{1}{1+ e^z}$

$\frac{e^{zm}}{\sum_{k}^{} {e^zk}}$

In this paper, we apply Long Short-Term Memory (Hochreiter and Schmidhuber, 1997; Graves et al., 2005) to sequence tagging. Long ShortTerm Memory networks are the same as RNNs, except that the hidden layer updates are replaced by purpose-built memory cells. As a result, they may be better at finding and exploiting long range dependencies in the data. Fig. 2 illustrates a single LSTM memory cell (Graves et al., 2005). The LSTM memory cell is implemented as the following:

在本文中，我们应用了长短期记忆(Hochreiter and Schmidhuber, 1997;Graves et al.， 2005)到序列标记。除了隐藏层更新被专门构建的记忆单元所替代之外，长期和短期内存网络与rnn是相同的。因此，他们可能更善于发现和利用数据中的长期依赖关系。如图2所示，单个LSTM记忆单元(Graves et al.， 2005)。LSTM记忆单元实现如下:

where σ is the logistic sigmoid function, and i, f, o and c are the input gate, forget gate, output gate and cell vectors, all of which are the same size as the hidden vector h. The weight matrix subscripts have the meaning as the name suggests. For example, W_hi is the hidden-input gate matrix, W_xo is the input-output gate matrix etc. The weight matrices from the cell to gate vectors (e.g. W_ci) are diagonal, so element m in each gate vector only receives input from element m of the cell vector.

σ是sigmoid函数.i,f,o和c是和隐藏向量h一样大小的输入门，遗忘门，输出门和单元向量。权重矩阵的下标顾名思义。例如，W_hi是隐藏-输入门矩阵，W_xo是输入-输出门矩阵等。从单元到门向量(如W_ci)的权矩阵是对角线的，所以每个门向量中的元素m只接收来自单元向量的元素m的输入。

Fig. 3 shows a LSTM sequence tagging model which employs aforementioned LSTM memory cells (dashed boxes with rounded corners).

图3所示的LSTM序列标记模型采用了前面提到的LSTM内存单元(虚线框和圆角)。

2.2 Bidirectional LSTM Networks

In sequence tagging task, we have access to both past and future input features for a given time, we can thus utilize a bidirectional LSTM network (Figure 4) as proposed in (Graves et al., 2013). In doing so, we can efficiently make use of past features (via forward states) and future features (via backward states) for a specific time frame. We train bidirectional LSTM networks using backpropagation through time (BPTT)(Boden., 2002). The forward and backward passes over the unfolded network over time are carried out in a similar way to regular network forward and backward passes, except that we need to unfold the hidden states for all time steps. We also need a special treatment at the beginning and the end of the data points. In our implementation, we do forward and backward for whole sentences and we only need to reset the hidden states to 0 at the begging of each sentence. We have batch implementation which enables multiple sentences to be processed at the same time .

在序列标记任务中，我们可以在给定的时间内同时获得过去和未来的输入特征，因此我们可以利用双向LSTM网络。在此过程中，我们可以在特定的时间框架内有效地利用过去的特性(通过向前状态)和将来的特性(通过向后状态)。我们使用时间反向传播来训练双向LSTM网络。随着时间的推移，对展开网络的向前和向后传递与常规网络的向前和向后传递类似，只是我们需要对所有时间步骤展开隐藏状态。我们还需要在数据点的开始和结束处进行特殊处理。在我们的实现中，我们对整个句子进行前向和反向操作，我们只需要在每个句子请求时将隐藏状态重置为0。我们有批处理实现，可以同时处理多个句子。

2.3 CRF networks

There are two different ways to make use of neighbor tag information in predicting current tags. The first is to predict a distribution of tags for each time step and then use beam-like decoding to find optimal tag sequences. The work of maximum entropy classifier (Ratnaparkhi, 1996) and Maximum entropy Markov models (MEMMs) (McCallum et al., 2000) fall in this category. The second one is to focus on sentence level instead of individual positions, thus leading to Conditional Random Fields (CRF) models (Lafferty et al., 2001) (Fig. 5). Note that the inputs and outputs are directly connected, as opposed to LSTM and bidirectional LSTM networks where memory cells/recurrent components are employed .

在预测当前标签时，有两种可以利用相邻标记信息的不同方法。第一种是预测每一步的标签分布，然后使用波束解码来寻找最优的标签序列。最大熵分类器(Ratnaparkhi, 1996)和最大熵马尔可夫模型(MEMMs) (McCallum et al.， 2000)都属于这一类。第二种是注重句子层次而不是单个位置，从而引入条件随机字段(CRF)模型(Lafferty et al.， 2001)(图5)。注意，输入和输出是直接连接的，而不是使用记忆细胞/重复成分的LSTM和双向LSTM网络。

2.4 LSTM-CRF networks

We combine a LSTM network and a CRF network to form a LSTM-CRF model, which is shown in Fig. 6. This network can efficiently use past input features via a LSTM layer and sentence level tag information via a CRF layer. A CRF layer is represented by lines which connect consecutive output layers. A CRF layer has a state transition matrix as parameters. With such a layer, we can efficiently use past and future tags to predict the current tag, which is similar to the use of past and future input features via a bidirectional LSTM network. We consider the matrix of scores f $_θ$ ( $[x]{^T_1}$ ) are output by the network. We drop the input $[x]{^T_1}$ for notation simplification. The element [f $_θ$ ]_i;t of the matrix is the score output by the network with parameters θ, for the sentence $[x]{^T_1}$ and for the i-th tag, at the t-th word. We introduce a transition score [A]_i;j to model the transition from i-th state to jth for a pair of consecutive time steps. Note that this transition matrix is position independent. We now denote the new parameters for our network as θ~= θ $\bigcup$ {[A]_i;j $\forall$ i,j} The score of a sentence $[x]{^T_1}$ along with a path of tags $[i]{^T_1}$ is then given by the sum of transition scores and network scores:

我们将LSTM网络与CRF网络相结合，形成LSTM-CRF模型，如图6所示。该网络可以通过LSTM层有效地利用过去的输入特征，通过CRF层有效地利用句子级标签信息。CRF层由连接连续输出层的线表示。 CRF层有一个状态转换矩阵作为参数。通过这样的一层，我们可以有效地使用过去和未来的标签来预测当前的标签，这类似于通过双向LSTM网络使用过去和未来的输入特征。我们认为分数的矩阵 f $_θ$ ( $[x]{^T_1}$ ) 是由网络输出的。为了简化符号，我们去掉输入 $[x]{^T_1}$ 。这个矩阵的元素[f $_θ$ ]_i;t 是句子 $[x]{^T_1}$ ，第i个标签，第t个词通过带参数 θ的网络计算的分数输出。我们引入一个转换得分[A]_i;j来为一对连续时间步骤从第i状态到第j状态的转换建模。注意这个转换矩阵是位置无关的。现在，我们将我们网络的新参数表示为一个句子 $[x]{^T_1}$ 的沿着标签 $[i]{^T_1}$ 的路径对的分数，这个分数是转移矩阵分数和网络分数之和：

The dynamic programming (Rabiner, 1989) can be used efficiently to compute [A]_i;j and optimal tag sequences for inference. See (Lafferty et al., 2001) for details.

动态规划(Rabiner, 1989)可以有效地计算[A]_i;j 和用于推理的最优标签序列。详见(Lafferty et al.， 2001)

2.5 BI-LSTM-CRF networks

Similar to a LSTM-CRF network, we combine a bidirectional LSTM network and a CRF network to form a BI-LSTM-CRF network (Fig. 7). In addition to the past input features and sentence level tag information used in a LSTM-CRF model, a BILSTM-CRF model can use the future input features. The extra features can boost tagging accuracy as we will show in experiments.

与LSTM-CRF网络相似，我们将双向的LSTM网络与CRF网络相结合，形成一个BI-LSTM-CRF网络(图7)。正如我们将在实验中展示的那样，这些额外的特性可以提高标记的准确性。

3.Training procedure

All models used in this paper share a generic SGD forward and backward training procedure. We choose the most complicated model, BI-LSTMCRF, to illustrate the training algorithm as shown in Algorithm 1. In each epoch, we divide the whole training data to batches and process one batch at a time. Each batch contains a list of sentences which is determined by the parameter of batch size. In our experiments, we use batch size of 100 which means to include sentences whose total length is no greater than 100. For each batch, we first run bidirectional LSTM-CRF model forward pass which includes the forward pass for both forward state and backward state of LSTM. As a result, we get the the output score fθ( $[x]{^T_1}$ ) for all tags at all positions. We then run CRF layer forward and backward pass to compute gradients for network output and state transition edges. After that, we can back propagate the errors from the output to the input, which includes the backward pass for both forward and backward states of LSTM. Finally we update the network parameters which include the state transition matrix [A]_i;j $\forall$ i,j, and the original bidirectional LSTM parameters θ.

本文使用的所有模型都共享一个通用的SGD向前和向后训练过程。我们选择最复杂的模型BI-LSTMCRF来说明训练算法，如算法1所示。在每个阶段，我们将所有的训练数据分成批，一次处理一批。每批包含一个由批大小参数决定的句子列表。在我们的实验中，我们使用批处理大小为100，这意味着包含总长度不大于100的句子。对于每批，我们首先运行双向LSTM- crf模型向前传递，包括LSTM向前状态和向后状态的向前传递。结果，我们得到了所有位置的所有标签的输出分数。然后，我们向前和向后运行CRF层来计算网络输出和状态转换边缘的梯度。在这之后，我们可以将误差从输出传播到输入，这包括LSTM向前和向后状态的向后传递。最后，我们更新网络参数，包括状态转移矩阵[A]_i;j $\forall$ i,j和双向LSTM的的初始参数 θ。

4.Experiments

4.1 Data

We test LSTM, BI-LSTM, CRF, LSTM-CRF, and BI-LSTM-CRF models on three NLP tagging tasks: Penn TreeBank (PTB) POS tagging, CoNLL 2000 chunking, and CoNLL 2003 named entity tagging. Table 1 shows the size of sentences, tokens, and labels for training, validation and test sets respectively.

我们在三个NLP标记任务:Penn TreeBank (PTB) POS标记、CoNLL 2000分块和CoNLL 2003命名实体标记上测试LSTM、BI-LSTM、CRF、LSTM-CRF和BI-LSTM-CRF模型。表1分别显示了用于培训、验证和测试集的句子、令牌和标签的大小。

POS assigns each word with a unique tag that indicates its syntactic role. In chunking, each word is tagged with its phrase type. For example, tag B-NP indicates a word starting a noun phrase. In NER task, each word is tagged with other or one of four entity types: Person, Location, Organization, or Miscellaneous. We use the BIO2 annotation standard for chunking and NER tasks.

POS为每个单词分配一个惟一的标记，该标记指示其语法角色。在分块中，每个单词都有其短语类型标记。例如，标签B-NP表示一个单词开始于一个名词短语。在NER任务中，每个单词都被标记为其他或四种实体类型之一:Person、Location、Organization或杂类。我们使用BIO2注释标准进行分块和NER任务。

4.2 Features

We extract the same types of features for three data sets. The features can be grouped as spelling features and context features. As a result, we have 401K, 76K, and 341K features extracted for POS, chunking and NER data sets respectively. These features are similar to the features extracted from Stanford NER tool (Finkel et al., 2005; Wang and Manning, 2013). Note that we did not use extra data for POS and chunking tasks, with the exception of using Senna embedding (see Section 4.2.3). For NER task, we report performance with spelling and context features, and also incrementally with Senna embedding and Gazetteer features。

我们为三个数据集提取相同类型的特征。这些特征可以分为拼写特征和上下文特征。因此，我们分别提取了POS、分块和NER数据集的401K、76K和341K特征。这些特征与斯坦福NER工具提取的特征相似(Finkel et al.， 2005;Wang and Manning, 2013)。注意，除了使用Senna内嵌之外，对于POS和分块任务，我们没有使用额外的数据(参见4.2.3节)。对于NER任务，我们使用拼写和上下文特征来报告性能，同时也使用Senna嵌入和Gazetteer特征逐步报告性能。

4.2.1 Spelling features

We extract the following features for a given word in addition to the lower case word features.

• whether start with a capital letter

• whether has all capital letters

• whether has all lower case letters

• whether has non initial capital letters

• whether mix with letters and digits

• whether has punctuation

• letter prefixes and suffixes (with window size of 2 to 5)

• whether has apostrophe end (’s)

• letters only, for example, I. B. M. to IBM

• non-letters only, for example, A. T. &T. to …&

• word pattern feature, with capital letters, lower case letters, and digits mapped to ‘A’, ‘a’ and ‘0’ respectively, for example, D56y-3 to A00a-0

• word pattern summarization feature, similar to word pattern feature but with consecutive identical characters removed. For example, D56y-3 to A0a-0

除了小写的单词特征外，我们还提取给定单词的下列特征。

是否以大写字母开头
是否有所有大写字母
是否有所有小写字母
是否有非首字母大写
是否字母和数字混合
是否有标点符号
字母前缀和后缀(窗口大小为2到5)
是否有撇号结尾(’ s)
非字母，例如 A. T. &T. to …&
单词模式特征，大写字母、小写字母和数字分别映射到“A”、“A”和“0”，例如，d56 -3到A00a-0
词型总结特征，类似于词型特征，但去掉了连续的相同字符。例如，d56 -3到A0a-0

4.2.2 Context features

For word features in three data sets, we use unigram features and bi-grams features. For POS features in CoNLL2000 data set and POS & CHUNK features in CoNLL2003 data set, we use unigram, bi-gram and tri-gram features.

对于三组数据中的词特征，我们分别使用了单字特征和双字特征。对于CoNLL2000数据集中的POS特征和CoNLL2003数据集中的POS & CHUNK特征，我们分别使用了unigram、bi-gram和triple -gram特征。

4.2.3 Word embedding

It has been shown in (Collobert et al., 2011) that word embedding plays a vital role to improve sequence tagging performance. We downloaded the embedding which has 130K vocabulary size and each word corresponds to a 50-dimensional embedding vector. To use this embedding, we simply replace the one hot encoding word representation with its corresponding 50-dimensional vector.

研究表明，词嵌入对于提高序列标记性能起着至关重要的作用。我们下载了包含130K词汇量的嵌入，每个单词对应一个50维的嵌入向量。为了使用这种嵌入，我们简单地用对应的50维向量替换一个独热编码词表示。

4.2.4 Features connection tricks

We can treat spelling and context features the same as word features. That is, the inputs of networks include both word, spelling and context features. However, we find that direct connections from spelling and context features to outputs accelerate training and they result in very similar tagging accuracy. Fig. 8 illustrates this network in which features have direct connections to outputs of networks. We will report all tagging accuracy using this connection. We note that this usage of features has the same flavor of Maximum Entropy features as used in (Mikolov et al., 2011). The difference is that features collision may occur in (Mikolov et al., 2011) as feature hashing technique has been adopted. Since the output labels in sequence tagging data sets are less than that of language model (usually hundreds of thousands), we can afford to have full connections between features and outputs to avoid potential feature collisions.

我们可以将拼写和上下文特征视为单词特征。也就是说，网络的输入包括单词、拼写和上下文特征。然而，我们发现从拼写和上下文特征到输出的直接联系加速了训练，并且导致了非常相似的标记准确性。图8所示为特征与网络输出直接连接的网络。我们将使用这个连接报告所有标注的准确性。我们注意到，这种特征的使用与(Mikolov et al.， 2011)中使用的最大熵特征相同。不同之处在于(Mikolov et al.， 2011)由于采用了特征哈希技术，可能会发生特征碰撞(feature collision)。由于序列标签数据集的输出标签比语言模型的输出标签少(通常是几十万)，因此我们可以在特征和输出之间建立完整的连接，避免潜在的特征冲突。

4.3 Results

We train LSTM, BI-LSTM, CRF, LSTM-CRF and BI-LSTM-CRF models for each data set. We have two ways to initialize word embedding: Random and Senna. We randomly initialize the word embedding vectors in the first category, and use Senna word embedding in the second category. For each category, we use identical feature sets, thus different results are solely due to different networks. We train models using training data and monitor performance on validation data. As chunking data do not have a validation data set, we use part of training data for validation purpose .

我们为每个数据集训练LSTM、BI-LSTM、CRF、LSTM-CRF和BI-LSTM-CRF模型。我们有两种方法初始化词嵌入： Random 和 Senna.我们在第一类中随机初始化词嵌入向量，在第二类中使用Senna词嵌入。对于每个类别，我们使用相同的特征集，因此不同的结果完全是由于不同的网络。我们使用训练数据对模型进行训练，并用验证集对模型验证。由于分块数据没有验证数据集，因此我们使用部分训练数据进行验证。

We use a learning rate of 0.1 to train models. We set hidden layer size to 300 and found that model performance is not sensitive to hidden layer sizes. The training for three tasks require less than 10 epochs to converge and it in general takes less than a few hours. We report models’ performance on test datasets in Table 2, which also lists the best results in (Collobert et al., 2011), denoted as Conv-CRF. The POS task is evaluated by computing per-word accuracy, while the chunk and NER tasks are evaluated by computing F1 scores over chunks.

我们使用0.1的学习率来训练模型。我们将隐层大小设置为300，发现模型性能对隐层大小并不敏感。三种任务的训练需要不到10个阶段的时间来收敛，通常需要不到几个小时。我们在表2中报告了模型在测试数据集上的性能，表2也列出了(Collobert et al.， 2011)中的最佳结果，记为Conv-CRF。POS任务通过计算每个单词的准确性进行评估，而chunk和NER任务通过计算chunk上的F1分数进行评估。

4.3.1 Comparison with Cov-CRF networks

We have three baselines: LSTM, BI-LSTM and CRF. LSTM is the weakest baseline for all three data sets. The BI-LSTM performs close to CRF on POS and chunking datasets, but is worse than CRF on NER data set. The CRF forms strong baselines in our experiments. For random category, CRF models outperform Conv-CRF models for all three data sets. For Senna category, CRFs outperform Conv-CRF for POS task, while underperform for chunking and NER task. LSTM-CRF models outperform CRF models for all data sets in both random and Senna categories. This shows the effectiveness of the forward state LSTM component in modeling sequence data. The BI-LSTMCRF models further improve LSTM-CRF models and they lead to the best tagging performance for all cases except for POS data at random category, in which LSTM-CRF model is the winner. The numbers in parentheses for CoNLL 2003 under Senna categories are generated with Gazetteer features.

我们有三个基线:LSTM、BI-LSTM和CRF。LSTM是三个数据集最弱的基线。BI-LSTM在POS和分块数据集上的性能接近CRF，但在NER数据集上的性能较差。在我们的实验中，CRF形成了较强的基线。对于随机类别，CRF模型在三个数据集上都优于Conv-CRF模型。在Senna类别中，crf在POS任务上优于Conv-CRF，而在分块和NER任务上表现不佳。LSTM-CRF模型在随机和Senna类别的所有数据集上都优于CRF模型。这说明了正向状态LSTM组件在序列数据建模中的有效性。BI-LSTMCRF模型进一步改进了LSTM-CRF模型，使得LSTM-CRF模型在所有情况下的标签性能都是最好的，除了随机分类的POS数据，其中LSTM-CRF模型是赢家。在Senna类别下，CoNLL 2003括号中的数字是根据Gazetteer功能生成的。

It is interesting that our best model BI-LSTMCRF has less dependence on Senna word embedding compared to Conv-CRF model. For example, the tagging difference between BI-LSTMCRF model for random and Senna categories are 0.12%, 0.33%, and 4.57% for POS, chunking and NER data sets respectively. In contrast, the ConvCRF model heavily relies on Senna embedding to get good tagging accuracy. It has the tagging difference of 0.92%, 3.99% and 7.20% between random and Senna category for POS, chunking and NER data sets respectively.

有趣的是，与Conv-CRF模型相比，我们的最佳模型BI-LSTMCRF对Senna词嵌入的依赖更小。例如，对于随机类别和Senna类别，BI-LSTMCRF模型的标记差异分别为0.12%、0.33%和4.57%，对于POS、chunking和NER数据集的标记差异分别为0.12%、0.33%和4.57%。而ConvCRF模型在很大程度上依赖于Senna的嵌入来获得良好的标注精度。对于POS、分块和NER数据集，random和Senna类别的标签差异分别为0.92%、3.99%和7.20%。

4.3.2 Model robustness

To estimate the robustness of models with respect to engineered features (spelling and context features), we train LSTM, BI-LSTM, CRF, LSTMCRF, and BI-LSTM-CRF models with word features only (spelling and context features removed). Table 3 shows tagging performance of proposed models for POS, chunking, and NER data sets using Senna word embedding. The numbers in parentheses indicate the performance degradation compared to the same models but using spelling and context features. CRF models’ performance is significantly degraded with the removal of spelling and context features. This reveals the fact that CRF models heavily rely on engineered features to obtain good performance. On the other hand, LSTM based models, especially BI-LSTM and BI-LSTM-CRF models are more robust and they are less affected by the removal of engineering features. For all three tasks, BI-LSTM-CRF models result in the highest tagging accuracy. For example, It achieves the F1 score of 94.40 for CoNLL2000 chunking, with slight degradation (0.06) compared to the same model but using spelling and context features.

为了评估模型相对于工程特性(拼写和上下文特性)的鲁棒性，我们对LSTM、BI-LSTM、CRF、LSTMCRF和BI-LSTM-CRF模型进行了培训，这些模型只包含单词特性(删除了拼写和上下文特性)。表3显示了使用Senna词嵌入的POS、分块和NER数据集模型的标记性能。括号中的数字表明，与使用拼写和上下文特性的相同模型相比，性能有所下降。CRF模型的性能随着拼写和上下文特征的去除而显著下降。这揭示了CRF模型严重依赖工程特性来获得良好性能的事实。另一方面，基于LSTM的模型，尤其是BI-LSTM和BI-LSTM- crf模型更健壮，不受工程特性的影响。对于这三个任务，BI-LSTM-CRF模型的标记精度最高。例如CoNLL2000 chunking的F1得分为94.40，与相同的模型相比，在使用拼写和上下文特征的情况下，有轻微的下降(0.06)。

4.3.3 Comparison with existing systems

For POS data set, we achieved state of the art tagging accuracy with or without the use of extra data resource. POS data set has been extensively tested and the past improvement can be realized in Table 4. Our test accuracy is 97.55% which is significantly better than others in the confidence level of 95%. In addition, our BI-LSTM-CRF model already reaches a good accuracy without the use of the Senna embedding.

对于POS数据集，无论是否使用额外的数据资源，我们都实现了最先进的标注精度。POS数据集经过了广泛的测试，过去的改进如表4所示。我们的测试准确率为97.55%，在95%的置信度水平上显著优于其他测试。此外，我们的BI-LSTM-CRF模型在不使用Senna嵌入的情况下已经达到了较好的精度。

All chunking systems performance is shown in table 5. Kudo et al. won the CoNLL 2000 challenge with a F1 score of 93.48%. Their approach was a SVM based classifier. They later improved the results up to 93.91%. Recent work include the CRF based models (Sha and Pereira, 2003; Mcdonald et al., 2005; Sun et al., 2008). More recent is (Shen and Sarkar, 2005) which obtained 95.23% accuracy with a voting classifier scheme, where each classifier is trained on different tag representations (IOB, IOE, etc.). Our model outperforms all reported systems except (Shen and Sarkar, 2005).

所有分块系统的性能如表5所示。Kudo等人以93.48%的F1成绩赢得了CoNLL 2000挑战赛。他们的方法是基于SVM的分类器。他们后来将结果提高到了93.91%。最近的工作包括基于CRF的模型(Sha和Pereira, 2003;Mcdonald et al.， 2005;Sun等人，2008)。最近的是(Shen and Sarkar, 2005)，他们使用投票分类器方案获得了95.23%的准确率，每个分类器在不同的标签表示(IOB, IOE等)上进行训练。除了(Shen and Sarkar, 2005)，我们的模型优于所有报告的系统。

The performance of all systems for NER is shown in table 6. (Florian et al., 2003) presented the best system at the NER CoNLL 2003 challenge, with 88.76% F1 score. They used a combination of various machine-learning classifiers. The second best performer of CoNLL 2003 (Chieu., 2003) was 88.31% F1, also with the help of an external gazetteer. Later, (Ando and Zhang., 2005) reached 89.31% F1 with a semi-supervised approach. The best F1 score of 90.90% was reported in (Passos et al., 2014) which employed a new form of learning word embeddings that can leverage information from relevant lexicons to improve the representations. Our model can achieve the best F1 score of 90.10 with both Senna embedding and gazetteer features. It has a lower F1 score than (Passos et al., 2014) , which may be due to the fact that different word embeddings were employed. With the same Senna embedding, BI-LSTM-CRF slightly outperforms Conv-CRF (90.10% vs. 89.59%). However, BI-LSTM-CRF significantly outperforms Conv-CRF (84.26% vs. 81.47%) if random embedding is used.

NER的所有系统的性能如表6所示。(Florian et al.， 2003)在2003年NER CoNLL挑战赛中以88.76%的F1成绩获得最佳系统。他们使用了各种机器学习分类器的组合。2003年CoNLL (Chieu., 2003)第二名最好的性能为则为88.31% F1，也是在外部gazetteer的协助下。后来， (Ando and Zhang., 2005) ，在半监督方法下，达到89.31% F1。F1得分最高的是(Passos et al.， 2014)达到了90.90%，采用了一种新的学习嵌入词的形式——可以利用相关词汇的信息来改善表征。我们的模型结合了Senna嵌入和gazetteer的特征，可以获得90.10的F1最好成绩。F1得分低于(Passos et al.， 2014)，这可能是因为使用了不同的单词嵌入。在相同的Senna嵌入情况下，BI-LSTM-CRF的表现略优于Conv-CRF (90.10% vs. 89.59%)。然而，如果使用随机嵌入，BI-LSTM-CRF显著优于Conv-CRF (84.26% vs. 81.47%)。

5.Discussions

Our work is close to the work of (Collobert et al., 2011) as both of them utilized deep neural networks for sequence tagging. While their work used convolutional neural networks, ours used bidirectional LSTM networks.

我们的工作与(Collobert et al.， 2011)的工作非常接近，因为都使用深层神经网络进行序列标记。他们的工作使用卷积神经网络，而我们的工作使用双向LSTM网络。

Our work is also close to the work of (Hammerton, 2003; Yao et al., 2014) as all of them employed LSTM network for tagging. The performance in (Hammerton, 2003) was not impressive. The work in (Yao et al., 2014) did not make use of bidirectional LSTM and CRF layers and thus the tagging accuracy may be suffered.

我们的工作也与(Hammerton, 2003;Yao等人，2014)非常接近，因为都使用LSTM网络进行标记。(Hammerton, 2003)的表现并不令人印象深刻。(Yao et al.， 2014)的工作没有使用双向LSTM和CRF层，可能会影响标签的准确性。

Finally, our work is related to the work of (Wang and Manning, 2013) which concluded that non-linear architecture offers no benefits in a highdimensional discrete feature space. We showed that with the bi-directional LSTM CRF model, we consistently obtained better tagging accuracy than a single CRF model with identical feature sets.

最后，我们的工作与(Wang and Manning, 2013)的工作有关，他们得出结论，非线性建筑在高维离散特征空间中没有优势。结果表明，在双向LSTM CRF模型中，我们的标记精度始终优于单一特征集相同的CRF模型。

6.Conclusions

In this paper, we systematically compared the performance of LSTM networks based models for sequence tagging. We presented the first work of applying a BI-LSTM-CRF model to NLP benchmark sequence tagging data. Our model can produce state of the art (or close to) accuracy on POS, chunking and NER data sets. In addition, our model is robust and it has less dependence on word embedding as compared to the observation in (Collobert et al., 2011). It can achieve accurate tagging accuracy without resorting to word embedding.

本文系统地比较了基于LSTM网络的序列标记模型的性能。我们提出了将BI-LSTM-CRF模型应用于NLP基准序列标记数据的第一个工作。我们的模型可以在POS、分块和NER数据集上产生最先进(或接近)的准确性。此外，我们的模型是稳健的，与(Collobert et al.， 2011)中的观察相比，它对嵌入词的依赖更少。它不需要嵌入词就可以实现精确的标注精度。

（注：若有错误希望大家指出！）

你可能感兴趣的:(ML)

QT6.5+qt-quick学习笔记 m0_63052064 学习
为什么用QMLQML是一种声明式语言，这意味着开发者只需要描述用户界面的外观和行为，而不需要关心具体的实现细节。这种方式减少了代码量，使得界面设计更加直观和高效。QML提供了丰富的UI组件和动画效果，开发者可以快速创建出现代化、用户友好的应用程序QML基于JavaScript并且与JavaScript的结合使得创建交互式和动画效果变得简单且高效。开发以Debug方式可以按步运行，调试；releas
CTF-web: YAML是什么 A5rZ 网络安全
YAML（YAMLAin’tMarkupLanguage）是一种常见的序列化数据格式，主要用于配置文件和数据交换。它的设计目标是简洁、易读，并且易于与编程语言交互。YAML使用缩进来表示层次结构，类似于Python的语法。：基本语法结构键值对：YAML中最基本的结构是键值对，用于表示映射（类似于Python的字典）。name:JohnDoeage:30列表：用破折号（-）表示列表项。items:-
前端25.1.26学习记录 RlTED 前端学习 javascript
DOM是给予如：JS编程语言使用的apiDOM是树状结构根节点是html节点是DOM的最小单位。等Node类型可以类比于java的object类，所有的节点都需要继承TA。TA提供了最基本的操作方法如：访问父、子、同级节点增加和删除节点。Document类型Document类指向整个HTML界面。操作的是html页面中的节点。Document可以通过class、tag（标签）、id对节点进行查询D
苍穹外卖使用MyBatis-Plus_P2 醒了就刷牙面试 mybatis
系列博客目录文章目录系列博客目录导致了Swagger没法用了并且导致拦截器不可以使用了导致了Swagger没法用了并且导致拦截器不可以使用了sky-take-out的pom.xml修改如下-->com.github.xiaoymin-->knife4j-spring-boot-starter-->${knife4j}-->-->com.github.xiaoyminknife4j-openapi3
25.1.13 前端 Vue组件十、插槽slot best_virtuoso 前端前端 vue.js javascript
我们已经了解到了组件能够接受任意类型的js值作为props，但组件要如何接收模板内容(html信息)呢？在某些场景中，可能想要为子组件传递一些模板片段（div，a标签等），让子组件在它们的组件中渲染这些片段例：插槽标题插槽内容importSlotsBasefrom"./components/SlotsBase.vue";exportdefault{components:{SlotsBase}}——
Java学习 - Spring Boot整合 Thymeleaf 实例泡芙萝莉酱 Java java 学习 spring boot
什么是ThymeleafThymeleaf是新一代的Java模板引擎，类似于Velocity、FreeMarker等传统引擎，其语言和HTML很接近，而且扩展性更高；Thymeleaf的主要目的是将优雅的模板引入开发工作流程中，并将HTML在浏览器中正确显示。同时能够作为静态引擎，让开发成员之间更方便协作开发；SpringBoot官方推荐使用模板，而且SpringBoot也为Thymeleaf提供
python 建立并使用 venv 波格斯特问题备忘 python 开发语言
python建立并使用venv[smf@5GC-10mmlShell]$python3-mvenv./.venv[smf@5GC-10mmlShell]$source./.venv/bin/activate(.venv)[smf@5GC-10mmlShell]$(.venv)[smf@5GC-10mmlShell]$(.venv)[smf@5GC-10mmlShell]$pip3installre
微信小程序web-view嵌入h5页面，分享当前页木本心声微信小程序微信小程序
①index.wxml为web-view标签添加bindload事件web-viewbindload|微信开放文档②index.js页面加载成功时会触发bindload事件绑定的方法此方法可以获取当前页面的地址getCurUrl(e){varshareUrl=e.detail.src;this.setData({shareUrl:shareUrl,})},③index.js把地址赋值给变量shar
前端请求gzip，响应里却没有压缩返回 a212121212121
前端请求gzip，响应里却没有压缩返回前后端分离的框架，前端vue后端的springboot都开启了gzip。前端请求Accept-Encoding:gzip,deflate后端却没有响应。数据没有压缩。试过了好几个浏览器都没有用。后来以为是yml格式的问题，也没有效果。有经验的能告诉我是怎么回事吗?yml添加的代码.（两种格式）compression:enabled:truemin-respon
使用Vue3来实现一个倒计时器以及倒计时任务记得开心一点嘛 JavaWeb前端开发技术栈 vue.js elementui javascript vue Html
本内容使用Vue3，以及element-plus辅助开发。首先展示倒计时器的功能：手动设置倒计时器的倒计时时间开始倒计时按钮暂停倒计时按钮重新开始倒计时按钮其次展示倒计时任务管理界面功能：创建倒计时任务选择任务并进行倒计时删除任务目录一.倒计时器：1.html：2.script：（1）状态变量：（2）格式化显示时间：（3）初始化倒计时：（4）开始按钮startCountdown->启动倒计时器：（
知网爬虫，作者、摘要、题目、发表期刊等主要内容的获取大懒猫软件爬虫
爬取知网内容的详细过程爬取知网内容需要考虑多个因素，包括网站的结构、反爬虫机制等。以下是一个详细的步骤和代码实现，帮助你使用Python爬取知网上的论文信息，包括作者、摘要、题目、发表期刊等主要内容。1.数据准备首先，需要准备一些基础数据，如知网的URL、请求头等。2.模型构建使用requests库发送HTTP请求，使用BeautifulSoup库解析HTML内容。如果遇到动态加载的内容，可以使用
html2canvas导出pdf模糊,vue实现pdf导出，解决生成canvas模糊等问题轮王寺宫
1、我们要添加两个模块第一个.将页面html转换成图片npminstall--savehtml2canvas第二个.将图片生成pdfnpminstalljspdf--save2、定义全局函数..创建一个htmlToPdf.js文件在指定位置.我个人习惯放在('src/utils/htmlToPdf')//导出页面为PDF格式importhtml2Canvasfrom'html2canvas'imp
从0安装mysql server 追心嵌入式 mysql
安装MySQLServer首先，你需要在Ubuntu上安装MySQL服务器。运行以下命令来安装：sudoaptupdatesudoaptinstallmysql-server安装完成后，MySQL服务会自动启动。你可以通过以下命令检查MySQL服务是否正在运行：sudosystemctlstatusmysql如果MySQL正在运行，你会看到类似于以下的输出：yaml●mysql.service-M
Vue组件开发-使用 html2canvas 和 jspdf 库实现PDF文件导出设置页面大小及方向 LCG元前端 vue.js pdf 前端
在Vue项目中实现导出PDF文件、调整文件页面大小和页面方向的功能，使用html2canvas将HTML内容转换为图片，再使用jspdf把图片添加到PDF文件中。以下是详细的实现步骤和代码示例：步骤1：安装依赖首先，在项目中安装html2canvas和jspdf：npminstallhtml2canvasjspdf步骤2：创建Vue组件以下是一个完整的Vue组件示例，包含导出PDF、调整页面大小和
对 Electron 架构的理解北海屿鹿前端 electron 前端
Electron的架构可以分为三层：Chromium、Node.js和应用程序层。Electron是一种基于Chromium和Node.js的开源框架，可以用于快速构建跨平台的桌面应用程序。Chromium层：Chromium是一种开源的浏览器引擎，能够渲染HTML、CSS和JavaScript等web技术栈。在Electron中，Chromium负责绘制应用程序的主窗口和所有的web视图内容，并
Electron----桌面应用开发框架暖阳浅笑-嘿前端框架 electron 桌面应用
Electron是一个开源框架，旨在使用Web技术（如HTML、CSS和JavaScript）开发跨平台的桌面应用。它结合了Chromium和Node.js，使开发者能够构建具有原生应用体验的桌面应用程序，同时使用熟悉的Web开发工具和技术。1.什么是Electron？定义：Electron是一个跨平台的桌面应用开发框架，由GitHub开发并维护。它允许使用Web技术（HTML、CSS和JavaS
深入理解Electron一Electron架构介绍 weixin_43188769 前端框架开发语言
深入理解Electron（一）Electron架构介绍Electron是什么引用来自官网的解释：Electron是一个使用JavaScript、HTML和CSS构建桌面应用程序的框架。通过将Chromium和Node.js嵌入到它的二进制文件中，Electron允许你维护一个JavaScript代码库，并创建可以在Windows、macOS和Linux上运行的跨平台应用程序ーー不需要本地开发经验。
java爬虫jsoup_Java爬虫框架Jsoup学习记录 weixin_33638349
Jsoup的作用当你想获得某网页的内容，可以使用此框架做个爬虫程序，爬某图片网站的图片(先获得图片地址，之后再借助其他工具下载图片)或者是小说网站的小说内容我使用Jsoup写出的一款小说下载器，小说下载器Jsoup导入1.使用gradle导入compile'org.jsoup:jsoup:1.11.1'2.第三方包导入Jsoup使用1.获得Document本地html文件或者使用javaIO流，则
python：随机产生n个数小洁癖Jieie python pytho
列表中有随机产生的10个数importrandomL=[random.randint(0,10)for_inrange(10)]random.randint(0,n)表示随机数的范围为：0到nfor_inrange(n)表示产生n随机数
Electron 框架详解与最新动态海豹工匠 electron javascript 前端
什么是Electron？Electron是由GitHub开发并维护的一个开源框架，旨在使用Web技术（如HTML、CSS和JavaScript）来构建跨平台的桌面应用程序。它嵌入了Chromium和Node.js，使开发者能够使用这些Web技术在桌面环境中构建应用。Electron的核心优势在于其跨平台兼容性，允许开发者编写一次代码，运行在Windows、macOS和Linux平台上。常见的基于E
设计模式：02观察者模式--labview实现奇晓迹 labview 设计模式设计模式观察者模式 uml labview
引言在观察者模式中，一种叫做被观察者的对象维护了观察者对象的集合，当被观察者对象发生改变时候，它会通知观察者。在被观察者对象所维护的观察者集合中，能够添加或者删除观察者。被观察者状态变化能够传递给观察者。这样观察者能够根据被观察者的状态变化做出相应的改变。观察者模式定义了对象之间的链接，当一个对象化的状态发生变化时候，所有依赖的对象会自动更新。观察者UML图LabVIEW类结构图如图，最顶层是AF
python爬虫之bs4解析和xpath解析 A.way30 python 爬虫开发语言 xpath
bs4解析原理:1.实例化一个BeautifulSoup对象，并且将页面源码数据加载到该对象中2.通过调用BeautifulSoup对象中相关的属性或者方法进行标签定位和数据提取如何实例化BeautifulSoup对象:frombs4importBeautifulSoupBeautifulSoup(参数一,参数二)参数一为文件描述符，参数二为解析器，一般为’lxml’一对象的实例化:1.将本地的h
Java简单爬虫 jsoup工具包 ax阿楠 java 爬虫开发语言前端
首先导入一个爬虫的工具包:jsoup-1.13.1.jar//测试爬虫的网址(爬取王者荣耀英雄的网址)staticStringurl="https://pvp.qq.com/web201605/herolist.shtml";//文件存放的地址staticStringpath="D://爬虫测试/";publicstaticvoidgetImgs(Stringurl){//加载对应网址上的Html
Linux: Apache 安全设定 iteye_5904 Ubuntu /Mac /Github /Aptana /Nginx /Shell /Linux 数据库操作系统系统安全
1.AutoIndex预设安装好Apache之后，其预设目录是在/var/www/html/，如果没有设定index.html的话，那么就会印出目前目录里的所有档案和目录，基於安全理由，希望把AutoIndex这个取消，如此在别人打入网址后，就会出现403的存取权限不足，只有在很“明确”的指出档案时才可以浏览。关闭/var/www/html里（含子目录）的自动印出首页功能[root@rhelcon
【算法应用】基于麻雀搜索算法SSA求解车间布局优化问题小O的算法实验室智能算法智能算法应用车间布局优化智能算法应用车间布局优化智能算法
目录1.问题背景2.车间布局数学模型3.麻雀搜索算法SSA原理4.结果展示5.参考文献6.代码获取1.问题背景工厂设施布置的规划一直是工业工程领域不断研究和探索的内容，其中最具代表性之一的是系统布置设计(systemlayoutplanning，SLP)方法。作为一种经典且有效的方法，其为设施布置提供了很好的改善思路，但在长期的发展中也存在一些不可避免的缺点，如计算结果不够精确，很难确保计算结果较
java爬虫工具Jsoup学习 Future_yzx java 爬虫学习
目录前言一、基本使用二、爬取豆瓣电影的案例三、Jsoup能做什么？四、Jsoup相关概念五、Jsoup获取文档六、定位选择元素七、获取数据八、具体案例前言JSoup是一个用于处理HTML的Java库，它提供了一个非常方便类似于使用DOM，CSS和jquery的方法的API来提取和操作数据。一、基本使用org.jsoupjsoup1.13.1二、爬取豆瓣电影的案例publicclassDouBan{
boostrap组件柒染‍ css html5 html
Bootstrap来自Twitter（推特），是目前最受欢迎的前端框架。Bootstrap是基于HTML，css，JavaScript的，它简洁灵活，使得web开发更加快速。框架：顾名思义就是一套架构，它有一套比较完整的网页功能解决方案，而且控制权在框架本身，有预制的样式库，组件和插件。使用者要按照框架所规定的某种规范进行开发。这是我所学到的用CSS基础，所以我要分享给你们，希望可以帮助到你们。组
嵌入式Rust小探 weixin_40437029 rust
说明现在很多项目都在从C++/C迁移到Rust,这是一种内存管理更安全的语言,从根本上可以防止一些指针操作的问题既然如此,那我们也尝试一下如何在嵌入式开发当中使用Rust,然后再慢慢迁移到更多代码https://doc.rust-lang.org/reference/introduction.htmlhttps://zhuanlan.zhihu.com/p/628575325https://www
HTTPS安全版预告页面设计与实施指南魔都财观
本文还有配套的精品资源，点击获取简介：本文深入探讨了网站即将上线时，如何使用HTTPS协议创建一个安全且吸引人的预告页面。HTTPS是HTTP的安全版本，利用SSL/TLS协议提供数据加密、身份验证和数据完整性，对于保护用户隐私和提升网站信任度至关重要。文章详细阐述了构建预告页面时所需的基础HTML、CSS和JavaScript技术，并且讨论了如何配置HTTPS以及进行测试和SEO优化的重要性。1
在 Linux (aarch64) 编译 OpenJDK 8 不搞数学的汤老师 linux 运维服务器
环境信息操作系统：RockyLinux9.4(aarch64)OpenJDK：OpenJDK8u422BootJDK：jdk8u421-linux-aarch64编译OpenJDK需要有一个JDK。解压后当前目录结构如下：/opt/├──jdk1.8.0_421│├──COPYRIGHT│├──LICENSE│├──README.html│├──THIRDPARTYLICENSEREADME.tx
书其实只有三类西蜀石兰类
一个人一辈子其实只读三种书，知识类、技能类、修心类。知识类的书可以让我们活得更明白。类似十万个为什么这种书籍，我一直不太乐意去读，因为单纯的知识是没法做事的，就像知道地球转速是多少一样（我肯定不知道），这种所谓的知识，除非用到，普通人掌握了完全是一种负担，维基百科能找到的东西，为什么去记忆？知识类的书，每个方面都涉及些，让自己显得不那么没文化，仅此而已。社会认为的学识渊博，肯定不是站在
《TCP/IP 详解，卷1：协议》学习笔记、吐槽及其他 bylijinnan tcp
《TCP/IP 详解，卷1：协议》是经典，但不适合初学者。它更像是一本字典，适合学过网络的人温习和查阅一些记不清的概念。这本书，我看的版本是机械工业出版社、范建华等译的。这本书在我看来，翻译得一般，甚至有明显的错误。如果英文熟练，看原版更好： http://pcvr.nl/tcpip/ 下面是我的一些笔记，包括我看书时有疑问的地方，也有对该书的吐槽，有不对的地方请指正： 1.
Linux—— 静态IP跟动态IP设置 eksliang linux IP
一.在终端输入 vi /etc/sysconfig/network-scripts/ifcfg-eth0 静态ip模板如下： DEVICE="eth0" #网卡名称 BOOTPROTO="static" #静态IP（必须） HWADDR="00:0C:29:B5:65:CA" #网卡mac地址 IPV6INIT=&q
Informatica update strategy transformation 18289753290
更新策略组件：标记你的数据进入target里面做什么操作，一般会和lookup配合使用，有时候用0,1,1代表 forward rejected rows被选中，rejected row是输出在错误文件里，不想看到reject输出，将错误输出到文件，因为有时候数据库原因导致某些column不能update，reject就会output到错误文件里面供查看，在workflow的
使用Scrapy时出现虽然队列里有很多Request但是却不下载，造成假死状态酷的飞上天空 request
现象就是：程序运行一段时间，可能是几十分钟或者几个小时，然后后台日志里面就不出现下载页面的信息，一直显示上一分钟抓取了0个网页的信息。刚开始已经猜到是某些下载线程没有正常执行回调方法引起程序一直以为线程还未下载完成，但是水平有限研究源码未果。经过不停的google终于发现一个有价值的信息，是给twisted提出的一个bugfix 连接地址如下http://twistedmatrix.
利用预测分析技术来进行辅助医疗蓝儿唯美医疗
2014年，克利夫兰诊所（Cleveland Clinic）想要更有效地控制其手术中心做膝关节置换手术的费用。整个系统每年大约进行2600例此类手术，所以，即使降低很少一部分成本，都可以为诊所和病人节约大量的资金。为了找到适合的解决方案，供应商将视野投向了预测分析技术和工具，但其分析团队还必须花时间向医生解释基于数据的治疗方案意味着什么。克利夫兰诊所负责企业信息管理和分析的医疗
java 线程(一)：基础篇 DavidIsOK java 多线程线程
&nbs
Tomcat服务器框架之Servlet开发分析 aijuans servlet
最近使用Tomcat做web服务器，使用Servlet技术做开发时，对Tomcat的框架的简易分析：疑问：为什么我们在继承HttpServlet类之后，覆盖doGet(HttpServletRequest req, HttpServetResponse rep)方法后，该方法会自动被Tomcat服务器调用，doGet方法的参数有谁传递过来？怎样传递？分析之我见： doGet方法的
揭秘玖富的粉丝营销之谜与小米粉丝社区类似 aoyouzi 揭秘玖富的粉丝营销之谜
玖富旗下悟空理财凭借着一个微信公众号上线当天成交量即破百万，第七天成交量单日破了1000万;第23天时，累计成交量超1个亿……至今成立不到10个月，粉丝已经超过500万，月交易额突破10亿，而玖富平台目前的总用户数也已经超过了1800万，位居P2P平台第一位。很多互联网金融创业者慕名前来学习效仿，但是却鲜有成功者，玖富的粉丝营销对外至今仍然是个谜。　　近日，一直坚持微信粉丝营销
Java web的会话跟踪技术百合不是茶 url会话 Cookie会话 Seession会话 Java Web 隐藏域会话
会话跟踪主要是用在用户页面点击不同的页面时,需要用到的技术点会话:多次请求与响应的过程 1,url地址传递参数,实现页面跟踪技术格式:传一个参数的 url?名=值传两个参数的 url?名=值 &名=值关键代码
web.xml之Servlet配置 bijian1013 java web.xml Servlet配置
定义： <servlet> <servlet-name>myservlet</servlet-name> <servlet-class>com.myapp.controller.MyFirstServlet</servlet-class> <init-param> <param-name>
利用svnsync实现SVN同步备份 sunjing SVN 同步 E000022 svnsync 镜像
1. 在备份SVN服务器上建立版本库 svnadmin create test 2. 创建pre-revprop-change文件 cd test/hooks/ cp pre-revprop-change.tmpl pre-revprop-change 3. 修改pre-revprop-
【分布式数据一致性三】MongoDB读写一致性 bit1129 mongodb
本系列文章结合MongoDB，探讨分布式数据库的数据一致性，这个系列文章包括：数据一致性概述与CAP 最终一致性(Eventually Consistency) 网络分裂(Network Partition)问题多数据中心(Multi Data Center) 多个写者(Multi Writer)最终一致性一致性图表(Consistency Chart) 数据
Anychart图表组件-Flash图转IMG普通图的方法白糖_ Flash
问题背景：项目使用的是Anychart图表组件，渲染出来的图是Flash的，往往一个页面有时候会有多个flash图，而需求是让我们做一个打印预览和打印功能，让多个Flash图在一个页面上打印出来。那么我们打印预览的思路是获取页面的body元素，然后在打印预览界面通过$("body").append(html)的形式显示预览效果，结果让人大跌眼镜：Flash是
Window 80端口被占用 WHY? bozch 端口占用 window
平时在启动一些可能使用80端口软件的时候，会提示80端口已经被其他软件占用，那一般又会有那些软件占用这些端口呢？下面坐下总结： 1、web服务器是最经常见的占用80端口的，例如：tomcat , apache , IIS , Php等等； 2
编程之美-数组的最大值和最小值-分治法（两种形式） bylijinnan 编程之美
import java.util.Arrays; public class MinMaxInArray { /** * 编程之美数组的最大值和最小值分治法 * 两种形式 */ public static void main(String[] args) { int[] t={11,23,34,4,6,7,8,1,2,23}; int[]
Perl正则表达式 chenbowen00 正则表达式 perl
首先我们应该知道 Perl 程序中，正则表达式有三种存在形式，他们分别是：匹配：m/<regexp>;/ （还可以简写为 /<regexp>;/ ，略去 m）替换：s/<pattern>;/<replacement>;/ 转化：tr/<pattern>;/<replacemnt>;
[宇宙与天文]行星议会是否具有本行星大气层以外的权力呢? comsci
举个例子: 地球,地球上由200多个国家选举出一个代表地球联合体的议会,那么现在地球联合体遇到一个问题,地球这颗星球上面的矿产资源快要采掘完了....那么地球议会全体投票,一致通过一项带有法律性质的议案,既批准地球上的国家用各种技术手段在地球以外开采矿产资源和其它资源........ &
Oracle Profile 使用详解 daizj oracle profile 资源限制
Oracle Profile 使用详解转一、目的： Oracle系统中的profile可以用来对用户所能使用的数据库资源进行限制，使用Create Profile命令创建一个Profile，用它来实现对数据库资源的限制使用，如果把该profile分配给用户，则该用户所能使用的数据库资源都在该profile的限制之内。二、条件：创建profile必须要有CREATE PROFIL
How HipChat Stores And Indexes Billions Of Messages Using ElasticSearch & Redis dengkane elasticsearch Lucene
This article is from an interview with Zuhaib Siddique, a production engineer at HipChat, makers of group chat and IM for teams. HipChat started in an unusual space, one you might not
循环小示例，菲波拉契序列，循环解一元二次方程以及switch示例程序 dcj3sjt126com c 算法
# include <stdio.h> int main(void) { int n; int i; int f1, f2, f3; f1 = 1; f2 = 1; printf("请输入您需要求的想的序列："); scanf("%d", &n); for (i=3; i<n; i
macbook的lamp环境 dcj3sjt126com lamp
sudo vim /etc/apache2/httpd.conf /Library/WebServer/Documents 是默认的网站根目录重启Mac上的Apache服务这个命令很早以前就查过了，但是每次使用的时候还是要在网上查：停止服务：sudo /usr/sbin/apachectl stop 开启服务：s
java ArrayList源码下 shuizhaosi888 ArrayList源码
版本 jdk-7u71-windows-x64 JavaSE7 ArrayList源码上：http://flyouwith.iteye.com/blog/2166890 /** * 从这个列表中移除所有c中包含元素 */ public boolean removeAll(Collection<?> c) {
Spring Security（08）——intercept-url配置 234390216 Spring Security intercept-url 访问权限访问协议请求方法
intercept-url配置目录 1.1 指定拦截的url 1.2 指定访问权限 1.3 指定访问协议 1.4 指定请求方法 1.1 &n
Linux环境下的oracle安装 jayung oracle
linux系统下的oracle安装本文档是Linux(redhat6.x、centos6.x、redhat7.x) 64位操作系统安装Oracle 11g(Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production)，本文基于各种网络资料精心整理而成，共享给有需要的朋友。如有问题可联系：QQ：52-7
hotspot虚拟机 leichenlei java HotSpot jvm 虚拟机文档
JVM参数 http://docs.oracle.com/javase/6/docs/technotes/guides/vm/index.html JVM工具 http://docs.oracle.com/javase/6/docs/technotes/tools/index.html JVM垃圾回收 http://www.oracle.com
读《Node.js项目实践：构建可扩展的Web应用》 ——引编程慢慢变成系统化的“砌砖活” noaighost Web node.js
读《Node.js项目实践：构建可扩展的Web应用》 ——引编程慢慢变成系统化的“砌砖活” 眼里的Node.JS 初初接触node是一年前的事，那时候年少不更事。还在纠结什么语言可以编写出牛逼的程序，想必每个码农都会经历这个月经性的问题：微信用什么语言写的？facebook为什么推荐系统这么智能，用什么语言写的？dota2的外挂这么牛逼，用什么语言写的？……用什么语言写这句话，困扰人也是阻碍
快速开发Android应用 rensanning android
Android应用开发过程中，经常会遇到很多常见的类似问题，解决这些问题需要花时间，其实很多问题已经有了成熟的解决方案，比如很多第三方的开源lib，参考 Android Libraries 和 Android UI/UX Libraries。编码越少，Bug越少，效率自然会高。但可能由于根本没听说过、听说过但没用过、特殊原因不能用、自己已经有了解决方案等等原因，这些成熟的解决
理解Java中的弱引用 tomcat_oracle java 工作面试
　不久之前，我面试了一些求职Java高级开发工程师的应聘者。我常常会面试他们说，“你能给我介绍一些Java中得弱引用吗？”，如果面试者这样说，“嗯，是不是垃圾回收有关的？”，我就会基本满意了，我并不期待回答是一篇诘究本末的论文描述。　　然而事与愿违，我很吃惊的发现，在将近20多个有着平均5年开发经验和高学历背景的应聘者中，居然只有两个人知道弱引用的存在，但是在这两个人之中只有一个人真正了
标签输出html标签" target="_blank">关于标签输出html标签 xshdch jsp
http://back-888888.iteye.com/blog/1181202 关于<c:out value=""/>标签的使用，其中有一个属性是escapeXml默认是true(将html标签当做转移字符，直接显示不在浏览器上面进行解析)，当设置escapeXml属性值为false的时候就是不过滤xml，这样就能在浏览器上解析html标签， &nb