《Convolutional Neural Networks for Sentence Classification》阅读笔记

任务:句子级别的分类 用CNN

Word vectors,wherein words are projected from a sparse,1-of-V encoding (here V is the vocabulary size) onto a lower dimensional vector space via a hidden layer, are essentially feature extractors that encode semantic features of words in their dimensions。从一个稀疏的,1-of-V encoding(V是vocabulary size)通过一个hidden layer投影到低维的向量空间,是很关键的特征提取过程,可以encode semantic features语义特征。 在这种dense representations,语义相近的词更可能接近---在欧式距离或cosine距离---in the lower dimensional vector space。

CNN使用layers with convolving filters that applied to local features。本文:训练简单的CNN with one layer of convolution on top of word vectors obtained from an unsupervised neural language model。这些word vectors来自Mikolov et al.(2013)。

首先设置这些word vectors static and 学习模型的其他参数。hyperparameters需要很少的调整,这个模型就可以达到excellent results on multiple benchmarks基准,表示the pre-trained vectors are "universal" feature extractors,可以用于多种分类任务。

Learning task-specific vectors through fine-tuning results in further improvements. 学习任务具体的vectors可以有改进。 

We finally describe a simple modification to the architecture to allow for the use of both pre-trained and task-specific vectors by having multiple channels. 允许pre-trained和task-specific向量,通过使用多种channels。


模型:

xi属于Rk是k维的word vector,句子中的第i个词,一个句子的长度为n(padded where necessary),表示为:

句子表示

符号是concatenation operator 级联操作。

A convolution operation 卷积操作:一个filter:

filter

is applied to a window of h words to produce a new feature. 例如, a feature ci is generated from a window of words Xi:i+h-1 by:

卷积操作

b属于R,是一个偏移项,f是非线性的函数,例如the hyperbolic tangent。这个filter应用到each possible window of words in sentence 

句子

卷积后


然后用max-over-time pooling operation,根据filter获取最大的feature over the feature map。为了获取最重要的feature for each feature map。 pooling scheme 为了处理不同的句子长度

使用不同的filters,可以获得不同的features,这些features构成了倒数第二层,传递到fully connected softmax layer,这一层的输出是labels上的概率分布。

本文模型,有两个channels of word vectors---that is kept static throughout training and one that is fine-tuned via backpropagation.


Regularization:

为了正则化,在倒数第二层使用dropout, with a constraint on l2-norms of the weight vectors。Dropout防止隐藏层节点的co-adaptation,通过randomly dropping out---例如设置为0,a proportion p of the hidden units during forward-backpropagation.

《Convolutional Neural Networks for Sentence Classification》阅读笔记_第1张图片

element-wise multiplication:元素相乘操作。 r是一个"masking" vector of 伯努利随机变量,with probability p of being 1.

Gradients are backpropagated only through the unmasked units.

《Convolutional Neural Networks for Sentence Classification》阅读笔记_第2张图片

Dropout proved to be such a good regularizer that it was fine to use a larger than necessary network and simply let dropout regularize it. Drop consistently added 2%--4% relative performance.


你可能感兴趣的:(《Convolutional Neural Networks for Sentence Classification》阅读笔记)