论文《Recurrent Convolutional Neural Networks for Text Classification》总结

《Recurrent Convolutional Neural Networks for Text Classification》

论文来源:Lai, S., Xu, L., Liu, K., & Zhao, J. (2015, January). Recurrent Convolutional Neural Networks for Text Classification. In AAAI (Vol. 333, pp. 2267-2273).

原文链接:http://blog.csdn.net/rxt2012kc/article/details/73742362

1.摘要

文本分类是NLP的一项重要的基础任务。传统的文本分类需要特征工程,需要人类参与。而深度学习能够自动提取特征不需要人的参与。本文采用周期循环神经网络比卷积神经网络能够更加减少噪声,利用最大池化层选取一句话中最重要的特征。

2.Introduction

  • 文本分类在很多应用中是非常重要的一部分。such as web searching, information filtering, and sentiment analysis。

  • feature representation:

    • bag-of-words: where unigrams, bigrams, n-grams or some exquisitely designed patterns are typically extracted as features.
    • several feature selection methods: frequency, MI, pLSA, LDA

    • 传统的特征表达方法经常忽略了上下文的信息和词序信息,以及语义信息。
    • 高阶n-gram,tree kernels被应用在特征表达,但是也有稀疏的缺点,影响准确性。
    • word embedding: word2vec 能够捕捉更多语法和语义特征。
  • Recursive Neural Network

    • 效果完全依赖于文本树的构建,并且构建文本树所需的时间是O(n^2). 并且两个句子的关系也不能通过一颗树表现出来。因此不适合与长句子或者文本。
  • Recurrent Neural Network
    • 优点:获取上下文信息。
    • 缺点:有偏的模型(biased model),后面的词占得重要性更大。这样不好,因为每个词都可能是重要的词。
    • 所以:Thus, it could reduce the effectiveness when it is used to capture the semantics of a whole document, because key components could appear anywhere in a document rather than at the end.
  • Convolutional Neural Network(CNN)
    • 优点:无偏的模型(unbiased model),能够通过最大池化获得最重要的特征。
    • Thus, the CNN may better capture the semantic of texts compared to recursive or recurrent neural networks.
    • 时间复杂度:O(n)
    • 缺点:CNN卷积器的大小固定,如果选小了容易造成信息的丢失;如果选大了,会造成巨大的参数空间。
    • 所以:Therefore, it raises a question: can we learn more contextual information than conventional window-based neural networks and represent the semantic of texts more precisely for text classification.
  • 为了解决上面模型的缺陷,提出了本文的Recurrent Convolutional Neural Network (RCNN)

    • 双向循环结构:比传统的基于窗口的神经网络噪声要小,能够最大化地提取上下文信息。

      We apply a bi-directional recurrent structure, which may introduce considerably less noise compared to a traditional window- based neural network, to capture the contextual information to the greatest extent possible when learning word repre- sentations. Moreover, the model can reserve a larger range of the word ordering when learning representations of texts.

    • max-pooling layer池化层:自动决策哪个特征占有更加重要的作用。

      We employ a max-pooling layer that automatically judges which features play key roles in text classification, to capture the key component in the texts.

    • 时间复杂度:O(n)
    • 我们的模型和目前最好的模型相比,并做了实验,取得了显著的效果。

3.最近研究工作

  • 文本分类

    • 传统的文本分类主要关注3个主题:特征工程,特征选择和使用不同的机器学习模型。
    • 特征工程:广泛使用的特征工程是bag-of-words

      For feature engineering, the most widely used feature is the bag-of-words feature. In addition, some more complex features have been designed, such as part-of-speech tags, noun phrases (Lewis 1992) and tree kernels (Post and Bergsma 2013).

    • 特征选择:删除噪声特征:如去除停顿词,使用信息增益,L1正则

      Feature selection aims at deleting noisy features and improving the classification performance. The most common feature selec- tion method is removing the stop words (e.g., “the”). Ad- vanced approaches use information gain, mutual informa- tion (Cover and Thomas 2012), or L1 regularization (Ng 2004) to select useful features

    • 机器学习模型:LR,朴素贝叶斯,SVM

      Machine learning algorithms often use classifiers such as logistic regression (LR), naive Bayes (NB), and support vector machine (SVM). However, these methods have the data sparsity problem.

  • 深度学习网络

    • 深度学习网络和词向量的研究解决了数据稀疏的问题。
    • 词向量的研究使我们测量两个词向量的相似度来表征两个词之间的相似度。

      With the pre-trained word embeddings, neural networks demonstrate their great performance in many NLP tasks. Socher et al. (2011b) use semi-supervised recursive autoen coders to predict the sentiment of a sentence. Socher et al. (2011a) proposed a method for paraphrase detection also with recurrent neural network. Socher et al. (2013) introduced recursive neural tensor network to analyse sentiment of phrases and sentences. Mikolov (2012) uses recurrent neural network to build language models. Kalchbrenner and Blunsom (2013) proposed a novel recurrent network for di- alogue act classification. Collobert et al. (2011) introduce convolutional neural network for semantic role labeling.

4.本文模型

论文《Recurrent Convolutional Neural Networks for Text Classification》总结_第1张图片
如图1所示,先经过1层双向LSTM,该词的左侧的词正向输入进去得到一个词向量,该词的右侧反向输入进去得到一个词向量。再结合该词的词向量,生成一个 1 * 3k 的向量。

再经过全连接层,tanh为非线性函数,得到y2。

再经过最大池化层,得出最大化向量y3.

再经过全连接层,sigmod为非线性函数,得到最终的多分类。

5. 实验

  • 数据集
    • 20Newsgroups1 This dataset contains messages from twenty newsgroups. We use the bydate version and select four major categories (comp, politics, rec, and religion) fol- lowed by Hingmire et al. (2013).
    • Fudan set2 The Fudan University document classification set is a Chinese document classification set that consists of 20 classes, including art, education, and energy.
    • ACL Anthology Network3 This dataset contains scien- tific documents published by the ACL and by related organi- zations. It is annotated by Post and Bergsma (2013) with the five most common native languages of the authors: English, Japanese, German, Chinese, and French.
    • Stanford Sentiment Treebank4 The dataset contains movie reviews parsed and labeled by Socher et al. (2013). The labels are Very Negative, Negative, Neutral, Positive, and Very Positive.
  • 实验结果
    论文《Recurrent Convolutional Neural Networks for Text Classification》总结_第2张图片

结论

我们的模型比其他模型能够更好的获得上下文的信息。所以模型效果更好。

你可能感兴趣的:(深度学习,自然语言处理)