Semi-supervised Ensemble Sentiment Classification

Following are some excerpts from the paper Semi-Stacking for Semi-supervised Sentiment Classification by Shoushan Li et al.. Those excerpts summarize the main idea of the paper.

Paper name: Semi-Stacking for Semi-supervised Sentiment Classification
Paper authors: Shoushan Li et al.
Key words: Semi-Stacking, Semi-supervised, Sentiment Classification, Ensemble Learning, meta-learning

Overview

Although various semi-supervised learning algorithms are now available and have been shown to be successful in sentiment classification, each algorithm has its own characteristic with different pros and cons. It is rather difficult to tell which performs best in general. Therefore, it remains difficult to pick a suitable algorithm for a specific domain.

The paper overcome the above challenge by combining two or more algorithms instead of picking one of them to perform semi-supervised learning.

Framework Overview

In the approach of the paper, two member semi-supervised learning algorithm are involved, and the objective is to leverage both of them to get a better-performed semi-supervised learning algorithm.

The basic idea of the framework can be summarized as following:

  1. A small portion of labeled samples in the initial labeled data, namely meta-samples, are picked as unlabeled samples and added into the initial unlabeled data to form a new unlabeled data set.

  2. Use the remaining labeled data as the new labeled data to perform semi-supervised learning with each member algorithm.

  3. Collect the meta-samples’ probability results from all member algorithms to train a meta-learning classifier (called meta-classifier).

    Here, meta- means the learning samples are not represented by traditional descriptive features, e.g., bag-of-words features, but by the result features generated from member algorithms. In the paper, the learning samples in meta-learning are represented by the posterior probabilities of the unlabeled samples belonging to the positive and negative categories from member algorithms.

  4. Utilize the meta-classifier to re-predict the unlabeled samples as new automatically-labeled samples.

Due to the limited number of labeled data in semi-supervised learning, the paper used N fold cross validation to obtain more meta-_samples for better learning the meta-classifier.

The above framework is called semi-stacking in the paper.



来自为知笔记(Wiz)


你可能感兴趣的:(Semi-supervised Ensemble Sentiment Classification)