
Exploiting BERT for End-to-End Aspect-based Sentiment Analysis



  1. In this paper, we investigate the modeling power of contextualized embeddings from pre-trained language models, e.g. BERT, on theE2E-ABSA task.
  2. Specifically, we build a series of simple yet insightful neural base-lines to deal with E2E-ABSA.
  3. The experimental results show that even with a simple linear classification layer, our BERT-based architecture can outperform state-of-the-art works.
  4. Besides, we also standardize the comparative study by consistently utilizing a hold-out development dataset for model selection, which is largely ignored by previous works.



  • In this paper, we focus on the aspect term-level End-to-End Aspect-Based Sentiment Analysis (E2E-ABSA) problem setting.
    在本文中,我们重点关注aspect term级别的end-to-end基于方面的情感分析(E2E-ABSA)问题设置。
  • This task canbe formulated as a sequence labeling problem.
  • The overall architecture of our model is depicted in Figure 1.


  • 2.1 BERT as Embedding Layer
    1).First of all, we pack the input features
    as H0={e1,···,eT}, where et(t∈[1,T]) is the combination of the token embedding, position embedding and segment embedding corresponding to the input token xt.
  • 2.2 Design of Downstream Model
    1).After obtaining the BERT representations, we de-sign a neural layer, called E2E-ABSA layer inFigure 1, on top of BERT embedding layer forsolving the task of E2E-ABSA.
    2).We investigate several different design for the E2E-ABSA layer, namely, linear layer, recurrent neural networks, self-attention networks, and conditional random fields layer.


  • 2.1)we firstly employ BERT component with L transformer layers to calculate the corresponding contextualized representations HL={hL1,···,hLT} ∈RT×dimh for the input tokens where dimh denotes the dimension of the representation vector.
    我们首先使用具有L个转换器层的BERT分量来计算输入tokens对应的上下文表示HL={hL1,···,hlt} ∈RT×dimh,其中dimh表示表示向量的维数。
  • 2.2)Then, the contextualized representations are fed to the task-specific layers to predict the tag sequence y = {y1,···,yT}. The possible values of the tag yt areB-{POS,NEG,NEU},I-{POS,NEG,NEU},E-{POS,NEG,NEU},S-{POS,NEG,NEU} or O, denoting the beginning of aspect, inside of aspect, end of aspect, single-word aspect, with positive, negative or neutral sentiment respectively, as well as outside of aspect.
    然后,将上下文表示馈送到任务特定层以预测标签序列y={y1,···,yt}。 标签yt的可能值是B-{POS,NEG,neu},I-{POS,NEG,neu},E-{POS,NEG,neu},S-{POS,NEG,neu}或O,分别表示aspect的开始,aspect的内部,aspect的结尾,aspect单词,分别带有积极的,消极的或中性的情绪,以及aspect外部
  • 3)SAN:One variant is composed of asimple self-attention layer and residual connec-tion (He et al., 2016), dubbed as “SAN”.


  • 4)TFM:Another variant is a transformer layer(dubbed as “TFM”),另一个变体是一个transformer 层。which has the same architecture with the transformer encoder layer in the BERT.它与BERT中的transformer编码层具有相同的架构。






  1. In this paper, we investigate the effectiveness of BERT embedding component on the task of End-to-End Aspect-Based Sentiment Analysis (E2E-ABSA).
  2. Specifically, we explore to couple theBERT embedding component with various neural models and conduct extensive experiments on two benchmark datasets.
  3. The experimental results demonstrate the superiority of BERT-based models on capturing aspect-based sentiment and their robustness to overfitting.


