2020.3 Enhanced meta-learning for cross-lingual named entity recognition with minimal resources 阅读笔记

Motivation

  • Problem Setting:
    • a) One source language with rich labeled data.
    • b) No labeled data in the target language.
  • 现有的 Cross-lingula NER 方法可以分为两大类:
    • a) Label projection (generate labeled data in target languages)
      • parallel data + word alignment information + label projection
      • word-to-word or phrase-to-phrase translation + label projection
    • b) Direct model transfer (exploit language-independent features)
      • Cross-lingual word representations/clusters/wikifier features/gazetteers.
    • STOA: multilingual BERT (direct model transfer)
  • 本文提出:direct model transfer 用到的 source-trained 模型可以进一步提升,因为:
    • a) Given test example, 相比于 directly test on it, 还可以 fine-tune the source-trained model with the similar examples.
      • 如何 retreive similar examples ? => 利用 cross-lingual 的 sentence representation model 计算source-target sentence pair 之间的 cosine simialarity.
      • 以何种方式 similar? => In structure or semantics.
        2020.3 Enhanced meta-learning for cross-lingual named entity recognition with minimal resources 阅读笔记_第1张图片
    • b) 由于 retrieve across different languages, 所以 the set size of the similar examples 很小
    • c) 所以只能 finetune with a small set and only a few update steps.
      • => Fast adapt to new tasks (languages here) with very limited data.
      • => Apply meta-learning! (Learn a good parameter initialization of a model (more sentitive to the new task/data)
  • 进一步提出:
    • a) masking scheme
    • b) a max-loss term

Methodology

  • i) 构建 Pseudo-Meta-NER Tasks:
    • 把每个 example 看做一个独立的 pseudo test set.
    • 用 mBERT [CLS] 做为 sentence representation 计算 cosine similarity.
    • 相应的 similar examples 作为 pseudo training set.
    • 由此构建 N个 pseuso tasks.
  • ii) Meta-training and Adaptation with Pseudo Tasks:
    2020.3 Enhanced meta-learning for cross-lingual named entity recognition with minimal resources 阅读笔记_第2张图片
  • iii) Masking Scheme:
    • Motivation: The learned representations of infrequent entities across different languages are not well-aligned in the shared space. (infrequent entities 在 mBERT 的 training corpus 中出现的比较少)
    • How to? => Mask entities in each training examples with a certain probability, to encourage the model to predict through context information.
  • iv) Max Loss
    • Motivation: 对所有 token 的 loss 进行 average 会弱化对于 loss 最高的那个 token 的学习。=> Put more effort in learning from high-loss tokens, which would probably
      be corrected during meta-training.
      2020.3 Enhanced meta-learning for cross-lingual named entity recognition with minimal resources 阅读笔记_第3张图片

Experimental Results

2020.3 Enhanced meta-learning for cross-lingual named entity recognition with minimal resources 阅读笔记_第4张图片
2020.3 Enhanced meta-learning for cross-lingual named entity recognition with minimal resources 阅读笔记_第5张图片

你可能感兴趣的:(Paper,Reading,机器学习,人工智能)