Data Augmentation in NLP

Data Augmentation in NLP

 

Word Substitution

 

  1. Synonym-based substitution

Data Augmentation in NLP_第1张图片

 

  1. Word embedding substitution

Data Augmentation in NLP_第2张图片

Data Augmentation in NLP_第3张图片

  1. Masked language model

Data Augmentation in NLP_第4张图片

  1. TF-IDF-based word substitution

The basic idea is that words with a low TF-IDF score are meaningless, so they can be replaced without affecting the true label of the sentence.

Data Augmentation in NLP_第5张图片

 

Back Translation

Data Augmentation in NLP_第6张图片

 

Text Surface Transformation

 

Random Noise Injection

 

  1. Misspelling injection

  1. QWERTY keyboard error injection

Data Augmentation in NLP_第7张图片

  1. empty noise injection

  1. Random injection

Choose a random word from sentences that are not stop words. Then, find its synonyms and insert them at random positions in the sentence.

Data Augmentation in NLP_第8张图片

  1. Sentence reorganization

Data Augmentation in NLP_第9张图片

 

Syntax Tree

Data Augmentation in NLP_第10张图片

 

reference

https://blog.csdn.net/lqfarmer/article/details/107006551

你可能感兴趣的:(NLP,数据增强,nlp)