【 RoBERTa 】
Liu Y, Ott M, Goyal N, et al. RoBERTa: A Robustly Optimized BERT Pretraining Approach[J]. arXiv preprint arXiv:1907.11692, 2019.
{ GitHub: https://github.com/brightmart/roberta_zh }
【 OpenAI GPT2 】
Radford, A., et al. (2019). "Language models are unsupervised multitask learners." OpenAI Blog 1(8).
{ GitHub: https://github.com/openai/gpt-2 }
【 XLNet 】
Yang, Z., et al. (2019). "XLNet: Generalized Autoregressive Pretraining for Language Understanding." arXiv preprint arXiv:1906.08237.
{ GitHub: https://github.com/zihangdai/xlnet }
{ Chinese-pretrained-model GitHub: https://github.com/ymcui/Chinese-PreTrained-XLNet }
【 BERT wwm - ext 】
{ GitHub: https://github.com/ymcui/Chinese-BERT-wwm }
【 BERT wwm】
Cui, Y., et al. (2019). "Pre-Training with Whole Word Masking for Chinese BERT." arXiv preprint arXiv:1906.08101.
哈工大讯飞联合发布全词覆盖中文BERT预训练模型
{ GitHub: https://github.com/ymcui/Chinese-BERT-wwm }
【 BERT 】
Devlin, J., et al. (2018). "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv 2018." arXiv preprint arXiv:1810.04805.
{ GitHub: BERT }
【 ELMo 】
Peters, M. E., et al. (2018). "Deep contextualized word representations." arXiv preprint arXiv:1802.05365. NAACL2018最佳论文
【 OpenAI GPT 】
Radford, A., et al. (2018). "Improving language understanding by generative pre-training." URL https://s3-us-west-2. amazonaws. com/openai-assets/research-covers/languageunsupervised/language understanding paper. pdf.
【 word2vec (Skip-gram model)】
Mikolov, T., et al. (2013). Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems. NIPS
【 Attention 】
Vaswani, A., et al. (2017). Attention is all you need. Advances in neural information processing systems.
中文词向量:
腾讯800W中文词 song yan老师出品 https://ai.tencent.com/ailab/nlp/embedding.html