数据源记录

Glove预训练词向量下载地址

https://github.com/stanfordnlp/GloVe
The links below contain word vectors obtained from the respective corpora. 
If you want word vectors trained on massive web datasets, 
you need only download one of these text files.

Common Crawl (42B tokens, 1.9M vocab, uncased, 300d vectors, 1.75 GB download): glove.42B.300d.zip,https://huggingface.co/stanfordnlp/glove/resolve/main/glove.42B.300d.zip


Common Crawl (840B tokens, 2.2M vocab, cased, 300d vectors, 2.03 GB download): glove.840B.300d.zip,https://huggingface.co/stanfordnlp/glove/resolve/main/glove.840B.300d.zip


Wikipedia 2014 + Gigaword 5 (6B tokens, 400K vocab, uncased, 300d vectors, 822 MB download): glove.6B.zip,https://huggingface.co/stanfordnlp/glove/resolve/main/glove.6B.zip


Twitter (2B tweets, 27B tokens, 1.2M vocab, uncased, 200d vectors, 1.42 GB download): glove.twitter.27B.zip,https://huggingface.co/stanfordnlp/glove/resolve/main/glove.twitter.27B.zip

你可能感兴趣的:(nlp)