ELMo模型的理解与实践(2)

预训练好的词向量已经released,这里介绍一下,如何直接获取ELMo词向量。在pytorch里可以通过AlenNLP包使用ELMo。

一、环境配置

1) 在conda中创建allennlp环境:

conda create -n allennlp python=3.6

2) 安装allennlp

pip install allennlp

二、下载训练好的参数和模型

参数下载:

https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x4096_512_2048cnn_2xhighway/elmo_2x4096_512_2048cnn_2xhighway_weights.hdf5

模型下载:

https://s3-us-west-2.amazonaws.com/allennlp/models/elmo/2x4096_512_2048cnn_2xhighway/elmo_2x4096_512_2048cnn_2xhighway_options.json

三、获得词向量

from allennlp.commands.elmo import ElmoEmbedder
options_file = "/files/elmo_2x4096_512_2048cnn_2xhighway_options.json"
weight_file = "/files/elmo_2x4096_512_2048cnn_2xhighway_weights.hdf5"

elmo = ElmoEmbedder(options_file, weight_file)

# use batch_to_ids to convert sentences to character ids
context_tokens = [['I', 'love', 'you', '.'], ['Sorry', ',', 'I', 'don', "'t", 'love', 'you', '.']] #references
elmo_embedding, elmo_mask = elmo.batch_to_embeddings(context_tokens)

print(elmo_embedding)
print(elmo_mask)

因为环境问题,机子一直无法装上allennlp包,流程大概如上。

你可能感兴趣的:(论文阅读,深度学习)