如何下载Hugging Face 模型(pytorch_model.bin, config.json, vocab.txt)以及如何在local使用

  1. 首先找到这些文件的网址。以bert-base-uncase模型为例。进入到你的.../lib/python3.6/site-packages/transformers/里,可以看到三个文件configuration_bert.py,modeling_bert.py,tokenization_bert.py。这三个文件里分别包含
    1. BERT_PRETRAINED_MODEL_ARCHIVE_MAP = {"bert-base-uncased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-pytorch_model.bin","bert-large-uncased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-pytorch_model.bin","bert-base-cased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased-pytorch_model.bin",....}
    2. BERT_PRETRAINED_CONFIG_ARCHIVE_MAP = {"bert-base-uncased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-config.json","bert-large-uncased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-config.json",...}
    3. PRETRAINED_VOCAB_FILES_MAP = {"vocab_file": {"bert-base-uncased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt","bert-large-uncased": "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased-vocab.txt",...}
  2.         下载bert-base-uncased模型所对应的三个文件:
    1. 在国内下载s3.amazonaws.com上数据的速度很慢,所以你可以用任何自己熟悉的方法。我用的是最“乖”的方法,虽然很慢。
    2. ping s3.amazonaws.com
    3. wget http://52.216.242.246/models.huggingface.co/bert/bert-base-uncased-vocab.txt
    4. wget http://52.216.242.246/models.huggingfacco/bert/bert-base-uncased-config.json
    5. vocab.txt和config.json都不大,很多方法都能够下载。注意把https改成http.
    6. 而pytorch_model.bin则很大,你可以继续采用wget http://52.216.242.246/models.huggingface.co/bert/bert-base-uncased-pytorch_model.bin,不过可能需要n多天才能够下载完成。一个还行的办法是用百度网盘里的离线下载,慢慢下。
  3. 三个文件全部下载后,新建一个目录bert-base-uncased,把三个文件分别改名为pytorch_model.bin, config.json, vocab.txt,并放在该目录中。
  4. local使用方式:
  5. 参考链接:https://www.cnblogs.com/lian1995/p/11947522.html
  6. https://zhuanlan.zhihu.com/p/106880488

你可能感兴趣的:(NLP,nlp)