BpeTrainer保存tokenizer

这里不是简单的save而是tokenizer.model.save('.')或者tokenizer.save(path="tokenizer.json",pretty=True)

from tokenizers import Tokenizer, pre_tokenizers
from tokenizers.models import BPE
from tokenizers.trainers import BpeTrainer

tokenizer = Tokenizer(BPE())
tokenizer.pre_tokenizer = pre_tokenizers.ByteLevel(add_prefix_space=False)
trainer = BpeTrainer(special_tokens=["[PAD]", "[BOS]", "[EOS]"], vocab_size=8000, show_progress=True)
tokenizer.train(trainer=trainer,files=["1.txt"])
print("Trained vocab size: {}".format(tokenizer.get_vocab_size()))
# tokenizer.save(path="./data", pretty=False)
tokenizer.save(path="tokenizer.json",pretty=True)
#tokenizer.model.save('.')

你可能感兴趣的:(python,深度学习,神经网络,python)