NLP(十)Bert使用

Bert

bert的原理+代码估计啃至少一个星期
本来bert的源代码要用起来挺不容易的,还好有大神写了个pytorch的高级封装
腾讯也有个服务器的封装版 bert_as_service
注:bert的序列最大长度是512,调长好像会报错

pip install fast-bert
from fast_bert.data import *
from fast_bert.learner import *
from pytorch_pretrained_bert.tokenization import BertTokenizer

# 关键的有三步
# 使用bert预训练模型转换词向量
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased", do_lower_case=True)
# batch数据接口
# label_cols: ['字符串1', '字符串2']
databunch = BertDataBunch(DATA_PATH, LABEL_PATH, tokenizer, train_file='train.csv', val_file='valid.csv',
                          test_data='test.csv', label_file="labels.csv",
                          text_col="comment_text", label_col=label_cols,
                          bs=512, maxlen=512, multi_gpu=False, multi_label=True)
# 创建模型模型
learner = BertLearner.from_pretrained_model(databunch, “bert-base-uncased“,  metrics, device, logger
                                            is_fp16=True, loss_scale=128, 
                                            multi_gpu=False,  multi_label=False)
# 4个并行任务
learner.fit(4, lr=0.001, schedule_type="warmup_linear")

参考链接:
https://github.com/wshuyi/demo-fastbert-multi-label-classification

你可能感兴趣的:(NLP)