参考:https://www.kexue.fm/archives/8373
https://github.com/bojone/CLUE-bert4keras(cluener.py)
数据集下载:
https://github.com/CLUEbenchmark/CLUENER2020/files/6371700/cluener_public.zip
参考:https://github.com/CLUEbenchmark/CLUENER2020/issues/47
模型结构简单说明:
BERT+GlobalPointer
# 建立分词器
tokenizer = Tokenizer(dict_path, do_lower_case=True)
# 预训练模型
base = build_transformer_model(
config_path, checkpoint_path, application='unilm', return_keras_model=False
)
# 模型参数
last_layer = 'Transformer-%s-FeedForward-Norm' % (base.num_hidden_layers - 1)
# 构建模型
output = base.model.get_layer(last_layer).output
output = GlobalPointer(
heads=num_classes,
head_size=base.attention_head_size,
use_bias=False,
kernel_initializer=base.initializer
)(output)
snippets.py文件
1、主要改下bert预训练模型的路径
# 模型路径
config_path = r'D:\***ert\chinese_L-12_H-768_A-12\bert_config.json'
checkpoint_path = r'D:\***\chinese_L-12_H-768_A-12\bert_model.ckpt'
dict_path = r'D:\****t\chinese_L-12_H-768_A-12\vocab.txt'
2、改下数据源的路径
# 通用参数
data_path = r'D:\clue_bert4keras\\'
cluener.py文件