本周五快下班的时候看到别人写了个bert语言模型作为输入,用于做ner识别,后面可以是cnn或者直接是crf层,bert在这里作为word2vec模型的替换着,原始地址https://github.com/macanv/BERT-BiLSTM-CRF-NER,在这里需要注意的是TensorFlow版本需要1.9版本:
整理逻辑还是比较简单,别看谷歌写了那么多代码,实际就是把bert模型替换了原来网络的word2vec部分,然后用google训练好的bert模型对下游任务进行微调,google开源的代码大多数都是用estimator接口,你可以完全不用,具体逻辑是 对原始你的数据转化为tfrecord的形式,dataset or 其他形式读取,然后使用bert模型进行embeding,然后加载google预训练的bert模型,得到embeding之后的tensor进入你的自己的网络,无论是cnn、rnn都可以,以后再也不需要tf.nn.embeding_lookup操作了,word2vec模型最大的缺点是就是在歧义词上效果较差,如苹果,可以是水果、公司、电影,在word2vec模型只要一个向量,无论前后文是什么,但是在bert里面可以是不同的向量,我已经用自己的代码逻辑实现了bert+lstm+crf用于做ner,
class BertLstmNer(object):
def __init__(self,bert_config, is_training, input_ids, input_mask,
segment_ids, labels, num_labels, use_one_hot_embeddings,init_checkpoint):
self.bert_config=bert_config
self.is_training=is_training
self.input_ids=input_ids
self.input_mask=input_mask
self.segment_ids=segment_ids
self.labels=labels
self.num_labels=num_labels
self.use_one_hot_embeddings=use_one_hot_embeddings
self.init_checkpoint=init_checkpoint
model = modeling.BertModel(
config=self.bert_config,
is_training=self.is_training,
input_ids=self.input_ids,
input_mask=self.input_mask,
token_type_ids=self.segment_ids,
use_one_hot_embeddings=self.use_one_hot_embeddings
)
# 获取对应的embedding 输入数据[batch_size, seq_length, embedding_size]
embedding = model.get_sequence_output()
max_seq_length = embedding.shape[1].value
used = tf.sign(tf.abs(input_ids))
lengths = tf.reduce_sum(used, reduction_indices=1) # [batch_size] 大小的向量,包含了当前batch中的序列长度
blstm_crf = BLSTM_CRF(embedded_chars=embedding, hidden_unit=FLAGS.lstm_size, cell_type=FLAGS.cell,
num_layers=FLAGS.num_layers,
droupout_rate=FLAGS.droupout_rate, initializers=initializers, num_labels=num_labels,
seq_length=max_seq_length, labels=labels, lengths=lengths, is_training=is_training)
(self.total_loss, logits, trans, self.pred_ids) = blstm_crf.add_blstm_crf_layer()
with tf.name_scope("train_op"):
self.train_op = tf.train.AdamOptimizer().minimize(self.total_loss)
如果要运行我刚刚贴的GitHub中的的代码的话,tensorflow版本需要至少1.9的版本,用于其中依赖了一些tpu的接口,用的是estimator高阶接口,下面是一些基本操作:
In [1]: import tensorflow as tf
[2]: tf.__version__
Out[2]: '1.9.0'
其他低版本运行会报下面的错误:
AttributeError: module 'tensorflow.contrib.tpu' has no attribute 'InputPipelineConfig'
运行之前下载Google开源的中文语言模型:
https://storage.googleapis.com/bert_models/2018_11_03/chinese_L-12_H-768_A-12.zip
代码修改为:
else:
bert_path = '/Users/zhoumeixu/Documents/python/BERT-BiLSTM-CRF-NER/data/chinese_L-12_H-768_A-12/'
root_path = '/Users/zhoumeixu/Documents/python/BERT-BiLSTM-CRF-NER'
运行命令:
python bert_lstm_ner.py --task_name="NER" --do_train=True --do_eval=True --do_predict=True --data_dir=NERdata --max_seq_length=128 --train_batch_size=32 --learning_rate=2e-5 --num_train_epochs=3.0 --output_dir=./output/result_dir/
运行情况会先吧原始文件转化为tfrecord文件,所以要事先建立一个目录output/result_dir
运行过程:
INFO:tensorflow: name = bert/encoder/layer_11/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*
INFO:tensorflow: name = bert/encoder/layer_11/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*
INFO:tensorflow: name = bert/encoder/layer_11/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*
INFO:tensorflow: name = bert/encoder/layer_11/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = bert/encoder/layer_11/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = bert/encoder/layer_11/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = bert/pooler/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
INFO:tensorflow: name = bert/pooler/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow: name = rnn_layer/bidirectional_rnn/fw/basic_lstm_cell/kernel:0, shape = (896, 512)
INFO:tensorflow: name = rnn_layer/bidirectional_rnn/fw/basic_lstm_cell/bias:0, shape = (512,)
INFO:tensorflow: name = rnn_layer/bidirectional_rnn/bw/basic_lstm_cell/kernel:0, shape = (896, 512)
INFO:tensorflow: name = rnn_layer/bidirectional_rnn/bw/basic_lstm_cell/bias:0, shape = (512,)
INFO:tensorflow: name = project/hidden/W:0, shape = (256, 128)
INFO:tensorflow: name = project/hidden/b:0, shape = (128,)
INFO:tensorflow: name = project/logits/W:0, shape = (128, 11)
INFO:tensorflow: name = project/logits/b:0, shape = (11,)
INFO:tensorflow: name = crf_loss/transitions:0, shape = (11, 11)
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into ./output/result_dir/model.ckpt.
最后祝福大家愉快的运行起来,模型准确率更上一层楼