kashgari学习笔记-1

1、回调函数的使用

from kashgari.corpus import SMP2018ECDTCorpus
import keras
import kashgari
from kashgari.tasks.classification import BiLSTM_Model
from kashgari.callbacks import EvalCallBack

import logging
logging.basicConfig(level='DEBUG')

# 加载内置数据集
train_x, train_y = SMP2018ECDTCorpus.load_data('train')
valid_x, valid_y = SMP2018ECDTCorpus.load_data('valid')
test_x, test_y = SMP2018ECDTCorpus.load_data('test')

# # 也可以使用自己的数据集,格式如下:
# train_x = [['Hello', 'world'], ['Hello', 'Kashgari']]
# train_y = ['a', 'b']

valid_x, valid_y = train_x, train_y
test_x, test_y = train_x, train_y

tf_board_callback = keras.callbacks.TensorBoard(log_dir='./logs', update_freq=1000)
model = BiLSTM_Model()

# 这是 Kashgari 内置回调函数,会在训练过程计算精确度,召回率和 F1
eval_callback = EvalCallBack(kash_model=model,
                             valid_x=valid_x,
                             valid_y=valid_y,
                             step=1)

model.fit(train_x,
          train_y,
          valid_x,
          valid_y,
          batch_size=100,
          epochs = 150, 
          callbacks=[eval_callback, tf_board_callback])

使用了两个回调函数,eval_callback和tf_board_callback。

1、eval_callback是是 Kashgari 内置回调函数,会在训练过程计算精确度,召回率和 F1。
其中step默认值是5,也就是默认每5个epoch训计算一次精确度,召回率和 F1。
2、tf_board_callback回调函数在当前目录下生成TensorBoard日志文件。
在后台运行tensorboard时,指定该目录,即可打开tensorboard页面查看。

# tensorboard --logdir=logs

在浏览器中输入http://localhost:6006/

image.png

2、预训练模型的使用

下载预训练模型,先从最有名的bert中文模型开始吧
下载地址:
https://github.com/google-research/bert
找到Bert-Base, Chinese模型:

Bert-Base, Chinese模型

#!/usr/bin/env python
# -*- encoding: utf-8 -*-
"""
@Author  :   Yang Song
@Time    :   2020/5/20 15:52 
"""
from kashgari.corpus import SMP2018ECDTCorpus
import keras
import kashgari
from kashgari.tasks.classification import BiLSTM_Model
from kashgari.callbacks import EvalCallBack

import logging
logging.basicConfig(level='DEBUG')

# 加载内置数据集
train_x, train_y = SMP2018ECDTCorpus.load_data('train')
valid_x, valid_y = SMP2018ECDTCorpus.load_data('valid')
test_x, test_y = SMP2018ECDTCorpus.load_data('test')

# # 也可以使用自己的数据集,格式如下:
# train_x = [['Hello', 'world'], ['Hello', 'Kashgari']]
# train_y = ['a', 'b']

valid_x, valid_y = train_x, train_y
test_x, test_y = train_x, train_y

from kashgari.embeddings import BERTEmbedding

# 
bert_embed = BERTEmbedding('chinese_L-12_H-768_A-12',
                           task=kashgari.CLASSIFICATION,
                           sequence_length=100)
model = BiLSTM_Model(bert_embed)
model.fit(train_x, train_y, valid_x, valid_y)

chinese_L-12_H-768_A-12就是下载的中文bert模型目录,当前代码是处在0_YS_TEST目录下:

chinese_L-12_H-768_A-12

模型效果:

Epoch 1/5
30/30 [==============================] - 32s 1s/step - loss: 1.5468 - acc: 0.6119 - val_loss: 0.6305 - val_acc: 0.8628
Epoch 2/5
30/30 [==============================] - 26s 856ms/step - loss: 0.5488 - acc: 0.8767 - val_loss: 0.3349 - val_acc: 0.9335
Epoch 3/5
30/30 [==============================] - 26s 850ms/step - loss: 0.2987 - acc: 0.9389 - val_loss: 0.1907 - val_acc: 0.9670
Epoch 4/5
30/30 [==============================] - 26s 862ms/step - loss: 0.2027 - acc: 0.9607 - val_loss: 0.1159 - val_acc: 0.9841
Epoch 5/5
30/30 [==============================] - 26s 865ms/step - loss: 0.1377 - acc: 0.9761 - val_loss: 0.0758 - val_acc: 0.9947

你可能感兴趣的:(kashgari学习笔记-1)