基于tensorflow-hub使用预训练bert模型——简单易上手成功率百分百

最近,研究了下如何使用基于tensorflow-hub中预训练bert,一开始找到的关于预模型使用介绍的官方教程国内打不开,所以看了很多博客遇到了很多坑,直至最后找到能打开的教程,才发现使用很简单。

实验版本:

         tensorflow版本: 2.3.0

         tensorflow-hub版本:0.9.0

         python版本: 3.7.6

数据准备:

         首先,熟悉bert的都知道输入有3个:inputIds、inputMask、segmentIds,这个不多说了,百度一大堆。

直接获取bert输出代码:

max_seq_length = 256

input_word_ids = tf.keras.layers.Input(shape=(max_seq_length,),
                                       dtype=tf.int32,name="input_word_ids")
input_mask = tf.keras.layers.Input(shape=(max_seq_length,),
                                   dtype=tf.int32,name="input_mask")
segment_ids = tf.keras.layers.Input(shape=(max_seq_length,),
                                    dtype=tf.int32,name="segment_ids")

# 将trainable值改为False
module = hub.KerasLayer(BERT_URL,trainable=False)#,signature="token")
pooled_output, sequence_output = module([input_mask,segment_ids,input_word_ids])

# 构建模型输入输出
model = tf.keras.Model(inputs=[input_word_ids,input_mask,segment_ids],outputs=[pooled_output,sequence_output])

# 获取输出
output = model.predict([inputIds,inputMask,segmentIds]) 
# output输出结果 ----》 pool_out: shape=[batch, 768];sequence_out: shape=[batch, 256, 768] 

-------------------------------------------------BUG----------------------------------------------

这里也尝试了参考链接3中博客方式获取bert输出结果,但是遇到个问题

         ValueError: Could not find matching function to call loaded from the SavedModel. Got:
                           Positional arguments (2 total):
                         * False
                         * None:

# 实验内容1——参数名来自https://hub.tensorflow.google.cn/tensorflow/bert_zh_L-12_H-768_A-12/2
outputs,_ = hub_module(input_word_ids=tf.constant(tmp_inputids),
    input_mask=tf.constant(tmp_inputMask),
    segment_ids=tf.constant(tmp_segmentIds))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
 in 
      2 outputs,_ = hub_module(input_word_ids=tf.constant(tmp_inputids),
      3     input_mask=tf.constant(tmp_inputMask),
----> 4     segment_ids=tf.constant(tmp_segmentIds))
      5 
      6 # # 实验内容2——参数名来自报错提示

/opt/conda/lib/python3.7/site-packages/tensorflow/python/saved_model/load.py in _call_attribute(instance, *args, **kwargs)
    507 
    508 def _call_attribute(instance, *args, **kwargs):
--> 509   return instance.__call__(*args, **kwargs)
    510 
    511 

/opt/conda/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py in __call__(self, *args, **kwds)
    778       else:
    779         compiler = "nonXla"
--> 780         result = self._call(*args, **kwds)
    781 
    782       new_tracing_count = self._get_tracing_count()

/opt/conda/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py in _call(self, *args, **kwds)
    812       # In this case we have not created variables on the first call. So we can
    813       # run the first trace but we should fail if variables are created.
--> 814       results = self._stateful_fn(*args, **kwds)
    815       if self._created_variables:
    816         raise ValueError("Creating variables on a non-first call to a function"

/opt/conda/lib/python3.7/site-packages/tensorflow/python/eager/function.py in __call__(self, *args, **kwargs)
   2826     """Calls a graph function specialized to the inputs."""
   2827     with self._lock:
-> 2828       graph_function, args, kwargs = self._maybe_define_function(args, kwargs)
   2829     return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
   2830 

/opt/conda/lib/python3.7/site-packages/tensorflow/python/eager/function.py in _maybe_define_function(self, args, kwargs)
   3211 
   3212       self._function_cache.missed.add(call_context_key)
-> 3213       graph_function = self._create_graph_function(args, kwargs)
   3214       self._function_cache.primary[cache_key] = graph_function
   3215       return graph_function, args, kwargs

/opt/conda/lib/python3.7/site-packages/tensorflow/python/eager/function.py in _create_graph_function(self, args, kwargs, override_flat_arg_shapes)
   3073             arg_names=arg_names,
   3074             override_flat_arg_shapes=override_flat_arg_shapes,
-> 3075             capture_by_value=self._capture_by_value),
   3076         self._function_attributes,
   3077         function_spec=self.function_spec,

/opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py in func_graph_from_py_func(name, python_func, args, kwargs, signature, func_graph, autograph, autograph_options, add_control_dependencies, arg_names, op_return_value, collections, capture_by_value, override_flat_arg_shapes)
    984         _, original_func = tf_decorator.unwrap(python_func)
    985 
--> 986       func_outputs = python_func(*func_args, **func_kwargs)
    987 
    988       # invariant: `func_outputs` contains only Tensors, CompositeTensors,

/opt/conda/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py in wrapped_fn(*args, **kwds)
    598         # __wrapped__ allows AutoGraph to swap in a converted function. We give
    599         # the function a weak reference to itself to avoid a reference cycle.
--> 600         return weak_wrapped_fn().__wrapped__(*args, **kwds)
    601     weak_wrapped_fn = weakref.ref(wrapped_fn)
    602 

/opt/conda/lib/python3.7/site-packages/tensorflow/python/saved_model/function_deserialization.py in restored_function_body(*args, **kwargs)
    255         .format(_pretty_format_positional(args), kwargs,
    256                 len(saved_function.concrete_functions),
--> 257                 "\n\n".join(signature_descriptions)))
    258 
    259   concrete_function_objects = []

ValueError: Could not find matching function to call loaded from the SavedModel. Got:
  Positional arguments (2 total):
    * False
    * None
  Keyword arguments: {'input_word_ids': , 'input_mask': , 'segment_ids': }

Expected these arguments to match one of the following 4 option(s):

Option 1:
  Positional arguments (3 total):
    * [TensorSpec(shape=(None, None), dtype=tf.int32, name='inputs/0'), TensorSpec(shape=(None, None), dtype=tf.int32, name='inputs/1'), TensorSpec(shape=(None, None), dtype=tf.int32, name='inputs/2')]
    * True
    * None
  Keyword arguments: {}

Option 2:
  Positional arguments (3 total):
    * [TensorSpec(shape=(None, None), dtype=tf.int32, name='input_word_ids'), TensorSpec(shape=(None, None), dtype=tf.int32, name='input_mask'), TensorSpec(shape=(None, None), dtype=tf.int32, name='input_type_ids')]
    * True
    * None
  Keyword arguments: {}

Option 3:
  Positional arguments (3 total):
    * [TensorSpec(shape=(None, None), dtype=tf.int32, name='inputs/0'), TensorSpec(shape=(None, None), dtype=tf.int32, name='inputs/1'), TensorSpec(shape=(None, None), dtype=tf.int32, name='inputs/2')]
    * False
    * None
  Keyword arguments: {}

Option 4:
  Positional arguments (3 total):
    * [TensorSpec(shape=(None, None), dtype=tf.int32, name='input_word_ids'), TensorSpec(shape=(None, None), dtype=tf.int32, name='input_mask'), TensorSpec(shape=(None, None), dtype=tf.int32, name='input_type_ids')]
    * False
    * None
  Keyword arguments: {}

从错误描述中可以看到,bert模型的输入有4中可选方式,每种可选方式都是3个参数,第一个参数为bert的输入(list类型),第二个参数为trainable(bool类型),第三个参数猜测为signatures(在官方教程中没说,大概率在tf-1.x版本里有介绍)。

根据错误提示修改参数名后继续尝试,还是一样的问题:

# 实验内容2——参数名来自报错提示Option2、4
outputs,_ = hub_module(input_word_ids=tf.constant(tmp_inputids),
    input_mask=tf.constant(tmp_inputMask),
    input_type_ids=tf.constant(tmp_segmentIds))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
 in 
      7 outputs,_ = hub_module(input_word_ids=tf.constant(tmp_inputids),
      8     input_mask=tf.constant(tmp_inputMask),
----> 9     input_type_ids=tf.constant(tmp_segmentIds))
     10 
     11 # # 实验内容3

/opt/conda/lib/python3.7/site-packages/tensorflow/python/saved_model/load.py in _call_attribute(instance, *args, **kwargs)
    507 
    508 def _call_attribute(instance, *args, **kwargs):
--> 509   return instance.__call__(*args, **kwargs)
    510 
    511 

/opt/conda/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py in __call__(self, *args, **kwds)
    778       else:
    779         compiler = "nonXla"
--> 780         result = self._call(*args, **kwds)
    781 
    782       new_tracing_count = self._get_tracing_count()

/opt/conda/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py in _call(self, *args, **kwds)
    812       # In this case we have not created variables on the first call. So we can
    813       # run the first trace but we should fail if variables are created.
--> 814       results = self._stateful_fn(*args, **kwds)
    815       if self._created_variables:
    816         raise ValueError("Creating variables on a non-first call to a function"

/opt/conda/lib/python3.7/site-packages/tensorflow/python/eager/function.py in __call__(self, *args, **kwargs)
   2826     """Calls a graph function specialized to the inputs."""
   2827     with self._lock:
-> 2828       graph_function, args, kwargs = self._maybe_define_function(args, kwargs)
   2829     return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
   2830 

/opt/conda/lib/python3.7/site-packages/tensorflow/python/eager/function.py in _maybe_define_function(self, args, kwargs)
   3211 
   3212       self._function_cache.missed.add(call_context_key)
-> 3213       graph_function = self._create_graph_function(args, kwargs)
   3214       self._function_cache.primary[cache_key] = graph_function
   3215       return graph_function, args, kwargs

/opt/conda/lib/python3.7/site-packages/tensorflow/python/eager/function.py in _create_graph_function(self, args, kwargs, override_flat_arg_shapes)
   3073             arg_names=arg_names,
   3074             override_flat_arg_shapes=override_flat_arg_shapes,
-> 3075             capture_by_value=self._capture_by_value),
   3076         self._function_attributes,
   3077         function_spec=self.function_spec,

/opt/conda/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py in func_graph_from_py_func(name, python_func, args, kwargs, signature, func_graph, autograph, autograph_options, add_control_dependencies, arg_names, op_return_value, collections, capture_by_value, override_flat_arg_shapes)
    984         _, original_func = tf_decorator.unwrap(python_func)
    985 
--> 986       func_outputs = python_func(*func_args, **func_kwargs)
    987 
    988       # invariant: `func_outputs` contains only Tensors, CompositeTensors,

/opt/conda/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py in wrapped_fn(*args, **kwds)
    598         # __wrapped__ allows AutoGraph to swap in a converted function. We give
    599         # the function a weak reference to itself to avoid a reference cycle.
--> 600         return weak_wrapped_fn().__wrapped__(*args, **kwds)
    601     weak_wrapped_fn = weakref.ref(wrapped_fn)
    602 

/opt/conda/lib/python3.7/site-packages/tensorflow/python/saved_model/function_deserialization.py in restored_function_body(*args, **kwargs)
    255         .format(_pretty_format_positional(args), kwargs,
    256                 len(saved_function.concrete_functions),
--> 257                 "\n\n".join(signature_descriptions)))
    258 
    259   concrete_function_objects = []

ValueError: Could not find matching function to call loaded from the SavedModel. Got:
  Positional arguments (2 total):
    * False
    * None
  Keyword arguments: {'input_word_ids': , 'input_mask': , 'input_type_ids': }

Expected these arguments to match one of the following 4 option(s):

Option 1:
  Positional arguments (3 total):
    * [TensorSpec(shape=(None, None), dtype=tf.int32, name='inputs/0'), TensorSpec(shape=(None, None), dtype=tf.int32, name='inputs/1'), TensorSpec(shape=(None, None), dtype=tf.int32, name='inputs/2')]
    * True
    * None
  Keyword arguments: {}

Option 2:
  Positional arguments (3 total):
    * [TensorSpec(shape=(None, None), dtype=tf.int32, name='input_word_ids'), TensorSpec(shape=(None, None), dtype=tf.int32, name='input_mask'), TensorSpec(shape=(None, None), dtype=tf.int32, name='input_type_ids')]
    * True
    * None
  Keyword arguments: {}

Option 3:
  Positional arguments (3 total):
    * [TensorSpec(shape=(None, None), dtype=tf.int32, name='inputs/0'), TensorSpec(shape=(None, None), dtype=tf.int32, name='inputs/1'), TensorSpec(shape=(None, None), dtype=tf.int32, name='inputs/2')]
    * False
    * None
  Keyword arguments: {}

Option 4:
  Positional arguments (3 total):
    * [TensorSpec(shape=(None, None), dtype=tf.int32, name='input_word_ids'), TensorSpec(shape=(None, None), dtype=tf.int32, name='input_mask'), TensorSpec(shape=(None, None), dtype=tf.int32, name='input_type_ids')]
    * False
    * None
  Keyword arguments: {}

第一个参数是list类型,所以继续修改为:

# 实验内容3
outputs,_ = hub_module([tf.constant(tmp_inputids),
    tf.constant(tmp_inputMask),
    tf.constant(tmp_segmentIds)])

果然,还是一样报错,百度一下,发现还有这种写法,但是还是一样的错误:

# 实验内容4
bert_inputs = dict(
    input_word_ids=tf.constant(tmp_inputids),
    input_mask=tf.constant(tmp_inputMask),
    input_type_ids=tf.constant(tmp_segmentIds))

outputs, _ = hub_module(bert_inputs)

尝试了这么多,说明可能不能这么写(如果有不同意的可以反馈一下这种方式的写法),那就老老实实重新构建bert吧,利用predict函数获取预训练bert的输出。

---------------------------------------------------------------------------------------------

 

预训练模型+自定义层——重新训练模型代码:

# 预训练bert模型地址,其中,末尾的 2 为该模型的版本号
# 目前还有很多模型是基于 TF1.0 的,选择的过程中请注意甄别,有些模型会明确写出来是试用哪个版本,或者,检查使用是否是 tfhub 0.5.0 或以上版本的 API hub.load(url) ,在之前版本使用的hub.Module(url)

BERT_URL = "https://hub.tensorflow.google.cn/tensorflow/bert_zh_L-12_H-768_A-12/2"

max_seq_length = 256  # 定义序列长度

# 定义3个输入
input_word_ids = tf.keras.layers.Input(shape=(max_seq_length,),
                                      dtype=tf.int32,name="input_word_ids")
input_mask = tf.keras.layers.Input(shape=(max_seq_length,),
                                   dtype=tf.int32,name="input_mask")
segment_ids = tf.keras.layers.Input(shape=(max_seq_length,),
                                    dtype=tf.int32,name="segment_ids")
# 构建预训练模型+自定义层
module = hub.KerasLayer(BERT_URL,trainable=True)
pooled_output, sequence_output = module([input_mask,segment_ids,input_word_ids])
out = tf.keras.layers.Dense(2,activation="softmax")(pooled_output)

model = tf.keras.Model(inputs=[input_word_ids,input_mask,segment_ids],outputs=out)
model.compile(optimizer='adam',
              loss=tf.losses.BinaryCrossentropy(from_logits=True),
              metrics=[tf.metrics.BinaryAccuracy(threshold=0.0, name='accuracy')])

# 这里确保输入和输出是numpy.array类型
model.fit([inputIds, inputMask, segmentIds],labels,
          batch_size=1,epochs=20,
          validation_data=([inputIds, inputMask, segmentIds],labels),
          verbose=1)

预训练模型微调方法:hub.KerasLayer()中将trainable改为False,利用keras中介绍的微调方式获取module的相应层并将对应层的trainable改为True就可以了。(未尝试)

 

总结:

(1)利用tensorflow-hub导入bert模型;

(2)利用tensorflow.keras搭建模型框架:可以直接构建bert输入和输出;或者将bert模型作为该框架中的某一层;

(3)训练模型或者直接获取bert模型输出;

(4)微调的话,将预训练bert模型作为框架的一层,并获取bert的某几层并修改trainable就行,具体可参考:keras微调预训练模型

 

参考链接:

tensorflow-hub官方相关介绍:https://tensorflow.google.cn/hub/installation

tensorflow-hub各种模型使用教程(为主):https://hub.tensorflow.google.cn/s?module-type=text-embedding,text-classification,text-generation,text-language-model,text-question-answering,text-retrieval-question-answering&tf-version=tf2&q=bert

从这里获得的tensorflow-hub国内镜像源:https://www.cnblogs.com/xingnie/p/12343601.html

----------

这个博客跟我写的差不多,不过用的模型不一样:https://cloud.tencent.com/developer/article/1537222

你可能感兴趣的:(tensorflow,深度学习,Python,tensorflow,深度学习,python)