风度78

BERT源码分析（PART I）

写在前面

[email protected]

最近在看paddle相关，于是就打算仔细过一遍百度ERNIE的源码。之前粗看的时候还没有ERNIE2.0、ERNIE-tiny,整体感觉跟BERT也挺类似的，不知道更新了之后会是啥样~看完也会整理跟下面类似的总结，刚好也在研究paddle或ERNIE的同学可以加我一起讨论哈哈哈

原内容@2019.05.16

BERT 模型也出来很久了，之前有看过论文和一些博客对其做了解读：NLP 大杀器 BERT 模型解读^[1]，但是一直没有细致地去看源码具体实现。最近有用到就抽时间来仔细看看记录下来，和大家一起讨论。

注意，源码阅读系列需要提前对 NLP 相关知识有所了解，比如 attention 机制、transformer 框架以及 python 和 tensorflow 基础等，关于 BERT 的原理不是本文的重点。

附上关于 BERT 的资料汇总：BERT 相关论文、文章和代码资源汇总^[2]

今天要介绍的是 BERT 最主要的模型实现部分-----BertModel，代码位于

modeling.py 模块^[3]

除了代码块外部，在代码块内部也有注释噢

如有解读不正确，请务必指出~

1、配置类（BertConfig）

这部分代码主要定义了 BERT 模型的一些默认参数，另外包括了一些文件处理函数。

class BertConfig(object):
  """BERT模型的配置类."""

  def __init__(self,
               vocab_size,
               hidden_size=768,
               num_hidden_layers=12,
               num_attention_heads=12,
               intermediate_size=3072,
               hidden_act="gelu",
               hidden_dropout_prob=0.1,
               attention_probs_dropout_prob=0.1,
               max_position_embeddings=512,
               type_vocab_size=16,
               initializer_range=0.02):

    self.vocab_size = vocab_size
    self.hidden_size = hidden_size
    self.num_hidden_layers = num_hidden_layers
    self.num_attention_heads = num_attention_heads
    self.hidden_act = hidden_act
    self.intermediate_size = intermediate_size
    self.hidden_dropout_prob = hidden_dropout_prob
    self.attention_probs_dropout_prob = attention_probs_dropout_prob
    self.max_position_embeddings = max_position_embeddings
    self.type_vocab_size = type_vocab_size
    self.initializer_range = initializer_range

  @classmethod
  def from_dict(cls, json_object):
    """Constructs a `BertConfig` from a Python dictionary of parameters."""
    config = BertConfig(vocab_size=None)
    for (key, value) in six.iteritems(json_object):
      config.__dict__[key] = value
    return config

  @classmethod
  def from_json_file(cls, json_file):
    """Constructs a `BertConfig` from a json file of parameters."""
    with tf.gfile.GFile(json_file, "r") as reader:
      text = reader.read()
    return cls.from_dict(json.loads(text))

  def to_dict(self):
    """Serializes this instance to a Python dictionary."""
    output = copy.deepcopy(self.__dict__)
    return output

  def to_json_string(self):
    """Serializes this instance to a JSON string."""
    return json.dumps(self.to_dict(), indent=2, sort_keys=True) + "\n"

「参数具体含义」

vocab_size：词表大小
hidden_size：隐藏层神经元数
num_hidden_layers：Transformer encoder 中的隐藏层数
*num_attention_heads：*multi-head attention 的 head 数
intermediate_size：encoder 的“中间”隐层神经元数（例如 feed-forward layer）
hidden_act：隐藏层激活函数
hidden_dropout_prob：隐层 dropout 率
attention_probs_dropout_prob：注意力部分的 dropout
max_position_embeddings：最大位置编码
type_vocab_size：token_type_ids 的词典大小
initializer_range：truncated_normal_initializer 初始化方法的 stdev

这里要注意一点，可能刚看的时候对type_vocab_size这个参数会有点不理解，其实就是在next sentence prediction任务里的Segment A和 Segment B。在下载的bert_config.json文件里也有说明，默认值应该为 2。参考这个 Issue^[4]

2、获取词向量（Embedding_lookup）

对于输入 word_ids，返回 embedding table。可以选用 one-hot 或者 tf.gather()

def embedding_lookup(input_ids,						# word_id：【batch_size, seq_length】
                     vocab_size,
                     embedding_size=128,
                     initializer_range=0.02,
                     word_embedding_name="word_embeddings",
                     use_one_hot_embeddings=False):

  # 该函数默认输入的形状为【batch_size, seq_length, input_num】
  # 如果输入为2D的【batch_size, seq_length】，则扩展到【batch_size, seq_length, 1】
  if input_ids.shape.ndims == 2:
    input_ids = tf.expand_dims(input_ids, axis=[-1])

  embedding_table = tf.get_variable(
      name=word_embedding_name,
      shape=[vocab_size, embedding_size],
      initializer=create_initializer(initializer_range))

  flat_input_ids = tf.reshape(input_ids, [-1])    #【batch_size*seq_length*input_num】
  if use_one_hot_embeddings:
    one_hot_input_ids = tf.one_hot(flat_input_ids, depth=vocab_size)
    output = tf.matmul(one_hot_input_ids, embedding_table)
  else:	# 按索引取值
    output = tf.gather(embedding_table, flat_input_ids)

  input_shape = get_shape_list(input_ids)

  # output：[batch_size, seq_length, num_inputs]
  # 转成:[batch_size, seq_length, num_inputs*embedding_size]
  output = tf.reshape(output,
  				      input_shape[0:-1] + [input_shape[-1] * embedding_size])
  return (output, embedding_table)

「参数具体含义」

input_ids：word id 【batch_size, seq_length】
vocab_size：embedding 词表
embedding_size：embedding 维度
initializer_range：embedding 初始化范围
word_embedding_name：embeddding table 命名
use_one_hot_embeddings：是否使用 one-hotembedding
Return：【batch_size, seq_length, embedding_size】

3、词向量的后续处理（embedding_postprocessor）

我们知道 BERT 模型的输入有三部分：token embedding ，segment embedding以及position embedding。上一节中我们只获得了 token embedding，这部分代码对其完善信息，正则化，dropout 之后输出最终 embedding。注意，在 Transformer 论文中的position embedding是由 sin/cos 函数生成的固定的值，而在这里代码实现中是跟普通 word embedding 一样随机生成的，可以训练的。作者这里这样选择的原因可能是 BERT 训练的数据比 Transformer 那篇大很多，完全可以让模型自己去学习。

def embedding_postprocessor(input_tensor,				# [batch_size, seq_length, embedding_size]
                            use_token_type=False,
                            token_type_ids=None,
                            token_type_vocab_size=16,		# 一般是2
                            token_type_embedding_name="token_type_embeddings",
                            use_position_embeddings=True,
                            position_embedding_name="position_embeddings",
                            initializer_range=0.02,
                            max_position_embeddings=512,    #最大位置编码，必须大于等于max_seq_len
                            dropout_prob=0.1):

  input_shape = get_shape_list(input_tensor, expected_rank=3)   #【batch_size,seq_length,embedding_size】
  batch_size = input_shape[0]
  seq_length = input_shape[1]
  width = input_shape[2]

  output = input_tensor

  # Segment position信息
  if use_token_type:
    if token_type_ids is None:
      raise ValueError("`token_type_ids` must be specified if"
                       "`use_token_type` is True.")
    token_type_table = tf.get_variable(
        name=token_type_embedding_name,
        shape=[token_type_vocab_size, width],
        initializer=create_initializer(initializer_range))
    # 由于token-type-table比较小，所以这里采用one-hot的embedding方式加速
    flat_token_type_ids = tf.reshape(token_type_ids, [-1])
    one_hot_ids = tf.one_hot(flat_token_type_ids, depth=token_type_vocab_size)
    token_type_embeddings = tf.matmul(one_hot_ids, token_type_table)
    token_type_embeddings = tf.reshape(token_type_embeddings,
                                       [batch_size, seq_length, width])
    output += token_type_embeddings

  # Position embedding信息
  if use_position_embeddings:
    # 确保seq_length小于等于max_position_embeddings
    assert_op = tf.assert_less_equal(seq_length, max_position_embeddings)
    with tf.control_dependencies([assert_op]):
      full_position_embeddings = tf.get_variable(
          name=position_embedding_name,
          shape=[max_position_embeddings, width],
          initializer=create_initializer(initializer_range))

      # 这里position embedding是可学习的参数，[max_position_embeddings, width]
      # 但是通常实际输入序列没有达到max_position_embeddings
      # 所以为了提高训练速度，使用tf.slice取出句子长度的embedding
      position_embeddings = tf.slice(full_position_embeddings, [0, 0],
                                     [seq_length, -1])
      num_dims = len(output.shape.as_list())

      # word embedding之后的tensor是[batch_size, seq_length, width]
	  # 因为位置编码是与输入内容无关，它的shape总是[seq_length, width]
	  # 我们无法把位置Embedding加到word embedding上
	  # 因此我们需要扩展位置编码为[1, seq_length, width]
	  # 然后就能通过broadcasting加上去了。
      position_broadcast_shape = []
      for _ in range(num_dims - 2):
        position_broadcast_shape.append(1)
      position_broadcast_shape.extend([seq_length, width])
      position_embeddings = tf.reshape(position_embeddings,
                                       position_broadcast_shape)
      output += position_embeddings

  output = layer_norm_and_dropout(output, dropout_prob)
  return output

4、构造 attention_mask

该部分代码的作用是构造 attention 可视域的 attention_mask，因为每个样本都经过 padding 过程，在做self-attention的是padding的部分不能attend到其他部分上。输入为形状为 [batch_size, from_seq_length,...] 的 padding 好的 input_ids 和形状为 [batch_size, to_seq_length] 的 mask 标记向量。

def create_attention_mask_from_input_mask(from_tensor, to_mask):
  from_shape = get_shape_list(from_tensor, expected_rank=[2, 3])
  batch_size = from_shape[0]
  from_seq_length = from_shape[1]

  to_shape = get_shape_list(to_mask, expected_rank=2)
  to_seq_length = to_shape[1]

  to_mask = tf.cast(
      tf.reshape(to_mask, [batch_size, 1, to_seq_length]), tf.float32)

  broadcast_ones = tf.ones(
      shape=[batch_size, from_seq_length, 1], dtype=tf.float32)

  mask = broadcast_ones * to_mask

  return mask

5、注意力层（attention layer）

这部分代码是「multi-head attention」的实现，主要来自《Attention is all you need》这篇论文。考虑key-query-value形式的 attention，输入的from_tensor当做是 query， to_tensor当做是 key 和 value，当两者相同的时候即为 self-attention。关于 attention 更详细的介绍可以转到【理解 Attention 机制原理及模型^[5]】。

def attention_layer(from_tensor,   # 【batch_size, from_seq_length, from_width】
                    to_tensor,		#【batch_size, to_seq_length, to_width】
                    attention_mask=None,		#【batch_size,from_seq_length, to_seq_length】
                    num_attention_heads=1,		# attention head numbers
                    size_per_head=512,			# 每个head的大小
                    query_act=None,				# query变换的激活函数
                    key_act=None,				# key变换的激活函数
                    value_act=None,				# value变换的激活函数
                    attention_probs_dropout_prob=0.0,		# attention层的dropout
                    initializer_range=0.02,					# 初始化取值范围
                    do_return_2d_tensor=False,				# 是否返回2d张量。
#如果True，输出形状【batch_size*from_seq_length,num_attention_heads*size_per_head】
#如果False，输出形状【batch_size, from_seq_length, num_attention_heads*size_per_head】
                    batch_size=None,						#如果输入是3D的，
#那么batch就是第一维，但是可能3D的压缩成了2D的，所以需要告诉函数batch_size
                    from_seq_length=None,					# 同上
                    to_seq_length=None):					# 同上

  def transpose_for_scores(input_tensor, batch_size, num_attention_heads,
                           seq_length, width):
    output_tensor = tf.reshape(
        input_tensor, [batch_size, seq_length, num_attention_heads, width])

    output_tensor = tf.transpose(output_tensor, [0, 2, 1, 3])	#[batch_size,  num_attention_heads, seq_length, width]
    return output_tensor

  from_shape = get_shape_list(from_tensor, expected_rank=[2, 3])
  to_shape = get_shape_list(to_tensor, expected_rank=[2, 3])

  if len(from_shape) != len(to_shape):
    raise ValueError(
        "The rank of `from_tensor` must match the rank of `to_tensor`.")

  if len(from_shape) == 3:
    batch_size = from_shape[0]
    from_seq_length = from_shape[1]
    to_seq_length = to_shape[1]
  elif len(from_shape) == 2:
    if (batch_size is None or from_seq_length is None or to_seq_length is None):
      raise ValueError(
          "When passing in rank 2 tensors to attention_layer, the values "
          "for `batch_size`, `from_seq_length`, and `to_seq_length` "
          "must all be specified.")

  # 为了方便备注shape，采用以下简写:
  #   B = batch size (number of sequences)
  #   F = `from_tensor` sequence length
  #   T = `to_tensor` sequence length
  #   N = `num_attention_heads`
  #   H = `size_per_head`

  # 把from_tensor和to_tensor压缩成2D张量
  from_tensor_2d = reshape_to_matrix(from_tensor)		# 【B*F, hidden_size】
  to_tensor_2d = reshape_to_matrix(to_tensor)			# 【B*T, hidden_size】

  # 将from_tensor输入全连接层得到query_layer
  # `query_layer` = [B*F, N*H]
  query_layer = tf.layers.dense(
      from_tensor_2d,
      num_attention_heads * size_per_head,
      activation=query_act,
      name="query",
      kernel_initializer=create_initializer(initializer_range))

  # 将from_tensor输入全连接层得到query_layer
  # `key_layer` = [B*T, N*H]
  key_layer = tf.layers.dense(
      to_tensor_2d,
      num_attention_heads * size_per_head,
      activation=key_act,
      name="key",
      kernel_initializer=create_initializer(initializer_range))

  # 同上
  # `value_layer` = [B*T, N*H]
  value_layer = tf.layers.dense(
      to_tensor_2d,
      num_attention_heads * size_per_head,
      activation=value_act,
      name="value",
      kernel_initializer=create_initializer(initializer_range))

  # query_layer转成多头：[B*F, N*H]==>[B, F, N, H]==>[B, N, F, H]
  query_layer = transpose_for_scores(query_layer, batch_size,
                                     num_attention_heads, from_seq_length,
                                     size_per_head)

  # key_layer转成多头：[B*T, N*H] ==> [B, T, N, H] ==> [B, N, T, H]
  key_layer = transpose_for_scores(key_layer, batch_size, num_attention_heads,
                                   to_seq_length, size_per_head)

  # 将query与key做点积，然后做一个scale，公式可以参见原始论文
  # `attention_scores` = [B, N, F, T]
  attention_scores = tf.matmul(query_layer, key_layer, transpose_b=True)
  attention_scores = tf.multiply(attention_scores,
                                 1.0 / math.sqrt(float(size_per_head)))

  if attention_mask is not None:
    # `attention_mask` = [B, 1, F, T]
    attention_mask = tf.expand_dims(attention_mask, axis=[1])

    # 如果attention_mask里的元素为1，则通过下面运算有（1-1）*-10000，adder就是0
    # 如果attention_mask里的元素为0，则通过下面运算有（1-0）*-10000，adder就是-10000
    adder = (1.0 - tf.cast(attention_mask, tf.float32)) * -10000.0

    # 我们最终得到的attention_score一般不会很大，
    #所以上述操作对mask为0的地方得到的score可以认为是负无穷
    attention_scores += adder

  # 负无穷经过softmax之后为0，就相当于mask为0的位置不计算attention_score
  # `attention_probs` = [B, N, F, T]
  attention_probs = tf.nn.softmax(attention_scores)

  # 对attention_probs进行dropout，这虽然有点奇怪，但是Transforme原始论文就是这么做的
  attention_probs = dropout(attention_probs, attention_probs_dropout_prob)

  # `value_layer` = [B, T, N, H]
  value_layer = tf.reshape(
      value_layer,
      [batch_size, to_seq_length, num_attention_heads, size_per_head])

  # `value_layer` = [B, N, T, H]
  value_layer = tf.transpose(value_layer, [0, 2, 1, 3])

  # `context_layer` = [B, N, F, H]
  context_layer = tf.matmul(attention_probs, value_layer)

  # `context_layer` = [B, F, N, H]
  context_layer = tf.transpose(context_layer, [0, 2, 1, 3])

  if do_return_2d_tensor:
    # `context_layer` = [B*F, N*H]
    context_layer = tf.reshape(
        context_layer,
        [batch_size * from_seq_length, num_attention_heads * size_per_head])
  else:
    # `context_layer` = [B, F, N*H]
    context_layer = tf.reshape(
        context_layer,
        [batch_size, from_seq_length, num_attention_heads * size_per_head])

  return context_layer

总结一下，attention layer 的主要流程：

对输入的 tensor 进行形状校验，提取batch_size、from_seq_length 、to_seq_length；
输入如果是 3d 张量则转化成 2d 矩阵；
from_tensor 作为 query， to_tensor 作为 key 和 value，经过一层全连接层后得到 query_layer、key_layer 、value_layer；
将上述张量通过transpose_for_scores转化成 multi-head；
根据论文公式计算 attention_score 以及 attention_probs（注意 attention_mask 的 trick）：

将得到的 attention_probs 与 value 相乘，返回 2D 或 3D 张量

6、Transformer

接下来的代码就是大名鼎鼎的 Transformer 的核心代码了，可以认为是"Attention is All You Need"原始代码重现。可以参见【原始论文^[6]】和【原始代码^[7]】。

def transformer_model(input_tensor,						# 【batch_size, seq_length, hidden_size】
                      attention_mask=None,				# 【batch_size, seq_length, seq_length】
                      hidden_size=768,
                      num_hidden_layers=12,
                      num_attention_heads=12,
                      intermediate_size=3072,
                      intermediate_act_fn=gelu,			# feed-forward层的激活函数
                      hidden_dropout_prob=0.1,
                      attention_probs_dropout_prob=0.1,
                      initializer_range=0.02,
                      do_return_all_layers=False):

  # 这里注意，因为最终要输出hidden_size， 我们有num_attention_head个区域，
  # 每个head区域有size_per_head多的隐层
  # 所以有 hidden_size = num_attention_head * size_per_head
  if hidden_size % num_attention_heads != 0:
    raise ValueError(
        "The hidden size (%d) is not a multiple of the number of attention "
        "heads (%d)" % (hidden_size, num_attention_heads))

  attention_head_size = int(hidden_size / num_attention_heads)
  input_shape = get_shape_list(input_tensor, expected_rank=3)
  batch_size = input_shape[0]
  seq_length = input_shape[1]
  input_width = input_shape[2]

  # 因为encoder中有残差操作，所以需要shape相同
  if input_width != hidden_size:
    raise ValueError("The width of the input tensor (%d) != hidden size (%d)" %
                     (input_width, hidden_size))

  # reshape操作在CPU/GPU上很快，但是在TPU上很不友好
  # 所以为了避免2D和3D之间的频繁reshape，我们把所有的3D张量用2D矩阵表示
  prev_output = reshape_to_matrix(input_tensor)

  all_layer_outputs = []
  for layer_idx in range(num_hidden_layers):
    with tf.variable_scope("layer_%d" % layer_idx):
      layer_input = prev_output

      with tf.variable_scope("attention"):
      # multi-head attention
        attention_heads = []
        with tf.variable_scope("self"):
        # self-attention
          attention_head = attention_layer(
              from_tensor=layer_input,
              to_tensor=layer_input,
              attention_mask=attention_mask,
              num_attention_heads=num_attention_heads,
              size_per_head=attention_head_size,
              attention_probs_dropout_prob=attention_probs_dropout_prob,
              initializer_range=initializer_range,
              do_return_2d_tensor=True,
              batch_size=batch_size,
              from_seq_length=seq_length,
              to_seq_length=seq_length)
          attention_heads.append(attention_head)

        attention_output = None
        if len(attention_heads) == 1:
          attention_output = attention_heads[0]
        else:
          # 如果有多个head，将他们拼接起来
          attention_output = tf.concat(attention_heads, axis=-1)

        # 对attention的输出进行线性映射, 目的是将shape变成与input一致
        # 然后dropout+residual+norm
        with tf.variable_scope("output"):
          attention_output = tf.layers.dense(
              attention_output,
              hidden_size,
              kernel_initializer=create_initializer(initializer_range))
          attention_output = dropout(attention_output, hidden_dropout_prob)
          attention_output = layer_norm(attention_output + layer_input)

      # feed-forward
      with tf.variable_scope("intermediate"):
        intermediate_output = tf.layers.dense(
            attention_output,
            intermediate_size,
            activation=intermediate_act_fn,
            kernel_initializer=create_initializer(initializer_range))

      # 对feed-forward层的输出使用线性变换变回‘hidden_size’
      # 然后dropout + residual + norm
      with tf.variable_scope("output"):
        layer_output = tf.layers.dense(
            intermediate_output,
            hidden_size,
            kernel_initializer=create_initializer(initializer_range))
        layer_output = dropout(layer_output, hidden_dropout_prob)
        layer_output = layer_norm(layer_output + attention_output)
        prev_output = layer_output
        all_layer_outputs.append(layer_output)

  if do_return_all_layers:
    final_outputs = []
    for layer_output in all_layer_outputs:
      final_output = reshape_from_matrix(layer_output, input_shape)
      final_outputs.append(final_output)
    return final_outputs
  else:
    final_output = reshape_from_matrix(prev_output, input_shape)
    return final_output

配上下图一同使用效果更佳，因为 BERT 里只有 encoder，所有 decoder 没有姓名

7、函数入口（init）

BertModel 类的构造函数，有了上面几节的铺垫，我们就可以来实现 BERT 模型了。

def __init__(self,
               config,							# BertConfig对象
               is_training,
               input_ids,						# 【batch_size, seq_length】
               input_mask=None,					# 【batch_size, seq_length】
               token_type_ids=None,				# 【batch_size, seq_length】
               use_one_hot_embeddings=False,	# 是否使用one-hot；否则tf.gather()
               scope=None):

    config = copy.deepcopy(config)
    if not is_training:
      config.hidden_dropout_prob = 0.0
      config.attention_probs_dropout_prob = 0.0

    input_shape = get_shape_list(input_ids, expected_rank=2)
    batch_size = input_shape[0]
    seq_length = input_shape[1]
	# 不做mask，即所有元素为1
    if input_mask is None:
      input_mask = tf.ones(shape=[batch_size, seq_length], dtype=tf.int32)

    if token_type_ids is None:
      token_type_ids = tf.zeros(shape=[batch_size, seq_length], dtype=tf.int32)

    with tf.variable_scope(scope, default_name="bert"):
      with tf.variable_scope("embeddings"):
        # word embedding
        (self.embedding_output, self.embedding_table) = embedding_lookup(
            input_ids=input_ids,
            vocab_size=config.vocab_size,
            embedding_size=config.hidden_size,
            initializer_range=config.initializer_range,
            word_embedding_name="word_embeddings",
            use_one_hot_embeddings=use_one_hot_embeddings)

        # 添加position embedding和segment embedding
        # layer norm + dropout
        self.embedding_output = embedding_postprocessor(
            input_tensor=self.embedding_output,
            use_token_type=True,
            token_type_ids=token_type_ids,
            token_type_vocab_size=config.type_vocab_size,
            token_type_embedding_name="token_type_embeddings",
            use_position_embeddings=True,
            position_embedding_name="position_embeddings",
            initializer_range=config.initializer_range,
            max_position_embeddings=config.max_position_embeddings,
            dropout_prob=config.hidden_dropout_prob)

      with tf.variable_scope("encoder"):

        # input_ids是经过padding的word_ids：[25, 120, 34, 0, 0]
        # input_mask是有效词标记：[1, 1, 1, 0, 0]
        attention_mask = create_attention_mask_from_input_mask(
            input_ids, input_mask)

        # transformer模块叠加
        # `sequence_output` shape = [batch_size, seq_length, hidden_size].
        self.all_encoder_layers = transformer_model(
            input_tensor=self.embedding_output,
            attention_mask=attention_mask,
            hidden_size=config.hidden_size,
            num_hidden_layers=config.num_hidden_layers,
            num_attention_heads=config.num_attention_heads,
            intermediate_size=config.intermediate_size,
            intermediate_act_fn=get_activation(config.hidden_act),
            hidden_dropout_prob=config.hidden_dropout_prob,
            attention_probs_dropout_prob=config.attention_probs_dropout_prob,
            initializer_range=config.initializer_range,
            do_return_all_layers=True)

	  # `self.sequence_output`是最后一层的输出，shape为【batch_size, seq_length, hidden_size】
      self.sequence_output = self.all_encoder_layers[-1]

      # ‘pooler’部分将encoder输出【batch_size, seq_length, hidden_size】
      # 转成【batch_size, hidden_size】
      with tf.variable_scope("pooler"):
        # 取最后一层的第一个时刻[CLS]对应的tensor， 对于分类任务很重要
		# sequence_output[:, 0:1, :]得到的是[batch_size, 1, hidden_size]
		# 我们需要用squeeze把第二维去掉
        first_token_tensor = tf.squeeze(self.sequence_output[:, 0:1, :], axis=1)
        # 然后再加一个全连接层，输出仍然是[batch_size, hidden_size]
        self.pooled_output = tf.layers.dense(
            first_token_tensor,
            config.hidden_size,
            activation=tf.tanh,
            kernel_initializer=create_initializer(config.initializer_range))

总结一哈

有了以上对源码的深入了解之后，我们在使用 BertModel 的时候就会更加得心应手。举个模型使用的简单栗子：

# 假设输入已经经过分词变成word_ids. shape=[2, 3]
input_ids = tf.constant([[31, 51, 99], [15, 5, 0]])
input_mask = tf.constant([[1, 1, 1], [1, 1, 0]])
# segment_emebdding. 表示第一个样本前两个词属于句子1，后一个词属于句子2.
# 第二个样本的第一个词属于句子1， 第二次词属于句子2，第三个元素0表示padding
# 原始代码是下面这样的，但是感觉么必要用 2，不知道是不是我哪里没理解
token_type_ids = tf.constant([[0, 0, 1], [0, 2, 0]])

# 创建BertConfig实例
config = modeling.BertConfig(vocab_size=32000, hidden_size=512,
         num_hidden_layers=8, num_attention_heads=6, intermediate_size=1024)

# 创建BertModel实例
model = modeling.BertModel(config=config, is_training=True,
     input_ids=input_ids, input_mask=input_mask, token_type_ids=token_type_ids)


label_embeddings = tf.get_variable(...)
#得到最后一层的第一个Token也就是[CLS]向量表示，可以看成是一个句子的embedding
pooled_output = model.get_pooled_output()
logits = tf.matmul(pooled_output, label_embeddings)

在 BERT 模型构建这一块的主要流程：

对输入序列进行 Embedding（三个），接下去就是‘Attention is all you need’的内容了
简单一点就是将 embedding 输入 transformer 得到输出结果；
详细一点就是 embedding --> N *【multi-head attention --> Add(Residual) &Norm--> Feed-Forward --> Add(Residual) &Norm】；
哈，是不是很简单~
源码中还有一些其他的辅助函数，不是很难理解，这里就不再啰嗦。

以上~

本文参考资料

[1]

NLP 大杀器 BERT 模型解读: https://blog.csdn.net/Kaiyuan_sjtu/article/details/83991186

[2]

BERT 相关论文、文章和代码资源汇总: http://www.52nlp.cn/bert-paper-%E8%AE%BA%E6%96%87-%E6%96%87%E7%AB%A0-%E4%BB%A3%E7%A0%81%E8%B5%84%E6%BA%90%E6%B1%87%E6%80%BB

[3]

modeling.py 模块: https://github.com/google-research/bert/blob/master/modeling.py

[4]

参考这个 Issue: https://github.com/google-research/bert/issues/16

[5]

理解 Attention 机制原理及模型: https://blog.csdn.net/Kaiyuan_sjtu/article/details/81806123

[6]

原始论文: https://arxiv.org/abs/1706.03762

[7]

原始代码: https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/models/transformer.py

往期精彩回顾




适合初学者入门人工智能的路线及资料下载机器学习在线手册深度学习在线手册AI基础下载（pdf更新到25集）备注：加入本站微信群或者qq群，请回复“加群”获取一折本站知识星球优惠券，请回复“知识星球”喜欢文章，点个在看

你可能感兴趣的:(BERT源码分析（PART I）)

回溯 Leetcode 332 重新安排行程 mmaerd Leetcode刷题学习记录 leetcode 算法职场和发展
重新安排行程Leetcode332学习记录自代码随想录给你一份航线列表tickets，其中tickets[i]=[fromi,toi]表示飞机出发和降落的机场地点。请你对该行程进行重新规划排序。所有这些机票都属于一个从JFK（肯尼迪国际机场）出发的先生，所以该行程必须从JFK开始。如果存在多种有效的行程，请你按字典排序返回最小的行程组合。例如，行程[“JFK”,“LGA”]与[“JFK”,“LGB
每日一题——第八十一题互联网打工人no1 C语言程序设计每日一练 c语言
打印如下图案:#includeintmain(){inti,j;charch='A';for(i=1;i<5;i++,ch++){for(j=0;j<5-i;j++){printf("");//控制空格输出}for(j=1;j<2*i;j++)//条件j<2*i{printf("%c",ch);//控制字符输出}printf("\n");}return0;}
每日一题——第八十三题互联网打工人no1 C语言程序设计每日一练 c语言
题目：将输入的整形数字输出,输出1990，输出"1990"#include#defineMAX_INPUT1024intmain(){intarrr_num[MAX_INPUT];intnum,i=0;printf("请输入一个数字：");scanf_s("%d",&num);while(num!=0){arrr_num[i++]=num%10;num/=10;}printf("\"");for(
C#中使用split分割字符串互联网打工人no1 c#
1、用字符串分隔：usingSystem.Text.RegularExpressions;stringstr="aaajsbbbjsccc";string[]sArray=Regex.Split(str,"js",RegexOptions.IgnoreCase);foreach(stringiinsArray)Response.Write(i.ToString()+"");输出结果：aaabbbc
python八股文面试题分享及解析(1) Shawn________ python
#1.'''a=1b=2不用中间变量交换a和b'''#1.a=1b=2a,b=b,aprint(a)print(b)结果：21#2.ll=[]foriinrange(3):ll.append({'num':i})print(11)结果:#[{'num':0},{'num':1},{'num':2}]#3.kk=[]a={'num':0}foriinrange(3):#0,12#可变类型，不仅仅改变
121. 买卖股票的最佳时机薄荷糖的味道_fb40
给定一个数组，它的第i个元素是一支给定股票第i天的价格。如果你最多只允许完成一笔交易（即买入和卖出一支股票），设计一个算法来计算你所能获取的最大利润。注意你不能在买入股票前卖出股票。示例1:输入:[7,1,5,3,6,4]输出:5解释:在第2天（股票价格=1）的时候买入，在第5天（股票价格=6）的时候卖出，最大利润=6-1=5。注意利润不能是7-1=6,因为卖出价格需要大于买入价格。示例2:输入:
libyuv之linux编译 jaronho Linux linux 运维服务器
文章目录一、下载源码二、编译源码三、注意事项1、银河麒麟系统（aarch64）（1）解决armv8-a+dotprod+i8mm指令集支持问题（2）解决armv9-a+sve2指令集支持问题一、下载源码到GitHub网站下载https://github.com/lemenkov/libyuv源码，或者用直接用git克隆到本地，如：gitclonehttps://github.com/lemenko
Rust基础知识 GRKF15 rust 开发语言后端
1.Rust语言简介1.1基础语法变量声明：let关键字用于声明变量，可以指定或不指定类型，如leta=10;和letmutc=30i32;。函数定义：使用fn关键字定义函数，并指定参数类型及返回类型，如fnadd(i:i32,j:i32)->i32{i+j}。控制流：包括if、else等，控制语句后需要使用;来结束语句。1.2数据类型整数类型：i8、i16、i32、i64、i128，以及无符号的
数据结构之哈希表 X同学的开始数据结构数据结构散列表
哈希表(散列表)出现的原因在顺序表中查找时，需要从表头开始，依次遍历比较a[i]与key的值是否相等，直到相等才返回索引i；在有序表中查找时，我们经常使用的是二分查找，通过比较key与a[i]的大小来折半查找，直到相等时才返回索引i。最终通过索引找到我们要找的元素。但是，这两种方法的效率都依赖于查找中比较的次数。我们有一种想法，能不能不经过比较，而是直接通过关键字key一次得到所要的结果呢？这时，
用Python实现读取统计单词个数程序媛了了 python 游戏 java
完整实例代码：fromcollectionsimportCounterdefpythonit():danci={}withopen("pythonit.txt","r",encoding="utf-8")asf:foriinf:words=i.strip().split()forwordinwords:ifwordnotindanci:danci[word]=1else:danci[word]+=
BART&BERT Ambition_LAO 深度学习
BART和BERT都是基于Transformer架构的预训练语言模型。模型架构：BERT(BidirectionalEncoderRepresentationsfromTransformers)主要是一个编码器（Encoder）模型，它使用了Transformer的编码器部分来处理输入的文本，并生成文本的表示。BERT特别擅长理解语言的上下文，因为它在预训练阶段使用了掩码语言模型（MLM）任务，即
python tif转png Python与遥感 python 开发语言
importosfromosgeoimportgdalimportnumpyasnpfromPILimportImage#提取432三波段fromspectralimport*#输入文件夹路径defget_img(dataset_img):width=dataset_img.RasterXSize#获取行列数height=dataset_img.RasterYSizebands=dataset_i
react-intl——react国际化使用方案苹果酱0567 面试题汇总与解析 java 开发语言中间件 spring boot 后端
国际化介绍i18n：internationalization国家化简称，首字母+首尾字母间隔的字母个数+尾字母，类似的还有k8s(Kubernetes)React-intl是React中最受欢迎的库。使用步骤安装#usenpmnpminstallreact-intl-D#useyarn项目入口文件配置//index.tsximportReactfrom"react";importReactDOMf
基于STM32的汽车仪表显示系统：集成CAN、UART与I2C总线设计流程极客小张 stm32 汽车嵌入式硬件物联网单片机 c语言
一、项目概述项目目标与用途本项目旨在设计和实现一个基于STM32微控制器的汽车仪表显示系统。该系统能够实时显示汽车的速度、转速、油量等关键信息，并通过CAN总线与其他汽车控制单元进行通信。这种仪表显示系统不仅提高了驾驶的安全性和便捷性，还能为汽车提供更智能的用户体验。技术栈关键词微控制器：STM32显示技术：TFTLCD/OLED传感器：速度传感器、温度传感器、油量传感器通信协议：CAN总线、UA
3286、穿越网格图的安全路径 Lenyiin 题解 c++算法 leetcode
3286、[中等]穿越网格图的安全路径1、题目描述给你一个mxn的二进制矩形grid和一个整数health表示你的健康值。你开始于矩形的左上角(0,0)，你的目标是矩形的右下角(m-1,n-1)。你可以在矩形中往上下左右相邻格子移动，但前提是你的健康值始终是正数。对于格子(i,j)，如果grid[i][j]=1，那么这个格子视为不安全的，会使你的健康值减少1。如果你可以到达最终的格子，请你返回tr
【数据结构-一维差分】力扣2848. 与车相交的点 hlc@ 数据结构数据结构 leetcode 算法
给你一个下标从0开始的二维整数数组nums表示汽车停放在数轴上的坐标。对于任意下标i，nums[i]=[starti,endi]，其中starti是第i辆车的起点，endi是第i辆车的终点。返回数轴上被车任意部分覆盖的整数点的数目。示例1：输入：nums=[[3,6],[1,5],[4,7]]输出：7解释：从1到7的所有点都至少与一辆车相交，因此答案为7。示例2：输入：nums=[[1,3],[5
✔2848. 与车相交的点程序员小小聪力扣 leetcode
代码实现：方法一：哈希表#definefmax(a,b)((a)>(b)?(a):(b))intnumberOfPoints(int**nums,intnumsSize,int*numsColSize){inthash[101]={0};intmax=0;for(inti=0;i=x){j--;}if(i=nums[i][0]){r=r>nums[i][1]?r:nums[i][1];}else{
AI大模型的架构演进与最新发展季风泯灭的季节 AI大模型应用技术二人工智能架构
随着深度学习的发展，AI大模型（LargeLanguageModels,LLMs）在自然语言处理、计算机视觉等领域取得了革命性的进展。本文将详细探讨AI大模型的架构演进，包括从Transformer的提出到GPT、BERT、T5等模型的历史演变，并探讨这些模型的技术细节及其在现代人工智能中的核心作用。一、基础模型介绍：Transformer的核心原理Transformer架构的背景在Transfo
斟一小组鸡血视频和自己一起成长
http://m.v.qq.com/play/play.html?coverid=&vid=c0518henl2a&ptag=2_6.0.0.14297_copy有一种努力叫做靠自己http://m.v.qq.com/play/play.html?coverid=&vid=i0547o426g4&ptag=2_6.0.0.14297_copy世界最励志短片https://v.qq.com/x/pa
You have an error in your SQL syntax； check the manual that corresponds to your MySQL server version 努力的菜鸟~ sql 数据库
YouhaveanerrorinyourSQLsyntax;checkthemanualthatcorrespondstoyourMySQLserverversionfortherightsyntaxtousenear‘IDENTIFIEDBY‘123456’WITHGRANTOPTION’atline1在mysql5.7之前GRANTALLPRIVILEGESON*.*TO'root'@'%'I
高级UI<第二十四篇>：Android中用到的矩阵常识 NoBugException
（1）定义在数学中，矩阵（Matrix）是一个按照长方阵列排列的复数或实数集合。由m×n个数aij排成的m行n列的数表称为m行n列的矩阵，简称m×n矩阵。记作：图片.png这m×n个数称为矩阵A的元素，简称为元，数aij位于矩阵A的第i行第j列，称为矩阵A的(i,j)元，以数aij为(i,j)元的矩阵可记为(aij)或(aij)m×n，m×n矩阵A也记作Amn。元素是实数的矩阵称为实矩阵，元素是复
linux脚本sed替换变量,sed 命令中替换值为shell变量诺坎普之约 linux脚本sed替换变量
文章目录sed命令中替换值为shell变量替换基本语法sed中替换使用shell变量总结参考文档sed命令中替换值为shell变量替换基本语法大家都是sed有很多用法，最多就应该是替换一些值了。让我们先回忆sed的替换语法。在sed进行替换的时候sed-i's/old/new/g'1.txtecho"hellooldfrank"|sed's/old/new/g'结果如下：hellonewfrank
前端代码上传文件余生逆风飞翔前端 javascript 开发语言
点击上传文件import{ElNotification}from'element-plus'import{API_CONFIG}from'../config/index.js'import{UploadFilled}from'@element-plus/icons-vue'import{reactive}from'vue'import{BASE_URL}from'../config/index'i
98_es生产集群部署之针对集群重启时的shard恢复耗时过长问题定制的重要参数小山居
98_es生产集群部署之针对集群重启时的shard恢复耗时过长问题定制的重要参数shardrecovery配置以及集群重启时的无意义shard重分配问题在集群重启的时候，有一些配置会影响shard恢复的过程。首先，我们需要理解默认配置下，shard恢复过程会发生什么事情。如果我们有10个node，每个node都有一个shard，可能是primaryshard或者replicashard，你有一个i
3.1 损失函数和优化：损失函数做只小考拉
用一个函数把W当做输入，然后看一下得分，定量地估计W的好坏，这个函数被称为“损失函数”。损失函数用于度量W的好坏。有了损失函数的概念后，就可以定量的衡量W到底是好还是坏，要找到一种有效的方法来从W的可行域里，找到W取何值时情况最不坏，，这个过程将会是一个优化过程。损失函数L_i定义：通过函数f给出预测的分数和真实的目标（或者说是标签y），可以定量的描述训练样本预测的好不好，最终的损失函数是在整个数
C语言---程序设计练习题目及学习方法1 Wanyu677 C语言 c语言学习方法算法
学习方法要多练习在这些题目中的代码和题目自己动手去敲练习也是在熟悉语法，写代码第一步就是熟悉语法练习是在锻炼编程思维，把实际问题转换为代码的能力学会画图画图去理解内存，理解指针这些比较难懂的知识画图可以更好的理清思路辅助理解，强化理解学会调试借助调试，更好的理解代码和感知代码找出代码中的bug和程序逻辑（1）自增自减运算符inta=5,b,c,i=10;b=a++;c=++b;printf("a=
Vicky的ScalersTalk第六轮新概念朗读持续力训练Day73 20210411 Vicky_b9de
练习材料：ModerncavemenPart-3ˈmɒdənˈkeɪvmənpɑːt-3Theyplungedintothelake,andafterloadingtheirgearonaninflatablerubberdinghy,letthecurrentcarrythemtotheotherside.Toprotectthemselvesfromtheicywater,theyhadtow
leetcode刷题day19|二叉树Part07（235. 二叉搜索树的最近公共祖先、701.二叉搜索树中的插入操作、450.删除二叉搜索树中的节点）小冉在学习 leetcode 算法数据结构
235.二叉搜索树的最近公共祖先思路：二叉搜索树首先考虑中序遍历。根据二叉搜索树的特性，如果p,q分别在中间节点的左右两边，该中间节点一定是最近公共祖先，如果在同一侧，则递归这一侧即可。递归三部曲：1、传入参数：根节点，p，q，返回节点。2、终止条件：因为p,q一定存在，所以不会遍历到树的最底层，因此可以不写终止条件3、递归逻辑：如果p,q均小于root的值，递归调用左子树；如果p,q均大于roo
golang 实现文件上传下载 wangwei830 go
Gin框架上传下载上传（支持批量上传）httpRouter.POST("/upload",func(ctx*gin.Context){forms,err:=ctx.MultipartForm()iferr!=nil{fmt.Println("error",err)}files:=forms.File["fileName"]for_,v:=rangefiles{iferr:=ctx.SaveUplo
自定义分区我的K8409 Hadoop hdfs hadoop 大数据
通过简单例子了解partition分区类的重写方法分区是在MR的过程中进行的，属于Shuffle阶段但是在Job端不要忘记进行调用：job.setPartitionerClass(xxx.class)按照年龄分区：classAgePartitionerextendsPartitioner{@OverridepublicintgetPartition(MyComparablekey,NullWrit
linux系统服务器下jsp传参数乱码 3213213333332132 java jsp linux windows xml
在一次解决乱码问题中，发现jsp在windows下用js原生的方法进行编码没有问题，但是到了linux下就有问题， escape,encodeURI,encodeURIComponent等都解决不了问题但是我想了下既然原生的方法不行，我用el标签的方式对中文参数进行加密解密总该可以吧。于是用了java的java.net.URLDecoder,结果还是乱码，最后在绝望之际，用了下面的方法解决了
Spring 注解区别以及应用 BlueSkator spring
1. @Autowired @Autowired是根据类型进行自动装配的。如果当Spring上下文中存在不止一个UserDao类型的bean，或者不存在UserDao类型的bean，会抛出 BeanCreationException异常，这时可以通过在该属性上再加一个@Qualifier注解来声明唯一的id解决问题。 2. @Qualifier 当spring中存在至少一个匹
printf和sprintf的应用 dcj3sjt126com PHP sprintf printf
<?php printf('b: %b c: %c d: %d <bf>f: %f', 80,80, 80, 80); echo ' '; printf('%0.2f %+d %0.2f ', 8, 8, 1235.456); printf('th
config.getInitParameter 171815164 parameter
web.xml <servlet> <servlet-name>servlet1</servlet-name> <jsp-file>/index.jsp</jsp-file> <init-param> <param-name>str</param-name>
Ant标签详解--基础操作 g21121 ant
Ant的一些核心概念： build.xml：构建文件是以XML 文件来描述的，默认构建文件名为build.xml。 project：每个构建文
[简单]代码片段_数据合并 53873039oycg 代码
合并规则:删除家长phone为空的记录,若一个家长对应多个孩子,保留一条家长记录,家长id修改为phone,对应关系也要修改。代码如下:
java 通信技术云端月影 Java 远程通信技术
在分布式服务框架中，一个最基础的问题就是远程服务是怎么通讯的，在Java领域中有很多可实现远程通讯的技术，例如：RMI、MINA、ESB、Burlap、Hessian、SOAP、EJB和JMS等，这些名词之间到底是些什么关系呢，它们背后到底是基于什么原理实现的呢，了解这些是实现分布式服务框架的基础知识，而如果在性能上有高的要求的话，那深入了解这些技术背后的机制就是必须的了，在这篇blog中我们将来
string与StringBuilder 性能差距到底有多大 aijuans
之前也看过一些对string与StringBuilder的性能分析，总感觉这个应该对整体性能不会产生多大的影响，所以就一直没有关注这块！由于学程序初期最先接触的string拼接，所以就一直没改变过自己的习惯！
今天碰到 java.util.ConcurrentModificationException 异常 antonyup_2006 java 多线程工作 IBM
今天改bug，其中有个实现是要对map进行循环，然后有删除操作，代码如下： Iterator<ListItem> iter = ItemMap.keySet.iterator(); while(iter.hasNext()){ ListItem it = iter.next(); //...一些逻辑操作 ItemMap.remove(it); } 结果运行报Con
PL/SQL的类型和JDBC操作数据库百合不是茶 PL/SQL表标量类型游标 PL/SQL记录
PL/SQL的标量类型: 字符,数字,时间,布尔,%type五中类型的 --标量：数据库中预定义类型的变量 --定义一个变长字符串 v_ename varchar2(10); --定义一个小数,范围 -9999.99~9999.99 v_sal number(6,2); --定义一个小数并给一个初始值为5.4 :=是pl/sql的赋值号
Mockito：一个强大的用于 Java 开发的模拟测试框架实例 bijian1013 mockito 单元测试
Mockito框架： Mockito是一个基于MIT协议的开源java测试框架。 Mockito区别于其他模拟框架的地方主要是允许开发者在没有建立“预期”时验证被测系统的行为。对于mock对象的一个评价是测试系统的测
精通Oracle10编程SQL(10)处理例外 bijian1013 oracle 数据库 plsql
/* *处理例外 */ --例外简介 --处理例外-传递例外 declare v_ename emp.ename%TYPE; begin SELECT ename INTO v_ename FROM emp where empno=&no; dbms_output.put_line('雇员名：'||v_ename); exceptio
【Java】Java执行远程机器上Linux命令 bit1129 linux命令
Java使用ethz通过ssh2执行远程机器Linux上命令，封装定义Linux机器的环境信息 package com.tom; import java.io.File; public class Env { private String hostaddr; //Linux机器的IP地址 private Integer po
java通信之Socket通信基础白糖_ java socket 网络协议
正处于网络环境下的两个程序，它们之间通过一个交互的连接来实现数据通信。每一个连接的通信端叫做一个Socket。一个完整的Socket通信程序应该包含以下几个步骤： ①创建Socket； ②打开连接到Socket的输入输出流； ④按照一定的协议对Socket进行读写操作； ④关闭Socket。 Socket通信分两部分：服务器端和客户端。服务器端必须优先启动，然后等待soc
angular.bind boyitech AngularJS angular.bind AngularJS API bind
angular.bind 描述：上下文，函数以及参数动态绑定，返回值为绑定之后的函数. 其中args是可选的动态参数，self在fn中使用this调用。使用方法： angular.bind(se
java-13个坏人和13个好人站成一圈，数到7就从圈里面踢出一个来，要求把所有坏人都给踢出来，所有好人都留在圈里。请找出初始时坏人站的位置。 bylijinnan java
import java.util.ArrayList; import java.util.List; public class KickOutBadGuys { /** * 题目：13个坏人和13个好人站成一圈，数到7就从圈里面踢出一个来，要求把所有坏人都给踢出来，所有好人都留在圈里。请找出初始时坏人站的位置。 * Maybe you can find out
Redis.conf配置文件及相关项说明（自查备用） Kai_Ge redis
Redis.conf配置文件及相关项说明 # Redis configuration file example # Note on units: when memory size is needed, it is possible to specifiy # it in the usual form of 1k 5GB 4M and so forth: #
[强人工智能]实现大规模拓扑分析是实现强人工智能的前奏 comsci 人工智能
真不好意思,各位朋友...博客再次更新... 节点数量太少,网络的分析和处理能力肯定不足,在面对机器人控制的需求方面,显得力不从心.... 但是,节点数太多,对拓扑数据处理的要求又很高,设计目标也很高,实现起来难度颇大...
记录一些常用的函数 dai_lm java
public static String convertInputStreamToString(InputStream is) { StringBuilder result = new StringBuilder(); if (is != null) try { InputStreamReader inputReader = new InputStreamRead
Hadoop中小规模集群的并行计算缺陷 datamachine mapreduce hadoop 并行计算
注：写这篇文章的初衷是因为Hadoop炒得有点太热，很多用户现有数据规模并不适用于Hadoop，但迫于扩容压力和去IOE（Hadoop的廉价扩展的确非常有吸引力）而尝试。尝试永远是件正确的事儿，但有时候不用太突进，可以调优或调需求，发挥现有系统的最大效用为上策。 -----------------------------------------------------------------
小学4年级英语单词背诵第二课 dcj3sjt126com english word
egg 蛋 twenty 二十 any 任何 well 健康的，好 twelve 十二 farm 农场 every 每一个 back 向后，回 fast 快速的 whose 谁的 much 许多 flower 花 watch 手表 very 非常，很 sport 运动 Chinese 中国的
自己实践了github的webhooks, linux上面的权限需要注意 dcj3sjt126com github webhook
环境, 阿里云服务器 1. 本地创建项目, push到github服务器上面 2. 生成www用户的密钥 sudo -u www ssh-keygen -t rsa -C "[email protected]" 3. 将密钥添加到github帐号的SSH_KEYS里面 3. 用www用户执行克隆, 源使
Java冒泡排序蕃薯耀冒泡排序 Java冒泡排序 Java排序
冒泡排序 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 蕃薯耀 2015年6月23日 10:40:14 星期二 http://fanshuyao.iteye.com/
Excle读取数据转换为实体List【基于apache-poi】 hanqunfeng apache
1.依赖apache-poi 2.支持xls和xlsx 3.支持按属性名称绑定数据值 4.支持从指定行、列开始读取 5.支持同时读取多个sheet 6.具体使用方式参见org.cpframework.utils.excelreader.CP_ExcelReaderUtilTest.java 比如： Str
3个处于草稿阶段的Javascript API介绍 jackyrong JavaScript
原文： http://www.sitepoint.com/3-new-javascript-apis-may-want-follow/?utm_source=html5weekly&utm_medium=email 本文中，介绍3个仍然处于草稿阶段，但应该值得关注的Javascript API. 1) Web Alarm API &
6个创建Web应用程序的高效PHP框架 lampcy Web 框架 PHP
以下是创建Web应用程序的PHP框架，有coder bay网站整理推荐： 1. CakePHP CakePHP是一个PHP快速开发框架，它提供了一个用于开发、维护和部署应用程序的可扩展体系。CakePHP使用了众所周知的设计模式，如MVC和ORM，降低了开发成本，并减少了开发人员写代码的工作量。 2. CodeIgniter CodeIgniter是一个非常小且功能强大的PHP框架，适合需
评"救市后中国股市新乱象泛起"谣言 nannan408
首先来看百度百家一位易姓作者的新闻：三个多星期来股市持续暴跌，跌得投资者及上市公司都处于极度的恐慌和焦虑中，都要寻找自保及规避风险的方式。面对股市之危机，政府突然进入市场救市，希望以此来重建市场信心，以此来扭转股市持续暴跌的预期。而政府进入市场后，由于市场运作方式发生了巨大变化，投资者及上市公司为了自保及为了应对这种变化，中国股市新的乱象也自然产生。首先，中国股市这两天
页面全屏遮罩的实现方式 Rainbow702 html css 遮罩 mask
之前做了一个页面，在点击了某个按钮之后，要求页面出现一个全屏遮罩，一开始使用了position:absolute来实现的。当时因为画面大小是固定的，不可以resize的，所以，没有发现问题。最近用了同样的做法做了一个遮罩，但是画面是可以进行resize的，所以就发现了一个问题，当画面被reisze到浏览器出现了滚动条的时候，就发现，用absolute 的做法是有问题的。后来改成fixed定位就
关于angularjs的点滴 tntxia AngularJS
angular是一个新兴的JS框架，和以往的框架不同的事，Angularjs更注重于js的建模，管理，同时也提供大量的组件帮助用户组建商业化程序，是一种值得研究的JS框架。 Angularjs使我们可以使用MVC的模式来写JS。Angularjs现在由谷歌来维护。这里我们来简单的探讨一下它的应用。首先使用Angularjs我
Nutz--->>反复新建ioc容器的后果 xiaoxiao1992428 DAO mvc IOC nutz
问题： public class DaoZ { public static Dao dao() { // 每当需要使用dao的时候就取一次 Ioc ioc = new NutIoc(new JsonLoader("dao.js")); return ioc.get(