保持一份率性

谷歌BERT预训练源码解析（二）：模型构建

前言

BERT的模型主要是基于Transformer架构（论文：Attention is all you need）。它抛开了RNN等固有模式，直接用注意力机制处理Seq2Seq问题，体现了大道至简的思想。网上对此模型解析的资料有很多，但大都千篇一律。这里推荐知乎的一篇《Attention is all you need》解读，我觉得这篇把transformer介绍的非常好。
由于模型最闹心的就是维度问题，维度理清了，理解模型就很容易，所以我在源码中会注释每个操作后tensor的维度信息。
下面开始介绍BERT的模型 modeling.py是怎么建立的，我始终认为读代码和注释是理解的最快方法，所以看代码时如果官方注释有的地方看不懂。请善看中文注释和维度信息

源码解析

模型配置参数

" attention_probs_dropout_prob": 0.1,         #乘法attention时，softmax后dropout概率
  "hidden_act": "gelu",         #激活函数
  "hidden_dropout_prob": 0.1,        #隐藏层dropout概率
  "hidden_size": 768,                #隐藏单元数
  "initializer_range": 0.02,          #初始化范围
  "intermediate_size": 3072,      #升维维度
  "max_position_embeddings": 512,     #一个大于seq_length的参数，用于生成position_embedding
  "num_attention_heads": 12,    #每个隐藏层中的attention head数
  "num_hidden_layers": 12,       #隐藏层数
  "type_vocab_size": 2,        #segment_ids类别 [0,1]
  "vocab_size": 30522       #词典中词数

这里的输入参数：input_ids,input_mask,token_type_ids对应上篇文章中输出的input_ids,input_mask,segment_ids

BertModel

这部分是总流程，整个modling脚本有900多行代码，所以我列个流程图一部一部走。整体流程如下。首先对input_ids和token_type_ids进行embedding操作，将embedding结果送入Transformer训练，最后得到编码结果。

def __init__(self,
               config,
               is_training,
               input_ids,
               input_mask=None,
               token_type_ids=None,
               use_one_hot_embeddings=True,
               scope=None):
    """Constructor for BertModel.
    Args:
      config: `BertConfig` instance.
      is_training: bool. rue for training model, false for eval model. Controls
        whether dropout will be applied.
      input_ids: int32 Tensor of shape [batch_size, seq_length].
      input_mask: (optional) int32 Tensor of shape [batch_size, seq_length].
      token_type_ids: (optional) int32 Tensor of shape [batch_size, seq_length].
      use_one_hot_embeddings: (optional) bool. Whether to use one-hot word
        embeddings or tf.embedding_lookup() for the word embeddings. On the TPU,
        it is must faster if this is True, on the CPU or GPU, it is faster if
        this is False.
      scope: (optional) variable scope. Defaults to "bert".
    Raises:
      ValueError: The config is invalid or one of the input tensor shapes
        is invalid.
    """
    config = copy.deepcopy(config)
    if not is_training:
      config.hidden_dropout_prob = 0.0
      config.attention_probs_dropout_prob = 0.0

    input_shape = get_shape_list(input_ids, expected_rank=2)
    batch_size = input_shape[0]
    seq_length = input_shape[1]

    if input_mask is None:
      input_mask = tf.ones(shape=[batch_size, seq_length], dtype=tf.int32)

    if token_type_ids is None:
      token_type_ids = tf.zeros(shape=[batch_size, seq_length], dtype=tf.int32)

    with tf.variable_scope(scope, default_name="bert"):
      with tf.variable_scope("embeddings"):
        # Perform embedding lookup on the word ids.
        
        #[batch_size,seq_length,embedding_size]    [vocab_size,embedding_size]
        (self.embedding_output, self.embedding_table) = embedding_lookup(    #word_embedding
            input_ids=input_ids,           #[batch_size,seq_length]
            vocab_size=config.vocab_size,
            embedding_size=config.hidden_size,
            initializer_range=config.initializer_range,
            word_embedding_name="word_embeddings",
            use_one_hot_embeddings=use_one_hot_embeddings)

        # Add positional embeddings and token type embeddings, then layer
        # normalize and perform dropout.
        self.embedding_output = embedding_postprocessor(       #token_embedding和position_embedding        [batch_size,seq_length,embedding_size]
            input_tensor=self.embedding_output,
            use_token_type=True,
            token_type_ids=token_type_ids,
            token_type_vocab_size=config.type_vocab_size,
            token_type_embedding_name="token_type_embeddings",
            use_position_embeddings=True,
            position_embedding_name="position_embeddings",
            initializer_range=config.initializer_range,
            max_position_embeddings=config.max_position_embeddings,
            dropout_prob=config.hidden_dropout_prob)

      with tf.variable_scope("encoder"):
        # This converts a 2D mask of shape [batch_size, seq_length] to a 3D
        # mask of shape [batch_size, seq_length, seq_length] which is used
        # for the attention scores.
        attention_mask = create_attention_mask_from_input_mask(     
            input_ids, input_mask)

        # Run the stacked transformer.
        # `sequence_output` shape = [batch_size, seq_length, hidden_size].
        self.all_encoder_layers = transformer_model(        #transformer_model  list(#[batch_size,seq_length,embedding_size])
            input_tensor=self.embedding_output,
            attention_mask=attention_mask,
            hidden_size=config.hidden_size,
            num_hidden_layers=config.num_hidden_layers,
            num_attention_heads=config.num_attention_heads,
            intermediate_size=config.intermediate_size,
            intermediate_act_fn=get_activation(config.hidden_act),
            hidden_dropout_prob=config.hidden_dropout_prob,
            attention_probs_dropout_prob=config.attention_probs_dropout_prob,
            initializer_range=config.initializer_range,
            do_return_all_layers=True)

      self.sequence_output = self.all_encoder_layers[-1]     #获取最后一层的输出
      # The "pooler" converts the encoded sequence tensor of shape
      # [batch_size, seq_length, hidden_size] to a tensor of shape
      # [batch_size, hidden_size]. This is necessary for segment-level
      # (or segment-pair-level) classification tasks where we need a fixed
      # dimensional representation of the segment.
      with tf.variable_scope("pooler"):
        # We "pool" the model by simply taking the hidden state corresponding
        # to the first token. We assume that this has been pre-trained
        first_token_tensor = tf.squeeze(self.sequence_output[:, 0:1, :], axis=1)    #取每个每个训练语料的第一个词的编码结果[CLS]，它有整条训练语料的编码信息   [batch_size, hidden_size]
        self.pooled_output = tf.layers.dense(     #接一个全连接层进行输出 [batch_size, hidden_size]
            first_token_tensor,
            config.hidden_size,
            activation=tf.tanh,
            kernel_initializer=create_initializer(config.initializer_range))

word embedding

首先看word_embedding部分,它传入input_ids，运用one_hot为中介返回embedding结果

def embedding_lookup(input_ids,
                     vocab_size,
                     embedding_size=128,
                     initializer_range=0.02,
                     word_embedding_name="word_embeddings",
                     use_one_hot_embeddings=False):
  """Looks up words embeddings for id tensor.
  Args:
    input_ids: int32 Tensor of shape [batch_size, seq_length] containing word
      ids.
    vocab_size: int. Size of the embedding vocabulary.
    embedding_size: int. Width of the word embeddings.
    initializer_range: float. Embedding initialization range.
    word_embedding_name: string. Name of the embedding table.
    use_one_hot_embeddings: bool. If True, use one-hot method for word
      embeddings. If False, use `tf.nn.embedding_lookup()`. One hot is better
      for TPUs.
  Returns:
    float Tensor of shape [batch_size, seq_length, embedding_size].
  """
  # This function assumes that the input is of shape [batch_size, seq_length,
  # num_inputs].
  #
  # If the input is a 2D tensor of shape [batch_size, seq_length], we
  # reshape to [batch_size, seq_length, 1].
  if input_ids.shape.ndims == 2:
    input_ids = tf.expand_dims(input_ids, axis=[-1])                 #最低维扩维 [batch_size,seq_length,1]

  embedding_table = tf.get_variable(
      name=word_embedding_name,
      shape=[vocab_size, embedding_size],
      initializer=create_initializer(initializer_range))

  if use_one_hot_embeddings:
    flat_input_ids = tf.reshape(input_ids, [-1])      #[batch_size*seq_length]
    one_hot_input_ids = tf.one_hot(flat_input_ids, depth=vocab_size)    #[batch_size*seq_length,vocab_size]
    output = tf.matmul(one_hot_input_ids, embedding_table)              #[batch_size*seq_length,embedding_size]
  else:
    output = tf.nn.embedding_lookup(embedding_table, input_ids)

  input_shape = get_shape_list(input_ids)

  output = tf.reshape(output,
                      input_shape[0:-1] + [input_shape[-1] * embedding_size])   #[batch_size,seq_length,embedding_size]
  return (output, embedding_table)

embedding_postprocessor

再看embedding_postprocessor 它包括token_type_embedding和position_embedding。也就是图中的Segement Embeddings和Position Embeddings。

但此代码中Position Embeddings部分与之前提出的Transformer不同，此代码中Position Embeddings是训练出来的，而传统的Transformer（如下）是固定值

def embedding_postprocessor(input_tensor,      #[batch_size,seq_length,embedding_size]
                            use_token_type=False,
                            token_type_ids=None,             #[batch_size,seq_length]
                            token_type_vocab_size=16,
                            token_type_embedding_name="token_type_embeddings",
                            use_position_embeddings=True,
                            position_embedding_name="position_embeddings",
                            initializer_range=0.02,
                            max_position_embeddings=512,
                            dropout_prob=0.1):
  """Performs various post-processing on a word embedding tensor.
  Args:
    input_tensor: float Tensor of shape [batch_size, seq_length,
      embedding_size].
    use_token_type: bool. Whether to add embeddings for `token_type_ids`.
    token_type_ids: (optional) int32 Tensor of shape [batch_size, seq_length].
      Must be specified if `use_token_type` is True.
    token_type_vocab_size: int. The vocabulary size of `token_type_ids`.
    token_type_embedding_name: string. The name of the embedding table variable
      for token type ids.
    use_position_embeddings: bool. Whether to add position embeddings for the
      position of each token in the sequence.
    position_embedding_name: string. The name of the embedding table variable
      for positional embeddings.
    initializer_range: float. Range of the weight initialization.
    max_position_embeddings: int. Maximum sequence length that might ever be
      used with this model. This can be longer than the sequence length of
      input_tensor, but cannot be shorter.
    dropout_prob: float. Dropout probability applied to the final output tensor.
  Returns:
    float tensor with same shape as `input_tensor`.
  Raises:
    ValueError: One of the tensor shapes or input values is invalid.
  """
  input_shape = get_shape_list(input_tensor, expected_rank=3)
  batch_size = input_shape[0]
  seq_length = input_shape[1]
  width = input_shape[2]

  output = input_tensor

  if use_token_type:     #Segement Embeddings部分
    if token_type_ids is None:
      raise ValueError("`token_type_ids` must be specified if"
                       "`use_token_type` is True.")
    token_type_table = tf.get_variable(
        name=token_type_embedding_name,
        shape=[token_type_vocab_size, width],
        initializer=create_initializer(initializer_range))
    # This vocab will be small so we always do one-hot here, since it is always
    # faster for a small vocabulary.
    flat_token_type_ids = tf.reshape(token_type_ids, [-1])     #[batch_size*seq_length]
    one_hot_ids = tf.one_hot(flat_token_type_ids, depth=token_type_vocab_size)   #[batch_size*seq_length,2] token_type只有0，1
    token_type_embeddings = tf.matmul(one_hot_ids, token_type_table)     #[batch_size*seq_length,embedding_size]
    token_type_embeddings = tf.reshape(token_type_embeddings,
                                       [batch_size, seq_length, width])     #[batch_size, seq_length, width=embedding_size]
    output += token_type_embeddings         #[batch_size, seq_length, embedding_size]

  if use_position_embeddings:       #Position Embeddings部分
    assert_op = tf.assert_less_equal(seq_length, max_position_embeddings)  #确保seq_length<max_position_embedding
    with tf.control_dependencies([assert_op]):
      full_position_embeddings = tf.get_variable(
          name=position_embedding_name,
          shape=[max_position_embeddings, width],
          initializer=create_initializer(initializer_range))
      # Since the position embedding table is a learned variable, we create it
      # using a (long) sequence length `max_position_embeddings`. The actual
      # sequence length might be shorter than this, for faster training of
      # tasks that do not have long sequences.
      #
      # So `full_position_embeddings` is effectively an embedding table
      # for position [0, 1, 2, ..., max_position_embeddings-1], and the current
      # sequence has positions [0, 1, 2, ... seq_length-1], so we can just
      # perform a slice.
      position_embeddings = tf.slice(full_position_embeddings, [0, 0],     #[seq_length,embedding_size]
                                     [seq_length, -1])
      num_dims = len(output.shape.as_list())

      # Only the last two dimensions are relevant (`seq_length` and `width`), so
      # we broadcast among the first dimensions, which is typically just
      # the batch size.
      position_broadcast_shape = []
      for _ in range(num_dims - 2):
        position_broadcast_shape.append(1)
      position_broadcast_shape.extend([seq_length, width])      #[1,seq_length,embedding_size]
      position_embeddings = tf.reshape(position_embeddings,     #[1,seq_length,embedding_size]
                                       position_broadcast_shape)
      output += position_embeddings               #[batch_size, seq_length, embedding_size] 与#[1,seq_length,embedding_size]相加
#因为每一个batch的同一位置的position_embedding是一样的，所以相当于batch_size个position_embeddings与output相加

  output = layer_norm_and_dropout(output, dropout_prob)
  return output

Transformer

embedding之后，首先构造一个attention_mask，这个attention_mask表示的含义是将原来的input_mask的[batch_size,seq_length]扩维到[batch_size,from_seq_length,to_seq_length]。保证对于每个from_seq_length都有一个input_mask。之后将他们传入到transformer模型。
transformer整体架构如图所示

下面我们来看transformer_model。首先对embedding进行multi-head attention,对输入进行残差和layer_norm。后传入feed forward，再进行残差和layer_norm。
本块代码中与原论文中不一样的点为：在进行multi-head attention后先链接了一个全连接层，再进行的残差和layer_norm。而原论文中貌似没有那个全连接层。下面是代码，关键部分我已写上注释

def transformer_model(input_tensor,
                      attention_mask=None,   #[batch_size,form_seq_length,to_seq_length]
                      hidden_size=768,
                      num_hidden_layers=12,
                      num_attention_heads=12,
                      intermediate_size=3072,
                      intermediate_act_fn=gelu,
                      hidden_dropout_prob=0.1,
                      attention_probs_dropout_prob=0.1,
                      initializer_range=0.02,
                      do_return_all_layers=False):
  """Multi-headed, multi-layer Transformer from "Attention is All You Need".
  This is almost an exact implementation of the original Transformer encoder.
  See the original paper:
  https://arxiv.org/abs/1706.03762
  Also see:
  https://github.com/tensorflow/tensor2tensor/blob/master/tensor2tensor/models/transformer.py
  Args:
    input_tensor: float Tensor of shape [batch_size, seq_length, hidden_size].
    attention_mask: (optional) int32 Tensor of shape [batch_size, seq_length,
      seq_length], with 1 for positions that can be attended to and 0 in
      positions that should not be.
    hidden_size: int. Hidden size of the Transformer.
    num_hidden_layers: int. Number of layers (blocks) in the Transformer.
    num_attention_heads: int. Number of attention heads in the Transformer.
    intermediate_size: int. The size of the "intermediate" (a.k.a., feed
      forward) layer.
    intermediate_act_fn: function. The non-linear activation function to apply
      to the output of the intermediate/feed-forward layer.
    hidden_dropout_prob: float. Dropout probability for the hidden layers.
    attention_probs_dropout_prob: float. Dropout probability of the attention
      probabilities.
    initializer_range: float. Range of the initializer (stddev of truncated
      normal).
    do_return_all_layers: Whether to also return all layers or just the final
      layer.
  Returns:
    float Tensor of shape [batch_size, seq_length, hidden_size], the final
    hidden layer of the Transformer.
  Raises:
    ValueError: A Tensor shape or parameter is invalid.
  """
  if hidden_size % num_attention_heads != 0:
    raise ValueError(
        "The hidden size (%d) is not a multiple of the number of attention "
        "heads (%d)" % (hidden_size, num_attention_heads))

  attention_head_size = int(hidden_size / num_attention_heads)
  input_shape = get_shape_list(input_tensor, expected_rank=3)
  batch_size = input_shape[0]
  seq_length = input_shape[1]
  input_width = input_shape[2]

  # The Transformer performs sum residuals on all layers so the input needs
  # to be the same as the hidden size.
  if input_width != hidden_size:
    raise ValueError("The width of the input tensor (%d) != hidden size (%d)" %
                     (input_width, hidden_size))

  # We keep the representation as a 2D tensor to avoid re-shaping it back and
  # forth from a 3D tensor to a 2D tensor. Re-shapes are normally free on
  # the GPU/CPU but may not be free on the TPU, so we want to minimize them to
  # help the optimizer.
  prev_output = reshape_to_matrix(input_tensor)       #这里官方说为了避免来回升降维，所以直接先变形为2D，最后再恢复成3D   [batch_size*seq_length,hidden_size]

  all_layer_outputs = []
  for layer_idx in range(num_hidden_layers):
    with tf.variable_scope("layer_%d" % layer_idx):
      layer_input = prev_output

      with tf.variable_scope("attention"):
        attention_heads = []
        with tf.variable_scope("self"):
          attention_head = attention_layer(            #进行self_attention 即multi-head attention
              from_tensor=layer_input,      #[batch_size*seq_length,hidden_size]
              to_tensor=layer_input,        #[batch_size*seq_length,hidden_size]
              attention_mask=attention_mask,
              num_attention_heads=num_attention_heads,
              size_per_head=attention_head_size,
              attention_probs_dropout_prob=attention_probs_dropout_prob,
              initializer_range=initializer_range,
              do_return_2d_tensor=True,
              batch_size=batch_size,
              from_seq_length=seq_length,
              to_seq_length=seq_length)
          attention_heads.append(attention_head)

        attention_output = None
        if len(attention_heads) == 1:
          attention_output = attention_heads[0]
        else:
          # In the case where we have other sequences, we just concatenate
          # them to the self-attention head before the projection.
          attention_output = tf.concat(attention_heads, axis=-1)

        # Run a linear projection of `hidden_size` then add a residual
        # with `layer_input`.
        with tf.variable_scope("output"):
          attention_output = tf.layers.dense(                  #对attention的输出做一个全连接层
              attention_output,
              hidden_size,
              kernel_initializer=create_initializer(initializer_range))
          attention_output = dropout(attention_output, hidden_dropout_prob)   
          attention_output = layer_norm(attention_output + layer_input)         #残差和layer_norm
	  #Feed Foward过程，先对输出升维、再进行降维
      # The activation is only applied to the "intermediate" hidden layer.
      with tf.variable_scope("intermediate"):
        intermediate_output = tf.layers.dense(             #升维
            attention_output,
            intermediate_size,
            activation=intermediate_act_fn,
            kernel_initializer=create_initializer(initializer_range))

      # Down-project back to `hidden_size` then add the residual.
      with tf.variable_scope("output"):                       #降维
        layer_output = tf.layers.dense(
            intermediate_output,
            hidden_size,
            kernel_initializer=create_initializer(initializer_range))
        layer_output = dropout(layer_output, hidden_dropout_prob)
        layer_output = layer_norm(layer_output + attention_output)    #加入残差
        prev_output = layer_output                 #本层输出作为下一层输入
        all_layer_outputs.append(layer_output)       #所有层的输出结果列表

  if do_return_all_layers:
    final_outputs = []
    for layer_output in all_layer_outputs:
      final_output = reshape_from_matrix(layer_output, input_shape)
      final_outputs.append(final_output)
    return final_outputs
  else:
    final_output = reshape_from_matrix(prev_output, input_shape)
    return final_output

self_attention

接下来介绍self_attention机制。他运用乘法注意力，自己和自己做attention，使每个词都全局语义信息。同时运用Multi-head attention。即将hidden_size平分为多个部分(head)。每个head进行self_attention。不同head学习不同子空间语义。

下面是代码，关键部分我已写上注释。首先将输入的key和value，reshape成[batch_size,num_head,seq_length,size_per_head]。在对这些head进行乘法注意力运算。经过softmax后乘以value。最后返回tensor with shape [batch_size*seq_length,hidden_size]

def attention_layer(from_tensor,      #from_tensor和to_tensor都是输入embedding  [batch_size*seq_length,hidden_size]
                    to_tensor,
                    attention_mask=None,    #[batch_size,form_seq_length,to_seq_length]
                    num_attention_heads=1,
                    size_per_head=512,
                    query_act=None,
                    key_act=None,
                    value_act=None,
                    attention_probs_dropout_prob=0.0,
                    initializer_range=0.02,
                    do_return_2d_tensor=False,
                    batch_size=None,
                    from_seq_length=None,
                    to_seq_length=None):
  """Performs multi-headed attention from `from_tensor` to `to_tensor`.
  This is an implementation of multi-headed attention based on "Attention
  is all you Need". If `from_tensor` and `to_tensor` are the same, then
  this is self-attention. Each timestep in `from_tensor` attends to the
  corresponding sequence in `to_tensor`, and returns a fixed-with vector.
  This function first projects `from_tensor` into a "query" tensor and
  `to_tensor` into "key" and "value" tensors. These are (effectively) a list
  of tensors of length `num_attention_heads`, where each tensor is of shape
  [batch_size, seq_length, size_per_head].
  Then, the query and key tensors are dot-producted and scaled. These are
  softmaxed to obtain attention probabilities. The value tensors are then
  interpolated by these probabilities, then concatenated back to a single
  tensor and returned.
  In practice, the multi-headed attention are done with transposes and
  reshapes rather than actual separate tensors.
  Args:
    from_tensor: float Tensor of shape [batch_size, from_seq_length,
      from_width].
    to_tensor: float Tensor of shape [batch_size, to_seq_length, to_width].
    attention_mask: (optional) int32 Tensor of shape [batch_size,
      from_seq_length, to_seq_length]. The values should be 1 or 0. The
      attention scores will effectively be set to -infinity for any positions in
      the mask that are 0, and will be unchanged for positions that are 1.
    num_attention_heads: int. Number of attention heads.
    size_per_head: int. Size of each attention head.
    query_act: (optional) Activation function for the query transform.
    key_act: (optional) Activation function for the key transform.
    value_act: (optional) Activation function for the value transform.
    attention_probs_dropout_prob: (optional) float. Dropout probability of the
      attention probabilities.
    initializer_range: float. Range of the weight initializer.
    do_return_2d_tensor: bool. If True, the output will be of shape [batch_size
      * from_seq_length, num_attention_heads * size_per_head]. If False, the
      output will be of shape [batch_size, from_seq_length, num_attention_heads
      * size_per_head].
    batch_size: (Optional) int. If the input is 2D, this might be the batch size
      of the 3D version of the `from_tensor` and `to_tensor`.
    from_seq_length: (Optional) If the input is 2D, this might be the seq length
      of the 3D version of the `from_tensor`.
    to_seq_length: (Optional) If the input is 2D, this might be the seq length
      of the 3D version of the `to_tensor`.
  Returns:
    float Tensor of shape [batch_size, from_seq_length,
      num_attention_heads * size_per_head]. (If `do_return_2d_tensor` is
      true, this will be of shape [batch_size * from_seq_length,
      num_attention_heads * size_per_head]).
  Raises:
    ValueError: Any of the arguments or tensor shapes are invalid.
  """

  def transpose_for_scores(input_tensor, batch_size, num_attention_heads,
                           seq_length, width):
    output_tensor = tf.reshape(
        input_tensor, [batch_size, seq_length, num_attention_heads, width])

    output_tensor = tf.transpose(output_tensor, [0, 2, 1, 3])
    return output_tensor

  from_shape = get_shape_list(from_tensor, expected_rank=[2, 3])
  to_shape = get_shape_list(to_tensor, expected_rank=[2, 3])

  if len(from_shape) != len(to_shape):
    raise ValueError(
        "The rank of `from_tensor` must match the rank of `to_tensor`.")

  if len(from_shape) == 3:
    batch_size = from_shape[0]
    from_seq_length = from_shape[1]
    to_seq_length = to_shape[1]
  elif len(from_shape) == 2:
    if (batch_size is None or from_seq_length is None or to_seq_length is None):
      raise ValueError(
          "When passing in rank 2 tensors to attention_layer, the values "
          "for `batch_size`, `from_seq_length`, and `to_seq_length` "
          "must all be specified.")

  # Scalar dimensions referenced here:
  #   B = batch size (number of sequences)
  #   F = `from_tensor` sequence length
  #   T = `to_tensor` sequence length
  #   N = `num_attention_heads`
  #   H = `size_per_head`

  from_tensor_2d = reshape_to_matrix(from_tensor)   #[batch_size*seq_length,hidden_size]
  to_tensor_2d = reshape_to_matrix(to_tensor)          #[batch_size*seq_length,hidden_size]
#首先将key和value输入进全连接层 但是激活函数为None，这里为什么我也不知道。。。
  # `query_layer` = [B*F, N*H]
  query_layer = tf.layers.dense(
      from_tensor_2d,
      num_attention_heads * size_per_head,
      activation=query_act,              #None
      name="query",
      kernel_initializer=create_initializer(initializer_range)) # [batch_size*seq_length,hidden_size] hidden_size即num_attention_heads*size_per_head

  # `key_layer` = [B*T, N*H]
  key_layer = tf.layers.dense(
      to_tensor_2d,
      num_attention_heads * size_per_head,
      activation=key_act,               #None
      name="key",
      kernel_initializer=create_initializer(initializer_range))   

  # `value_layer` = [B*T, N*H]
  value_layer = tf.layers.dense(
      to_tensor_2d,
      num_attention_heads * size_per_head,
      activation=value_act,            #None
      name="value",
      kernel_initializer=create_initializer(initializer_range))
#reshape成四位，用于注意力矩阵运算
  # `query_layer` = [B, N, F, H]
  query_layer = transpose_for_scores(query_layer, batch_size,      #将num_attention_heads调到第二维。这里表示每个batch有N个head，每个head有F个token，每个token用H表示。不同head学习不同子空间的特征
                                     num_attention_heads, from_seq_length,
                                     size_per_head)

  # `key_layer` = [B, N, T, H]
  key_layer = transpose_for_scores(key_layer, batch_size, num_attention_heads,
                                   to_seq_length, size_per_head)

  # Take the dot product between "query" and "key" to get the raw
  # attention scores.   乘法注意力
  # `attention_scores` = [B, N, F, T]
  attention_scores = tf.matmul(query_layer, key_layer, transpose_b=True)
  attention_scores = tf.multiply(attention_scores,
                                 1.0 / math.sqrt(float(size_per_head)))

  if attention_mask is not None:
    # `attention_mask` = [B, 1, F, T]
    attention_mask = tf.expand_dims(attention_mask, axis=[1])
    
    #这部分将每条训练语料的结尾padding的部分都变为一个极小值，其他有实数据的部分都为0
    # Since attention_mask is 1.0 for positions we want to attend and 0.0 for
    # masked positions, this operation will create a tensor which is 0.0 for
    # positions we want to attend and -10000.0 for masked positions.
    adder = (1.0 - tf.cast(attention_mask, tf.float32)) * -10000.0

    # Since we are adding it to the raw scores before the softmax, this is
    # effectively the same as removing these entirely.
    #相加后，有实数据的部分加的，padding部分都是一个极小值
    attention_scores += adder

  # Normalize the attention scores to probabilities.
  # `attention_probs` = [B, N, F, T]
  attention_probs = tf.nn.softmax(attention_scores)

  # This is actually dropping out entire tokens to attend to, which might
  # seem a bit unusual, but is taken from the original Transformer paper.
  attention_probs = dropout(attention_probs, attention_probs_dropout_prob)

  # `value_layer` = [B, T, N, H]
  value_layer = tf.reshape(
      value_layer,
      [batch_size, to_seq_length, num_attention_heads, size_per_head])

  # `value_layer` = [B, N, T, H]
  value_layer = tf.transpose(value_layer, [0, 2, 1, 3])

  # `context_layer` = [B, N, F, H]
 # 注意力矩阵乘以value
  context_layer = tf.matmul(attention_probs, value_layer)

  # `context_layer` = [B, F, N, H]
  context_layer = tf.transpose(context_layer, [0, 2, 1, 3])

  if do_return_2d_tensor:
 # 返回2D结果
    # `context_layer` = [B*F, N*V]
    context_layer = tf.reshape(
        context_layer,
        [batch_size * from_seq_length, num_attention_heads * size_per_head])
  else:
    # `context_layer` = [B, F, N*V]
    context_layer = tf.reshape(
        context_layer,
        [batch_size, from_seq_length, num_attention_heads * size_per_head])

  return context_layer

模型应用

模型怎么用呢，在BertModel class中有两个函数。get_pool_output表示获取每个batch第一个词的[CLS]表示结果。BERT认为这个词包含了整条语料的信息；适用于句子级别的分类问题。get_sequence_output表示BERT最终的输出结果,shape为[batch_size,seq_length,hidden_size]。可以直观理解为对每条语料的最终表示，适用于seq2seq问题。

def get_pooled_output(self):
  return self.pooled_outp          #[batch_size, hidden_size]
def get_sequence_output(self):
  """Gets final hidden layer of encoder.
  Returns:
    float Tensor of shape [batch_size, seq_length, hidden_size] corresponding
    to the final hidden of the transformer encoder.
  """
  return self.sequence_output

下一篇是训练过程。最近突然有两件事要忙，所以可能要鸽几天了

你可能感兴趣的:(BERT)

BART&BERT Ambition_LAO 深度学习
BART和BERT都是基于Transformer架构的预训练语言模型。模型架构：BERT(BidirectionalEncoderRepresentationsfromTransformers)主要是一个编码器（Encoder）模型，它使用了Transformer的编码器部分来处理输入的文本，并生成文本的表示。BERT特别擅长理解语言的上下文，因为它在预训练阶段使用了掩码语言模型（MLM）任务，即
AI大模型的架构演进与最新发展季风泯灭的季节 AI大模型应用技术二人工智能架构
随着深度学习的发展，AI大模型（LargeLanguageModels,LLMs）在自然语言处理、计算机视觉等领域取得了革命性的进展。本文将详细探讨AI大模型的架构演进，包括从Transformer的提出到GPT、BERT、T5等模型的历史演变，并探讨这些模型的技术细节及其在现代人工智能中的核心作用。一、基础模型介绍：Transformer的核心原理Transformer架构的背景在Transfo
go语言安装快速入门吉祥鸟hu
[TOC]go语言是什么Go是一个开源的编程语言，它能让构造简单、可靠且高效的软件变得容易。Go是从2007年末由RobertGriesemer,RobPike,KenThompson主持开发，后来还加入了IanLanceTaylor,RussCox等人，并最终于2009年11月开源，在2012年早些时候发布了Go1稳定版本。现在Go的开发已经是完全开放的，并且拥有一个活跃的社区如何安装环境笔者这
解决BERT模型bert-base-chinese报错（无法自动联网下载）搬砖修狗 bert 人工智能深度学习 python
一、下载问题hugging-face是访问BERT模型的最初网站，但是目前hugging-face在中国多地不可达，在代码中涉及到该网站的模型都会报错，本文我们就以bert-base-chinese报错为例，提供一个下载到本地的方法来解决问题。二、网站google-bert(BERTcommunity)Thisorganizationismaintainedbythetransformerstea
ROS yaml参数文件的使用 Sun Shiteng ROS
举个例子，若在params.yaml文件中定义如下参数LidarImageFusion:points_src:"/hilbert_h/deskew/cloud_info"image_src:"/usb_cam0/image_raw"camera_info_src:"/home/hdj/fusion_slam/Color_SLAM_ws/src/hilbert_h/config/firefly_8s
《昇思 25 天学习打卡营第 25 天 | 基于 MindSpore 实现 BERT 对话情绪识别》 Sam9029 Mindscope模型学习深度学习
《昇思25天学习打卡营第25天|基于MindSpore实现BERT对话情绪识别》活动地址：https://xihe.mindspore.cn/events/mindspore-training-camp签名：Sam9029环境配置确保安装了正确版本的MindSpore和MindNLP库。!pipuninstallmindspore-y!pipinstall-ihttps://pypi.mirror
Go的学习路线 JSU-YSJ Golang基础学习 golang 学习开发语言
Golang简介go语言Go（又称Golang）是Google的RobertGriesemer，RobPike及KenThompson开发的一种静态强类型、编译型语言。Go语言语法与C相近，但功能上有：内存安全，GC（垃圾回收），结构形态及CSP-style并发计算。为什么要学习Go现有的编程语言风格各异，不能完全的运动好电脑的硬件，不高效，及各种优势于一身的语言Golang(谷歌创建)兼容静态编
爱无常，恨无常，珍惜好时光爱博文学翻译社
爱无常，恨无常，珍惜好时光编辑:AlbertXu片尾曲《匆匆那年》很好听，看的过程中感觉美好、惊醒、奇妙、困惑和无常：1.美好的是青春，是逝去的时光，是那些已经改变又从未改变的人，也是那些深深刻在脑海中的爱的印记，甚至是后悔，那些证明我们存在于世的全部经历。当人们从一个原点出发，相逢又散去，投入到各自的生活洪流中时，片刻的驻足，怀念起过去，几乎很多人都在习惯性美化它们。长大后觉得甜蜜的回忆，在经历
大规模语言模型的书籍分享，从零基础入门到精通非常详细收藏我这一篇就够了黑客-雨语言模型人工智能自然语言处理学习大模型学习大模型入门大模型教程
在当今人工智能领域，大规模语言模型成为了研究和应用的热点之一。它们以其大规模的参数和强大的性能表现，推动着机器学习和深度学习技术的发展。对于GPT系列大规模语言模型的发展历程，有两点令人印象深刻。第一点是可拓展的训练架构与学习范式:Transformer架构能够拓展到百亿、千亿甚至万亿参数规模，并且将预训练任务统一为预测下一个词这一通用学习范式;第二点是对于数据质量与数据规模的重视:不同于BERT
【Tools】大模型中的BERT概念音乐学家方大刚工具 bert 人工智能深度学习
摇来摇去摇碎点点的金黄伸手牵来一片梦的霞光南方的小巷推开多情的门窗年轻和我们歌唱摇来摇去摇着温柔的阳光轻轻托起一件梦的衣裳古老的都市每天都改变模样方芳《摇太阳》BERT（BidirectionalEncoderRepresentationsfromTransformers）是一种基于Transformer的预训练语言模型，由Google于2018年发布。BERT的目标是通过大规模无监督预训练学习来
详述Python环境下配置AI大模型Qwen-72B的步骤 Play_Sai #Python开发 python AI大模型人工智能
随着人工智能技术的发展，大规模预训练模型如Qwen-72B等逐渐成为研究和应用的重点。本篇博客旨在提供一份详细的指南，帮助Python开发者们在自己的环境中顺利配置并使用Qwen-72B大模型。请注意：由于Qwen-72B这一模型目前并未公开存在，所以以下内容仅为假设性描述，实际上你需要替换为你想要配置的真实存在的大模型，例如GPT-3、BERT等。一、环境准备1.安装必要的库首先确保你已经安装了
突发奇想，玩家用《我的世界》重现美术大师画作，还原度很高爱游戏的萌博士
如果你喜欢绘画，在其中又特别钟情风景画的话，你可能听说过鲍伯·鲁斯（BobRoss）。这其实是罗伯特·诺曼·鲁斯（RobertNormanRoss）的艺名，他是位美国画家，同时也是一位艺术指导与电视节目主持人。鲁斯以他温柔且和乐的语气为特色，在他著名的电视节目“欢乐画室（TheJoyofPainting）”中担任即席教学画家兼主持人，这个节目活跃于上世纪八九十年代。博士为什么要提上面这位顶着爆炸头
大模型落地指南：从下载到本地化部署全流程解析网安猫叔人工智能自然语言处理语言模型 AIGC 深度学习
一、引言随着人工智能技术的迅猛发展，大规模预训练模型（如GPT-4、BERT等）在自然语言处理、图像识别等领域展现出了卓越的性能。然而，如何将这些强大的模型从理论落地到实际应用中，仍然是许多技术从业者面临的挑战。本篇文章旨在为读者提供一份详尽的大模型落地指南，从模型的下载、文件结构的解析，到本地化部署的具体步骤，全面覆盖整个流程。无论你是初次接触大模型的新手，还是希望深入了解部署细节的资深开发者，
解决Can‘t load tokenizer for ‘bert-base-chinese‘.问题 CSDNhdlg NLP bert 人工智能深度学习自然语言处理
报错提示：OSError:Can'tloadtokenizerfor'bert-base-chinese'.Ifyouweretryingtoloaditfrom'https://huggingface.co/models',makesureyoudon'thavealocaldirectorywiththesamename.Otherwise,makesure'bert-base-chinese
如何用RoBERTa高效提取事件文本结构特征：多层次上下文建模与特征融合大多_C 人工智能
基于RoBERTa-BASE的特征提取器，提取事件文本数据的结构特征（如段落和篇章结构）涉及多个步骤。RoBERTa作为一种预训练语言模型，可以很好地捕捉输入文本的上下文和依赖关系。具体步骤如下：1.文本预处理在提取事件文本的结构特征之前，需要对文本进行适当的预处理。这一步包括：分句和分段处理：将事件文本拆分为不同的句子或段落，并对每个句子/段落进行标记。每个段落可以视为一个独立的输入序列。Tok
这样的电影都骂烂，是我握不动刀还是有人太飘 Sir电影
年度最WTF电影来了！年度最争议电影来了！威尼斯电影节首映，有的观众起立鼓掌，有的观众恨不得朝屏幕丢鞋。观众这样，更别说影评人……迷之又迷的，比如《RogerEbert.com》：恐怖、勾人、迷惑……这是一部刷新你认知的电影。恨之入骨的，比如《纽约观察者报》——我不愿给它贴上“年度最差电影”标签，因为“世纪最差电影”更适合它。评分网站呢，一个比一个不给面子：IMDb7.0，烂番茄68%，豆瓣6.7
Transformer、BERT、GPT、T5、LLM（大语言模型），以及它们在实际行业中的运用 Funhpc_huachen transformer bert gpt 语言模型深度学习
作为AI智能大模型的专家训练师，我将从主流模型框架的角度来分析其核心技术特点及其在不同实际行业中的应用。我们重点讨论以下几个主流模型框架：Transformer、BERT、GPT、T5、LLM（大语言模型），以及它们在实际行业中的运用。1.Transformer框架Transformer是一种基础的深度学习模型架构，由Google于2017年提出。它引入了注意力机制（Self-Attention）
fpga图像处理实战-边缘检测（Roberts算子）梦梦梦梦子~ OV5640+图像处理图像处理计算机视觉人工智能
Roberts算子Roberts算子是一种用于边缘检测的算子，主要用于图像处理中检测图像的边缘。它是最早的边缘检测算法之一，以其计算简单、速度快而著称。Roberts算子通过计算图像像素在对角方向的梯度来检测边缘，从而突出图像中灰度变化最剧烈的部分。原理Roberts算子通过对图像应用两个2x2的卷积核（也称为掩模或滤波器）来计算图像在水平和垂直方向上的梯度。假设原始图像的像素值为I(x,y)，则
Rhinoceros 8 for Mac/Win：重塑三维建模边界的革新之作平安喜乐616 Rhinoceros 8 Rhino 8 三维建模软件犀牛8
Rhinoceros8（简称Rhino8），作为一款由RobertMcNeel&Assoc公司开发的顶尖三维建模软件，无论是对于Mac还是Windows用户而言，都是一款不可多得的高效工具。Rhino8以其强大的功能、广泛的应用领域以及卓越的性能，在建筑设计、工业设计、产品设计、三维动画制作、科学研究及机械设计等多个领域展现出了非凡的实力。强大的建模能力Rhino8支持多种建模技术，包括曲面建模、
预训练语言模型的前世今生 - 从Word Embedding到BERT 脚步的影子语言模型 embedding bert
目录一、预训练1.1图像领域的预训练1.2预训练的思想二、语言模型2.1统计语言模型2.2神经网络语言模型三、词向量3.1独热（Onehot）编码3.2WordEmbedding四、Word2Vec模型五、自然语言处理的预训练模型六、RNN和LSTM6.1RNN6.2RNN的梯度消失问题6.3LSTM6.4LSTM解决RNN的梯度消失问题七、ELMo模型7.1ELMo的预训练7.2ELMo的Fea
【大模型系列篇】预训练模型：BERT & GPT 木亦汐丫大模型 bert gpt 人工智能预训练模型大模型
2018年，Google首次推出BERT（BidirectionalEncoderRepresentationsfromTransformers）。该模型是在大量文本语料库上结合无监督和监督学习进行训练的。BERT的目标是创建一种语言模型，可以理解句子中单词的上下文和含义，同时考虑到它前后出现的单词。2018年，OpenAI首次推出GPT（GenerativePre-trainedTransfor
【人工智能】Transformers之Pipeline（十三）：填充蒙版（fill-mask） LDG_AGI Pipeline 人工智能机器学习计算机视觉 python 时序数据库大数据自然语言处理
目录一、引言二、填充蒙版（fill-mask）2.1概述2.2技术原理2.2.1BERT模型的基本概念2.2.2BERT模型的工作原理2.2.3BERT模型的结构2.2.4BERT模型的应用2.2.5BERT模型与Transformer的区别和联系2.3应用场景2.4pipeline参数2.4.1pipeline对象实例化参数2.4.2pipeline对象使用参数2.4.3pipeline返回参数
IT历史：互联网简史 weixin_34275734 网络操作系统 java
Hobbes的互联网大事记-权威的互联网发展史Hobbes’Internet大事记v4.2作者：RobertH’obbes’ZakonInternet福音传道者译者：郭力Internet大事记的版权归RobertHZakon所有(c)1993-9。只要保留版权说明，给出在一个在本文档最后的指向本大事记的连接地址，并且不是出于商业目的，均可以使用本文的部分或全部内容，但是使用者必须向作者提供一份使用
大模型--个人学习心得挚爱清&虚人工智能
大模型LLM定义大模型LLM，全称LargeLanguageModel，即大型语言模型LLM是一种基于Transformer架构模型，它通过驯良大量文本数据，学习语言的语法、语义和上下文信息，从而能够对自然语言文本进行建模这种模型在自然语言处理(NLP)领域具有广泛应用常见的13个大模型BERT、GPT系列、T5、Meta的Llama系列、华为盘古模型、阿里巴巴通义大模型、科大讯飞星火大模型、百度
基于Bert-base-chinese训练多分类文本模型(代码详解）一颗洋芋 bert 分类自然语言处理
目录一、简介二、模型训练三、模型推理一、简介BERT（BidirectionalEncoderRepresentationsfromTransformers）是基于深度学习在自然语言处理（NLP）领域近几年出现的、影响深远的创新模型之一。在BERT之前，已经有许多预训练语言模型，如ELMO和GPT，它们展示了预训练模型在NLP任务中的强大性能。然而，这些模型通常基于单向的上下文信息，即只考虑文本中
【深度学习 transformer】使用pytorch 训练transformer 模型,hugginface 来啦东华果汁哥深度学习-文本分类深度学习 transformer pytorch
HuggingFace是一个致力于开源自然语言处理（NLP）和机器学习项目的社区。它由几个关键组件组成：Transformers：这是一个基于PyTorch的库，提供了各种预训练的NLP模型，如BERT、GPT、RoBERTa、DistilBERT等。它还提供了一个简单易用的API来加载这些模型，并进行微调以适应特定的下游任务。Datasets：这是一个用于加载和预处理NLP数据集的库，与Tran
LLM大模型落地-从理论到实践 hhaiming_ 语言模型人工智能 ai 深度学习
简述按个人偏好和目标总结了学习目标和路径（可按需学习），后续将陆续整理出相应学习资料和资源。学习目标熟悉主流LLM（Llama,ChatGLM,Qwen）的技术架构和技术细节；有实际应用RAG、PEFT和SFT的项目经验较强的NLP基础，熟悉BERT、T5、Transformer和GPT的实现和差异，能快速掌握业界进展，有对话系统相关研发经验掌握TensorRT-LLM、vLLM等主流推理加速框架
AI 大模型在文本生成任务中的创新应用 AI_Guru人工智呢人工智能
概述随着人工智能技术的飞速发展，大模型在文本生成任务中的应用越来越广泛。这些模型通过深度学习技术，能够生成连贯、有意义的文本，甚至在某些情况下达到与人类写作难以区分的程度。本文将探讨AI大模型在文本生成任务中的创新应用，包括自动文摘、机器翻译、创意写作等领域。自动文摘自动文摘是指从给定文本中自动提取关键信息，生成简短摘要的过程。这对于处理大量文本数据、快速获取信息尤为重要。代码示例：基于BERT的
Bert系列：论文阅读Rethink Training of BERT Rerankers in Multi-Stage Retrieval Pipeline 凝眸伏笔 nlp 论文阅读 bert reranker retrieval
一句话总结：提出LocalizedContrastiveEstimation(LCE)，来优化检索排序。摘要预训练的深度语言模型(LM)在文本检索中表现出色。基于丰富的上下文匹配信息，深度LM微调重新排序器从候选集合中找出更为关联的内容。同时，深度lm也可以用来提高搜索索引，构建更好的召回。当前的reranker方法并不能完全探索到检索结果的效果。因此，本文提出了LocalizedContrast
大语言模型算力优化策略：基于并行化技术的算力共享平台研究 ZhangJiQun&MXP 2024算力共享 2021 论文语言模型人工智能自然语言处理
目录大语言模型算力优化策略：基于并行化技术的算力共享平台研究摘要引言算力共享平台的设计1.平台架构2.并行化计算技术模型并行化流水线并行化3.资源管理和调度实验与结果分析结论与展望首先，大语言模型（如GPT系列、BERT等）和算力共享的结合是近年来人工智能领域的研究热点。算力共享旨在通过分布式计算技术，将大规模计算任务分配给多个计算节点，以提高计算效率、降低资源成本并加速模型训练和推理过程。其次，
Enum 枚举 120153216 enum 枚举
原文地址：http://www.cnblogs.com/Kavlez/p/4268601.html Enumeration 于Java 1.5增加的enum type...enum type是由一组固定的常量组成的类型，比如四个季节、扑克花色。在出现enum type之前，通常用一组int常量表示枚举类型。比如这样： public static final int APPLE_FUJI = 0
Java8简明教程 bijian1013 java jdk1.8
Java 8已于2014年3月18日正式发布了，新版本带来了诸多改进，包括Lambda表达式、Streams、日期时间API等等。本文就带你领略Java 8的全新特性。一.允许在接口中有默认方法实现 Java 8 允许我们使用default关键字，为接口声明添
Oracle表维护快速备份删除数据 cuisuqiang oracle 索引快速备份删除
我知道oracle表分区，不过那是数据库设计阶段的事情，目前是远水解不了近渴。当前的数据库表，要求保留一个月数据，且表存在大量录入更新，不存在程序删除。为了解决频繁查询和更新的瓶颈，我在oracle内根据需要创建了索引。但是随着数据量的增加，一个半月数据就要超千万，此时就算有索引，对高并发的查询和更新来说，让然有所拖累。为了解决这个问题，我一般一个月会进行一次数据库维护，主要工作就是备
java多态内存分析麦田的设计者 java 内存分析多态原理接口和抽象类
“ 时针如果可以回头，熟悉那张脸，重温嬉戏这乐园，墙壁的松脱涂鸦已经褪色才明白存在的价值归于记忆。街角小店尚存在吗？这大时代会不会牵挂，过去现在花开怎么会等待。但有种意外不管痛不痛都有伤害，光阴远远离开，那笑声徘徊与脑海。但这一秒可笑不再可爱，当天心
Xshell实现Windows上传文件到Linux主机被触发 windows
经常有这样的需求，我们在Windows下载的软件包，如何上传到远程Linux主机上？还有如何从Linux主机下载软件包到Windows下；之前我的做法现在看来好笨好繁琐，不过也达到了目的，笨人有本方法嘛；我是怎么操作的： 1、打开一台本地Linux虚拟机，使用mount 挂载Windows的共享文件夹到Linux上，然后拷贝数据到Linux虚拟机里面；（经常第一步都不顺利，无法挂载Windo
类的加载ClassLoader 肆无忌惮_ ClassLoader
类加载器ClassLoader是用来将java的类加载到虚拟机中，类加载器负责读取class字节文件到内存中，并将它转为Class的对象（类对象），通过此实例的 newInstance()方法就可以创建出该类的一个对象。其中重要的方法为findClass(String name)。如何写一个自己的类加载器呢？首先写一个便于测试的类Student
html5写的玫瑰花知了ing html5
<html> <head> <title>I Love You!</title> <meta charset="utf-8" /> </head> <body> <canvas id="c"></canvas>
google的ConcurrentLinkedHashmap源代码解析矮蛋蛋 LRU
原文地址： http://janeky.iteye.com/blog/1534352 简述 ConcurrentLinkedHashMap 是google团队提供的一个容器。它有什么用呢？其实它本身是对 ConcurrentHashMap的封装，可以用来实现一个基于LRU策略的缓存。详细介绍可以参见 http://code.google.com/p/concurrentlinke
webservice获取访问服务的ip地址 alleni123 webservice
1. 首先注入javax.xml.ws.WebServiceContext, @Resource private WebServiceContext context; 2. 在方法中获取交换请求的对象。 javax.xml.ws.handler.MessageContext mc=context.getMessageContext(); com.sun.net.http
菜鸟的java基础提升之道——————>是否值得拥有百合不是茶
1，c++，java是面向对象编程的语言，将万事万物都看成是对象；java做一件事情关注的是人物，java是c++继承过来的，java没有直接更改地址的权限但是可以通过引用来传值操作地址，java也没有c++中繁琐的操作，java以其优越的可移植型，平台的安全型，高效性赢得了广泛的认同，全世界越来越多的人去学习java，我也是其中的一员 java组成：
通过修改Linux服务自动启动指定应用程序 bijian1013 linux
Linux中修改系统服务的命令是chkconfig (check config)，命令的详细解释如下: chkconfig 功能说明：检查，设置系统的各种服务。语　　法：chkconfig [ -- add][ -- del][ -- list][系统服务] 或 chkconfig [ -- level <</SPAN>
spring拦截器的一个简单实例 bijian1013 java spring 拦截器 Interceptor
Purview接口 package aop; public interface Purview { void checkLogin(); } Purview接口的实现类PurviesImpl.java package aop; public class PurviewImpl implements Purview { public void check
[Velocity二]自定义Velocity指令 bit1129 velocity
什么是Velocity指令在Velocity中，#set,#if, #foreach, #elseif, #parse等，以#开头的称之为指令，Velocity内置的这些指令可以用来做赋值，条件判断，循环控制等脚本语言必备的逻辑控制等语句，Velocity的指令是可扩展的，即用户可以根据实际的需要自定义Velocity指令自定义指令(Directive)的一般步骤 &nbs
【Hive十】Programming Hive学习笔记 bit1129 programming
第二章 Getting Started 1.Hive最大的局限性是什么？一是不支持行级别的增删改(insert, delete, update)二是查询性能非常差(基于Hadoop MapReduce）,不适合延迟小的交互式任务三是不支持事务2. Hive MetaStore是干什么的？Hive persists table schemas and other system metadata.
nginx有选择性进行限制 ronin47 nginx 动静　限制
http { limit_conn_zone $binary_remote_addr zone=addr:10m; limit_req_zone $binary_remote_addr zone=one:10m rate=5r/s;... server {... location ~.*\.(gif|png|css|js|icon)$ {
java-4.-在二元树中找出和为某一值的所有路径 . bylijinnan java
/* * 0.use a TwoWayLinkedList to store the path.when the node can't be path,you should/can delete it. * 1.curSum==exceptedSum:if the lastNode is TreeNode,printPath();delete the node otherwise
Netty学习笔记 bylijinnan java netty
本文是阅读以下两篇文章时： http://seeallhearall.blogspot.com/2012/05/netty-tutorial-part-1-introduction-to.html http://seeallhearall.blogspot.com/2012/06/netty-tutorial-part-15-on-channel.html 我的一些笔记 ===
js获取项目路径 cngolon js
//js获取项目根路径，如： http://localhost:8083/uimcardprj function getRootPath(){ //获取当前网址，如： http://localhost:8083/uimcardprj/share/meun.jsp var curWwwPath=window.document.locati
oracle 的性能优化 cuishikuan oracle SQL Server
在网上搜索了一些Oracle性能优化的文章，为了更加深层次的巩固[边写边记]，也为了可以随时查看，所以发表这篇文章。 1.ORACLE采用自下而上的顺序解析WHERE子句，根据这个原理，表之间的连接必须写在其他WHERE条件之前，那些可以过滤掉最大数量记录的条件必须写在WHERE子句的末尾。（这点本人曾经做过实例验证过，的确如此哦！
Shell变量和数组使用详解 daizj linux shell 变量数组
Shell 变量定义变量时，变量名不加美元符号（$，PHP语言中变量需要），如： your_name="w3cschool.cc" 注意，变量名和等号之间不能有空格，这可能和你熟悉的所有编程语言都不一样。同时，变量名的命名须遵循如下规则：首个字符必须为字母（a-z，A-Z）。中间不能有空格，可以使用下划线（_）。不能使用标点符号。不能使用ba
编程中的一些概念，KISS、DRY、MVC、OOP、REST dcj3sjt126com REST
KISS、DRY、MVC、OOP、REST （1）KISS是指Keep It Simple,Stupid（摘自wikipedia），指设计时要坚持简约原则，避免不必要的复杂化。（2）DRY是指Don't Repeat Yourself（摘自wikipedia），特指在程序设计以及计算中避免重复代码，因为这样会降低灵活性、简洁性，并且可能导致代码之间的矛盾。（3）OOP 即Object-Orie
[Android]设置Activity为全屏显示的两种方法 dcj3sjt126com Activity
1. 方法1：AndroidManifest.xml 里，Activity的 android:theme 指定为" @android:style/Theme.NoTitleBar.Fullscreen" 示例: <application
solrcloud 部署方式比较 eksliang solrCloud
solrcloud 的部署其实有两种方式可选，那么我们在实践开发中应该怎样选择呢？第一种：当启动solr服务器时，内嵌的启动一个Zookeeper服务器，然后将这些内嵌的Zookeeper服务器组成一个集群。第二种：将Zookeeper服务器独立的配置一个集群，然后将solr交给Zookeeper进行管理谈谈第一种：每启动一个solr服务器就内嵌的启动一个Zoo
Java synchronized关键字详解 gqdy365 synchronized
转载自：http://www.cnblogs.com/mengdd/archive/2013/02/16/2913806.html 多线程的同步机制对资源进行加锁，使得在同一个时间，只有一个线程可以进行操作，同步用以解决多个线程同时访问时可能出现的问题。同步机制可以使用synchronized关键字实现。当synchronized关键字修饰一个方法的时候，该方法叫做同步方法。当s
js实现登录时记住用户名 hw1287789687 记住我记住密码 cookie 记住用户名记住账号
在页面中如何获取cookie值呢? 如果是JSP的话,可以通过servlet的对象request 获取cookie,可以参考:http://hw1287789687.iteye.com/blog/2050040 如果要求登录页面是html呢?html页面中如何获取cookie呢? 直接上代码了页面:loginInput.html 代码: <!DOCTYPE html PUB
开发者必备的 Chrome 扩展 justjavac chrome
Firebug：不用多介绍了吧https://chrome.google.com/webstore/detail/bmagokdooijbeehmkpknfglimnifench ChromeSnifferPlus：Chrome 探测器，可以探测正在使用的开源软件或者 js 类库https://chrome.google.com/webstore/detail/chrome-sniffer-pl
算法机试题李亚飞 java 算法机试题
在面试机试时，遇到一个算法题，当时没能写出来，最后是同学帮忙解决的。这道题大致意思是：输入一个数，比如4,。这时会输出： &n
正确配置Linux系统ulimit值字符串 ulimit
在Linux下面部署应用的时候，有时候会遇上Socket/File: Can’t open so many files的问题；这个值也会影响服务器的最大并发数，其实Linux是有文件句柄限制的，而且Linux默认不是很高，一般都是1024，生产服务器用其实很容易就达到这个数量。下面说的是，如何通过正解配置来改正这个系统默认值。因为这个问题是我配置Nginx+php5时遇到了，所以我将这篇归纳进
hibernate调用返回游标的存储过程 Supanccy2013 java DAO oracle Hibernate jdbc
注：原创作品，转载请注明出处。上篇博文介绍的是hibernate调用返回单值的存储过程，本片博文说的是hibernate调用返回游标的存储过程。此此扁博文的存储过程的功能相当于是jdbc调用select 的作用。 1，创建oracle中的包，并在该包中创建的游标类型。 ---创建oracle的程
Spring 4.2新特性-更简单的Application Event wiselyman application
1.1 Application Event Spring 4.1的写法请参考10点睛Spring4.1-Application Event 请对比10点睛Spring4.1-Application Event 使用一个@EventListener取代了实现ApplicationListener接口,使耦合度降低; 1.2 示例包依赖 <p