赵文淮

Bert源代码（二）模型

模型训练、评估和预测流程
Bert模型

Transformer模型
Bert模型

Bert模型代码解析
参考文献

export BERT_BASE_DIR=/path/to/bert/uncased_L-12_H-768_A-12
export GLUE_DIR=/path/to/glue
python run_classifier.py \
  --task_name=MRPC \
  --do_train=true \
  --do_eval=true \
  --data_dir=$GLUE_DIR/MRPC \
  --vocab_file=$BERT_BASE_DIR/vocab.txt \
  --bert_config_file=$BERT_BASE_DIR/bert_config.json \
  --init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt \
  --max_seq_length=128 \
  --train_batch_size=32 \
  --learning_rate=2e-5 \
  --num_train_epochs=3.0 \
  --output_dir=/tmp/mrpc_output/

模型训练、评估和预测流程

和上一篇文章的预训练类似。

使用ColaProcessor，MnliProcessor，MrpcProcessor，XnliProcessor四种processor处理输入相应输入格式数据，获得当前句text_a和下一句text_b，再通过FullTokenizer来token化。最后通过TFRecordWriter将数据保存为tfrecord文件。
定义input_fn函数file_based_input_fn_builder
定义RunConfig配置run_config
创建model_fn：从传入的init_checkpoint初始化bert模型、AdamWeightDecayOptimizer，返回TPUEstimatorSpec
结合model_fn、run_config和input_fn使用TPUEstimator进行训练、评估、预测

Bert模型

先上几张图：

Transformer模型

上面的图展示了Transformer结构：Transformer是一种Seq2Seq模型，由encoder和decoder组成。encoder端由Nx个基础单元叠加，每个基础单元为：Multi-head的Self-Attention和Feed Forward串联；decoder端由Nx个基础单元叠加，每个基础单元为：输出端的Masked Multi-head Self-Attention和encoder-decoder的Multi-head Attention和Feed Forward串联。当然encoder和decoder端都会加入短连接Resnet和Normalization。输入输出端都在原来Inputs和Outputs上接一层Embedding层，再加入Position的Embedding来补偿位置信息（相对于RNN而言，距离始终为1，避免了RNN长距离依赖的问题。用Position Embedding来补偿距离信息。）

Bert模型

Bert模型在Input Embedding基础上加入Token Embedding、Segment Embedding、Position Embedding。引入了Deep Bidirectional Masked Lm和Next Prediction Sentence（这两个创新点在上一篇文章中已经结合代码详细介绍了）。

Bert模型代码解析

我们来逐步解析代码：

 with tf.variable_scope(scope, default_name="bert"):
      with tf.variable_scope("embeddings"):
        # 输入Embedding
        # Perform embedding lookup on the word ids.
        (self.embedding_output, self.embedding_table) = embedding_lookup(
            input_ids=input_ids,
            vocab_size=config.vocab_size,
            embedding_size=config.hidden_size,
            initializer_range=config.initializer_range,
            word_embedding_name="word_embeddings",
            use_one_hot_embeddings=use_one_hot_embeddings)
        # Add positional embeddings and token type embeddings, then layer
        # normalize and perform dropout.
        # 1. 添加Position Embedding和Token Embedding
        self.embedding_output = embedding_postprocessor(
            input_tensor=self.embedding_output,
            use_token_type=True,
            token_type_ids=token_type_ids,
            token_type_vocab_size=config.type_vocab_size,
            token_type_embedding_name="token_type_embeddings",
            use_position_embeddings=True,
            position_embedding_name="position_embeddings",
            initializer_range=config.initializer_range,
            max_position_embeddings=config.max_position_embeddings,
            dropout_prob=config.hidden_dropout_prob)
      with tf.variable_scope("encoder"):
        # This converts a 2D mask of shape [batch_size, seq_length] to a 3D
        # mask of shape [batch_size, seq_length, seq_length] which is used
        # for the attention scores.
        attention_mask = create_attention_mask_from_input_mask(
            input_ids, input_mask)

        # Run the stacked transformer.
        # `sequence_output` shape = [batch_size, seq_length, hidden_size].
        # 2. 创建transformer模型
        self.all_encoder_layers = transformer_model(
            input_tensor=self.embedding_output,
            attention_mask=attention_mask,
            hidden_size=config.hidden_size,
            num_hidden_layers=config.num_hidden_layers,
            num_attention_heads=config.num_attention_heads,
            intermediate_size=config.intermediate_size,
            intermediate_act_fn=get_activation(config.hidden_act),
            hidden_dropout_prob=config.hidden_dropout_prob,
            attention_probs_dropout_prob=config.attention_probs_dropout_prob,
            initializer_range=config.initializer_range,
            do_return_all_layers=True)
	  # 用于masked lm的sequence_output，即final hidden layer
      self.sequence_output = self.all_encoder_layers[-1]
      # The "pooler" converts the encoded sequence tensor of shape
      # [batch_size, seq_length, hidden_size] to a tensor of shape
      # [batch_size, hidden_size]. This is necessary for segment-level
      # (or segment-pair-level) classification tasks where we need a fixed
      # dimensional representation of the segment.
      # 用于句子分类的pooled_output，使用final hidden layer的第一个token[CLS]的embeding向量
      with tf.variable_scope("pooler"):
        # We "pool" the model by simply taking the hidden state corresponding
        # to the first token. We assume that this has been pre-trained
        first_token_tensor = tf.squeeze(self.sequence_output[:, 0:1, :], axis=1)
        self.pooled_output = tf.layers.dense(
            first_token_tensor,
            config.hidden_size,
            activation=tf.tanh,
            kernel_initializer=create_initializer(config.initializer_range))

输入加入其他embedding解析：

def embedding_postprocessor(input_tensor,
                            use_token_type=False,
                            token_type_ids=None,
                            token_type_vocab_size=16,
                            token_type_embedding_name="token_type_embeddings",
                            use_position_embeddings=True,
                            position_embedding_name="position_embeddings",
                            initializer_range=0.02,
                            max_position_embeddings=512,
                            dropout_prob=0.1):
  """Performs various post-processing on a word embedding tensor.

  Args:
    input_tensor: float Tensor of shape [batch_size, seq_length,
      embedding_size].
    use_token_type: bool. Whether to add embeddings for `token_type_ids`.
    token_type_ids: (optional) int32 Tensor of shape [batch_size, seq_length].
      Must be specified if `use_token_type` is True.
    token_type_vocab_size: int. The vocabulary size of `token_type_ids`.
    token_type_embedding_name: string. The name of the embedding table variable
      for token type ids.
    use_position_embeddings: bool. Whether to add position embeddings for the
      position of each token in the sequence.
    position_embedding_name: string. The name of the embedding table variable
      for positional embeddings.
    initializer_range: float. Range of the weight initialization.
    max_position_embeddings: int. Maximum sequence length that might ever be
      used with this model. This can be longer than the sequence length of
      input_tensor, but cannot be shorter.
    dropout_prob: float. Dropout probability applied to the final output tensor.

  Returns:
    float tensor with same shape as `input_tensor`.

  Raises:
    ValueError: One of the tensor shapes or input values is invalid.
  """
  input_shape = get_shape_list(input_tensor, expected_rank=3)
  batch_size = input_shape[0]
  seq_length = input_shape[1]
  width = input_shape[2]

  output = input_tensor

  # 添加token embedding
  if use_token_type:
    if token_type_ids is None:
      raise ValueError("`token_type_ids` must be specified if"
                       "`use_token_type` is True.")
    token_type_table = tf.get_variable(
        name=token_type_embedding_name,
        shape=[token_type_vocab_size, width],
        initializer=create_initializer(initializer_range))
    # This vocab will be small so we always do one-hot here, since it is always
    # faster for a small vocabulary.
    flat_token_type_ids = tf.reshape(token_type_ids, [-1])
    one_hot_ids = tf.one_hot(flat_token_type_ids, depth=token_type_vocab_size)
    token_type_embeddings = tf.matmul(one_hot_ids, token_type_table)
    token_type_embeddings = tf.reshape(token_type_embeddings,
                                       [batch_size, seq_length, width])
    output += token_type_embeddings
  # 添加position embedding
  if use_position_embeddings:
    assert_op = tf.assert_less_equal(seq_length, max_position_embeddings)
    with tf.control_dependencies([assert_op]):
      full_position_embeddings = tf.get_variable(
          name=position_embedding_name,
          shape=[max_position_embeddings, width],
          initializer=create_initializer(initializer_range))
      # Since the position embedding table is a learned variable, we create it
      # using a (long) sequence length `max_position_embeddings`. The actual
      # sequence length might be shorter than this, for faster training of
      # tasks that do not have long sequences.
      #
      # So `full_position_embeddings` is effectively an embedding table
      # for position [0, 1, 2, ..., max_position_embeddings-1], and the current
      # sequence has positions [0, 1, 2, ... seq_length-1], so we can just
      # perform a slice.
      position_embeddings = tf.slice(full_position_embeddings, [0, 0],
                                     [seq_length, -1])
      num_dims = len(output.shape.as_list())

      # Only the last two dimensions are relevant (`seq_length` and `width`), so
      # we broadcast among the first dimensions, which is typically just
      # the batch size.
      position_broadcast_shape = []
      for _ in range(num_dims - 2):
        position_broadcast_shape.append(1)
      position_broadcast_shape.extend([seq_length, width])
      position_embeddings = tf.reshape(position_embeddings,
                                       position_broadcast_shape)
      output += position_embeddings
  # 加入LN和dropout
  output = layer_norm_and_dropout(output, dropout_prob)
  return output

transformer模型解析：

def transformer_model(input_tensor,
                      attention_mask=None,
                      hidden_size=768,
                      num_hidden_layers=12,
                      num_attention_heads=12,
                      intermediate_size=3072,
                      intermediate_act_fn=gelu,
                      hidden_dropout_prob=0.1,
                      attention_probs_dropout_prob=0.1,
                      initializer_range=0.02,
                      do_return_all_layers=False):
  if hidden_size % num_attention_heads != 0:
    raise ValueError(
        "The hidden size (%d) is not a multiple of the number of attention "
        "heads (%d)" % (hidden_size, num_attention_heads))

  attention_head_size = int(hidden_size / num_attention_heads)
  input_shape = get_shape_list(input_tensor, expected_rank=3)
  batch_size = input_shape[0]
  seq_length = input_shape[1]
  input_width = input_shape[2]

  # The Transformer performs sum residuals on all layers so the input needs
  # to be the same as the hidden size.
  if input_width != hidden_size:
    raise ValueError("The width of the input tensor (%d) != hidden size (%d)" %
                     (input_width, hidden_size))

  # We keep the representation as a 2D tensor to avoid re-shaping it back and
  # forth from a 3D tensor to a 2D tensor. Re-shapes are normally free on
  # the GPU/CPU but may not be free on the TPU, so we want to minimize them to
  # help the optimizer.
  prev_output = reshape_to_matrix(input_tensor)

  all_layer_outputs = []
  # 定义基础单元叠加数量Nx：num_hidden_layers
  for layer_idx in range(num_hidden_layers):
    with tf.variable_scope("layer_%d" % layer_idx):
      layer_input = prev_output

	  # Attention层
      with tf.variable_scope("attention"):
        attention_heads = []
        with tf.variable_scope("self"):
          attention_head = attention_layer(
              from_tensor=layer_input,
              to_tensor=layer_input,
              attention_mask=attention_mask,
              num_attention_heads=num_attention_heads,
              size_per_head=attention_head_size,
              attention_probs_dropout_prob=attention_probs_dropout_prob,
              initializer_range=initializer_range,
              do_return_2d_tensor=True,
              batch_size=batch_size,
              from_seq_length=seq_length,
              to_seq_length=seq_length)
          attention_heads.append(attention_head)

        attention_output = None
        if len(attention_heads) == 1:
          attention_output = attention_heads[0]
        else:
          # In the case where we have other sequences, we just concatenate
          # them to the self-attention head before the projection.
          attention_output = tf.concat(attention_heads, axis=-1)

        # Run a linear projection of `hidden_size` then add a residual
        # with `layer_input`.
        # dropout并加入短链接ResNet和LN
        with tf.variable_scope("output"):
          attention_output = tf.layers.dense(
              attention_output,
              hidden_size,
              kernel_initializer=create_initializer(initializer_range))
          attention_output = dropout(attention_output, hidden_dropout_prob)
          attention_output = layer_norm(attention_output + layer_input)

      # The activation is only applied to the "intermediate" hidden layer.
      # 映射到intermediate_size大小
      with tf.variable_scope("intermediate"):
        intermediate_output = tf.layers.dense(
            attention_output,
            intermediate_size,
            activation=intermediate_act_fn,
            kernel_initializer=create_initializer(initializer_range))

      # Down-project back to `hidden_size` then add the residual.
      # 映射回hidden_size大小，并加入短连接ResNet和LN
      with tf.variable_scope("output"):
        layer_output = tf.layers.dense(
            intermediate_output,
            hidden_size,
            kernel_initializer=create_initializer(initializer_range))
        layer_output = dropout(layer_output, hidden_dropout_prob)
        layer_output = layer_norm(layer_output + attention_output)
        prev_output = layer_output
        all_layer_outputs.append(layer_output)

  if do_return_all_layers:
    final_outputs = []
    for layer_output in all_layer_outputs:
      # reshap回input_shape尺寸
      final_output = reshape_from_matrix(layer_output, input_shape)
      final_outputs.append(final_output)
    return final_outputs
  else:
    final_output = reshape_from_matrix(prev_output, input_shape)
    return final_output

Attention层：
$softmax(\frac{QK^T}{\sqrt{d_k}})V$
其中Q为query, K为key, V为value, $Q∈R^{n\text{ × }d_k}$ $d_k, K∈R^{m\text{ × }d_k}, V∈R^{m\text{ × }d_v}$ ， $d_k$ 为query和key的维度。其中，除以 $d_k$ 是为了防止点积过大， $QK^T$ 是计算序列中各个位置的相互关系。

def attention_layer(from_tensor,
                    to_tensor,
                    attention_mask=None,
                    num_attention_heads=1,
                    size_per_head=512,
                    query_act=None,
                    key_act=None,
                    value_act=None,
                    attention_probs_dropout_prob=0.0,
                    initializer_range=0.02,
                    do_return_2d_tensor=False,
                    batch_size=None,
                    from_seq_length=None,
                    to_seq_length=None):
  def transpose_for_scores(input_tensor, batch_size, num_attention_heads,
                           seq_length, width):
    output_tensor = tf.reshape(
        input_tensor, [batch_size, seq_length, num_attention_heads, width])

    output_tensor = tf.transpose(output_tensor, [0, 2, 1, 3])
    return output_tensor

  from_shape = get_shape_list(from_tensor, expected_rank=[2, 3])
  to_shape = get_shape_list(to_tensor, expected_rank=[2, 3])

  if len(from_shape) != len(to_shape):
    raise ValueError(
        "The rank of `from_tensor` must match the rank of `to_tensor`.")

  if len(from_shape) == 3:
    batch_size = from_shape[0]
    from_seq_length = from_shape[1]
    to_seq_length = to_shape[1]
  elif len(from_shape) == 2:
    if (batch_size is None or from_seq_length is None or to_seq_length is None):
      raise ValueError(
          "When passing in rank 2 tensors to attention_layer, the values "
          "for `batch_size`, `from_seq_length`, and `to_seq_length` "
          "must all be specified.")

  # Scalar dimensions referenced here:
  #   B = batch size (number of sequences)
  #   F = `from_tensor` sequence length, 输入tensor
  #   T = `to_tensor` sequence length, 输出tensor
  #   N = `num_attention_heads`, Attention头数目
  #   H = `size_per_head`, 每个Attention头的大小

  from_tensor_2d = reshape_to_matrix(from_tensor)
  to_tensor_2d = reshape_to_matrix(to_tensor)

  # `query_layer` = [B*F, N*H]
  # Q, [B*F, N*H], 将from_tensor映射为num_attention_heads * size_per_head大小
  query_layer = tf.layers.dense(
      from_tensor_2d,
      num_attention_heads * size_per_head,
      activation=query_act,
      name="query",
      kernel_initializer=create_initializer(initializer_range))

  # `key_layer` = [B*T, N*H]
  # K, [B*T, N*H], 将to_tensor映射为num_attention_heads * size_per_head大小
  key_layer = tf.layers.dense(
      to_tensor_2d,
      num_attention_heads * size_per_head,
      activation=key_act,
      name="key",
      kernel_initializer=create_initializer(initializer_range))

  # `value_layer` = [B*T, N*H]
  # V, [B*T, N*H], 将to_tensor映射为num_attention_heads * size_per_head大小
  value_layer = tf.layers.dense(
      to_tensor_2d,
      num_attention_heads * size_per_head,
      activation=value_act,
      name="value",
      kernel_initializer=create_initializer(initializer_range))

  # `query_layer` = [B, N, F, H]
  # 重排列维度
  query_layer = transpose_for_scores(query_layer, batch_size,
                                     num_attention_heads, from_seq_length,
                                     size_per_head)

  # `key_layer` = [B, N, T, H]
  key_layer = transpose_for_scores(key_layer, batch_size, num_attention_heads,
                                   to_seq_length, size_per_head)

  # Take the dot product between "query" and "key" to get the raw
  # attention scores.
  # `attention_scores` = [B, N, F, T]
  # 计算QK/sqrt(dk)
  attention_scores = tf.matmul(query_layer, key_layer, transpose_b=True)
  attention_scores = tf.multiply(attention_scores,
                                 1.0 / math.sqrt(float(size_per_head)))

  if attention_mask is not None:
    # `attention_mask` = [B, 1, F, T]
    attention_mask = tf.expand_dims(attention_mask, axis=[1])

    # Since attention_mask is 1.0 for positions we want to attend and 0.0 for
    # masked positions, this operation will create a tensor which is 0.0 for
    # positions we want to attend and -10000.0 for masked positions.
    # 对于attention_mask为1的加零维持原样；
    # 对于attention_mask为0的加入-10000令最终该位置的softmax结果为0。
    adder = (1.0 - tf.cast(attention_mask, tf.float32)) * -10000.0

    # Since we are adding it to the raw scores before the softmax, this is
    # effectively the same as removing these entirely.
    attention_scores += adder

  # Normalize the attention scores to probabilities.
  # `attention_probs` = [B, N, F, T]
  # 计算softmax
  attention_probs = tf.nn.softmax(attention_scores)

  # This is actually dropping out entire tokens to attend to, which might
  # seem a bit unusual, but is taken from the original Transformer paper.
  # 加入dropout
  attention_probs = dropout(attention_probs, attention_probs_dropout_prob)

  # `value_layer` = [B, T, N, H]
  value_layer = tf.reshape(
      value_layer,
      [batch_size, to_seq_length, num_attention_heads, size_per_head])

  # `value_layer` = [B, N, T, H]
  value_layer = tf.transpose(value_layer, [0, 2, 1, 3])

  # `context_layer` = [B, N, F, H]
  # softmax结果乘以V, [B, N, F, T] x [B, N, T, H] = [B, N, F, H]
  context_layer = tf.matmul(attention_probs, value_layer)

  # `context_layer` = [B, F, N, H]
  # 重排列维度
  context_layer = tf.transpose(context_layer, [0, 2, 1, 3])

  if do_return_2d_tensor:
    # `context_layer` = [B*F, N*H]
    context_layer = tf.reshape(
        context_layer,
        [batch_size * from_seq_length, num_attention_heads * size_per_head])
  else:
    # `context_layer` = [B, F, N*H]
    # 将num_attention_heads个head连接在一起
    context_layer = tf.reshape(
        context_layer,
        [batch_size, from_seq_length, num_attention_heads * size_per_head])

  return context_layer

参考文献

Attention Is All You Need
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

小狐狸AI数字人源码独立SAAS部署全开源+搭建环境教程 kaui52066 kaui52066精品源码人工智能 uni-app 前端小程序 php 小狐狸AI数字人数字人源码
一.系统介绍小狐狸AI数字人分身系统源码独立部署支持PC端、小程序端、H5端，一键克隆真人形象+声音核心功能亮点：1:1真人级克隆技术声音克隆：上传3分钟音频，AI深度学习声纹特征，复刻语气、情感、方言形象克隆：通过照片/视频建模，生成动态3D数字人，表情自然，动作流畅智能口型同步引擎AI算法精准匹配唇形与语音，实现口型同步0门槛SAAS化操作无需专业设备，网页端一键生成数字人视频海量模板库：电商
【PyTorch】PyTorch 中改变张量形状的几种方法 shengchao0920 pytorch 人工智能 python
PyTorch中改变张量形状的几种方法在深度学习领域，PyTorch是一个广泛使用的框架，它提供了丰富的API来处理张量（tensor）。在模型开发过程中，我们经常需要改变张量的形状以满足特定的需求。本文将介绍在PyTorch中改变张量形状的几种方法，并给出推荐的使用场景。比如：我们想合并一个张量的最后两个维度。一、方法1.使用reshape方法reshape方法可以改变张量的形状而不改变其数据。
OpenAI 团队组织架构和研发技术栈 AI天才研究院 ChatGPT 人工智能
OpenAI是一家致力于推动人工智能技术发展的公司，成立于2015年。其目标是确保人工智能技术造福全人类。为了实现这一目标，OpenAI采用了多种先进的技术和组织架构来推动其研发工作。目录OpenAI组织架构和研发技术栈概述1OpenAI团队的世界顶尖科学家IlyaSutskever：Ilya是OpenAI的联合创始人之一，也是深度学习领域的先驱。他在神经网络和深度学习方面的研究具有重要影响，曾与
深度学习-服务器训练SparseDrive过程记录 weixin_40826634 深度学习服务器人工智能
1、cuda安装1.1卸载安装失败的cuda参考：https://blog.csdn.net/weixin_40826634/article/details/127493809注意：因为/usr/local/cuda-xx.x/bin/下没有卸载脚本，很可能是apt安装的，所以通过执行下面的命令删除：apt-get--purgeremove"cuda*"apt-getautoremove然后执行f
模型量化 (Model Quantization) 算法 (Model Quantization Algorithms) （initial）大模型科普算法人工智能量化
1模型量化的必要性：降低模型大小、加速推理、减少资源消耗随着深度学习模型的日益复杂和庞大，其在资源受限的设备（如移动端、嵌入式设备）上的部署面临着巨大的挑战。即使在服务器端，部署大型模型也会带来高昂的计算成本和能源消耗。模型量化(ModelQuantization)作为一种关键的模型压缩和加速技术应运而生。其核心思想是将模型中的浮点数（通常是FP32或FP16）表示的权重和激活值转换为低精度整数（
《探秘人工智能与鸿蒙系统集成开发的硬件基石》程序猿阿伟人工智能 harmonyos 华为
在科技飞速发展的当下，人工智能与鸿蒙系统的集成开发开辟了创新的前沿领域。这一融合不仅代表着技术的演进，更预示着智能设备生态的全新变革。而在这场技术盛宴的背后，坚实的硬件配置是确保开发顺利进行的关键，它就像一座大厦的基石，决定了上层建筑的高度和稳定性。处理器：运算核心的澎湃动力处理器作为硬件系统的核心，在人工智能与鸿蒙系统集成开发中扮演着至关重要的角色。对于模型训练任务，尤其是深度学习模型，其复杂的
嵌入式AI必备技能2-模型的压缩与加速奥德彪123 嵌入式AI 人工智能嵌入式
嵌入式AI必备技能2-模型的压缩与加速引言随着嵌入式AI设备的广泛应用，模型的计算效率和存储需求成为核心挑战。由于嵌入式系统通常资源受限，传统的深度学习模型往往难以直接部署。因此，模型压缩和加速技术应运而生，旨在减少计算量、降低存储需求，同时尽可能保持模型的准确性。本文介绍几种常见的模型压缩与加速方法，包括剪枝、低秩分解、量化、权值共享、知识蒸馏等，并探讨如何综合应用这些技术来优化AI模型。1.常
NVIDIA显卡型号有哪些？怎么知道自己电脑的型号？可靠的豆包蟹同志杂烩积累经验分享
NVIDIA显卡型号显卡分N卡和A卡，这个N卡指的是英伟达（NVIDIA），A卡之前是ATI（后来被AMD收购），现在的A卡指的就是AMD显卡。如果是为了玩游戏或者是学深度学习，选显卡肯定是要选N卡，因为A卡对于游戏优化的没有N卡好。（1）图中的GTX表示是英伟达的一个系列名称，全称叫GeForceGTX，GTX定位高端显卡系列，从低到高排名：GS/GT/GTS/GTX/RTX/Ultra，从20
英伟达系列显卡大解析B100、H200、L40S、A100 2301_78234743 java
家里有了变故。。。快手数分秋招一面面经我发现算法岗也不很难进啊(深度学习)算法想转数开…Java零基础校招学习路线突击版（吐血整理）等的花都谢了的华子最后给开了22k，武汉，应该是14a。不过在这几个月里我坚定了搞几年快钱回家和np朋友因骂了hr，boos被封了哈哈哈在央企想被开除需要做什么？2024小米分布式存储研发急招华为2012被毁意向我发现算法岗也不很难进啊(深度学习)在央企想被开除需要做
eBest AI Hub全场景接入Deepseek eBest数字化转型方案人工智能
一、技术赋能，智创未来Deepseek的强大基因将为eBest产品注入新的活力即时智能响应：融合海量行业智慧与互联网搜索精华，提供秒级智能建议；多模态理解能力：突破界限，无缝融合文本、代码与图像理解，精准解析用户的需求；进化式深度学习：不断学习，持续进化，为用户提供日益完善、超越期待的服务体验。二、全场景赋能，体验再次跃升1.智能报表-数据洞察，指尖掌控升级后的智能报表功能，能够根据查询和检
Prompt工程：大模型沟通指南（人工智能到大模型） Harry技术 AI prompt 人工智能
文章目录人工智能到大模型机器学习深度学习大模型Prompt工程：大模型沟通的桥梁在人工智能的广袤领域中，大模型无疑是最为璀璨的明珠之一。它仿佛是一座连接人类与人工智能的桥梁，让我们能够更加深入地探索和利用人工智能的强大能力。而要实现与大模型的高效沟通，Prompt工程扮演着至关重要的角色。让我们一起走进Prompt工程的奇妙世界，探寻大模型沟通的奥秘。人工智能到大模型“人工智能是一种模拟人类智能的
大模型生成人物关系思维导图的实战教程 herosunly 大模型生成人物关系生成思维导图实战教程
大家好，我是herosunly。985院校硕士毕业，现担任算法研究员一职，热衷于机器学习算法研究与应用。曾获得阿里云天池比赛第一名，CCF比赛第二名，科大讯飞比赛第三名。拥有多项发明专利。对机器学习和深度学习拥有自己独到的见解。曾经辅导过若干个非计算机专业的学生进入到算法行业就业。希望和大家一起成长进步。本文主要介绍了大模型生成人物关系思维导图的实战教程，希望对使用大语言模型的同学们有所帮
pytorch实现cifar10多分类总结 L_pyu 人工智能 pytorch 分类
cifar-10简介：CIFAR-10是一个常用的图像分类数据集，每张图片都是3×32×32，3通道彩色图片，分辨率32×32。它包含了10个不同类别，每个类别有6000张图像，其中5000张用于训练，1000张用于测试。这10个类别分别为：飞机、汽车、鸟类、猫、鹿、狗、青蛙、马、船和卡车。CIFAR-10分类任务是将这些图像正确地分类到它们所属的类别中。对于这个任务，可以使用深度学习模型，如卷积
数据挖掘技术介绍柒柒钏数据挖掘数据挖掘人工智能
数据挖掘技术介绍分类聚类关联规则挖掘预测异常检测特征选择与降维文本挖掘序列模式挖掘深度学习集成学习数据挖掘（DataMining）是一种从大量数据中提取有用信息和模式的技术，旨在从数据中发现隐藏的规律、趋势或关系，从而为决策提供支持。分类定义：是一种监督学习方法，用于将数据分为不同的类别。功能：根据已标记的训练数据，学习一个模型，用于预测新数据的类别。方法：决策树、支持向量机、神经网络、逻辑回归、
深度学习在医疗影像诊断中的应用与实现 Evaporator Core #DeepSeek快速入门人工智能 #深度学习深度学习人工智能
引言随着人工智能技术的快速发展，深度学习在医疗领域的应用日益广泛，尤其是在医疗影像诊断方面。医疗影像数据量大、复杂度高，传统的诊断方法往往依赖于医生的经验，容易受到主观因素的影响。而深度学习通过自动学习特征，能够从海量数据中提取出有用的信息，辅助医生进行更精准的诊断。本文将探讨深度学习在医疗影像诊断中的应用，并通过代码示例展示如何实现一个简单的医疗影像分类模型。深度学习在医疗影像诊断中的应用1.图
图神经网络学习笔记—高级小批量处理（专题十四） AI专题精讲图神经网络入门到精通人工智能
小批量（mini-batch）的创建对于让深度学习模型的训练扩展到海量数据至关重要。与逐条处理样本不同，小批量将一组样本组合成一个统一的表示形式，从而可以高效地并行处理。在图像或语言领域，这一过程通常通过将每个样本缩放或填充为相同大小的形状来实现，然后将样本在一个额外的维度中分组。该维度的长度等于小批量中分组的样本数量，通常称为batch_size。由于图是能够容纳任意数量节点或边的最通用的数据结
每天五分钟玩转深度学习PyTorch：基于GoogLeNet完成CAFIR10分类每天五分钟玩转人工智能深度学习框架pytorch 深度学习 pytorch 分类 GoogLeNet 人工智能 CAFIR10
本文重点前面我们终于使用pytorch搭建了GoogLeNet，本文我们使用该网络模型解决一个实际问题，也就是使用它完成CAFIR10分类，其实就这些任务而言，我们只要搭建好模型，然后把数据喂进去就行了，其它的地方都是一样的，就是网络模型不一样。代码
Deepseek:物理神经网络PINN入门教程天一生水water 神经网络人工智能深度学习
一、物理信息网络（PINN）的概念与原理1.定义与来源物理信息网络（Physics-InformedNeuralNetworks,PINN）是一种将物理定律（如偏微分方程、守恒定律等）嵌入神经网络训练过程的深度学习方法。其核心思想是通过神经网络同时拟合观测数据并满足物理约束，从而解决传统数值方法难以处理的高维、噪声数据或复杂边界条件问题。来源：PINN起源于对传统数值方法局限性的改进需求（如网格生
深度学习项目--基于DenseNet网络的“乳腺癌图像识别”，准确率90%+，pytorch复现羊小猪~~ 深度学习网络 pytorch 人工智能 python 机器学习分类
本文为365天深度学习训练营中的学习记录博客原作者：K同学啊前言如果说最经典的神经网络，ResNet肯定是一个，从ResNet发布后，很多人做了修改，denseNet网络无疑是最成功的一个，它采用密集型连接，将通道数连接在一起；本文是基于上一篇复现DenseNet121模型，做一个乳腺癌图像识别，效果还行，准确率0.9+;CNN经典网络之“DenseNet”简介，源码研究与复现(pytorch)：
谈为什么KLA和Camtech公司为什么可以做到，半导体那边，晶圆，键合可以做到不管哪款新产品进来。编程2小时，上线后准确率可以直接做到99.9%、 *Major* 机器视觉
谈为什么KLA和Camtech公司为什么可以做到，半导体那边，晶圆，键合可以做到不管哪款新产品进来。编程2小时，上线后准确率可以直接做到99.9%、这么里面的AI原理没什么，还是这些公司把AI技术层面用出花了，一是他们有公司可能比较成立时间长，数据丰富。二是像AI深度学习网络冻结，或者自适应调参，都是一些AI技巧，他们用的比较好。三什么跨层特征解耦，实现的基础是他们对半导体理解比较深刻KLA和Ca
AI 之路——数据分析（1）Pandas小结与框架整理 Robin_Pi 机器学习之路数据分析数据分析 python 人工智能可视化
目录1.写在前面1.1AI之路：1.2工具/技能：2.数据分析2.1数据分析的流程2.2数据的基本操作方法2.2.1Pandas概览2.2.2使用Pandas操作数据的核心(1)选择数据(2)操作数据2.2.2数据详解3.写在最后1.写在前面主要是阶段性框架总结1.1AI之路：数据分析——机器学习——深度学习——CV/NLP1.2工具/技能：Python、NumPy、Pandas、Matplotl
PyTorch 深度学习实战（13）：Proximal Policy Optimization (PPO) 算法进取星辰 PyTorch 深度学习实战深度学习 pytorch 算法
在上一篇文章中，我们介绍了Actor-Critic算法，并使用它解决了CartPole问题。本文将深入探讨ProximalPolicyOptimization(PPO)算法，这是一种更稳定、更高效的策略优化方法。我们将使用PyTorch实现PPO算法，并应用于经典的CartPole问题。一、PPO算法基础PPO是OpenAI提出的一种强化学习算法，旨在解决策略梯度方法中的训练不稳定问题。PPO通过
人工智能概念 zhangpeng455547940 计算机人工智能
机器学习、深度学习、大模型机器学习提供框架，使得系统可以从数据中学习算法：线性回归、逻辑回归、支持向量机、决策树、随机森林、K近邻算法深度学习是实现这一目标的工具，模仿人脑，使用多层神经网络进行学习算法：多层感知器、卷积神经网络、循环神经网络、长短期记忆网络大模型指参数量巨大的深度学习模型人工智能应用：自然语言处理、图像识别与生成、语音识别、政务与企业服务...
机器学习(二) 本文(2.5万字) | KNN算法原理及Python复现 | 小酒馆燃着灯机器学习算法 k近邻算法
文章目录一KNN算法原理二KNN三要素三机器学习中标准化四KNN分类预测规则五KNN回归预测规则六KNN算法实现方式七KDTree7.1构造KDtree7.2KDtree查找最近邻八KNN特点九KNN算法实现案例一案例二1.机器学习2.深度学习与目标检测3.YOLOv54.YOLOv5改进5.YOLOv8及其改进6.Python与PyTorch7.工具8.小知识点9.杂记一KNN算法原理K近邻分类
再添殊荣！移远通信工业智能品牌宝维塔™斩获AI创新应用奖移远通信算力人工智能工业智能
12月24日，2024中国物联网产业大会暨第21届慧聪品牌盛会在深圳圆满落幕。会上，移远通信凭借其工业智能品牌宝维塔™在推动AI技术落地与应用创新方面的卓越贡献，获颁“AI创新应用奖”。作为科技发展的前沿力量，AI技术正深刻改变着各行各业的生产模式和效率，尤其在工业领域，展现出了巨大潜力。宝维塔™是移远通信精心打造的工业智能品牌，专注于将人工智能、边缘计算、机器视觉、深度学习、软件算法平台等前沿技
验证码识别：使用OCR技术识别图形验证码详解数据知道 2025年爬虫和逆向教程 ocr python 爬虫 OCR识别验证码识别图片验证码
文章目录一、基本原理二、所需工具2.1Python环境2.2图像处理库2.3OCR引擎2.4Python接口三、实现步骤3.1获取验证码图像3.2图像预处理3.3使用OCR进行字符识别3.4基本OCR识别样例四、提高识别准确率的方法4.1字符分割4.2使用深度学习模型4.3数据增强4.4集成多个OCR引擎五、实际应用中的注意事项六、总结验证码（CAPTCHA）是一种用于区分人类用户和自动化程序的安
从LayerNorm到RMSNorm：深度学习归一化技术的进化！qwen2.5的技术。 KangkangLoveNLP qwen2.5 深度学习人工智能 transformer pytorch 自然语言处理 python 神经网络
RMSNorm（RootMeanSquareNormalization，均方根归一化）是一种用于深度学习的归一化技术，是LayerNorm（层归一化）的一种改进。它通过计算输入数据的均方根（RootMeanSquare,RMS）来进行归一化，避免了传统归一化方法中均值和方差的计算1.LayerNorm（层归一化）LayerNorm（层归一化）是一种用于深度学习的归一化技术，主要用于稳定训练过程、加
【漫话机器学习系列】137.随机搜索（Randomized Search） IT古董漫话机器学习系列专辑机器学习人工智能
随机搜索（RandomizedSearch）详解在机器学习和深度学习的模型训练过程中，超参数调优（HyperparameterTuning）是至关重要的一环。随机搜索（RandomizedSearch）是一种高效的超参数优化方法，它通过在候选超参数的数值分布（如正态分布、均匀分布等）中随机选择超参数组合，从而找到最优的超参数配置。1.超参数调优的必要性超参数是模型在训练之前需要人为设定的参数，例如
医学人工智能影像诊断数据收集与整理 V搜xhliang0246 人工智能健康医疗算法
在医学领域中，人工智能（AI）尤其是深度学习技术，已经被广泛应用于医学影像的分析和诊断。为了训练这些模型，需要大量的高质量标注数据。下面我会给出一个简单的示例流程，介绍如何收集、整理和准备医学影像数据集，并提供一些基础的Python代码示例。数据收集首先，你需要收集包含医学影像的数据集。这些数据通常来自医院或研究机构，并且需要经过伦理审查和患者同意。示例数据集假设我们有一个包含肺部X光片的数据集，
深度学习模块缝合教程：从理论到实践 RockLiu@805 深度学习模块机器视觉深度学习人工智能
深度学习模块缝合教程：从理论到实践引言随着深度学习的不断发展，模型的设计与优化成为研究者关注的核心问题之一。如何有效地“缝合”不同模块，以实现更高效的计算和更强大的功能，是当前深度学习研究中的一个重要课题。在本文中，我们将从基础概念出发，详细探讨深度学习模块缝合的方法、技巧及其应用场景。无论是理论深厚的研究者还是实验导向的实践者，都可以从中获得启发。一、深度学习基础知识详解深度学习是人工智能领域的
SAX解析xml文件小猪猪08 xml
1.创建SAXParserFactory实例 2.通过SAXParserFactory对象获取SAXParser实例 3.创建一个类SAXParserHander继续DefaultHandler，并且实例化这个类 4.SAXParser实例的parse来获取文件 public static void main(String[] args) { //
为什么mysql里的ibdata1文件不断的增长？ brotherlamp linux linux运维 linux资料 linux视频 linux运维自学
我们在 Percona 支持栏目经常收到关于 MySQL 的 ibdata1 文件的这个问题。当监控服务器发送一个关于 MySQL 服务器存储的报警时，恐慌就开始了 —— 就是说磁盘快要满了。一番调查后你意识到大多数地盘空间被 InnoDB 的共享表空间 ibdata1 使用。而你已经启用了 innodbfileper_table，所以问题是： ibdata1存了什么？当你启用了 i
Quartz-quartz.properties配置 eksliang quartz
其实Quartz JAR文件的org.quartz包下就包含了一个quartz.properties属性配置文件并提供了默认设置。如果需要调整默认配置，可以在类路径下建立一个新的quartz.properties，它将自动被Quartz加载并覆盖默认的设置。下面是这些默认值的解释 #-----集群的配置 org.quartz.scheduler.instanceName =
informatica session的使用 18289753290 workflow session log Informatica
如果希望workflow存储最近20次的log，在session里的Config Object设置，log options做配置，save session log :sessions run ;savesessio log for these runs:20 session下面的source 里面有个tracing
Scrapy抓取网页时出现CRC check failed 0x471e6e9a != 0x7c07b839L的错误酷的飞上天空 scrapy
Scrapy版本0.14.4 出现问题现象： ERROR: Error downloading <GET http://xxxxx CRC check failed 解决方法 1.设置网络请求时的header中的属性'Accept-Encoding': '*;q=0' 明确表示不支持任何形式的压缩格式，避免程序的解压
java Swing小集锦永夜-极光 java swing
1.关闭窗体弹出确认对话框 1.1 this.setDefaultCloseOperation (JFrame.DO_NOTHING_ON_CLOSE); 1.2 this.addWindowListener ( new WindowAdapter () { public void windo
强制删除.svn文件夹随便小屋 java
在windows上，从别处复制的项目中可能带有.svn文件夹，手动删除太麻烦，并且每个文件夹下都有。所以写了个程序进行删除。因为.svn文件夹在windows上是只读的，所以用File中的delete()和deleteOnExist()方法都不能将其删除，所以只能采用windows命令方式进行删除
GET和POST有什么区别？及为什么网上的多数答案都是错的。 aijuans get post
如果有人问你，GET和POST，有什么区别？你会如何回答？我的经历前几天有人问我这个问题。我说GET是用于获取数据的，POST，一般用于将数据发给服务器之用。这个答案好像并不是他想要的。于是他继续追问有没有别的区别？我说这就是个名字而已，如果服务器支持，他完全可以把G
谈谈新浪微博背后的那些算法 aoyouzi 谈谈新浪微博背后的那些算法
本文对微博中常见的问题的对应算法进行了简单的介绍，在实际应用中的算法比介绍的要复杂的多。当然，本文覆盖的主题并不全，比如好友推荐、热点跟踪等就没有涉及到。但古人云“窥一斑而见全豹”，希望本文的介绍能帮助大家更好的理解微博这样的社交网络应用。微博是一个很多人都在用的社交应用。天天刷微博的人每天都会进行着这样几个操作：原创、转发、回复、阅读、关注、@等。其中，前四个是针对短博文，最后的关注和@则针
Connection reset 连接被重置的解决方法百合不是茶 java 字符流连接被重置
流是java的核心部分,,昨天在做android服务器连接服务器的时候出了问题,就将代码放到java中执行,结果还是一样连接被重置被重置的代码如下; 客户端代码; package 通信软件服务器; import java.io.BufferedWriter; import java.io.OutputStream; import java.io.O
web.xml配置详解之filter bijian1013 java web.xml filter
一.定义 <filter> <filter-name>encodingfilter</filter-name> <filter-class>com.my.app.EncodingFilter</filter-class> <init-param> <param-name>encoding<
Heritrix Bill_chen 多线程 xml 算法制造配置管理
作为纯Java语言开发的、功能强大的网络爬虫Heritrix，其功能极其强大，且扩展性良好，深受热爱搜索技术的盆友们的喜爱，但它配置较为复杂，且源码不好理解，最近又使劲看了下，结合自己的学习和理解，跟大家分享Heritrix的点点滴滴。 Heritrix的下载（http://sourceforge.net/projects/archive-crawler/）安装、配置，就不罗嗦了，可以自己找找资
【Zookeeper】FAQ bit1129 zookeeper
1.脱离IDE，运行简单的Java客户端程序 #ZkClient是简单的Zookeeper~$ java -cp "./:zookeeper-3.4.6.jar:./lib/*" ZKClient 1. Zookeeper是的Watcher回调是同步操作，需要添加异步处理的代码 2. 如果Zookeeper集群跨越多个机房，那么Leader/
The user specified as a definer ('aaa'@'localhost') does not exist 白糖_ localhost
今天遇到一个客户BUG，当前的jdbc连接用户是root，然后部分删除操作都会报下面这个错误：The user specified as a definer ('aaa'@'localhost') does not exist 最后找原因发现删除操作做了触发器，而触发器里面有这样一句 /*!50017 DEFINER = ''aaa@'localhost' */ 原来最初
javascript中showModelDialog刷新父页面 bozch JavaScript 刷新父页面 showModalDialog
在页面中使用showModalDialog打开模式子页面窗口的时候，如果想在子页面中操作父页面中的某个节点，可以通过如下的进行： window.showModalDialog('url',self,‘status...’); // 首先中间参数使用self 在子页面使用w
编程之美-买书折扣 bylijinnan 编程之美
import java.util.Arrays; public class BookDiscount { /**编程之美买书折扣书上的贪心算法的分析很有意思，我看了半天看不懂，结果作者说，贪心算法在这个问题上是不适用的。。下面用动态规划实现。哈利波特这本书一共有五卷，每卷都是8欧元，如果读者一次购买不同的两卷可扣除5%的折扣，三卷10%，四卷20%，五卷
关于struts2.3.4项目跨站执行脚本以及远程执行漏洞修复概要 chenbowen00 struts WEB安全
因为近期负责的几个银行系统软件，需要交付客户，因此客户专门请了安全公司对系统进行了安全评测，结果发现了诸如跨站执行脚本，远程执行漏洞以及弱口令等问题。下面记录下本次解决的过程以便后续 1、首先从最简单的开始处理，服务器的弱口令问题，首先根据安全工具提供的测试描述中发现应用服务器中存在一个匿名用户，默认是不需要密码的，经过分析发现服务器使用了FTP协议，而使用ftp协议默认会产生一个匿名用
[电力与暖气]煤炭燃烧与电力加温 comsci
在宇宙中,用贝塔射线观测地球某个部分,看上去,好像一个个马蜂窝,又像珊瑚礁一样,原来是某个国家的采煤区..... 不过,这个采煤区的煤炭看来是要用完了.....那么依赖将起燃烧并取暖的城市,在极度严寒的季节中...该怎么办呢? &nbs
oracle O7_DICTIONARY_ACCESSIBILITY参数 daizj oracle
O7_DICTIONARY_ACCESSIBILITY参数控制对数据字典的访问.设置为true,如果用户被授予了如select any table等any table权限,用户即使不是dba或sysdba用户也可以访问数据字典.在9i及以上版本默认为false,8i及以前版本默认为true.如果设置为true就可能会带来安全上的一些问题.这也就为什么O7_DICTIONARY_ACCESSIBIL
比较全面的MySQL优化参考 dengkane mysql
本文整理了一些MySQL的通用优化方法，做个简单的总结分享，旨在帮助那些没有专职MySQL DBA的企业做好基本的优化工作，至于具体的SQL优化，大部分通过加适当的索引即可达到效果，更复杂的就需要具体分析了，可以参考本站的一些优化案例或者联系我，下方有我的联系方式。这是上篇。 1、硬件层相关优化 1.1、CPU相关在服务器的BIOS设置中，可
C语言homework2，有一个逆序打印数字的小算法 dcj3sjt126com c
#h1# 0、完成课堂例子 1、将一个四位数逆序打印 1234 ==> 4321 实现方法一： # include <stdio.h> int main(void) { int i = 1234; int one = i%10; int two = i / 10 % 10; int three = i / 100 % 10;
apacheBench对网站进行压力测试 dcj3sjt126com apachebench
ab 的全称是 ApacheBench ，是 Apache 附带的一个小工具，专门用于 HTTP Server 的 benchmark testing ，可以同时模拟多个并发请求。前段时间看到公司的开发人员也在用它作一些测试，看起来也不错，很简单，也很容易使用，所以今天花一点时间看了一下。通过下面的一个简单的例子和注释，相信大家可以更容易理解这个工具的使用。
2种办法让HashMap线程安全 flyfoxs java jdk jni
多线程之--2种办法让HashMap线程安全多线程之--synchronized 和reentrantlock的优缺点多线程之--2种JAVA乐观锁的比较( NonfairSync VS. FairSync) HashMap不是线程安全的,往往在写程序时需要通过一些方法来回避.其实JDK原生的提供了2种方法让HashMap支持线程安全.
Spring Security（04）——认证简介 234390216 Spring Security 认证过程
认证简介目录 1.1 认证过程 1.2 Web应用的认证过程 1.2.1 ExceptionTranslationFilter 1.2.2 在request之间共享SecurityContext 1
Java 位运算 Javahuhui java 位运算
// 左移( << ) 低位补0 // 0000 0000 0000 0000 0000 0000 0000 0110 然后左移2位后，低位补0： // 0000 0000 0000 0000 0000 0000 0001 1000 System.out.println(6 << 2);// 运行结果是24 // 右移( >> ) 高位补"
mysql免安装版配置 ldzyz007 mysql
1、my-small.ini是为了小型数据库而设计的。不应该把这个模型用于含有一些常用项目的数据库。 2、my-medium.ini是为中等规模的数据库而设计的。如果你正在企业中使用RHEL,可能会比这个操作系统的最小RAM需求(256MB)明显多得多的物理内存。由此可见，如果有那么多RAM内存可以使用，自然可以在同一台机器上运行其它服务。 3、my-large.ini是为专用于一个SQL数据
MFC和ado数据库使用时遇到的问题你不认识的休道人 sql C++mfc
=================================================================== 第一个 =================================================================== try{ CString sql; sql.Format("select * from p
表单重复提交Double Submits rensanning double
可能发生的场景： *多次点击提交按钮 *刷新页面 *点击浏览器回退按钮 *直接访问收藏夹中的地址 *重复发送HTTP请求（Ajax）（1）点击按钮后disable该按钮一会儿，这样能避免急躁的用户频繁点击按钮。这种方法确实有些粗暴，友好一点的可以把按钮的文字变一下做个提示，比如Bootstrap的做法： http://getbootstrap.co
Java String 十大常见问题 tomcat_oracle java 正则表达式
　1.字符串比较，使用“==”还是equals()? 　　"=="判断两个引用的是不是同一个内存地址(同一个物理对象)。　　equals()判断两个字符串的值是否相等。　　除非你想判断两个string引用是否同一个对象，否则应该总是使用equals()方法。　　如果你了解字符串的驻留(String Interning)则会更好地理解这个问题。　　
SpringMVC 登陆拦截器实现登陆控制 xp9802 springMVC
思路，先登陆后，将登陆信息存储在session中，然后通过拦截器，对系统中的页面和资源进行访问拦截，同时对于登陆本身相关的页面和资源不拦截。实现方法： 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23

Bert源代码（二）模型