少年冬郎

深度学习笔记（2）：caffe 加新层 Attention LSTM layer

上一篇文章，详细地分析了LSTM layer 的源码和流程图，本篇将在caffe加入新层，Attention lstm layer。

在代码之前，我们先分析一下一些论文里的attention model 的公式和流程图。

(1): Recurrent Models of Visual Attention

A、Glimpse Sensor: 在t时刻，选取不同大小的区域，组合成数据ρ
B、Glimpse Network:图片局部信息与位置信息整合
C、Model Architecture:ht-1隐藏记忆单元,与gt，生成新的ht，并以此生成attention,即感兴趣的地方。

具体公式推导可以看论文和代码

（2）ACTION RECOGNITION USING VISUAL ATTENTION

大体思想是对提出的特征分割，即每张图片分割成49个部分（7X7），这样找出每张图片的关注地方，这里的图（b）有问题，作者的代码也反映出这一点。本文主要是在caffe里写出一个这样的Attention Lstm layer.

(3) Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

思想一致，具体推导看论文。

二：ALSTM Layer 代码

Alstm.cpp

#include 
#include 

#include "caffe/blob.hpp"
#include "caffe/common.hpp"
#include "caffe/filler.hpp"
#include "caffe/layer.hpp"
#include "caffe/sequence_layers.hpp"
#include "caffe/util/math_functions.hpp"

namespace caffe {

template 
void ALSTMLayer::RecurrentInputBlobNames(vector* names) const {
  names->resize(2);
  (*names)[0] = "h_0";
  (*names)[1] = "c_0";
}

template 
void ALSTMLayer::RecurrentOutputBlobNames(vector* names) const {
  names->resize(2);
  (*names)[0] = "h_" + this->int_to_str(this->T_);
  (*names)[1] = "c_T";
}

template 
void ALSTMLayer::OutputBlobNames(vector* names) const {
  names->resize(2);
  (*names)[0] = "h";
  (*names)[1] = "mask";
}

template 
void ALSTMLayer::FillUnrolledNet(NetParameter* net_param) const {
  const int num_output = this->layer_param_.recurrent_param().num_output();
  CHECK_GT(num_output, 0) << "num_output must be positive";
  const FillerParameter& weight_filler =
      this->layer_param_.recurrent_param().weight_filler();
  const FillerParameter& bias_filler =
      this->layer_param_.recurrent_param().bias_filler();

  // Add generic LayerParameter's (without bottoms/tops) of layer types we'll
  // use to save redundant code.
  LayerParameter hidden_param;
  hidden_param.set_type("InnerProduct");
  hidden_param.mutable_inner_product_param()->set_num_output(num_output * 4);
  hidden_param.mutable_inner_product_param()->set_bias_term(false);
  hidden_param.mutable_inner_product_param()->set_axis(1);
  hidden_param.mutable_inner_product_param()->
      mutable_weight_filler()->CopyFrom(weight_filler);

  LayerParameter biased_hidden_param(hidden_param);
  biased_hidden_param.mutable_inner_product_param()->set_bias_term(true);
  biased_hidden_param.mutable_inner_product_param()->
      mutable_bias_filler()->CopyFrom(bias_filler);

  LayerParameter attention_param;
  attention_param.set_type("InnerProduct");
  attention_param.mutable_inner_product_param()->set_num_output(256);
  attention_param.mutable_inner_product_param()->set_bias_term(false);
  attention_param.mutable_inner_product_param()->set_axis(2);
  attention_param.mutable_inner_product_param()->
      mutable_weight_filler()->CopyFrom(weight_filler);

  LayerParameter biased_attention_param(attention_param);
  biased_attention_param.mutable_inner_product_param()->set_bias_term(true);
  biased_attention_param.mutable_inner_product_param()->
      mutable_bias_filler()->CopyFrom(bias_filler); // weight + bias

  LayerParameter sum_param;
  sum_param.set_type("Eltwise");
  sum_param.mutable_eltwise_param()->set_operation(
      EltwiseParameter_EltwiseOp_SUM);

  LayerParameter slice_param;
  slice_param.set_type("Slice");
  slice_param.mutable_slice_param()->set_axis(0);

  LayerParameter softmax_param;
  softmax_param.set_type("Softmax");
  softmax_param.mutable_softmax_param()->set_axis(-1);

  LayerParameter split_param;
  split_param.set_type("Split");

  LayerParameter scale_param;
  scale_param.set_type("Scale");

  LayerParameter permute_param;
  permute_param.set_type("Permute");

  LayerParameter reshape_param;
  reshape_param.set_type("Reshape");

  LayerParameter bias_layer_param;
  bias_layer_param.set_type("Bias");

  LayerParameter pool_param;
  pool_param.set_type("Pooling");

  LayerParameter reshape_layer_param;
  reshape_layer_param.set_type("Reshape");

  BlobShape input_shape;
  input_shape.add_dim(1);  // c_0 and h_0 are a single timestep
  input_shape.add_dim(this->N_);
  input_shape.add_dim(num_output);

  net_param->add_input("c_0");
  net_param->add_input_shape()->CopyFrom(input_shape);

  net_param->add_input("h_0");
  net_param->add_input_shape()->CopyFrom(input_shape);

  LayerParameter* cont_slice_param = net_param->add_layer();
  cont_slice_param->CopyFrom(slice_param);
  cont_slice_param->set_name("cont_slice");
  cont_slice_param->add_bottom("cont");
  cont_slice_param->mutable_slice_param()->set_axis(1);

  LayerParameter* x_slice_param = net_param->add_layer();
  x_slice_param->CopyFrom(slice_param);
  x_slice_param->set_name("x_slice");
  x_slice_param->add_bottom("x");

  // Add layer to transform all timesteps of x to the hidden state dimension.
  //     W_xc_x = W_xc * x + b_c
/*
  {
    LayerParameter* x_transform_param = net_param->add_layer();
    x_transform_param->CopyFrom(biased_hidden_param);
    x_transform_param->set_name("x_transform");
    x_transform_param->add_param()->set_name("W_xc");
    x_transform_param->add_param()->set_name("b_c");
    x_transform_param->add_bottom("x");
    x_transform_param->add_top("W_xc_x");
  }

  if (this->static_input_) {
    // Add layer to transform x_static to the gate dimension.
    //     W_xc_x_static = W_xc_static * x_static
    LayerParameter* x_static_transform_param = net_param->add_layer();
    x_static_transform_param->CopyFrom(hidden_param);
    x_static_transform_param->mutable_inner_product_param()->set_axis(1);
    x_static_transform_param->set_name("W_xc_x_static");
    x_static_transform_param->add_param()->set_name("W_xc_static");
    x_static_transform_param->add_bottom("x_static");
    x_static_transform_param->add_top("W_xc_x_static");

    LayerParameter* reshape_param = net_param->add_layer();
    reshape_param->set_type("Reshape");
    BlobShape* new_shape =
         reshape_param->mutable_reshape_param()->mutable_shape();
    new_shape->add_dim(1);  // One timestep.
    new_shape->add_dim(this->N_);
    new_shape->add_dim(
        x_static_transform_param->inner_product_param().num_output());
    reshape_param->add_bottom("W_xc_x_static");
    reshape_param->add_top("W_xc_x_static");
  }


  LayerParameter* x_slice_param = net_param->add_layer();
  x_slice_param->CopyFrom(slice_param);
  x_slice_param->add_bottom("W_xc_x");
  x_slice_param->set_name("W_xc_x_slice");
*/

  LayerParameter output_concat_layer;
  output_concat_layer.set_name("h_concat");
  output_concat_layer.set_type("Concat");
  output_concat_layer.add_top("h");
  output_concat_layer.mutable_concat_param()->set_axis(0);

  LayerParameter output_m_layer;
  output_m_layer.set_name("m_concat");
  output_m_layer.set_type("Concat");
  output_m_layer.add_top("mask");
  output_m_layer.mutable_concat_param()->set_axis(0); // out put 2

  for (int t = 1; t <= this->T_; ++t) {
    string tm1s = this->int_to_str(t - 1);
    string ts = this->int_to_str(t);

    cont_slice_param->add_top("cont_" + ts);
    x_slice_param->add_top("x_" + ts);

    // Add a layer to permute x
    {
      LayerParameter* permute_x_param = net_param->add_layer();
      permute_x_param->CopyFrom(permute_param);
      permute_x_param->set_name("permute_x_" + ts);
      permute_x_param->mutable_permute_param()->add_order(2);
      permute_x_param->mutable_permute_param()->add_order(0);
      permute_x_param->mutable_permute_param()->add_order(1);
      permute_x_param->mutable_permute_param()->add_order(3);
      permute_x_param->add_bottom("x_" + ts);
      permute_x_param->add_top("x_p_" + ts);
    }
    //
     

    // Add a layer to generate attention weights
    {
      LayerParameter* att_m_param = net_param->add_layer();
      att_m_param->CopyFrom(biased_attention_param);
      att_m_param->set_name("att_m_" + tm1s);
      att_m_param->add_bottom("h_" + tm1s);
      att_m_param->add_top("m_" + tm1s);     //     }
   {
      LayerParameter* permute_x_a_param = net_param->add_layer();
      permute_x_a_param->CopyFrom(permute_param);
      permute_x_a_param->set_name("permute_x_a_" + ts);
      permute_x_a_param->mutable_permute_param()->add_order(0);
      permute_x_a_param->mutable_permute_param()->add_order(1);
      permute_x_a_param->mutable_permute_param()->add_order(3);
      permute_x_a_param->mutable_permute_param()->add_order(2);
      permute_x_a_param->add_bottom("x_" + ts);
      permute_x_a_param->add_top("x_p_a_" + ts);
    }  // here is to change!
    {
      LayerParameter* att_x_param = net_param->add_layer();
      att_x_param->CopyFrom(biased_attention_param);
      att_x_param->set_name("att_x_" + tm1s);
      att_x_param->mutable_inner_product_param()->set_axis(3);
      att_x_param->add_bottom("x_p_a_" + ts);
      att_x_param->add_top("m_x_" + tm1s);
    }    //  fc layer ,change output,dim 
   {
      LayerParameter* permute_x_a_p_param = net_param->add_layer();
      permute_x_a_p_param->CopyFrom(permute_param);
      permute_x_a_p_param->set_name("permute_x_a_p_" + ts);
      permute_x_a_p_param->mutable_permute_param()->add_order(2);
      permute_x_a_p_param->mutable_permute_param()->add_order(0);
      permute_x_a_p_param->mutable_permute_param()->add_order(1);
      permute_x_a_p_param->mutable_permute_param()->add_order(3);
      permute_x_a_p_param->add_bottom("m_x_" + tm1s);
      permute_x_a_p_param->add_top("m_x_a_" + tm1s);
    }
    {
      LayerParameter* m_sum_layer = net_param->add_layer();
      m_sum_layer->CopyFrom(bias_layer_param);
      m_sum_layer->set_name("mask_input_" + ts);
      m_sum_layer->add_bottom("m_x_a_" + tm1s);
      m_sum_layer->add_bottom("m_" + tm1s);
      m_sum_layer->add_top("m_input_" + tm1s);
    }
   {
      LayerParameter* att_x_ap_param = net_param->add_layer();
      att_x_ap_param->CopyFrom(biased_attention_param);
      att_x_ap_param->set_name("att_x_ap_" + tm1s);
      att_x_ap_param->mutable_inner_product_param()->set_axis(3);
      att_x_ap_param->mutable_inner_product_param()->set_num_output(1);
      att_x_ap_param->add_bottom("m_input_" + tm1s);
      att_x_ap_param->add_top("m_x_ap_" + tm1s);  //256---->1
    }
    {
      LayerParameter* permute_m_param = net_param->add_layer();
      permute_m_param->CopyFrom(permute_param);
      permute_m_param->set_name("permute_m_" + ts);
      permute_m_param->mutable_permute_param()->add_order(1);
      permute_m_param->mutable_permute_param()->add_order(2);
      permute_m_param->mutable_permute_param()->add_order(0);
      permute_m_param->mutable_permute_param()->add_order(3);
      permute_m_param->add_bottom("m_x_ap_" + tm1s);
      permute_m_param->add_top("m_f_" + tm1s);  //10*8*30*1
    }
    // Add a softmax layers to generate attention masks
    {
      LayerParameter* softmax_m_param = net_param->add_layer();
      softmax_m_param->CopyFrom(softmax_param);
      softmax_m_param->mutable_softmax_param()->set_axis(2);
      softmax_m_param->set_name("softmax_m_" + tm1s);
      softmax_m_param->add_bottom("m_f_" + tm1s);
      softmax_m_param->add_top("mask_" + tm1s);
    }
   
     {
      LayerParameter* reshape_m_param = net_param->add_layer();
      reshape_m_param->CopyFrom(reshape_layer_param);
      BlobShape* shape = reshape_m_param->mutable_reshape_param()->mutable_shape();
      shape->Clear();
      shape->add_dim(0);
      shape->add_dim(0);
      shape->add_dim(0);
      reshape_m_param->set_name("reshape_m_" + tm1s);
      reshape_m_param->add_bottom("mask_" + tm1s);
      reshape_m_param->add_top("mask_reshape_" + tm1s);
    }
    //Reshape mask from 1*6*36 to 1*6*6*6
    /*
    {
      LayerParameter* reshape_param = net_param->add_layer();
      reshape_param->set_type("Reshape");
      BlobShape* new_shape =
         reshape_param->mutable_reshape_param()->mutable_shape();
      new_shape->add_dim(1);  // One timestep.
      new_shape->add_dim(6);
      new_shape->add_dim(6);
      new_shape->add_dim(6);
      reshape_param->add_bottom("mask_" +tm1s);
      reshape_param->add_top("mask_reshape_" +tm1s);
    }*/
    // Conbine mask with input features
    {
      LayerParameter* scale_x_param = net_param->add_layer();
      scale_x_param->CopyFrom(scale_param);
      scale_x_param->set_name("scale_x_" + tm1s);
      scale_x_param->add_bottom("x_p_" + ts);
      scale_x_param->add_bottom("mask_reshape_" + tm1s);
      scale_x_param->add_top("x_mask_" + ts);
    }

    {
      LayerParameter* permute_x_mask_param = net_param->add_layer();
      permute_x_mask_param->CopyFrom(permute_param);
      permute_x_mask_param->set_name("permute_x_mask_" + ts);
      permute_x_mask_param->mutable_permute_param()->add_order(1);
      permute_x_mask_param->mutable_permute_param()->add_order(2);
      permute_x_mask_param->mutable_permute_param()->add_order(0);
      permute_x_mask_param->mutable_permute_param()->add_order(3);
      permute_x_mask_param->add_bottom("x_mask_" + ts);
      permute_x_mask_param->add_top("x_mask_p_" + ts);
    }

    {
      LayerParameter* reshape_x_param = net_param->add_layer();
      reshape_x_param->CopyFrom(reshape_param);
      reshape_x_param->set_name("reshape_x_" +ts);
      BlobShape* new_shape =
         reshape_x_param->mutable_reshape_param()->mutable_shape();
      new_shape->add_dim(this->N_);
      new_shape->add_dim(512);//512//384
      new_shape->add_dim(7);//7//6
      new_shape->add_dim(7);//7//6
      reshape_x_param->add_bottom("x_mask_p_" + ts);
      reshape_x_param->add_top("x_mask_reshape_"+ts);
    }

    {
      LayerParameter* pool_x_param = net_param->add_layer();
      pool_x_param->CopyFrom(pool_param);
      pool_x_param->set_name("pool_x_"+ts);
      pool_x_param->mutable_pooling_param()->set_pool(PoolingParameter_PoolMethod_SUM);
      pool_x_param->mutable_pooling_param()->set_kernel_size(7);//7//6
      pool_x_param->add_bottom("x_mask_reshape_"+ts);
      pool_x_param->add_top("x_pool_"+ts);
    }

    {
      LayerParameter* x_transform_param = net_param->add_layer();
      x_transform_param->CopyFrom(biased_hidden_param);
      x_transform_param->set_name("x_transform_" + ts);
      x_transform_param->add_param()->set_name("W_xc_" + ts);
      x_transform_param->add_param()->set_name("b_c" + ts);
      x_transform_param->add_bottom("x_pool_" +ts );
      x_transform_param->add_top("W_xc_x_"+ts);
    }

    {
      LayerParameter* x_transform_reshape_param = net_param->add_layer();
      x_transform_reshape_param->CopyFrom(reshape_param);
      x_transform_reshape_param->set_name("x_transform_reshape_" +ts);
      BlobShape* new_shape_r =
         x_transform_reshape_param->mutable_reshape_param()->mutable_shape();
      new_shape_r->add_dim(1);
      new_shape_r->add_dim(this->N_);
      new_shape_r->add_dim(num_output * 4);
      x_transform_reshape_param->add_bottom("W_xc_x_" + ts);
      x_transform_reshape_param->add_top("W_xc_x_r_"+ts);
    }
    // Add layers to flush the hidden state when beginning a new
    // sequence, as indicated by cont_t.
    //     h_conted_{t-1} := cont_t * h_{t-1}
    //
    // Normally, cont_t is binary (i.e., 0 or 1), so:
    //     h_conted_{t-1} := h_{t-1} if cont_t == 1
    //                       0   otherwise
    {
      LayerParameter* cont_h_param = net_param->add_layer();
      cont_h_param->CopyFrom(sum_param);
      cont_h_param->mutable_eltwise_param()->set_coeff_blob(true);
      cont_h_param->set_name("h_conted_" + tm1s);
      cont_h_param->add_bottom("h_" + tm1s);
      cont_h_param->add_bottom("cont_" + ts);
      cont_h_param->add_top("h_conted_" + tm1s);
    }

    // Add layer to compute
    //     W_hc_h_{t-1} := W_hc * h_conted_{t-1}
    {
      LayerParameter* w_param = net_param->add_layer();
      w_param->CopyFrom(hidden_param);
      w_param->set_name("transform_" + ts);
      w_param->add_param()->set_name("W_hc");
      w_param->add_bottom("h_conted_" + tm1s);
      w_param->add_top("W_hc_h_" + tm1s);
      w_param->mutable_inner_product_param()->set_axis(2);
    }

    // Add the outputs of the linear transformations to compute the gate input.
    //     gate_input_t := W_hc * h_conted_{t-1} + W_xc * x_t + b_c
    //                   = W_hc_h_{t-1} + W_xc_x_t + b_c
    {
      LayerParameter* input_sum_layer = net_param->add_layer();
      input_sum_layer->CopyFrom(sum_param);
      input_sum_layer->set_name("gate_input_" + ts);
      input_sum_layer->add_bottom("W_hc_h_" + tm1s);
      input_sum_layer->add_bottom("W_xc_x_r_" + ts);
      if (this->static_input_) {
        input_sum_layer->add_bottom("W_xc_x_static");
      }
      input_sum_layer->add_top("gate_input_" + ts);
    }

    // Add LSTMUnit layer to compute the cell & hidden vectors c_t and h_t.
    // Inputs: c_{t-1}, gate_input_t = (i_t, f_t, o_t, g_t), cont_t
    // Outputs: c_t, h_t
    //     [ i_t' ]
    //     [ f_t' ] := gate_input_t
    //     [ o_t' ]
    //     [ g_t' ]
    //         i_t := \sigmoid[i_t']
    //         f_t := \sigmoid[f_t']
    //         o_t := \sigmoid[o_t']
    //         g_t := \tanh[g_t']
    //         c_t := cont_t * (f_t .* c_{t-1}) + (i_t .* g_t)
    //         h_t := o_t .* \tanh[c_t]
    {
      LayerParameter* lstm_unit_param = net_param->add_layer();
      lstm_unit_param->set_type("LSTMUnit");
      lstm_unit_param->add_bottom("c_" + tm1s);
      lstm_unit_param->add_bottom("gate_input_" + ts);
      lstm_unit_param->add_bottom("cont_" + ts);
      lstm_unit_param->add_top("c_" + ts);
      lstm_unit_param->add_top("h_" + ts);
      lstm_unit_param->set_name("unit_" + ts);
    }
    output_concat_layer.add_bottom("h_" + ts);
    output_m_layer.add_bottom("mask_" + tm1s);
  }  // for (int t = 1; t <= this->T_; ++t)

  {
    LayerParameter* c_T_copy_param = net_param->add_layer();
    c_T_copy_param->CopyFrom(split_param);
    c_T_copy_param->add_bottom("c_" + this->int_to_str(this->T_));
    c_T_copy_param->add_top("c_T");
  }
  net_param->add_layer()->CopyFrom(output_concat_layer);
  net_param->add_layer()->CopyFrom(output_m_layer);
}

INSTANTIATE_CLASS(ALSTMLayer);
REGISTER_LAYER_CLASS(ALSTM);

}  // namespace caffe

sequence_layers.hpp

#ifndef CAFFE_SEQUENCE_LAYERS_HPP_
#define CAFFE_SEQUENCE_LAYERS_HPP_

#include 
#include 
#include 

#include "caffe/blob.hpp"
#include "caffe/common.hpp"
#include "caffe/layer.hpp"
#include "caffe/net.hpp"
#include "caffe/proto/caffe.pb.h"

namespace caffe {

template  class RecurrentLayer;

/**
 * @brief An abstract class for implementing recurrent behavior inside of an
 *        unrolled network.  This Layer type cannot be instantiated -- instaed,
 *        you should use one of its implementations which defines the recurrent
 *        architecture, such as RNNLayer or LSTMLayer.
 */
template 
class RecurrentLayer : public Layer {
 public:
  explicit RecurrentLayer(const LayerParameter& param)
      : Layer(param) {}
  virtual void LayerSetUp(const vector*>& bottom,
      const vector*>& top);
  virtual void Reshape(const vector*>& bottom,
      const vector*>& top);
  virtual void Reset();

  virtual inline const char* type() const { return "Recurrent"; }
  virtual inline int MinBottomBlobs() const { return 2; }
  virtual inline int MaxBottomBlobs() const { return 3; }
  //virtual inline int ExactNumTopBlobs() const { return 2; }
  virtual inline int MinTopBlobs() const {return 1; }
  virtual inline int MaxTopBlobs() const {return 2; }


  virtual inline bool AllowForceBackward(const int bottom_index) const {
    // Can't propagate to sequence continuation indicators.
    return bottom_index != 1;
  }

 protected:
  /**
   * @brief Fills net_param with the recurrent network arcthiecture.  Subclasses
   *        should define this -- see RNNLayer and LSTMLayer for examples.
   */
  virtual void FillUnrolledNet(NetParameter* net_param) const = 0;

  /**
   * @brief Fills names with the names of the 0th timestep recurrent input
   *        Blob&s.  Subclasses should define this -- see RNNLayer and LSTMLayer
   *        for examples.
   */
  virtual void RecurrentInputBlobNames(vector* names) const = 0;

  /**
   * @brief Fills names with the names of the Tth timestep recurrent output
   *        Blob&s.  Subclasses should define this -- see RNNLayer and LSTMLayer
   *        for examples.
   */
  virtual void RecurrentOutputBlobNames(vector* names) const = 0;

  /**
   * @brief Fills names with the names of the output blobs, concatenated across
   *        all timesteps.  Should return a name for each top Blob.
   *        Subclasses should define this -- see RNNLayer and LSTMLayer for
   *        examples.
   */
  virtual void OutputBlobNames(vector* names) const = 0;

  /**
   * @param bottom input Blob vector (length 2-3)
   *
   *   -# @f$ (T \times N \times ...) @f$
   *      the time-varying input @f$ x @f$.  After the first two axes, whose
   *      dimensions must correspond to the number of timesteps @f$ T @f$ and
   *      the number of independent streams @f$ N @f$, respectively, its
   *      dimensions may be arbitrary.  Note that the ordering of dimensions --
   *      @f$ (T \times N \times ...) @f$, rather than
   *      @f$ (N \times T \times ...) @f$ -- means that the @f$ N @f$
   *      independent input streams must be "interleaved".
   *
   *   -# @f$ (T \times N) @f$
   *      the sequence continuation indicators @f$ \delta @f$.
   *      These inputs should be binary (0 or 1) indicators, where
   *      @f$ \delta_{t,n} = 0 @f$ means that timestep @f$ t @f$ of stream
   *      @f$ n @f$ is the beginning of a new sequence, and hence the previous
   *      hidden state @f$ h_{t-1} @f$ is multiplied by @f$ \delta_t = 0 @f$
   *      and has no effect on the cell's output at timestep @f$ t @f$, and
   *      a value of @f$ \delta_{t,n} = 1 @f$ means that timestep @f$ t @f$ of
   *      stream @f$ n @f$ is a continuation from the previous timestep
   *      @f$ t-1 @f$, and the previous hidden state @f$ h_{t-1} @f$ affects the
   *      updated hidden state and output.
   *
   *   -# @f$ (N \times ...) @f$ (optional)
   *      the static (non-time-varying) input @f$ x_{static} @f$.
   *      After the first axis, whose dimension must be the number of
   *      independent streams, its dimensions may be arbitrary.
   *      This is mathematically equivalent to using a time-varying input of
   *      @f$ x'_t = [x_t; x_{static}] @f$ -- i.e., tiling the static input
   *      across the @f$ T @f$ timesteps and concatenating with the time-varying
   *      input.  Note that if this input is used, all timesteps in a single
   *      batch within a particular one of the @f$ N @f$ streams must share the
   *      same static input, even if the sequence continuation indicators
   *      suggest that difference sequences are ending and beginning within a
   *      single batch.  This may require padding and/or truncation for uniform
   *      length.
   *
   * @param top output Blob vector (length 1)
   *   -# @f$ (T \times N \times D) @f$
   *      the time-varying output @f$ y @f$, where @f$ D @f$ is
   *      recurrent_param.num_output().
   *      Refer to documentation for particular RecurrentLayer implementations
   *      (such as RNNLayer and LSTMLayer) for the definition of @f$ y @f$.
   */
  virtual void Forward_cpu(const vector*>& bottom,
      const vector*>& top);
  virtual void Forward_gpu(const vector*>& bottom,
      const vector*>& top);
  virtual void Backward_cpu(const vector*>& top,
      const vector& propagate_down, const vector*>& bottom);

  /// @brief A helper function, useful for stringifying timestep indices.
  virtual string int_to_str(const int t) const;

  /// @brief A Net to implement the Recurrent functionality.
  shared_ptr > unrolled_net_;

  /// @brief The number of independent streams to process simultaneously.
  int N_;

  /**
   * @brief The number of timesteps in the layer's input, and the number of
   *        timesteps over which to backpropagate through time.
   */
  int T_;

  /// @brief Whether the layer has a "static" input copied across all timesteps.
  bool static_input_;

  vector* > recur_input_blobs_;
  vector* > recur_output_blobs_;
  vector* > output_blobs_;
  Blob* x_input_blob_;
  Blob* x_static_input_blob_;
  Blob* cont_input_blob_;
};

/**
 * @brief Processes sequential inputs using a "Long Short-Term Memory" (LSTM)
 *        [1] style recurrent neural network (RNN). Implemented as a network
 *        unrolled the LSTM computation in time.
 *
 *
 * The specific architecture used in this implementation is as described in
 * "Learning to Execute" [2], reproduced below:
 *     i_t := \sigmoid[ W_{hi} * h_{t-1} + W_{xi} * x_t + b_i ]
 *     f_t := \sigmoid[ W_{hf} * h_{t-1} + W_{xf} * x_t + b_f ]
 *     o_t := \sigmoid[ W_{ho} * h_{t-1} + W_{xo} * x_t + b_o ]
 *     g_t :=    \tanh[ W_{hg} * h_{t-1} + W_{xg} * x_t + b_g ]
 *     c_t := (f_t .* c_{t-1}) + (i_t .* g_t)
 *     h_t := o_t .* \tanh[c_t]
 * In the implementation, the i, f, o, and g computations are performed as a
 * single inner product.
 *
 * Notably, this implementation lacks the "diagonal" gates, as used in the
 * LSTM architectures described by Alex Graves [3] and others.
 *
 * [1] Hochreiter, Sepp, and Schmidhuber, J黵gen. "Long short-term memory."
 *     Neural Computation 9, no. 8 (1997): 1735-1780.
 *
 * [2] Zaremba, Wojciech, and Sutskever, Ilya. "Learning to execute."
 *     arXiv preprint arXiv:1410.4615 (2014).
 *
 * [3] Graves, Alex. "Generating sequences with recurrent neural networks."
 *     arXiv preprint arXiv:1308.0850 (2013).
 */
template 
class LSTMLayer : public RecurrentLayer {
 public:
  explicit LSTMLayer(const LayerParameter& param)
      : RecurrentLayer(param) {}

  virtual inline const char* type() const { return "LSTM"; }

 protected:
  virtual void FillUnrolledNet(NetParameter* net_param) const;
  virtual void RecurrentInputBlobNames(vector* names) const;
  virtual void RecurrentOutputBlobNames(vector* names) const;
  virtual void OutputBlobNames(vector* names) const;
};

template 
class LSTMStaticLayer : public RecurrentLayer {
 public:
  explicit LSTMStaticLayer(const LayerParameter& param)
      : RecurrentLayer(param) {}

  virtual inline const char* type() const { return "LSTMStatic"; }

 protected:
  virtual void FillUnrolledNet(NetParameter* net_param) const;
  virtual void RecurrentInputBlobNames(vector* names) const;
  virtual void RecurrentOutputBlobNames(vector* names) const;
  virtual void OutputBlobNames(vector* names) const;
};

template 
class LSTMStaticNewLayer : public RecurrentLayer {
 public:
  explicit LSTMStaticNewLayer(const LayerParameter& param)
      : RecurrentLayer(param) {}

  virtual inline const char* type() const { return "LSTMStaticNew"; }

 protected:
  virtual void FillUnrolledNet(NetParameter* net_param) const;
  virtual void RecurrentInputBlobNames(vector* names) const;
  virtual void RecurrentOutputBlobNames(vector* names) const;
  virtual void OutputBlobNames(vector* names) const;
};

template 
class ASLSTMLayer : public RecurrentLayer {
 public:
  explicit ASLSTMLayer(const LayerParameter& param)
      : RecurrentLayer(param) {}

  virtual inline const char* type() const { return "ASLSTM"; }

 protected:
  virtual void FillUnrolledNet(NetParameter* net_param) const;
  virtual void RecurrentInputBlobNames(vector* names) const;
  virtual void RecurrentOutputBlobNames(vector* names) const;
  virtual void OutputBlobNames(vector* names) const;
};

template 
class ADLSTMLayer : public RecurrentLayer {
 public:
  explicit ADLSTMLayer(const LayerParameter& param)
      : RecurrentLayer(param) {}

  virtual inline const char* type() const { return "ADLSTM"; }

 protected:
  virtual void FillUnrolledNet(NetParameter* net_param) const;
  virtual void RecurrentInputBlobNames(vector* names) const;
  virtual void RecurrentOutputBlobNames(vector* names) const;
  virtual void OutputBlobNames(vector* names) const;
};

template 
class ALSTMLayer : public RecurrentLayer {
 public:
  explicit ALSTMLayer(const LayerParameter& param)
      : RecurrentLayer(param) {}

  virtual inline const char* type() const { return "ALSTM"; }

 protected:
  virtual void FillUnrolledNet(NetParameter* net_param) const;
  virtual void RecurrentInputBlobNames(vector* names) const;
  virtual void RecurrentOutputBlobNames(vector* names) const;
  virtual void OutputBlobNames(vector* names) const;
};

//coupled LSTM layer
template 
class CLSTMLayer : public RecurrentLayer {
 public:
  explicit CLSTMLayer(const LayerParameter& param)
      : RecurrentLayer(param) {}

  virtual inline const char* type() const { return "CLSTM"; }

 protected:
  virtual void FillUnrolledNet(NetParameter* net_param) const;
  virtual void RecurrentInputBlobNames(vector* names) const;
  virtual void RecurrentOutputBlobNames(vector* names) const;
  virtual void OutputBlobNames(vector* names) const;
};

//coupled LSTM layer
template 
class ACLSTMLayer : public RecurrentLayer {
 public:
  explicit ACLSTMLayer(const LayerParameter& param)
      : RecurrentLayer(param) {}

  virtual inline const char* type() const { return "ACLSTM"; }

 protected:
  virtual void FillUnrolledNet(NetParameter* net_param) const;
  virtual void RecurrentInputBlobNames(vector* names) const;
  virtual void RecurrentOutputBlobNames(vector* names) const;
  virtual void OutputBlobNames(vector* names) const;
};

template 
class ACTLSTMLayer : public RecurrentLayer {
 public:
  explicit ACTLSTMLayer(const LayerParameter& param)
      : RecurrentLayer(param) {}

  virtual inline const char* type() const { return "ACTLSTM"; }

 protected:
  virtual void FillUnrolledNet(NetParameter* net_param) const;
  virtual void RecurrentInputBlobNames(vector* names) const;
  virtual void RecurrentOutputBlobNames(vector* names) const;
  virtual void OutputBlobNames(vector* names) const;
};

template 
class ACSLSTMLayer : public RecurrentLayer {
 public:
  explicit ACSLSTMLayer(const LayerParameter& param)
      : RecurrentLayer(param) {}

  virtual inline const char* type() const { return "ACSLSTM"; }

 protected:
  virtual void FillUnrolledNet(NetParameter* net_param) const;
  virtual void RecurrentInputBlobNames(vector* names) const;
  virtual void RecurrentOutputBlobNames(vector* names) const;
  virtual void OutputBlobNames(vector* names) const;
};

template 
class ACSSLSTMLayer : public RecurrentLayer {
 public:
  explicit ACSSLSTMLayer(const LayerParameter& param)
      : RecurrentLayer(param) {}

  virtual inline const char* type() const { return "ACSSLSTM"; }

 protected:
  virtual void FillUnrolledNet(NetParameter* net_param) const;
  virtual void RecurrentInputBlobNames(vector* names) const;
  virtual void RecurrentOutputBlobNames(vector* names) const;
  virtual void OutputBlobNames(vector* names) const;
};
template 
class ACSSLSTMStaticLayer : public RecurrentLayer {
 public:
  explicit ACSSLSTMStaticLayer(const LayerParameter& param)
      : RecurrentLayer(param) {}

  virtual inline const char* type() const { return "ACSSLSTMStatic"; }

 protected:
  virtual void FillUnrolledNet(NetParameter* net_param) const;
  virtual void RecurrentInputBlobNames(vector* names) const;
  virtual void RecurrentOutputBlobNames(vector* names) const;
  virtual void OutputBlobNames(vector* names) const;
};
template 
class ATLSTMLayer : public RecurrentLayer {
 public:
  explicit ATLSTMLayer(const LayerParameter& param)
      : RecurrentLayer(param) {}

  virtual inline const char* type() const { return "ATLSTM"; }

 protected:
  virtual void FillUnrolledNet(NetParameter* net_param) const;
  virtual void RecurrentInputBlobNames(vector* names) const;
  virtual void RecurrentOutputBlobNames(vector* names) const;
  virtual void OutputBlobNames(vector* names) const;
};

/**
 * @brief A helper for LSTMLayer: computes a single timestep of the
 *        non-linearity of the LSTM, producing the updated cell and hidden
 *        states.
 */
template 
class LSTMUnitLayer : public Layer {
 public:
  explicit LSTMUnitLayer(const LayerParameter& param)
      : Layer(param) {}
  virtual void Reshape(const vector*>& bottom,
      const vector*>& top);

  virtual inline const char* type() const { return "LSTMUnit"; }
  virtual inline int ExactNumBottomBlobs() const { return 3; }
  virtual inline int ExactNumTopBlobs() const { return 2; }

  virtual inline bool AllowForceBackward(const int bottom_index) const {
    // Can't propagate to sequence continuation indicators.
    return bottom_index != 2;
  }

 protected:
  /**
   * @param bottom input Blob vector (length 3)
   *   -# @f$ (1 \times N \times D) @f$
   *      the previous timestep cell state @f$ c_{t-1} @f$
   *   -# @f$ (1 \times N \times 4D) @f$
   *      the "gate inputs" @f$ [i_t', f_t', o_t', g_t'] @f$
   *   -# @f$ (1 \times 1 \times N) @f$
   *      the sequence continuation indicators  @f$ \delta_t @f$
   * @param top output Blob vector (length 2)
   *   -# @f$ (1 \times N \times D) @f$
   *      the updated cell state @f$ c_t @f$, computed as:
   *          i_t := \sigmoid[i_t']
   *          f_t := \sigmoid[f_t']
   *          o_t := \sigmoid[o_t']
   *          g_t := \tanh[g_t']
   *          c_t := cont_t * (f_t .* c_{t-1}) + (i_t .* g_t)
   *   -# @f$ (1 \times N \times D) @f$
   *      the updated hidden state @f$ h_t @f$, computed as:
   *          h_t := o_t .* \tanh[c_t]
   */
  virtual void Forward_cpu(const vector*>& bottom,
      const vector*>& top);
  virtual void Forward_gpu(const vector*>& bottom,
      const vector*>& top);

  /**
   * @brief Computes the error gradient w.r.t. the LSTMUnit inputs.
   *
   * @param top output Blob vector (length 2), providing the error gradient with
   *        respect to the outputs
   *   -# @f$ (1 \times N \times D) @f$:
   *      containing error gradients @f$ \frac{\partial E}{\partial c_t} @f$
   *      with respect to the updated cell state @f$ c_t @f$
   *   -# @f$ (1 \times N \times D) @f$:
   *      containing error gradients @f$ \frac{\partial E}{\partial h_t} @f$
   *      with respect to the updated cell state @f$ h_t @f$
   * @param propagate_down see Layer::Backward.
   * @param bottom input Blob vector (length 3), into which the error gradients
   *        with respect to the LSTMUnit inputs @f$ c_{t-1} @f$ and the gate
   *        inputs are computed.  Computatation of the error gradients w.r.t.
   *        the sequence indicators is not implemented.
   *   -# @f$ (1 \times N \times D) @f$
   *      the error gradient w.r.t. the previous timestep cell state
   *      @f$ c_{t-1} @f$
   *   -# @f$ (1 \times N \times 4D) @f$
   *      the error gradient w.r.t. the "gate inputs"
   *      @f$ [
   *          \frac{\partial E}{\partial i_t}
   *          \frac{\partial E}{\partial f_t}
   *          \frac{\partial E}{\partial o_t}
   *          \frac{\partial E}{\partial g_t}
   *          ] @f$
   *   -# @f$ (1 \times 1 \times N) @f$
   *      the gradient w.r.t. the sequence continuation indicators
   *      @f$ \delta_t @f$ is currently not computed.
   */
  virtual void Backward_cpu(const vector*>& top,
      const vector& propagate_down, const vector*>& bottom);
  virtual void Backward_gpu(const vector*>& top,
      const vector& propagate_down, const vector*>& bottom);

  /// @brief The hidden and output dimension.
  int hidden_dim_;
  Blob X_acts_;
};

/**
 * @brief Processes time-varying inputs using a simple recurrent neural network
 *        (RNN). Implemented as a network unrolling the RNN computation in time.
 *
 * Given time-varying inputs @f$ x_t @f$, computes hidden state @f$
 *     h_t := \tanh[ W_{hh} h_{t_1} + W_{xh} x_t + b_h ]
 * @f$, and outputs @f$
 *     o_t := \tanh[ W_{ho} h_t + b_o ]
 * @f$.
 */
template 
class RNNLayer : public RecurrentLayer {
 public:
  explicit RNNLayer(const LayerParameter& param)
      : RecurrentLayer(param) {}

  virtual inline const char* type() const { return "RNN"; }

 protected:
  virtual void FillUnrolledNet(NetParameter* net_param) const;
  virtual void RecurrentInputBlobNames(vector* names) const;
  virtual void RecurrentOutputBlobNames(vector* names) const;
  virtual void OutputBlobNames(vector* names) const;
};

}  // namespace caffe

#endif  // CAFFE_SEQUENCE_LAYERS_HPP_

代码就不注释了,有问题可以留言。代码主要是在LSTM Unit前进行一些数据预处理，计算出Mask(即Attention)，这里给出attention的计算方式，方便大家理解代码。

然后把S接入softmax进行[0,1]压缩。关于tanh这个函数可以更换成其他方式。

补充：博主推了半天的维度,参考LSTM layer，测试成功。但是博主参考一篇AAAI论文对一些joint坐标进行attention，改动代码测试失败，发邮件给作者无人回复，严重怀疑造假。为什么坐标就不行呢？因为上述代码是写图片的区域的，而坐标就3个点，维度太低。

你可能感兴趣的:(lstm)

时间序列预测综述 Super_Whw 时序预测
文章目录非周期时间序列预测1.转化为监督学习数据集，使用xgboot/LSTM模型/时间卷积网络/seq2seq(attention_based_model)2.Facebook-prophet，类似于STL分解思路3.深度学习网络，结合CNN+RNN+Attention，作用各不相同互相配合参考：非周期时间序列预测1.转化为监督学习数据集，使用xgboot/LSTM模型/时间卷积网络/seq2s
深入理解GPT底层原理--从n-gram到RNN到LSTM/GRU到Transformer/GPT的进化网络安全研发随想 rnn gpt lstm
从简单的RNN到复杂的LSTM/GRU,再到引入注意力机制,研究者们一直在努力解决序列建模的核心问题。每一步的进展都为下一步的突破奠定了基础,最终孕育出了革命性的Transformer架构和GPT大模型。1.从n-gram到循环神经网络(RNN)的诞生1.1N-gram模型在深度学习兴起之前,处理序列数据主要依靠统计方法,如n-gram模型。N-gram是一种基于统计的语言模型,它的核心思想是:一
改善python程序的91建议记录(学习记录) 后端
使用else子句简化循环（异常处理）案例1执行sql异常时处理defsave(db,obj):try:#saveattr1db.execute('asqlstmt',obj.attr1)#saveattr2db.execute('anothersqlstmt',obj.attr2)exceptDBError:db.rollback()else:db.commit()案例2defprint_prim
基于Python机器学习、深度学习技术提升气象、海洋、水文领域实践应用 KY_chenzhao python 机器学习深度学习气象
1.背景与目标ENSO（ElNiño-SouthernOscillation）是全球气候系统中最显著的年际变率现象之一，对全球气候、农业、渔业等有着深远的影响。准确预测ENSO事件的发生和发展对于减灾防灾具有重要意义。近年来，深度学习技术在气象领域得到了广泛应用，其中长短期记忆网络（LSTM）因其在处理时间序列数据方面的优势，被广泛用于ENSO预测。2.数据准备数据来源包括NOAA（美国国家海洋和
深度解析：Python与TensorFlow在日平均气温预测中的应用——LSTM神经网络实战 AI_DL_CODE python 神经网络 tensorflow LSTM 气温预测 RNN
文章目录1.引言1.1研究背景与意义1.2研究目标与问题定义2.概念解析2.1Python语言简介2.2TensorFlow框架概述2.3LSTM神经网络原理3.原理详解3.1时间序列分析基础3.1.1时间序列的组成3.1.2时间序列分析方法3.2LSTM在时间序列分析中的应用3.2.1LSTM的优势3.2.2LSTM的结构3.3日平均气温预测的数学模型3.3.1ARIMA模型3.3.2LSTM模
基于LSTM的空气污染情况预测与可视化平台设计与实现 QQ346127357 javaweb lstm 人工智能 rnn
一、选题来源及意义（一）选题来源改革开放以来，中国经济取得了举世瞩目的伟大成就。随着城市化进程的加快和工业发展的加速，空气污染问题已经成为影响我国城市居民生活质量的重要因素。但与此同时，以高能耗和高排放为代价的发展模式也给中国的环境治理带来了严峻挑战[1]。空气污染不仅对环境造成破坏，还对人类健康产生严重影响[2]。给人们的生产生活带来极大的困扰。现阶段，我国城市高度重视环境污染治理工作，并采取相
ChatGPT4.0最新功能和使用技巧，助力日常生活、学习与工作！ WangYan2022 教程人工智能 chatgpt 数据分析 ai绘画 AI写作
熟练掌握ChatGPT4.0在数据分析、自动生成代码等方面的强大功能，系统学习人工智能（包括传统机器学习、深度学习等）的基础理论知识，以及具体的代码实现方法，同时掌握ChatGPT4.0在科研工作中的各种使用方法与技巧，以及人工智能领域经典机器学习算法（BP神经网络、支持向量机、决策树、随机森林、变量降维与特征选择、群优化算法等）和热门深度学习方法（卷积神经网络、迁移学习、RNN与LSTM神经网络
Pytorch详解-模型模块(RNN,CNN,FNN,LSTM,GRU,TCN,Transformer) qq742234984 rnn pytorch cnn
Pytorch详解-模型模块Module¶meterModule初认识forward函数ParameterPytorch中的权重、参数和超参数Module容器-ContainersSequentialModuleListModuleDictParameterList&ParameterDict常用网络层LSTM输入和输出GRUConvolutionalLayers卷积层的基本概念常见的卷积
厉害了，LSTM+Transformer王炸创新，精准度又高了！马拉AI LSTM transformer
【LSTM+Transformer】作为一种混合深度学习模型，近年来在学术界和工业界都受到了极大的关注。它巧妙地融合了长短期记忆网络（LSTM）在处理时序数据方面的专长和Transformer在捕捉长距离依赖关系上的优势，从而在文本生成、机器翻译、时间序列预测等多个领域取得了突破性的进展。这种创新的结合不仅提升了模型的预测精度，还优化了性能和训练效率，使其在序列分析任务中展现出卓越的能力。例如，最
使用神经网络拟合6项参数 Andrew_Xzw 神经网络人工智能深度学习开发语言机器学习 python
使用神经网络拟合6项参数1.数据预处理1.1添加参数解析1.2数据预处理逻辑1.3数据归一化及划分1.4数据标签处理逻辑1.5数据转torch2.定义model2.1CNN_LSTM2.2Transformer3.定义train脚本3.1loss和optimizer3.2train3.3predict1.数据预处理1.1添加参数解析为了方便管理模型和训练等参数，统一用参数解析。defparse_a
Pytorch实现：LSTM-火灾温度预测骑猪玩狗 pytorch lstm 人工智能
本文为365天深度学习训练营中的学习记录博客原作者：K同学啊前期工作语言环境：Python3.9.18编译器：JupyterLab深度学习环境：Pytorch1.12.11.设置GPUimporttorchimporttorch.nnasnnimporttorchvisionfromtorchvisionimporttransforms,datasetsimportos,PIL,pathlibde
深度学习项目--基于LSTM的火灾预测研究(pytorch实现) 羊小猪~~ RNN LSTM神经网络案例机器学习/数据分析案例深度学习 lstm pytorch 人工智能机器学习 rnn gru
本文为365天深度学习训练营中的学习记录博客原作者：K同学啊前言LSTM模型一直是一个很经典的模型，这个模型当然也很复杂，一般需要先学习RNN、GRU模型之后再学，GRU、LSTM的模型讲解将在这两天发布更新，其中：深度学习基础–一文搞懂RNN深度学习基础–GRU学习笔记(李沐《动手学习深度学习》)这一篇：是基于LSTM模型火灾预测研究，讲述了如何构建时间数据、模型如何构建、pytorch中LST
深度学习驱动的极端天气预测：时空数据异常检测与应用全解析（基于Python + TensorFlow） AI_DL_CODE 深度学习 python tensorflow 人工智能天气预测
摘要：时空数据异常检测在气象领域识别偏离正常模式的数据点，对极端天气预测至关重要。深度学习，尤其是LSTM网络，因其强大的特征学习能力在该领域显示出巨大潜力。通过整合多源气象数据，深度学习模型能够自动挖掘复杂模式和非线性关系，提高预测准确性。然而，挑战依然存在，包括数据质量问题、模型可解释性不足以及极端天气的内在复杂性和不确定性。未来，通过模型架构创新、训练算法优化以及探索深度学习在气候预测、气象
Time-LLM ：超越了现有时间序列预测模型的学习器福安德信息科技 AI预测大模型学习人工智能 python 大模型时序预测
AI预测相关目录AI预测流程，包括ETL、算法策略、算法模型、模型评估、可视化等相关内容最好有基础的python算法预测经验EEMD策略及踩坑VMD-CNN-LSTM时序预测对双向LSTM等模型添加自注意力机制K折叠交叉验证optuna超参数优化框架多任务学习-模型融合策略Transformer模型及Paddle实现迁移学习在预测任务上的tensoflow2.0实现holt提取时序序列特征TCN时
【机器学习】—时序数据分析：机器学习与深度学习在预测、金融、气象等领域的应用云边有个稻草人热门文章机器学习数据分析深度学习笔记
云边有个稻草人-CSDN博客目录引言1.时序数据分析基础1.1时序数据的特点1.2时序数据分析的常见方法2.深度学习与时序数据分析2.1深度学习在时序数据分析中的应用2.1.1LSTM（长短期记忆网络）2.2深度学习在金融市场预测中的应用2.2.1股票市场预测2.3深度学习在设备故障检测中的应用3.强化学习与时序数据分析3.1强化学习的基本概念3.2强化学习在金融市场中的应用3.3强化学习在设备故
深度学习每周学习总结R4（LSTM-实现糖尿病探索与预测）大地之灯每周深度学习总结深度学习学习 lstm 人工智能算法
本文为365天深度学习训练营中的学习记录博客R6中的内容，为了便于自己整理总结起名为R4原作者：K同学啊|接辅导、项目定制目录0.总结1.LSTM介绍LSTM的基本组成部分如何理解与应用LSTM2.数据预处理3.数据集构建4.定义模型5.初始化模型及优化器6.训练函数7.测试函数8.训练过程9.模型评估0.总结数据导入及处理部分：在PyTorch中，我们通常先将NumPy数组转换为torch.Te
数据分析-24-时间序列预测之基于keras的VMD-LSTM和VMD-CNN-LSTM预测风速皮皮冰燃数据分析数据分析
文章目录1普通的LSTM模型1.1数据重采样1.2数据标准化1.3切分窗口1.4划分数据集1.5建立模型1.6预测效果2VMD-LSTM模型2.1VMD分解时间序列2.2对每一个IMF建立LSTM模型2.2.1IMF1—LSTM2.2.2IMF2-LSTM2.2.3统一代码2.3评估效果3CNN-LSTM模型3.1数据预处理3.2建立模型3.3效果预测4VMD-CNN-LSTM模型4.1VMD分解
【NLP5-RNN模型、LSTM模型和GRU模型】一蓑烟雨紫洛 nlp rnn lstm gru nlp
RNN模型、LSTM模型和GRU模型1、什么是RNN模型RNN（RecurrentNeuralNetwork)中文称为循环神经网络，它一般以序列数据为输入，通过网络内部的结构设计有效捕捉序列之间的关系特征，一般也是以序列形式进行输出RNN的循环机制使模型隐层上一时间步产生的结果，能够作为当下时间步输入的一部分（当下时间步的输入除了正常的输入外还包括上一步的隐层输出）对当下时间步的输出产生影响2、R
探索深度学习的奥秘：从理论到实践的奇幻之旅小周不想卷深度学习
目录引言：穿越智能的迷雾一、深度学习的奇幻起源：从感知机到神经网络1.1感知机的启蒙1.2神经网络的诞生与演进1.3深度学习的崛起二、深度学习的核心魔法：神经网络架构2.1前馈神经网络（FeedforwardNeuralNetwork,FNN）2.2卷积神经网络（CNN）2.3循环神经网络（RNN）及其变体（LSTM,GRU）2.4生成对抗网络（GAN）三、深度学习的魔法秘籍：算法与训练3.1损失
一维数组 list 呢，怎么转换成 (批次句子长度特征值 )三维向量 python pytorch lstm 编程人工智能 zhangfeng1133 python pytorch 人工智能数据挖掘
一、介绍对于一维数组，如果你想将其转换成适合深度学习模型（如LSTM）输入的格式，你需要考虑将其扩展为三维张量。这通常涉及到批次大小（batchsize）、序列长度（sequencelength）和特征数量（numberoffeatures）的维度。以下是如何将一维数组转换为这种格式的步骤：###1.确定维度-**批次大小（BatchSize）**：这是你一次处理的样本数量。-**序列长度（Seq
使用LSTM（长短期记忆网络）模型预测股票价格的实例分析 eeee~~ 深度学习 lstm 人工智能 rnn 金融 python 神经网络
一：LSTM与RNN的区别LSTM（LongShort-TermMemory）是一种特殊的循环神经网络（RNN）架构。LSTM是为了解决传统RNN在处理长序列数据时遇到的梯度消失或梯度爆炸问题而设计的。在传统的RNN中，信息通过隐藏状态在时间步之间传递，但由于权重的重复应用，随着时间的推移，梯度可能会迅速减小或增大，导致网络难以学习长期依赖关系。LSTM通过引入了一种称为“门”（gates）的机制
《自然语言处理 Transformer 模型详解》黑色叉腰丶大魔王自然语言处理 transformer 人工智能
一、引言在自然语言处理领域，Transformer模型的出现是一个重大的突破。它摒弃了传统的循环神经网络（RNN）和卷积神经网络（CNN）架构，完全基于注意力机制，在机器翻译、文本生成、问答系统等众多任务中取得了卓越的性能。本文将深入讲解Transformer模型的原理、结构和应用。二、Transformer模型的背景在Transformer出现之前，RNN及其变体（如LSTM和GRU）是自然语言
深度学习特征提取魔改版太强了！发文香饽饽！深度之眼深度学习干货人工智能干货人工智能深度学习机器学习论文特征提取
要说CV领域经久不衰的研究热点，特征提取可以占一席，毕竟SLAM、三维重建等重要应用的底层都离不开它。再加上近几年深度学习兴起，用深度学习做特征提取逐渐成了主流，比传统算法无论是性能、准确性还是效率都更胜一筹。目前比较常见的深度学习特征提取方法有基于transformer、基于CNN、基于LSTM以及基于GAN，都发展的比较成熟。但为了追求更快速、准确、鲁棒的特征点提取，研究者们开始致力于改进深度
预训练语言模型的前世今生 - 从Word Embedding到BERT 脚步的影子语言模型 embedding bert
目录一、预训练1.1图像领域的预训练1.2预训练的思想二、语言模型2.1统计语言模型2.2神经网络语言模型三、词向量3.1独热（Onehot）编码3.2WordEmbedding四、Word2Vec模型五、自然语言处理的预训练模型六、RNN和LSTM6.1RNN6.2RNN的梯度消失问题6.3LSTM6.4LSTM解决RNN的梯度消失问题七、ELMo模型7.1ELMo的预训练7.2ELMo的Fea
第R3周：天气预测 Jessica2017lj python
本文为[365天深度学习训练营]中的学习记录博客参考文章：第R3周：LSTM-火灾温度预测（训练营内部可读）作者：[K同学啊]任务说明：该数据集提供了来自澳大利亚许多地点的大约10年的每日天气观测数据。你需要做的是根据这些数据对RainTomorrow进行一个预测，这次任务任务与以往的不同，我增加了探索式数据分析（EDA），希望这部分内容可以帮助到大家。我的环境：●语言环境：Python3.8●编
Python知识点：如何使用Python进行时间序列预测杰哥在此 Python系列 python 开发语言编程面试
使用Python进行时间序列预测是一个非常常见的任务，可以应用于各种领域，如金融市场预测、销售量预测、天气预报等。时间序列预测的方法有很多，包括统计方法（如ARIMA模型）、机器学习方法（如支持向量机、决策树）、以及深度学习方法（如LSTM网络）。下面是一个简单的时间序列预测流程示例，使用Python和pandas、numpy、以及statsmodels库来实现ARIMA模型的时间序列预测。1.导
深度学习基础之循环神经网络 Ctrl+CV九段手机器学习和深度学习 rnn 深度学习神经网络人工智能机器学习学习
目录基本概念与特点定义与工作原理结构组成应用领域自然语言处理语音识别时间序列分析优缺点优点缺点改进方法总结循环神经网络在自然语言处理中的最新应用和研究进展是什么？长短期记忆网络（LSTM）与门控循环单元（GRU）在解决梯度消失和爆炸问题上的具体差异和优势是什么？LSTM的结构与优势GRU的结构与优势具体差异门的数量：计算复杂度：性能对比：总结双向循环神经网络如何增强模型的上下文捕捉能力，与单向RN
02 使用 LSTM 进行时间序列预测柒魅。时间序列预测 lstm 人工智能 rnn
深度学习入门：使用LSTM进行时间序列预测引言深度学习在时间序列预测中展现出了强大的能力，尤其是长短期记忆（LSTM）网络。本文将为深度学习初学者介绍如何使用LSTM网络进行时间序列预测。我们将从基础知识讲起，提供代码示例，并解释每一步的技术细节。希望通过本文，大家能对LSTM有一个初步的了解，并能够在自己的项目中应用。1.什么是LSTM？LSTM（长短期记忆网络）是一种特殊的递归神经网络（RNN
计算机毕业设计hadoop+spark知识图谱房源推荐系统房价预测系统房源数据分析房源可视化房源大数据大屏大数据毕业设计机器学习计算机毕业设计大全
创新点：1.支付宝沙箱支付2.支付邮箱通知(JavaMail)3.短信验证码修改密码4.知识图谱5.四种推荐算法(协同过滤基于用户、物品、SVD混合神经网络、MLP深度学习模型)6.线性回归算法预测房价7.Python爬虫采集链家数据8.AI短信识别9.百度地图API10.lstm情感分析11.spark大屏可视化开发技术：springbootvue.jspythonechartssparkmys
Python深度学习（使用 LSTM 生成文本）--学习笔记（十八）呆萌的小透明深度学习神经网络深度学习
第8章生成式深度学习人工智能模拟人类思维过程的可能性，并不局限于被动性任务（比如目标识别）和大多数反应性任务（比如驾驶汽车），它还包括创造性活动。的确，到目前为止，我们见到的人工智能艺术作品的水平还很低。人工智能还远远比不上人类编剧、画家和作曲家。但是，替代人类始终都不是我们要谈论的主题，人工智能不会替代我们自己的智能，而是会为我们的生活和工作带来更多的智能，即另一种类型的智能。在许多领域，特别是
windows下源码安装golang 616050468 golang安装 golang环境 windows
系统： 64位win7，开发环境：sublime text 2， go版本： 1.4.1 1. 安装前准备(gcc, gdb, git) golang在64位系
redis批量删除带空格的key bylijinnan redis
redis批量删除的通常做法： redis-cli keys "blacklist*" | xargs redis-cli del 上面的命令在key的前后没有空格时是可以的，但有空格就不行了： $redis-cli keys "blacklist*" 1) "blacklist:12: [email protected]
oracle正则表达式的用法 0624chenhong oracle 正则表达式
方括号表达示方括号表达式描述 [[:alnum:]] 字母和数字混合的字符 [[:alpha:]] 字母字符 [[:cntrl:]] 控制字符 [[:digit:]] 数字字符 [[:graph:]] 图像字符 [[:lower:]] 小写字母字符 [[:print:]] 打印字符 [[:punct：]] 标点符号字符 [[:space:]]
2048源码(核心算法有，缺少几个anctionbar，以后补上) 不懂事的小屁孩 2048
2048游戏基本上有四部分组成， 1：主activity，包含游戏块的16个方格，上面统计分数的模块 2：底下的gridview，监听上下左右的滑动，进行事件处理， 3：每一个卡片，里面的内容很简单，只有一个text，记录显示的数字 4：Actionbar，是游戏用重新开始，设置等功能(这个在底下可以下载的代码里面还没有实现) 写代码的流程 1：设计游戏的布局，基本是两块，上面是分
jquery内部链式调用机理换个号韩国红果果 JavaScript jquery
只需要在调用该对象合适(比如下列的setStyles)的方法后让该方法返回该对象（通过this 因为一旦一个函数称为一个对象方法的话那么在这个方法内部this（结合下面的setStyles）指向这个对象） function create(type){ var element=document.createElement(type); //this=element;
你订酒店时的每一次点击背后都是NoSQL和云计算蓝儿唯美 NoSQL
全球最大的在线旅游公司Expedia旗下的酒店预订公司，它运营着89个网站，跨越68个国家，三年前开始实验公有云，以求让客户在预订网站上查询假期酒店时得到更快的信息获取体验。云端本身是用于驱动网站的部分小功能的，如搜索框的自动推荐功能，还能保证处理Hotels.com服务的季节性需求高峰整体储能。 Hotels.com的首席技术官Thierry Bedos上个月在伦敦参加“2015 Clou
java笔记1 a-john java
1，面向对象程序设计（Object-oriented Propramming，OOP）：java就是一种面向对象程序设计。 2，对象：我们将问题空间中的元素及其在解空间中的表示称为“对象”。简单来说，对象是某个类型的实例。比如狗是一个类型，哈士奇可以是狗的一个实例，也就是对象。 3，面向对象程序设计方式的特性： 3.1 万物皆为对象。
C语言 sizeof和strlen之间的那些事 C/C++软件开发求职面试题必备考点（一） aijuans C/C++求职面试必备考点
找工作在即，以后决定每天至少写一个知识点，主要是记录，逼迫自己动手、总结加深印象。当然如果能有一言半语让他人收益，后学幸运之至也。如有错误，还希望大家帮忙指出来。感激不尽。后学保证每个写出来的结果都是自己在电脑上亲自跑过的，咱人笨，以前学的也半吊子。很多时候只能靠运行出来的结果再反过来
程序员写代码时就不要管需求了吗？ asia007 程序员不能一味跟需求走
编程也有2年了，刚开始不懂的什么都跟需求走，需求是怎样就用代码实现就行，也不管这个需求是否合理，是否为较好的用户体验。当然刚开始编程都会这样，但是如果有了2年以上的工作经验的程序员只知道一味写代码，而不在写的过程中思考一下这个需求是否合理，那么，我想这个程序员就只能一辈写敲敲代码了。我的技术不是很好，但是就不代
Activity的四种启动模式百合不是茶 android 栈模式启动 Activity的标准模式启动栈顶模式启动单例模式启动
android界面的操作就是很多个activity之间的切换,启动模式决定启动的activity的生命周期 ; 启动模式xml中配置 <activity android:name=".MainActivity" android:launchMode="standard&quo
Spring中@Autowired标签与@Resource标签的区别 bijian1013 java spring @Resource @Autowired @Qualifier
Spring不但支持自己定义的@Autowired注解，还支持由JSR-250规范定义的几个注解，如：@Resource、 @PostConstruct及@PreDestroy。 1. @Autowired @Autowired是Spring 提供的，需导入 Package:org.springframewo
Changes Between SOAP 1.1 and SOAP 1.2 sunjing Changes Enable SOAP 1.1 SOAP 1.2
JAX-WS SOAP Version 1.2 Part 0: Primer (Second Edition) SOAP Version 1.2 Part 1: Messaging Framework (Second Edition) SOAP Version 1.2 Part 2: Adjuncts (Second Edition) Which style of WSDL
【Hadoop二】Hadoop常用命令 bit1129 hadoop
以Hadoop运行Hadoop自带的wordcount为例， hadoop脚本位于/home/hadoop/hadoop-2.5.2/bin/hadoop，需要说明的是，这些命令的使用必须在Hadoop已经运行的情况下才能执行 Hadoop HDFS相关命令 hadoop fs -ls 列出HDFS文件系统的第一级文件和第一级
java异常处理（初级）白糖_ java DAO spring 虚拟机 Ajax
从学习到现在从事java开发一年多了，个人觉得对java只了解皮毛，很多东西都是用到再去慢慢学习，编程真的是一项艺术，要完成一段好的代码，需要懂得很多。最近项目经理让我负责一个组件开发，框架都由自己搭建，最让我头疼的是异常处理，我看了一些网上的源码，发现他们对异常的处理不是很重视，研究了很久都没有找到很好的解决方案。后来有幸看到一个200W美元的项目部分源码，通过他们对异常处理的解决方案，我终
记录整理-工作问题 braveCS 工作
1）那位同学还是CSV文件默认Excel打开看不到全部结果。以为是没写进去。同学甲说文件应该不分大小。后来log一下原来是有写进去。只是Excel有行数限制。那位同学进步好快啊。 2）今天同学说写文件的时候提示jvm的内存溢出。我马上反应说那就改一下jvm的内存大小。同学说改用分批处理了。果然想问题还是有局限性。改jvm内存大小只能暂时地解决问题，以后要是写更大的文件还是得改内存。想问题要长远啊
org.apache.tools.zip实现文件的压缩和解压，支持中文 bylijinnan apache
刚开始用java.util.Zip，发现不支持中文（网上有修改的方法，但比较麻烦）后改用org.apache.tools.zip org.apache.tools.zip的使用网上有更简单的例子下面的程序根据实际需求，实现了压缩指定目录下指定文件的方法 import java.io.BufferedReader; import java.io.BufferedWrit
读书笔记-4 chengxuyuancsdn 读书笔记
1、JSTL 核心标签库标签 2、避免SQL注入 3、字符串逆转方法 4、字符串比较compareTo 5、字符串替换replace 6、分拆字符串 1、JSTL 核心标签库标签共有13个，学习资料：http://www.cnblogs.com/lihuiyy/archive/2012/02/24/2366806.html 功能上分为4类： (1)表达式控制标签：out
[物理与电子]半导体教材的一个小问题 comsci 问题
各种模拟电子和数字电子教材中都有这个词汇-空穴书中对这个词汇的解释是; 当电子脱离共价键的束缚成为自由电子之后,共价键中就留下一个空位,这个空位叫做空穴我现在回过头翻大学时候的教材,觉得这个
Flashback Database --闪回数据库 daizj oracle 闪回数据库
Flashback 技术是以Undo segment中的内容为基础的，因此受限于UNDO_RETENTON参数。要使用flashback 的特性，必须启用自动撤销管理表空间。在Oracle 10g中， Flash back家族分为以下成员： Flashback Database， Flashback Drop，Flashback Query(分Flashback Query,Flashbac
简单排序:插入排序 dieslrae 插入排序
public void insertSort(int[] array){ int temp; for(int i=1;i<array.length;i++){ temp = array[i]; for(int k=i-1;k>=0;k--)
C语言学习六指针小示例、一维数组名含义，定义一个函数输出数组的内容 dcj3sjt126com c
# include <stdio.h> int main(void) { int * p; //等价于 int *p 也等价于 int* p; int i = 5; char ch = 'A'; //p = 5; //error //p = &ch; //error //p = ch; //error p = &i; //
centos下php redis扩展的安装配置3种方法 dcj3sjt126com redis
方法一 1.下载php redis扩展包代码如下复制代码 #wget http://redis.googlecode.com/files/redis-2.4.4.tar.gz 2 tar -zxvf 解压压缩包，cd /扩展包（进入扩展包然后运行phpize 一下是我环境中phpize的目录，/usr/local/php/bin/phpize (一定要
线程池(Executors) shuizhaosi888 线程池
在java类库中，任务执行的主要抽象不是Thread，而是Executor，将任务的提交过程和执行过程解耦 public interface Executor { void execute(Runnable command); } public class RunMain implements Executor{ @Override pub
openstack 快速安装笔记 haoningabc openstack
前提是要配置好yum源版本icehouse，操作系统redhat6.5 最简化安装，不要cinder和swift 三个节点 172 control节点keystone glance horizon 173 compute节点nova 173 network节点neutron control /etc/sysctl.conf net.ipv4.ip_forward =
从c面向对象的实现理解c++的对象（二） jimmee C++面向对象虚函数
1. 类就可以看作一个struct，类的方法，可以理解为通过函数指针的方式实现的，类对象分配内存时，只分配成员变量的，函数指针并不需要分配额外的内存保存地址。 2. c++中类的构造函数，就是进行内存分配(malloc)，调用构造函数 3. c++中类的析构函数，就时回收内存(free) 4. c++是基于栈和全局数据分配内存的，如果是一个方法内创建的对象，就直接在栈上分配内存了。专门在
如何让那个一个div可以拖动 lingfeng520240 html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml
第10章高级事件（中） onestopweb 事件
index.html <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/
计算两个经纬度之间的距离 roadrunners 计算纬度 LBS 经度距离
要解决这个问题的时候，到网上查了很多方案，最后计算出来的都与百度计算出来的有出入。下面这个公式计算出来的距离和百度计算出来的距离是一致的。 /** * * @param longitudeA * 经度A点 * @param latitudeA * 纬度A点 * @param longitudeB *
最具争议的10个Java话题 tomcat_oracle java
1、Java8已经到来。什么！？ Java8 支持lambda。哇哦，RIP Scala！　　随着Java8 的发布，出现很多关于新发布的Java8是否有潜力干掉Scala的争论，最终的结论是远远没有那么简单。Java8可能已经在Scala的lambda的包围中突围，但Java并非是函数式编程王位的真正觊觎者。　　2、Java 9 即将到来　　 Oracle早在8月份就发布
zoj 3826 Hierarchical Notation(模拟) 阿尔萨斯 rar
题目链接：zoj 3826 Hierarchical Notation 题目大意：给定一些结构体，结构体有value值和key值，Q次询问，输出每个key值对应的value值。解题思路：思路很简单，写个类词法的递归函数，每次将key值映射成一个hash值，用map映射每个key的value起始终止位置，预处理完了查询就很简单了。这题是最后10分钟出的，因为没有考虑value为{}的情