呵呵哈哈哈白

cs231n assignment3

前一段时间把公开课cs231n看完，然后这里分享下assignment3的代码，水平有限，如有疏漏之处请见谅。assignment3主要内容包括Image Captioning和深度网络可视化。对于Image Captioning，已经预先提取了图像特征不需要自己做，只需要自己实现RNN，LSTM就行，下面是各个作业的代码（写的有点烂，各位凑合着看吧）。

Image Captioning with Vanilla RNNs and Image Captioning with LSTMs

实现Vanilla RNN和lstm

rnn_layers.py

import numpy as np


"""
This file defines layer types that are commonly used for recurrent neural
networks.
"""


def rnn_step_forward(x, prev_h, Wx, Wh, b):
  """
  Run the forward pass for a single timestep of a vanilla RNN that uses a tanh
  activation function.

  The input data has dimension D, the hidden state has dimension H, and we use
  a minibatch size of N.

  Inputs:
  - x: Input data for this timestep, of shape (N, D).
  - prev_h: Hidden state from previous timestep, of shape (N, H)
  - Wx: Weight matrix for input-to-hidden connections, of shape (D, H)
  - Wh: Weight matrix for hidden-to-hidden connections, of shape (H, H)
  - b: Biases of shape (H,)

  Returns a tuple of:
  - next_h: Next hidden state, of shape (N, H)
  - cache: Tuple of values needed for the backward pass.
  """
  next_h, cache = None, None
  ##############################################################################
  # TODO: Implement a single forward step for the vanilla RNN. Store the next  #
  # hidden state and any values you need for the backward pass in the next_h   #
  # and cache variables respectively.                                          #
  ##############################################################################
  N, D = x.shape
  _, H = Wx.shape
  next_h = np.dot(x, Wx) + np.dot(prev_h, Wh) + b
  next_h = np.tanh(next_h)
  cache = (x, prev_h, Wx, Wh, next_h)
  ##############################################################################
  #                               END OF YOUR CODE                             #
  ##############################################################################
  return next_h, cache


def rnn_step_backward(dnext_h, cache):
  """
  Backward pass for a single timestep of a vanilla RNN.

  Inputs:
  - dnext_h: Gradient of loss with respect to next hidden state
  - cache: Cache object from the forward pass

  Returns a tuple of:
  - dx: Gradients of input data, of shape (N, D)
  - dprev_h: Gradients of previous hidden state, of shape (N, H)
  - dWx: Gradients of input-to-hidden weights, of shape (D, H)
  - dWh: Gradients of hidden-to-hidden weights, of shape (H, H)
  - db: Gradients of bias vector, of shape (H,)
  """
  dx, dprev_h, dWx, dWh, db = None, None, None, None, None
  ##############################################################################
  # TODO: Implement the backward pass for a single step of a vanilla RNN.      #
  #                                                                            #
  # HINT: For the tanh function, you can compute the local derivative in terms #
  # of the output value from tanh.                                             #
  ##############################################################################
  (x, prev_h, Wx, Wh, next_h) = cache
  dtanh = (1 - next_h * next_h) * dnext_h
  db = np.sum(dtanh, axis = 0)
  dWx = np.dot(x.T, dtanh)
  dWh = np.dot(prev_h.T, dtanh)
  dprev_h = np.dot(dtanh, Wh.T)
  dx = np.dot(dtanh, Wx.T)
  ##############################################################################
  #                               END OF YOUR CODE                             #
  ##############################################################################
  return dx, dprev_h, dWx, dWh, db


def rnn_forward(x, h0, Wx, Wh, b):
  """
  Run a vanilla RNN forward on an entire sequence of data. We assume an input
  sequence composed of T vectors, each of dimension D. The RNN uses a hidden
  size of H, and we work over a minibatch containing N sequences. After running
  the RNN forward, we return the hidden states for all timesteps.

  Inputs:
  - x: Input data for the entire timeseries, of shape (N, T, D).
  - h0: Initial hidden state, of shape (N, H)
  - Wx: Weight matrix for input-to-hidden connections, of shape (D, H)
  - Wh: Weight matrix for hidden-to-hidden connections, of shape (H, H)
  - b: Biases of shape (H,)

  Returns a tuple of:
  - h: Hidden states for the entire timeseries, of shape (N, T, H).
  - cache: Values needed in the backward pass
  """
  h, cache = None, None
  ##############################################################################
  # TODO: Implement forward pass for a vanilla RNN running on a sequence of    #
  # input data. You should use the rnn_step_forward function that you defined  #
  # above.                                                                     #
  ##############################################################################
  N, T, D = x.shape
  _, H = Wh.shape
  cache = []
  h = np.zeros((N, T, H))
  for i in np.arange(T):
      h0, tmp = rnn_step_forward(x[:,i,:], h0, Wx, Wh, b)
      cache.append(tmp)
      h[:, i, :] = h0      

  ##############################################################################
  #                               END OF YOUR CODE                             #
  ##############################################################################
  return h, cache


def rnn_backward(dh, cache):
  """
  Compute the backward pass for a vanilla RNN over an entire sequence of data.

  Inputs:
  - dh: Upstream gradients of all hidden states, of shape (N, T, H)

  Returns a tuple of:
  - dx: Gradient of inputs, of shape (N, T, D)
  - dh0: Gradient of initial hidden state, of shape (N, H)
  - dWx: Gradient of input-to-hidden weights, of shape (D, H)
  - dWh: Gradient of hidden-to-hidden weights, of shape (H, H)
  - db: Gradient of biases, of shape (H,)
  """
  dx, dh0, dWx, dWh, db = None, None, None, None, None
  ##############################################################################
  # TODO: Implement the backward pass for a vanilla RNN running an entire      #
  # sequence of data. You should use the rnn_step_backward function that you   #
  # defined above.                                                             #
  ##############################################################################
  N, T, H = dh.shape
  _, _, W, _, _ = cache[0]
  D = W.shape[0] 
  dx = np.zeros((N, T, D))
  dh0 = np.zeros((N, H))
  dWx = np.zeros((D, H))
  dWh = np.zeros((H, H))
  db = np.zeros(H)

  dh_prev = np.zeros((N, H))
  for i in reversed(np.arange(T)):
      dh_cur = dh_prev + dh[:, i, :] #copy!!!
      dx[:, i, :], dh_prev, tdWx, tdWh, tdb = rnn_step_backward(dh_cur, cache[i])      
      dWx += tdWx
      dWh += tdWh
      db += tdb

  dh0 = dh_prev
  ##############################################################################
  #                               END OF YOUR CODE                             #
  ##############################################################################
  return dx, dh0, dWx, dWh, db


def word_embedding_forward(x, W):
  """
  Forward pass for word embeddings. We operate on minibatches of size N where
  each sequence has length T. We assume a vocabulary of V words, assigning each
  to a vector of dimension D.

  Inputs:
  - x: Integer array of shape (N, T) giving indices of words. Each element idx
    of x muxt be in the range 0 <= idx < V.
  - W: Weight matrix of shape (V, D) giving word vectors for all words.

  Returns a tuple of:
  - out: Array of shape (N, T, D) giving word vectors for all input words.
  - cache: Values needed for the backward pass
  """
  out, cache = None, None
  ##############################################################################
  # TODO: Implement the forward pass for word embeddings.                      #
  #                                                                            #
  # HINT: This should be very simple.                                          #
  ##############################################################################
  #print x.shape
  N, T = x.shape
  V, D = W.shape
  out = np.zeros((N, T, D))
  for i in np.arange(N):
      for j in np.arange(T):
          out[i, j, :] = W[int(x[i, j]), :]

  cache = (x, W)
  ##############################################################################
  #                               END OF YOUR CODE                             #
  ##############################################################################
  return out, cache


def word_embedding_backward(dout, cache):
  """
  Backward pass for word embeddings. We cannot back-propagate into the words
  since they are integers, so we only return gradient for the word embedding
  matrix.

  HINT: Look up the function np.add.at

  Inputs:
  - dout: Upstream gradients of shape (N, T, D)
  - cache: Values from the forward pass

  Returns:
  - dW: Gradient of word embedding matrix, of shape (V, D).
  """
  dW = None
  ##############################################################################
  # TODO: Implement the backward pass for word embeddings.                     #
  #                                                                            #
  # HINT: Look up the function np.add.at                                       #
  ##############################################################################
  N, T, D = dout.shape
  (x, W) = cache
  V, _ = W.shape
  dW = np.zeros((V, D))

  for i in np.arange(N):
      for j in np.arange(T):
          dW[x[i, j], :] += dout[i, j, :]


  ##############################################################################
  #                               END OF YOUR CODE                             #
  ##############################################################################
  return dW


def sigmoid(x):
  """
  A numerically stable version of the logistic sigmoid function.
  """
  pos_mask = (x >= 0)
  neg_mask = (x < 0)
  z = np.zeros_like(x)
  z[pos_mask] = np.exp(-x[pos_mask])
  z[neg_mask] = np.exp(x[neg_mask])
  top = np.ones_like(x)
  top[neg_mask] = z[neg_mask]
  return top / (1 + z)


def lstm_step_forward(x, prev_h, prev_c, Wx, Wh, b):
  """
  Forward pass for a single timestep of an LSTM.

  The input data has dimension D, the hidden state has dimension H, and we use
  a minibatch size of N.

  Inputs:
  - x: Input data, of shape (N, D)
  - prev_h: Previous hidden state, of shape (N, H)
  - prev_c: previous cell state, of shape (N, H)
  - Wx: Input-to-hidden weights, of shape (D, 4H)
  - Wh: Hidden-to-hidden weights, of shape (H, 4H)
  - b: Biases, of shape (4H,)

  Returns a tuple of:
  - next_h: Next hidden state, of shape (N, H)
  - next_c: Next cell state, of shape (N, H)
  - cache: Tuple of values needed for backward pass.
  """
  next_h, next_c, cache = None, None, None
  #############################################################################
  # TODO: Implement the forward pass for a single timestep of an LSTM.        #
  # You may want to use the numerically stable sigmoid implementation above.  #
  #############################################################################
  N, D = x.shape
  _, H = prev_h.shape

  a = np.dot(x, Wx) + np.dot(prev_h, Wh) + b

  i = sigmoid(a[:, :H])
  f = sigmoid(a[:, H:2*H])
  o = sigmoid(a[:, 2*H:3*H])
  g = np.tanh(a[:, 3*H:])

  next_c = f * prev_c + i * g
  tanh_c = np.tanh(next_c)
  next_h = o * tanh_c

  cache = (a, i, f, o, g, tanh_c, Wx, Wh, b, x, prev_c, prev_h)
  ##############################################################################
  #                               END OF YOUR CODE                             #
  ##############################################################################

  return next_h, next_c, cache


def lstm_step_backward(dnext_h, dnext_c, cache):
  """
  Backward pass for a single timestep of an LSTM.

  Inputs:
  - dnext_h: Gradients of next hidden state, of shape (N, H)
  - dnext_c: Gradients of next cell state, of shape (N, H)
  - cache: Values from the forward pass

  Returns a tuple of:
  - dx: Gradient of input data, of shape (N, D)
  - dprev_h: Gradient of previous hidden state, of shape (N, H)
  - dprev_c: Gradient of previous cell state, of shape (N, H)
  - dWx: Gradient of input-to-hidden weights, of shape (D, 4H)
  - dWh: Gradient of hidden-to-hidden weights, of shape (H, 4H)
  - db: Gradient of biases, of shape (4H,)
  """
  dx, dprev_h, dprev_c, dWx, dWh, db = None, None, None, None, None, None
  #############################################################################
  # TODO: Implement the backward pass for a single timestep of an LSTM.       #
  #                                                                           #
  # HINT: For sigmoid and tanh you can compute local derivatives in terms of  #
  # the output value from the nonlinearity.                                   #
  #############################################################################

  (a, i, f, o, g, tanh_c, Wx, Wh, b, x, prev_c, prev_h) = cache
  N, D = x.shape 
  _, H = prev_h.shape

  dx = np.zeros_like(a)

  do = dnext_h * tanh_c # N x H
  dtanh_c = dnext_h * o # N x H
  dc = dtanh_c * (1 - tanh_c * tanh_c) # N x H
  #print dc.shape
  #print dnext_c.shape
  dc = dc + dnext_c # N x H ??????

  df = dc * prev_c # N x H
  dprev_c = dc * f # N x H
  di = dc * g # N x H
  dg = dc * i # N x H

  da = np.zeros_like(a)
  da[:, :H] = di * (1 - i) * i
  da[:, H:2*H] = df * (1 - f) * f
  da[:, 2*H:3*H] = do * (1 - o) * o
  da[:, 3*H:] = dg * (1 - g * g )

  #da = np.array([da_1,da_2:da_3:da_4]) # N x 4H

  #print da.shape
  db = np.sum(da, axis = 0)
  dWx = np.dot(x.T, da )
  dWh = np.dot(prev_h.T, da)
  dx = np.dot(da, Wx.T)
  dprev_h = np.dot(da, Wh.T)
  ##############################################################################
  #                               END OF YOUR CODE                             #
  ##############################################################################

  return dx, dprev_h, dprev_c, dWx, dWh, db


def lstm_forward(x, h0, Wx, Wh, b):
  """
  Forward pass for an LSTM over an entire sequence of data. We assume an input
  sequence composed of T vectors, each of dimension D. The LSTM uses a hidden
  size of H, and we work over a minibatch containing N sequences. After running
  the LSTM forward, we return the hidden states for all timesteps.

  Note that the initial cell state is passed as input, but the initial cell
  state is set to zero. Also note that the cell state is not returned; it is
  an internal variable to the LSTM and is not accessed from outside.

  Inputs:
  - x: Input data of shape (N, T, D)
  - h0: Initial hidden state of shape (N, H)
  - Wx: Weights for input-to-hidden connections, of shape (D, 4H)
  - Wh: Weights for hidden-to-hidden connections, of shape (H, 4H)
  - b: Biases of shape (4H,)

  Returns a tuple of:
  - h: Hidden states for all timesteps of all sequences, of shape (N, T, H)
  - cache: Values needed for the backward pass.
  """
  h, cache = None, None
  #############################################################################
  # TODO: Implement the forward pass for an LSTM over an entire timeseries.   #
  # You should use the lstm_step_forward function that you just defined.      #
  #############################################################################
  N, T, D = x.shape
  _, H = h0.shape

  cache = []
  h = np.zeros((N, T, H))
  c0 = np.zeros((N, H))
  for i in np.arange(T):
      h0, c0, tmp = lstm_step_forward(x[:, i, :], h0, c0, Wx, Wh, b)
      h[:, i, :] = h0
      cache.append(tmp)
  ##############################################################################
  #                               END OF YOUR CODE                             #
  ##############################################################################

  return h, cache


def lstm_backward(dh, cache):
  """
  Backward pass for an LSTM over an entire sequence of data.]

  Inputs:
  - dh: Upstream gradients of hidden states, of shape (N, T, H)
  - cache: Values from the forward pass

  Returns a tuple of:
  - dx: Gradient of input data of shape (N, T, D)
  - dh0: Gradient of initial hidden state of shape (N, H)
  - dWx: Gradient of input-to-hidden weight matrix of shape (D, 4H)
  - dWh: Gradient of hidden-to-hidden weight matrix of shape (H, 4H)
  - db: Gradient of biases, of shape (4H,)
  """
  dx, dh0, dWx, dWh, db = None, None, None, None, None
  #############################################################################
  # TODO: Implement the backward pass for an LSTM over an entire timeseries.  #
  # You should use the lstm_step_backward function that you just defined.     #
  #############################################################################
  N, T, H = dh.shape
  D = cache[0][6].shape[0]

  dc = np.zeros((N, H))
  dx = np.zeros((N, T, D))
  dWx = np.zeros((D, 4*H))
  dWh = np.zeros((H, 4*H))
  db = np.zeros(4*H)
  dh0 = np.zeros((N, H))

  for i in reversed(np.arange(T)):
      dh0 += dh[:, i, :]
      dx[:, i, :], dh0, dc, dw, dwh, tdb = lstm_step_backward(dh0, dc, cache[i])

      dWx += dw
      dWh += dwh
      #print tdb.shape
      #print db.shape
      db += tdb

  ##############################################################################
  #                               END OF YOUR CODE                             #
  ##############################################################################

  return dx, dh0, dWx, dWh, db


def temporal_affine_forward(x, w, b):
  """
  Forward pass for a temporal affine layer. The input is a set of D-dimensional
  vectors arranged into a minibatch of N timeseries, each of length T. We use
  an affine function to transform each of those vectors into a new vector of
  dimension M.

  Inputs:
  - x: Input data of shape (N, T, D)
  - w: Weights of shape (D, M)
  - b: Biases of shape (M,)

  Returns a tuple of:
  - out: Output data of shape (N, T, M)
  - cache: Values needed for the backward pass
  """
  N, T, D = x.shape
  M = b.shape[0]
  out = x.reshape(N * T, D).dot(w).reshape(N, T, M) + b
  cache = x, w, b, out
  return out, cache


def temporal_affine_backward(dout, cache):
  """
  Backward pass for temporal affine layer.

  Input:
  - dout: Upstream gradients of shape (N, T, M)
  - cache: Values from forward pass

  Returns a tuple of:
  - dx: Gradient of input, of shape (N, T, D)
  - dw: Gradient of weights, of shape (D, M)
  - db: Gradient of biases, of shape (M,)
  """
  x, w, b, out = cache
  N, T, D = x.shape
  M = b.shape[0]

  dx = dout.reshape(N * T, M).dot(w.T).reshape(N, T, D)
  dw = dout.reshape(N * T, M).T.dot(x.reshape(N * T, D)).T
  db = dout.sum(axis=(0, 1))

  return dx, dw, db


def temporal_softmax_loss(x, y, mask, verbose=False):
  """
  A temporal version of softmax loss for use in RNNs. We assume that we are
  making predictions over a vocabulary of size V for each timestep of a
  timeseries of length T, over a minibatch of size N. The input x gives scores
  for all vocabulary elements at all timesteps, and y gives the indices of the
  ground-truth element at each timestep. We use a cross-entropy loss at each
  timestep, summing the loss over all timesteps and averaging across the
  minibatch.

  As an additional complication, we may want to ignore the model output at some
  timesteps, since sequences of different length may have been combined into a
  minibatch and padded with NULL tokens. The optional mask argument tells us
  which elements should contribute to the loss.

  Inputs:
  - x: Input scores, of shape (N, T, V)
  - y: Ground-truth indices, of shape (N, T) where each element is in the range
       0 <= y[i, t] < V
  - mask: Boolean array of shape (N, T) where mask[i, t] tells whether or not
    the scores at x[i, t] should contribute to the loss.

  Returns a tuple of:
  - loss: Scalar giving loss
  - dx: Gradient of loss with respect to scores x.
  """

  N, T, V = x.shape

  x_flat = x.reshape(N * T, V)
  y_flat = y.reshape(N * T)
  mask_flat = mask.reshape(N * T)

  probs = np.exp(x_flat - np.max(x_flat, axis=1, keepdims=True))
  probs /= np.sum(probs, axis=1, keepdims=True)
  loss = -np.sum(mask_flat * np.log(probs[np.arange(N * T), y_flat])) / N
  dx_flat = probs.copy()
  dx_flat[np.arange(N * T), y_flat] -= 1
  dx_flat /= N
  dx_flat *= mask_flat[:, None]

  if verbose: print 'dx_flat: ', dx_flat.shape

  dx = dx_flat.reshape(N, T, V)

  return loss, dx

rnn.py

import numpy as np

from cs231n.layers import *
from cs231n.rnn_layers import *


class CaptioningRNN(object):
  """
  A CaptioningRNN produces captions from image features using a recurrent
  neural network.

  The RNN receives input vectors of size D, has a vocab size of V, works on
  sequences of length T, has an RNN hidden dimension of H, uses word vectors
  of dimension W, and operates on minibatches of size N.

  Note that we don't use any regularization for the CaptioningRNN.
  """

  def __init__(self, word_to_idx, input_dim=512, wordvec_dim=128,
               hidden_dim=128, cell_type='rnn', dtype=np.float32):
    """
    Construct a new CaptioningRNN instance.

    Inputs:
    - word_to_idx: A dictionary giving the vocabulary. It contains V entries,
      and maps each string to a unique integer in the range [0, V).
    - input_dim: Dimension D of input image feature vectors.
    - wordvec_dim: Dimension W of word vectors.
    - hidden_dim: Dimension H for the hidden state of the RNN.
    - cell_type: What type of RNN to use; either 'rnn' or 'lstm'.
    - dtype: numpy datatype to use; use float32 for training and float64 for
      numeric gradient checking.
    """
    if cell_type not in {'rnn', 'lstm'}:
      raise ValueError('Invalid cell_type "%s"' % cell_type)

    self.cell_type = cell_type
    self.dtype = dtype
    self.word_to_idx = word_to_idx
    self.idx_to_word = {i: w for w, i in word_to_idx.iteritems()}
    self.params = {}

    vocab_size = len(word_to_idx)

    self._null = word_to_idx['']
    self._start = word_to_idx.get('', None)
    self._end = word_to_idx.get('', None)

    # Initialize word vectors
    self.params['W_embed'] = np.random.randn(vocab_size, wordvec_dim)
    self.params['W_embed'] /= 100

    # Initialize CNN -> hidden state projection parameters
    self.params['W_proj'] = np.random.randn(input_dim, hidden_dim)
    self.params['W_proj'] /= np.sqrt(input_dim)
    self.params['b_proj'] = np.zeros(hidden_dim)

    # Initialize parameters for the RNN
    dim_mul = {'lstm': 4, 'rnn': 1}[cell_type]
    self.params['Wx'] = np.random.randn(wordvec_dim, dim_mul * hidden_dim)
    self.params['Wx'] /= np.sqrt(wordvec_dim)
    self.params['Wh'] = np.random.randn(hidden_dim, dim_mul * hidden_dim)
    self.params['Wh'] /= np.sqrt(hidden_dim)
    self.params['b'] = np.zeros(dim_mul * hidden_dim)

    # Initialize output to vocab weights
    self.params['W_vocab'] = np.random.randn(hidden_dim, vocab_size)
    self.params['W_vocab'] /= np.sqrt(hidden_dim)
    self.params['b_vocab'] = np.zeros(vocab_size)

    # Cast parameters to correct dtype
    for k, v in self.params.iteritems():
      self.params[k] = v.astype(self.dtype)


  def loss(self, features, captions):
    """
    Compute training-time loss for the RNN. We input image features and
    ground-truth captions for those images, and use an RNN (or LSTM) to compute
    loss and gradients on all parameters.

    Inputs:
    - features: Input image features, of shape (N, D)
    - captions: Ground-truth captions; an integer array of shape (N, T) where
      each element is in the range 0 <= y[i, t] < V

    Returns a tuple of:
    - loss: Scalar loss
    - grads: Dictionary of gradients parallel to self.params
    """
    # Cut captions into two pieces: captions_in has everything but the last word
    # and will be input to the RNN; captions_out has everything but the first
    # word and this is what we will expect the RNN to generate. These are offset
    # by one relative to each other because the RNN should produce word (t+1)
    # after receiving word t. The first element of captions_in will be the START
    # token, and the first element of captions_out will be the first word.
    captions_in = captions[:, :-1]
    captions_out = captions[:, 1:]

    # You'll need this 
    mask = (captions_out != self._null)

    # Weight and bias for the affine transform from image features to initial
    # hidden state
    W_proj, b_proj = self.params['W_proj'], self.params['b_proj']

    # Word embedding matrix
    W_embed = self.params['W_embed']

    # Input-to-hidden, hidden-to-hidden, and biases for the RNN
    Wx, Wh, b = self.params['Wx'], self.params['Wh'], self.params['b']

    # Weight and bias for the hidden-to-vocab transformation.
    W_vocab, b_vocab = self.params['W_vocab'], self.params['b_vocab']

    loss, grads = 0.0, {}
    ############################################################################
    # TODO: Implement the forward and backward passes for the CaptioningRNN.   #
    # In the forward pass you will need to do the following:                   #
    # (1) Use an affine transformation to compute the initial hidden state     #
    #     from the image features. This should produce an array of shape (N, H)#
    # (2) Use a word embedding layer to transform the words in captions_in     #
    #     from indices to vectors, giving an array of shape (N, T, W).         #
    # (3) Use either a vanilla RNN or LSTM (depending on self.cell_type) to    #
    #     process the sequence of input word vectors and produce hidden state  #
    #     vectors for all timesteps, producing an array of shape (N, T, H).    #
    # (4) Use a (temporal) affine transformation to compute scores over the    #
    #     vocabulary at every timestep using the hidden states, giving an      #
    #     array of shape (N, T, V).                                            #
    # (5) Use (temporal) softmax to compute loss using captions_out, ignoring  #
    #     the points where the output word is  using the mask above.     #
    #                                                                          #
    # In the backward pass you will need to compute the gradient of the loss   #
    # with respect to all model parameters. Use the loss and grads variables   #
    # defined above to store loss and gradients; grads[k] should give the      #
    # gradients for self.params[k].                                            #
    ############################################################################
    """
    forward pass
    """
    #first layer affine transformation to calculate h0
    h0, flcache = affine_forward(features, W_proj, b_proj)
    # Using embedding layer to get word vectors
    fx, emcache = word_embedding_forward(captions_in, W_embed)
    # RNN or LSTM
    if(self.cell_type == 'rnn'):
        fx, hicache = rnn_forward(fx, h0, Wx, Wh, b)
    else: #if(self.cell_type == 'lstm'):
        fx, hicache = lstm_forward(fx, h0, Wx, Wh, b)
    # affine transformation
    fx, affcache = temporal_affine_forward(fx, W_vocab, b_vocab)
    # softmax 
    loss, dx = temporal_softmax_loss(fx, captions_out, mask)

    """
    backward pass
    """

    # affine trans backward
    dx, grads['W_vocab'], grads['b_vocab'] = temporal_affine_backward(dx, affcache)
    # RNN or LSTM backward
    if(self.cell_type == 'rnn'):
        dx, dh0, grads['Wx'], grads['Wh'], grads['b'] = rnn_backward(dx, hicache)
    else:
        dx, dh0, grads['Wx'], grads['Wh'], grads['b'] = lstm_backward(dx, hicache)
    # embedding backward
    grads['W_embed'] = word_embedding_backward(dx, emcache)
    # first layer(affine transform) backward
    _, grads['W_proj'], grads['b_proj'] = affine_backward(dh0, flcache)


    ############################################################################
    #                             END OF YOUR CODE                             #
    ############################################################################

    return loss, grads


  def sample(self, features, max_length=30):
    """
    Run a test-time forward pass for the model, sampling captions for input
    feature vectors.

    At each timestep, we embed the current word, pass it and the previous hidden
    state to the RNN to get the next hidden state, use the hidden state to get
    scores for all vocab words, and choose the word with the highest score as
    the next word. The initial hidden state is computed by applying an affine
    transform to the input image features, and the initial word is the 
    token.

    For LSTMs you will also have to keep track of the cell state; in that case
    the initial cell state should be zero.

    Inputs:
    - features: Array of input image features of shape (N, D).
    - max_length: Maximum length T of generated captions.

    Returns:
    - captions: Array of shape (N, max_length) giving sampled captions,
      where each element is an integer in the range [0, V). The first element
      of captions should be the first sampled word, not the  token.
    """
    N = features.shape[0]
    captions = self._null * np.ones((N, max_length + 1), dtype=np.int32)

    # Unpack parameters
    W_proj, b_proj = self.params['W_proj'], self.params['b_proj']
    W_embed = self.params['W_embed']
    Wx, Wh, b = self.params['Wx'], self.params['Wh'], self.params['b']
    W_vocab, b_vocab = self.params['W_vocab'], self.params['b_vocab']

    ###########################################################################
    # TODO: Implement test-time sampling for the model. You will need to      #
    # initialize the hidden state of the RNN by applying the learned affine   #
    # transform to the input image features. The first word that you feed to  #
    # the RNN should be the  token; its value is stored in the         #
    # variable self._start. At each timestep you will need to do to:          #
    # (1) Embed the previous word using the learned word embeddings           #
    # (2) Make an RNN step using the previous hidden state and the embedded   #
    #     current word to get the next hidden state.                          #
    # (3) Apply the learned affine transformation to the next hidden state to #
    #     get scores for all words in the vocabulary                          #
    # (4) Select the word with the highest score as the next word, writing it #
    #     to the appropriate slot in the captions variable                    #
    #                                                                         #
    # For simplicity, you do not need to stop generating after an  token #
    # is sampled, but you can if you want to.                                 #
    #                                                                         #
    # HINT: You will not be able to use the rnn_forward or lstm_forward       #
    # functions; you'll need to call rnn_step_forward or lstm_step_forward in #
    # a loop.                                                                 #
    ###########################################################################

    """
    sampling process
    """
    # get h0
    h0, _ = affine_forward(features, W_proj, b_proj)
    # generate captions
    nc = np.zeros_like(h0)
    word = np.ones((N, 1)) * self._start
    captions[:, 0] = self._start
    for i in np.arange(max_length):
        fx, _ = word_embedding_forward(word, W_embed)
        #print fx.shape
        if( self.cell_type == 'rnn' ):
            h0, _ = rnn_step_forward(np.squeeze(fx), h0, Wx, Wh, b)
        else:
            h0, nc, _ = lstm_step_forward( np.squeeze(fx), h0, nc, Wx, Wh, b )
        #print fx.shape
        #h0 = np.squeeze(fx)
        #h0 = copy(fx)
        fx, _ = temporal_affine_forward(h0[:, np.newaxis,:], W_vocab, b_vocab) # copy
        _, T, D = fx.shape
        #print fx.shape
        word = np.argmax(fx, axis = 2)
        #word = word.reshape(-1, 1)
        #word = self.idx_to_word[word_idx]
        #print word.shape
        captions[:, i + 1] = np.squeeze(word)



    ############################################################################
    #                             END OF YOUR CODE                             #
    ############################################################################
    return captions

Image Gradients: Saliency maps and Fooling Images

Saliency Maps

from cs231n.layers import svm_loss
def compute_saliency_maps(X, y, model):
  """
  Compute a class saliency map using the model for images X and labels y.

  Input:
  - X: Input images, of shape (N, 3, H, W)
  - y: Labels for X, of shape (N,)
  - model: A PretrainedCNN that will be used to compute the saliency map.

  Returns:
  - saliency: An array of shape (N, H, W) giving the saliency maps for the input
    images.
  """
  print X.shape
  saliency = None
  ##############################################################################
  # TODO: Implement this function. You should use the forward and backward     #
  # methods of the PretrainedCNN class, and compute gradients with respect to  #
  # the unnormalized class score of the ground-truth classes in y.             #
  ##############################################################################
  out, cache = model.forward(X, None, None, 'test')
  _, dsc = svm_loss(out, y)
  print dsc.shape
  dx = np.zeros_like(dsc)
  dx[np.arange(X.shape[0]), y] = 1 #better than svm_loss dx
  dx, _ = model.backward(dx, cache)
  saliency = np.amax(np.absolute(dx), axis = 1)
  #print saliency.shape
  #print X.shape 
  ##############################################################################
  #                             END OF YOUR CODE                               #
  ##############################################################################
  return saliency

Fooling Images

from cs231n.layers import softmax_loss
def make_fooling_image(X, target_y, model):
  """
  Generate a fooling image that is close to X, but that the model classifies
  as target_y.

  Inputs:
  - X: Input image, of shape (1, 3, 64, 64)
  - target_y: An integer in the range [0, 100)
  - model: A PretrainedCNN

  Returns:
  - X_fooling: An image that is close to X, but that is classifed as target_y
    by the model.
  """
  X_fooling = X.copy()
  ##############################################################################
  # TODO: Generate a fooling image X_fooling that the model will classify as   #
  # the class target_y. Use gradient ascent on the target class score, using   #
  # the model.forward method to compute scores and the model.backward method   #
  # to compute image gradients.                                                #
  #                                                                            #
  # HINT: For most examples, you should be able to generate a fooling image    #
  # in fewer than 100 iterations of gradient ascent.                           #
  ##############################################################################
  i = 1
  while True:
    out, cache = model.forward(X_fooling, None, None, 'test')
    if( np.argmax(out) == target_y):
        print i
        break

    _, dx = svm_loss(out, target_y)
    #dx= np.zeros_like(out)
    #dx[np.arange(X.shape[0]),target_y] = 1.0
    dx, _ = model.backward(dx, cache)
    lr = 100
    X_fooling -= 100 * dx #using svm_loss should be - (gradient descent)
    if( i > 100):
        print np.argmax(out)
        print "Eoor!"
        break
    i += 1

  ##############################################################################
  #                             END OF YOUR CODE                               #
  ##############################################################################
  return X_fooling

Image Generation: Classes, Inversion, DeepDream

Class visualization

def create_class_visualization(target_y, model, **kwargs):
  """
  Perform optimization over the image to generate class visualizations.

  Inputs:
  - target_y: Integer in the range [0, 100) giving the target class
  - model: A PretrainedCNN that will be used for generation

  Keyword arguments:
  - learning_rate: Floating point number giving the learning rate
  - blur_every: An integer; how often to blur the image as a regularizer
  - l2_reg: Floating point number giving L2 regularization strength on the image;
    this is lambda in the equation above.
  - max_jitter: How much random jitter to add to the image as regularization
  - num_iterations: How many iterations to run for
  - show_every: How often to show the image
  """

  learning_rate = kwargs.pop('learning_rate', 10000)
  blur_every = kwargs.pop('blur_every', 1)
  l2_reg = kwargs.pop('l2_reg', 1e-6)
  max_jitter = kwargs.pop('max_jitter', 4)
  num_iterations = kwargs.pop('num_iterations', 100)
  show_every = kwargs.pop('show_every', 25)

  X = np.random.randn(1, 3, 64, 64)
  for t in xrange(num_iterations):
    # As a regularizer, add random jitter to the image
    ox, oy = np.random.randint(-max_jitter, max_jitter+1, 2)
    X = np.roll(np.roll(X, ox, -1), oy, -2)

    dX = None
    ############################################################################
    # TODO: Compute the image gradient dX of the image with respect to the     #
    # target_y class score. This should be similar to the fooling images. Also #
    # add L2 regularization to dX and update the image X using the image       #
    # gradient and the learning rate.                                          #
    ############################################################################
    out, cache = model.forward(X, mode = 'test')
    dout = np.zeros_like(out)
    dout[np.arange(X.shape[0]), target_y] = 1.0
    dx, _ = model.backward(dout, cache)
    dx += 2 * l2_reg * X
    X += learning_rate * dx
    ############################################################################
    #                             END OF YOUR CODE                             #
    ############################################################################

    # Undo the jitter
    X = np.roll(np.roll(X, -ox, -1), -oy, -2)

    # As a regularizer, clip the image
    X = np.clip(X, -data['mean_image'], 255.0 - data['mean_image'])

    # As a regularizer, periodically blur the image
    if t % blur_every == 0:
      X = blur_image(X)

    # Periodically show the image
    if t % show_every == 0:
      plt.imshow(deprocess_image(X, data['mean_image']))
      plt.gcf().set_size_inches(3, 3)
      plt.axis('off')
      plt.show()
  return X

Feature Inversion

def invert_features(target_feats, layer, model, **kwargs):
  """
  Perform feature inversion in the style of Mahendran and Vedaldi 2015, using
  L2 regularization and periodic blurring.

  Inputs:
  - target_feats: Image features of the target image, of shape (1, C, H, W);
    we will try to generate an image that matches these features
  - layer: The index of the layer from which the features were extracted
  - model: A PretrainedCNN that was used to extract features

  Keyword arguments:
  - learning_rate: The learning rate to use for gradient descent
  - num_iterations: The number of iterations to use for gradient descent
  - l2_reg: The strength of L2 regularization to use; this is lambda in the
    equation above.
  - blur_every: How often to blur the image as implicit regularization; set
    to 0 to disable blurring.
  - show_every: How often to show the generated image; set to 0 to disable
    showing intermediate reuslts.

  Returns:
  - X: Generated image of shape (1, 3, 64, 64) that matches the target features.
  """
  learning_rate = kwargs.pop('learning_rate', 10000)
  num_iterations = kwargs.pop('num_iterations', 500)
  l2_reg = kwargs.pop('l2_reg', 1e-7)
  blur_every = kwargs.pop('blur_every', 1)
  show_every = kwargs.pop('show_every', 50)

  X = np.random.randn(1, 3, 64, 64)
  for t in xrange(num_iterations):
    ############################################################################
    # TODO: Compute the image gradient dX of the reconstruction loss with      #
    # respect to the image. You should include L2 regularization penalizing    #
    # large pixel values in the generated image using the l2_reg parameter;    #
    # then update the generated image using the learning_rate from above.      #
    ############################################################################
    out, cache = model.forward(X, end  = layer, mode = 'test')
    loss = np.sum((out - target_feats)**2) + l2_reg * np.sum(X**2) 
    dout = 2 * (out - target_feats)
    dx, _ = model.backward(dout, cache)
    dx += 2 * l2_reg * X
    X -= learning_rate * dx
    ############################################################################
    #                             END OF YOUR CODE                             #
    ############################################################################

    # As a regularizer, clip the image
    X = np.clip(X, -data['mean_image'], 255.0 - data['mean_image'])

    # As a regularizer, periodically blur the image
    if (blur_every > 0) and t % blur_every == 0:
      X = blur_image(X)

    if (show_every > 0) and (t % show_every == 0 or t + 1 == num_iterations):
      print 'loss: %f' %loss 
      plt.imshow(deprocess_image(X, data['mean_image']))
      plt.gcf().set_size_inches(3, 3)
      plt.axis('off')
      plt.title('t = %d' % t)
      plt.show()

DeepDream

def deepdream(X, layer, model, **kwargs):
  """
  Generate a DeepDream image.

  Inputs:
  - X: Starting image, of shape (1, 3, H, W)
  - layer: Index of layer at which to dream
  - model: A PretrainedCNN object

  Keyword arguments:
  - learning_rate: How much to update the image at each iteration
  - max_jitter: Maximum number of pixels for jitter regularization
  - num_iterations: How many iterations to run for
  - show_every: How often to show the generated image
  """

  X = X.copy()

  learning_rate = kwargs.pop('learning_rate', 5.0)
  max_jitter = kwargs.pop('max_jitter', 16)
  num_iterations = kwargs.pop('num_iterations', 100)
  show_every = kwargs.pop('show_every', 25)

  for t in xrange(num_iterations):
    # As a regularizer, add random jitter to the image
    ox, oy = np.random.randint(-max_jitter, max_jitter+1, 2)
    X = np.roll(np.roll(X, ox, -1), oy, -2)

    dX = None
    ############################################################################
    # TODO: Compute the image gradient dX using the DeepDream method. You'll   #
    # need to use the forward and backward methods of the model object to      #
    # extract activations and set gradients for the chosen layer. After        #
    # computing the image gradient dX, you should use the learning rate to     #
    # update the image X.                                                      #
    ############################################################################
    out, cache =  model.forward(X, end = layer, mode = 'test')
    dx, _ = model.backward(out, cache)

    X += learning_rate * dx
    ############################################################################
    #                             END OF YOUR CODE                             #
    ############################################################################

    # Undo the jitter
    X = np.roll(np.roll(X, -ox, -1), -oy, -2)

    # As a regularizer, clip the image
    mean_pixel = data['mean_image'].mean(axis=(1, 2), keepdims=True)
    X = np.clip(X, -mean_pixel, 255.0 - mean_pixel)

    # Periodically show the image
    if t == 0 or (t + 1) % show_every == 0:
      img = deprocess_image(X, data['mean_image'], mean='pixel')
      plt.imshow(img)
      plt.title('t = %d' % (t + 1))
      plt.gcf().set_size_inches(8, 8)
      plt.axis('off')
      plt.show()
  return X

你可能感兴趣的:(深度学习,cs231n,深度学习)

10 个免费的 AI 图片生成工具分享程序员
原文：https://openaigptguide.com/ai-picture-generator/在人工智能（AI）图像生成技术的推动下，各类AI图片生成网站如雨后春笋般涌现，为我们的日常生活提供了丰富多彩的视觉体验。AI图片生成技术原理人工智能（AI）图片生成技术原理是通过计算机程序使用深度学习算法从大量的数据中学习特征，并根据特征创建新的图片。该技术可以模拟人类的绘画过程，学习输入图像的潜
假新闻检测论文（24）A comprehensive survey of multimodal fake news detection techniques... weixin_41964296 假新闻检测自然语言处理
本文综述了利用深度学习架构和注意力机制进行假新闻检测的最新和全面的研究一介绍假新闻定义：虚假或误导性新闻，或“假新闻”，是任何捏造或故意欺骗的媒体内容。假新闻危害：它可以被利用来操纵公众情绪，传播错误信息，甚至干预政治选举。它的主要目的是扭曲、欺骗或操纵个人的信仰和观点。假新闻的形式（类型）：虚假信息在媒体上传播的形式多种多样，包括讽刺、谣言、点击诱饵、错误信息等。讽刺作品通常充满幽默，用来强调特
YOLOv8重磅升级：引入DenseOne密集网络革新主干设计，重塑YOLO目标检测性能新高度程序员杨弋 YOLO 目标检测人工智能
随着深度学习技术的不断进步，目标检测作为计算机视觉领域的重要任务之一，其性能和应用范围也在不断扩大。作为目标检测领域的佼佼者，YOLO（YouOnlyLookOnce）系列算法以其出色的性能和实时性受到了广泛关注。而最近提出的YOLOv8更是在前代版本的基础上进行了多项优化，进一步提升了检测精度和速度。然而，尽管YOLOv8已经取得了显著的进步，但在处理复杂场景和遮挡问题时，仍然存在一定的挑战。为
深度学习驱动的极端天气预测：时空数据异常检测与应用全解析（基于Python + TensorFlow） AI_DL_CODE 深度学习 python tensorflow 人工智能天气预测
摘要：时空数据异常检测在气象领域识别偏离正常模式的数据点，对极端天气预测至关重要。深度学习，尤其是LSTM网络，因其强大的特征学习能力在该领域显示出巨大潜力。通过整合多源气象数据，深度学习模型能够自动挖掘复杂模式和非线性关系，提高预测准确性。然而，挑战依然存在，包括数据质量问题、模型可解释性不足以及极端天气的内在复杂性和不确定性。未来，通过模型架构创新、训练算法优化以及探索深度学习在气候预测、气象
基于深度学习的人脸表情识别系统：YOLOv5 + YOLOv8 + YOLOv10 + UI界面 + 数据集 2025年数学建模美赛深度学习 YOLO ui 分类人工智能
引言随着人工智能的飞速发展，深度学习技术已广泛应用于各个领域，尤其是在计算机视觉领域。人脸识别和表情识别是其中的一个重要应用，能够在多种场景下提供重要的信息，例如安全监控、情感分析、智能客服、健康监测等。在人脸表情识别任务中，准确识别人脸的情感状态（如高兴、愤怒、悲伤等）是一个极具挑战性的任务。随着YOLO系列算法的不断进步，YOLOv5、YOLOv8和YOLOv10的推出大大提高了目标检测的精度
基于YOLOv8深度学习的人脸年龄检测识别系统 2025年数学建模美赛 YOLO 深度学习人工智能 ui 数据挖掘分类
引言随着人工智能和计算机视觉的飞速发展，人脸分析技术在年龄检测领域取得了显著进展。人脸年龄检测系统在安全监控、广告推荐、健康监测等领域有广泛应用。本文将基于YOLOv8目标检测模型和UI界面，开发一个完整的人脸年龄检测识别系统。我们将详细介绍项目的技术实现、数据集构建、模型训练以及UI设计，并附上完整代码。目录引言系统架构设计数据准备公开人脸年龄数据集数据标注格式数据目录结构模型训练YOLOv8环
基于深度学习的人脸表情识别系统（YOLOv10+UI界面+数据集） 2025年数学建模美赛深度学习 YOLO ui 计算机视觉人工智能目标跟踪
在本篇博客中，我们将详细介绍如何构建一个基于深度学习的人脸表情识别系统。该系统主要由三部分组成：YOLOv10（深度学习模型）进行表情识别、UI界面展示识别结果以及数据集的准备和训练过程。我们将从系统架构、数据准备、模型训练、UI设计等多个方面进行全面讲解，最终实现一个能够实时识别并展示人脸表情的系统。目录1.系统架构2.数据集准备2.1FER2013数据集2.2数据预处理3.YOLOv10模型概
基于深度学习的人脸表情识别系统：YOLOv8 + UI界面 + 数据集完整实现 2025年数学建模美赛深度学习 YOLO ui 人工智能代码
1.引言近年来，人脸表情识别在情感计算、智能人机交互、心理学研究等领域有着广泛的应用。深度学习的快速发展，使得高效、准确的人脸表情识别成为可能。通过利用卷积神经网络（CNN）和目标检测技术，可以实现实时、精准的人脸表情识别。本文将基于YOLOv8构建一个完整的人脸表情识别系统。系统集成了数据集准备、YOLOv8模型训练、实时推理以及基于PyQt5的图形用户界面（UI）。通过本文，你将学习如何实现一
AI大模型应用架构（ALLMA）白皮书解读百度_开发者中心人工智能大模型数据库自然语言处理
随着人工智能技术的不断发展，AI大模型成为推动生产、生活方式变革，助推产业智能化转型升级，驱动数字经济高质量发展等社会经济发展方面的新引擎。为了全面展示AI大模型的发展全貌，为各界提供新思路，本文将对AI大模型应用架构（ALLMA）白皮书进行解读。一、AI大模型应用架构（ALLMA）的内涵AI大模型应用架构（ALLMA）是一种基于深度学习的人工智能应用架构，旨在通过大规模无标注数据预训练、指令微调
Web APP 阶段性综述预测模型的开发与应用研究 APP construction web app
WebAPP阶段性综述当前，WebAPP主要应用于电脑端，常被用于部署数据分析、机器学习及深度学习等高算力需求的任务。在医学与生物信息学领域，WebAPP扮演着重要角色。在生物信息学领域，诸多工具以WebAPP的形式呈现，相较之下，医学领域的此类应用数量相对较少。在医学和生物信息学的学术论文中，WebAPP是展示研究成果的有效工具，并且还能部署到网络上，服务于实际应用场景。ShinyAPP平台特性
气象海洋水文领域Python机器学习及深度学习实践应用能力提升 AAIshangyanxiu 农林生态遥感编程算法统计语言大气科学 python 机器学习深度学习
Python是功能强大、免费、开源，实现面向对象的编程语言，能够在不同操作系统和平台使用，简洁的语法和解释性语言使其成为理想的脚本语言。除了标准库，还有丰富的第三方库，Python在数据处理、科学计算、数学建模、数据挖掘和数据可视化方面具备优异的性能。上述优势使得Python在气象、海洋、地理、气候、水文和生态等地学领域的科研和工程项目中得到广泛应用。可以预见未来Python将成为气象、海洋和水文
【昇思25天学习打卡营打卡指南-第一天】基本介绍与快速入门 JeffDingAI MindSpore 学习
昇思MindSpore介绍昇思MindSpore是一个全场景深度学习框架，旨在实现易开发、高效执行、全场景统一部署三大目标。其中，易开发表现为API友好、调试难度低；高效执行包括计算效率、数据预处理效率和分布式训练效率；全场景则指框架同时支持云、边缘以及端侧场景。昇思MindSpore总体架构如下图所示：ModelZoo（模型库）：ModelZoo提供可用的深度学习算法网络，也欢迎更多开发者贡献新
NLP-语义解析(Text2SQL)：技术路线【Seq2Seq、模板槽位填充、中间表达、强化学习、图网络】 u013250861 #自然语言处理人工智能
目前关于NL2SQL技术路线的发展主要包含以下几种:Seq2Seq方法：在深度学习的研究背景下,很多研究人员将Text-to-SQL看作一个类似神经机器翻译的任务,主要采取Seq2Seq的模型框架。基线模型Seq2Seq在加入Attention、Copying等机制后,能够在ATIS、GeoQuery数据集上达到84%的精确匹配,但是在WikiSQL数据集上只能达到23.3%的精确匹配,37.0%
PyTorch 中的 expand 操作详解：用法、原理与技巧专业发呆业余科研深度模型底层原理 pytorch 人工智能 python 深度学习机器学习
在使用PyTorch进行深度学习时，张量形状与广播机制常常是让初学者感到困惑的地方。我们需要时常面对多维张量，并在批量、通道、空间位置等多个维度之间做运算。如果能熟练掌握各种维度变换操作——包括unsqueeze、expand、view/reshape、transpose/permute等，可以帮助我们灵活地操纵张量，写出高效而简洁的矩阵化（vectorized）代码。本文将重点聚焦于expand
注意力池化层：从概念到实现及应用专业发呆业余科研深度模型底层原理 python 人工智能 transformer 深度学习自然语言处理图像处理
引言在现代深度学习模型中，注意力机制已经成为一个不可或缺的组件，特别是在处理自然语言和视觉数据时。多头注意力机制（MultiheadAttention）是Transformer模型的核心，它通过多个注意力头来捕捉序列中不同部分之间的关系。然而，在多模态模型中，如何有效地将图像特征和文本特征结合起来一直是一个挑战。注意力池化层（AttentionPoolingLayer）提供了一种有效的解决方案，通
深入解析昇腾AI CPU算子开发：基于AI CPU引擎的自定义算子实现与优化快撑死的鱼华为昇腾 Ascend C的算子开发系统学习人工智能
深入解析昇腾AICPU算子开发：基于AICPU引擎的自定义算子实现与优化随着深度学习模型复杂性的不断提升，AI处理器需要更强大的算力和更高效的计算架构来支撑模型的训练和推理。在华为昇腾AI处理器的架构中，AICPU承担着重要的计算任务，特别是针对标量和向量等通用计算的支持。AICPU算子开发成为开发者优化模型性能的重要步骤，而TBE（TensorBoostEngine）工具也为开发者提供了便捷的算
【AI系统】混合并行 ZOMI酱人工智能
混合并行混合并行（HybridParallel）是一种用于分布式计算的高级策略，它结合了数据并行和模型并行的优势，以更高效地利用计算资源，解决深度学习中的大模型训练问题。混合并行不仅能提高计算效率，还能在有限的硬件资源下处理更大的模型和数据集。在深度学习中，数据并行和模型并行各自有其适用的场景和局限性。数据并行适用于训练样本较多而模型较小的情况，通过将数据集分割成多个子集并在不同的设备上同时训练来
BladeDISC++：Dynamic Shape AI 编译器下的显存优化技术人工智能机器学习分布式阿里云
近年来，随着深度学习技术的迅猛发展，越来越多的模型展现出动态特性，这引发了对动态形状深度学习编译器(DynamicShapeAICompiler)的广泛关注。本文将介绍阿里云PAI团队近期发布的BladeDISC++项目，探讨在动态场景下如何优化深度学习训练任务的显存峰值，主要内容包括以下三个部分：DynamicShape场景下显存优化的背景与挑战BladeDISC++的创新解决方案Llama2模
【TVM 教程】为 x86 CPU 自动调优卷积网络
ApacheTVM是一个深度的深度学习编译框架，适用于CPU、GPU和各种机器学习加速芯片。更多TVM中文文档可访问→https://tvm.hyper.ai/作者：YaoWang,EddieYan本文介绍如何为x86CPU调优卷积神经网络。注意，本教程不会在Windows或最新版本的macOS上运行。如需运行，请将本教程的主体放在ifname=="__main__":代码块中。importosi
交叉熵损失与二元交叉熵损失：区别、联系及实现细节专业发呆业余科研深度模型底层原理人工智能深度学习 python
在机器学习和深度学习中，交叉熵损失（Cross-EntropyLoss）和二元交叉熵损失（BinaryCross-EntropyLoss）是两种常用的损失函数，它们在分类任务中发挥着重要作用。本文将详细介绍这两种损失函数的区别和联系，并通过具体的代码示例来说明它们的实现细节。交叉熵损失（Cross-EntropyLoss）常用于多类分类问题，即每个样本只能属于一个类别，但总类别数量较多。例如，在手
深度学习YOLOv3压双黄线期末项目 yzx991013 giit YOLO
一、引言实现功能目录一、引言实现功能打开视频连续检测车辆能检测到道路中间的双黄线能检测出车辆是否压双黄线当车辆压到双黄线时给出提示要求使用多线程实现功能二、技术栈概览三、代码功能深度剖析视频文件选择功能（choosevideo函数）四、项目亮点提炼五、总结与展望1.打开视频2.连续检测车辆3.能检测到道路中间的双黄线4.能检测出车辆是否压双黄线5.当车辆压到双黄线时给出提示6.要求使用多线程实现功
深度定制：Embedding与Reranker模型的微调艺术从零开始学习人工智能 embedding 人工智能
微调是深度学习中的一种常见做法，它允许模型在预训练的基础上进一步学习特定任务的特定特征。对于Embedding模型，微调的目的是让模型更适配特定的数据集，从而取得更好的召回效果。这通常涉及到使用特定的数据集对模型进行额外的训练，以便模型能够学习到数据集中的特定语义关系。微调过程可以使用不同的库和框架来实现，例如sentence-transformers库，它提供了便捷的API来调整Embeddin
【机器学习】—时序数据分析：机器学习与深度学习在预测、金融、气象等领域的应用云边有个稻草人热门文章机器学习数据分析深度学习笔记
云边有个稻草人-CSDN博客目录引言1.时序数据分析基础1.1时序数据的特点1.2时序数据分析的常见方法2.深度学习与时序数据分析2.1深度学习在时序数据分析中的应用2.1.1LSTM（长短期记忆网络）2.2深度学习在金融市场预测中的应用2.2.1股票市场预测2.3深度学习在设备故障检测中的应用3.强化学习与时序数据分析3.1强化学习的基本概念3.2强化学习在金融市场中的应用3.3强化学习在设备故
使用 AI 在医疗影像分析中的应用探索
摘要医疗影像分析是AI在医疗领域的重要应用方向，能够提高诊断效率，减少误诊率。本文将深入探讨AI技术在医疗影像数据分析中的应用，包括核心算法、关键实现步骤和实际案例，并提供一个基于卷积神经网络（CNN）的图像分类Demo。引言随着医疗影像数据的爆炸式增长，传统的人工分析已无法满足高效、精准诊断的需求。AI技术通过深度学习算法，在医疗影像的识别、分类和标注中发挥了重要作用。本文章将结合技术实现与案例
【机器学习】---神经架构搜索（NAS） Undoom 机器学习 Python 机器学习架构人工智能 python
这里写目录标题引言1.什么是神经架构搜索（NAS）1.1为什么需要NAS？2.NAS的三大组件2.1搜索空间搜索空间设计的考虑因素：2.2搜索策略2.3性能估计3.NAS的主要方法3.1基于强化学习的NAS3.2基于进化算法的NAS3.3基于梯度的NAS4.NAS的应用5.实现一个简单的NAS框架6.总结引言随着深度学习的成功应用，神经网络架构的设计变得越来越复杂。模型的性能不仅依赖于数据和训练方
【C#深度学习之路】如何使用C#读取pickle类型的大模型文件来瓶霸王防脱发 C#深度学习之路 c#机器学习
【C#深度学习之路】如何使用C#读取pickle类型的大模型文件背景Pickle文件的结构及读取思路读取方法以压缩文件的方式加载Pickle类型文件读取Header的内容读取tensor的权重值该方法的不足总结本文为原创文章，若需要转载，请注明出处。原文地址：https://blog.csdn.net/qq_30270773/article/details/141367057项目对应的Github
【C#深度学习之路】如何使用C#实现Yolov8模型的训练和推理来瓶霸王防脱发 C#深度学习之路 c#机器学习图像处理视觉检测 YOLO
【C#深度学习之路】如何使用C#实现Yolov8模型的训练和推理项目背景算法实现模型结构项目展望写在最后项目下载链接本文为原创文章，若需要转载，请注明出处。原文地址：https://blog.csdn.net/qq_30270773/article/details/143529308项目对应的Github地址：https://github.com/IntptrMax/YoloSharpC#深度学习
【C#深度学习之路】如何使用C#实现Yolov11模型的训练和推理来瓶霸王防脱发 C#深度学习之路 c#深度学习 YOLO
【C#深度学习之路】如何使用C#实现Yolov11模型的训练和推理项目背景算法实现模型结构项目展望写在最后项目下载链接本文为原创文章，若需要转载，请注明出处。原文地址：https://blog.csdn.net/qq_30270773/article/details/143722404项目对应的Github地址：https://github.com/IntptrMax/YoloSharpC#深度学
AlexNet：开启深度学习图像识别新纪元池央深度学习人工智能
一、引言在深度学习的璀璨星空中，AlexNet无疑是一颗极为耀眼的明星。它于2012年横空出世，并在ImageNet竞赛中一举夺冠，这一历史性的突破彻底改变了计算机视觉领域的发展轨迹，让全世界深刻认识到深度卷积神经网络在图像识别任务中的巨大潜力，从而掀起了深度学习研究与应用的热潮。二、AlexNet网络架构详解（一）输入层AlexNet的输入图像通常为224x224x3的彩色图像。这一尺寸的确定是
拯救者电脑安装Windows和Ubuntu双系统遇到黑屏或者花屏问题的解决方法，亲测有效我爱猪肉炖粉条 ubuntu 深度学习
最近想在电脑上跑深度学习，有一定基础的都知道，ubuntu更适合gpu、apex以及其他加速的使用，如果在Windows上总是遇到各种各样的问题，所以我给电脑安装了双系统。装系统的过程此处忽略，随便找个教程都可以。总结一下就是在C盘压缩一定的空间（比如80G），然后通过U盘工具制作一个Ubuntu启动盘，把系统安装到压缩的那个盘里。我使用的电脑是拯救者R7000P，英伟达RTX2060，AMD处理
多线程编程之存钱与取钱周凡杨 java thread 多线程存钱取钱
生活费问题是这样的：学生每月都需要生活费，家长一次预存一段时间的生活费，家长和学生使用统一的一个帐号，在学生每次取帐号中一部分钱，直到帐号中没钱时通知家长存钱，而家长看到帐户还有钱则不存钱，直到帐户没钱时才存钱。问题分析：首先问题中有三个实体，学生、家长、银行账户，所以设计程序时就要设计三个类。其中银行账户只有一个，学生和家长操作的是同一个银行账户，学生的行为是
java中数组与List相互转换的方法征客丶 JavaScript java jsonp
1.List转换成为数组。（这里的List是实体是ArrayList) 　　调用ArrayList的toArray方法。　　toArray 　　public T[] toArray(T[] a)返回一个按照正确的顺序包含此列表中所有元素的数组；返回数组的运行时类型就是指定数组的运行时类型。如果列表能放入指定的数组，则返回放入此列表元素的数组。否则，将根据指定数组的运行时类型和此列表的大小分
Shell 流程控制 daizj 流程控制 if else while case shell
Shell 流程控制和Java、PHP等语言不一样，sh的流程控制不可为空，如(以下为PHP流程控制写法)： <?php if(isset($_GET["q"])){ search(q);}else{// 不做任何事情} 在sh/bash里可不能这么写，如果else分支没有语句执行，就不要写这个else，就像这样 if else if if 语句语
Linux服务器新手操作之二周凡杨 Linux 简单操作
1.利用关键字搜寻Man Pages man -k keyword 其中-k 是选项，keyword是要搜寻的关键字如果现在想使用whoami命令，但是只记住了前3个字符who，就可以使用 man -k who来搜寻关键字who的man命令 [haself@HA5-DZ26 ~]$ man -k
socket聊天室之服务器搭建朱辉辉33 socket
因为我们做的是聊天室，所以会有多个客户端，每个客户端我们用一个线程去实现，通过搭建一个服务器来实现从每个客户端来读取信息和发送信息。我们先写客户端的线程。 public class ChatSocket extends Thread{ Socket socket; public ChatSocket(Socket socket){ this.sock
利用finereport建设保险公司决策分析系统的思路和方法老A不折腾 finereport 金融保险分析系统报表系统项目开发
决策分析系统呈现的是数据页面，也就是俗称的报表，报表与报表间、数据与数据间都按照一定的逻辑设定，是业务人员查看、分析数据的平台，更是辅助领导们运营决策的平台。底层数据决定上层分析，所以建设决策分析系统一般包括数据层处理（数据仓库建设）。项目背景介绍通常，保险公司信息化程度很高，基本上都有业务处理系统（像集团业务处理系统、老业务处理系统、个人代理人系统等）、数据服务系统（通过
始终要页面在ifream的最顶层林鹤霄
index.jsp中有ifream，但是session消失后要让login.jsp始终显示到ifream的最顶层。。。始终没搞定，后来反复琢磨之后，得到了解决办法，在这儿给大家分享下。。 index.jsp--->主要是加了颜色的那一句 <html> <iframe name="top" ></iframe> <ifram
MySQL binlog恢复数据 aigo mysql
1，先确保my.ini已经配置了binlog： # binlog log_bin = D:/mysql-5.6.21-winx64/log/binlog/mysql-bin.log log_bin_index = D:/mysql-5.6.21-winx64/log/binlog/mysql-bin.index log_error = D:/mysql-5.6.21-win
OCX打成CBA包并实现自动安装与自动升级 alxw4616 ocx cab
近来手上有个项目,需要使用ocx控件 (ocx是什么? http://baike.baidu.com/view/393671.htm) 在生产过程中我遇到了如下问题. 1. 如何让 ocx 自动安装? a) 如何签名? b) 如何打包? c) 如何安装到指定目录? 2.
Hashmap队列和PriorityQueue队列的应用百合不是茶 Hashmap队列 PriorityQueue队列
HashMap队列已经是学过了的,但是最近在用的时候不是很熟悉,刚刚重新看以一次, HashMap是K,v键 ,值 put()添加元素 //下面试HashMap去掉重复的 package com.hashMapandPriorityQueue; import java.util.H
JDK1.5 returnvalue实例 bijian1013 java thread java多线程 returnvalue
Callable接口：返回结果并且可能抛出异常的任务。实现者定义了一个不带任何参数的叫做 call 的方法。 Callable 接口类似于 Runnable，两者都是为那些其实例可能被另一个线程执行的类设计的。但是 Runnable 不会返回结果，并且无法抛出经过检查的异常。 ExecutorService接口方
angularjs指令中动态编译的方法(适用于有异步请求的情况) 内嵌指令无效 bijian1013 JavaScript AngularJS
在directive的link中有一个$http请求，当请求完成后根据返回的值动态做element.append('......');这个操作，能显示没问题，可问题是我动态组的HTML里面有ng-click，发现显示出来的内容根本不执行ng-click绑定的方法！
【Java范型二】Java范型详解之extend限定范型参数的类型 bit1129 extend
在第一篇中，定义范型类时，使用如下的方式： public class Generics<M, S, N> { //M,S,N是范型参数 } 这种方式定义的范型类有两个基本的问题： 1. 范型参数定义的实例字段，如private M m = null;由于M的类型在运行时才能确定，那么我们在类的方法中，无法使用m，这跟定义pri
【HBase十三】HBase知识点总结 bit1129 hbase
1. 数据从MemStore flush到磁盘的触发条件有哪些？ a.显式调用flush，比如flush 'mytable' b.MemStore中的数据容量超过flush的指定容量，hbase.hregion.memstore.flush.size,默认值是64M 2. Region的构成是怎么样？ 1个Region由若干个Store组成
服务器被DDOS攻击防御的SHELL脚本 ronin47
mkdir /root/bin vi /root/bin/dropip.sh #!/bin/bash/bin/netstat -na|grep ESTABLISHED|awk ‘{print $5}’|awk -F:‘{print $1}’|sort|uniq -c|sort -rn|head -10|grep -v -E ’192.168|127.0′|awk ‘{if($2!=null&a
java程序员生存手册-craps 游戏-一个简单的游戏 bylijinnan java
import java.util.Random; public class CrapsGame { /** * *一个简单的赌*博游戏，游戏规则如下： *玩家掷两个骰子，点数为1到6，如果第一次点数和为7或11，则玩家胜， *如果点数和为2、3或12，则玩家输， *如果和为其它点数，则记录第一次的点数和，然后继续掷骰，直至点数和等于第一次掷出的点
TOMCAT启动提示NB: JAVA_HOME should point to a JDK not a JRE解决开窍的石头 JAVA_HOME
当tomcat是解压的时候，用eclipse启动正常，点击startup.bat的时候启动报错; 报错如下： The JAVA_HOME environment variable is not defined correctly This environment variable is needed to run this program NB: JAVA_HOME shou
[操作系统内核]操作系统与互联网 comsci 操作系统
我首先申明：我这里所说的问题并不是针对哪个厂商的，仅仅是描述我对操作系统技术的一些看法操作系统是一种与硬件层关系非常密切的系统软件，按理说，这种系统软件应该是由设计CPU和硬件板卡的厂商开发的，和软件公司没有直接的关系，也就是说，操作系统应该由做硬件的厂商来设计和开发
富文本框ckeditor_4.4.7 文本框的简单使用支持IE11 cuityang 富文本框
<html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> <title>知识库内容编辑</tit
Property null not found darrenzhu datagrid Flex Advanced propery null
When you got error message like "Property null not found ***", try to fix it by the following way: 1)if you are using AdvancedDatagrid, make sure you only update the data in the data prov
MySQl数据库字符串替换函数使用 dcj3sjt126com mysql 函数替换
需求：需要将数据表中一个字段的值里面的所有的 . 替换成 _ 原来的数据是 site.title site.keywords .... 替换后要为 site_title site_keywords 使用的SQL语句如下： updat
mac上终端起动MySQL的方法 dcj3sjt126com mysql mac
首先去官网下载: http://www.mysql.com/downloads/ 我下载了5.6.11的dmg然后安装,安装完成之后..如果要用终端去玩SQL.那么一开始要输入很长的:/usr/local/mysql/bin/mysql 这不方便啊,好想像windows下的cmd里面一样输入mysql -uroot -p1这样...上网查了下..可以实现滴. 打开终端,输入: 1
Gson使用一（Gson） eksliang json gson
转载请出自出处：http://eksliang.iteye.com/blog/2175401 一.概述从结构上看Json，所有的数据（data）最终都可以分解成三种类型：第一种类型是标量（scalar），也就是一个单独的字符串（string）或数字（numbers），比如"ickes"这个字符串。第二种类型是序列（sequence），又叫做数组（array）
android点滴4 gundumw100 android
Android 47个小知识 http://www.open-open.com/lib/view/open1422676091314.html Android实用代码七段（一） http://www.cnblogs.com/over140/archive/2012/09/26/2611999.html http://www.cnblogs.com/over140/arch
JavaWeb之JSP基本语法 ihuning javaweb
目录 JSP模版元素 JSP表达式 JSP脚本片断 EL表达式 JSP注释特殊字符序列的转义处理如何查找JSP页面中的错误 JSP模版元素 JSP页面中的静态HTML内容称之为JSP模版元素，在静态的HTML内容之中可以嵌套JSP
App Extension编程指南（iOS8/OS X v10.10）中文版啸笑天 ext
当iOS 8.0和OS X v10.10发布后，一个全新的概念出现在我们眼前，那就是应用扩展。顾名思义，应用扩展允许开发者扩展应用的自定义功能和内容，能够让用户在使用其他app时使用该项功能。你可以开发一个应用扩展来执行某些特定的任务，用户使用该扩展后就可以在多个上下文环境中执行该任务。比如说，你提供了一个能让用户把内容分
SQLServer实现无限级树结构 macroli oracle sql SQL Server
表结构如下：数据库id path titlesort 排序 1 0 首页 0 2 0,1 新闻 1 3 0,2 JAVA 2 4 0,3 JSP 3 5 0,2,3 业界动态 2 6 0,2,3 国内新闻 1 创建一个存储过程来实现，如果要在页面上使用可以设置一个返回变量将至传过去 create procedure test as begin decla
Css居中div，Css居中img，Css居中文本，Css垂直居中div qiaolevip 众观千象学习永无止境每天进步一点点 css
/**********Css居中Div**********/ div.center { width: 100px; margin: 0 auto; } /**********Css居中img**********/ img.center { display: block; margin-left: auto; margin-right: auto; }
Oracle 常用操作(实用) 吃猫的鱼 oracle
SQL>select text from all_source where owner=user and name=upper('&plsql_name'); SQL>select * from user_ind_columns where index_name=upper('&index_name'); 将表记录恢复到指定时间段以前
iOS中使用RSA对数据进行加密解密 witcheryne ios rsa iPhone objective c
RSA算法是一种非对称加密算法,常被用于加密数据传输.如果配合上数字摘要算法, 也可以用于文件签名. 本文将讨论如何在iOS中使用RSA传输加密数据. 本文环境 mac os openssl-1.0.1j, openssl需要使用1.x版本, 推荐使用[homebrew](http://brew.sh/)安装. Java 8 RSA基本原理 RS