角度洗

cs231n assignment3解析

cs231n-assignment3解析

前言：每年的作业都差不多，但是有些地方有微小改动，比如将循环的内容单独作为一个函数，核心内容其实都是一样的。

作业要求见该网站：https://cs231n.github.io/assignments2017/assignment3/

cs231n的三次作业都放在了我的GitHub仓库，可以进去查看。

由于我是看完课程再写的作业，所以我一般把一个文件里要写的函数部分全部写完，该博文也是这么书写的，没有按照Q1、Q2……题中所需的顺序一个个写函数，见谅见谅。

数据准备

这一部分使用的是 Microsoft COCO dataset ，可以直接进入下面的网址：http://cs231n.stanford.edu/coco_captioning.zip 下载数据，解压后放入工作区直接使用即可。

rnn_layers.py

rnn_step_forward()

由于 RNN 的前向传播公式是：
$h_{t+1} = tanh(W_xx_{t+1} + W_hh_t + b)$ 因此直接使用矩阵乘法计算即可。

def rnn_step_forward(x, prev_h, Wx, Wh, b):
    """
    Run the forward pass for a single timestep of a vanilla RNN that uses a tanh
    activation function.

    The input data has dimension D, the hidden state has dimension H, and we use
    a minibatch size of N.

    Inputs:
    - x: Input data for this timestep, of shape (N, D).
    - prev_h: Hidden state from previous timestep, of shape (N, H)
    - Wx: Weight matrix for input-to-hidden connections, of shape (D, H)
    - Wh: Weight matrix for hidden-to-hidden connections, of shape (H, H)
    - b: Biases of shape (H,)

    Returns a tuple of:
    - next_h: Next hidden state, of shape (N, H)
    - cache: Tuple of values needed for the backward pass.
    """
    next_h, cache = None, None
    ##############################################################################
    # TODO: Implement a single forward step for the vanilla RNN. Store the next  #
    # hidden state and any values you need for the backward pass in the next_h   #
    # and cache variables respectively.                                          #
    ##############################################################################
    next_h = np.tanh(np.dot(x, Wx) + np.dot(prev_h, Wh) + b)
    cache = (next_h, Wx, Wh, x, prev_h, b)
    ##############################################################################
    #                               END OF YOUR CODE                             #
    ##############################################################################
    return next_h, cache

rnn_step_backward()

这里一共要计算出 dx，dpre_h，dWx，dWh，db

tanh激活函数

tanh激活函数的定义是： $\frac{e^x - e^{-x}}{e^x + e^{-x}}$ 其对于自变量 x 的导数为： $tanh^{'}(x) = 1-tanh^2(x)$ （可以求导一下，结果和这个公式是一样的，但是这么书写和计算更为简洁）

求导计算

注意这是一个含有 tanh 激活函数的复合函数，因此以求 dx 为例，应： $dnext\_h * dtanh * \frac{\partial{tanh}}{\partial{x}}$ 所以程序应写为：

def rnn_step_backward(dnext_h, cache):
    """
    Backward pass for a single timestep of a vanilla RNN.

    Inputs:
    - dnext_h: Gradient of loss with respect to next hidden state
    - cache: Cache object from the forward pass

    Returns a tuple of:
    - dx: Gradients of input data, of shape (N, D)
    - dprev_h: Gradients of previous hidden state, of shape (N, H)
    - dWx: Gradients of input-to-hidden weights, of shape (D, H)
    - dWh: Gradients of hidden-to-hidden weights, of shape (H, H)
    - db: Gradients of bias vector, of shape (H,)
    """
    dx, dprev_h, dWx, dWh, db = None, None, None, None, None
    ##############################################################################
    # TODO: Implement the backward pass for a single step of a vanilla RNN.      #
    #                                                                            #
    # HINT: For the tanh function, you can compute the local derivative in terms #
    # of the output value from tanh.                                             #
    ##############################################################################
    next_h, Wx, Wh, x, prev_h, b = cache
    # temp_x = np.dot(x, Wx) + np.dot(prev_h, Wh) + b
    dtanh = dnext_h * (1 - next_h**2)
    dx = np.dot(dtanh, Wx.T)
    dprev_h = np.dot(dtanh, Wh.T)
    dWx = np.dot(x.T, dtanh)
    dWh = np.dot(prev_h.T, dtanh)
    db = np.sum(dtanh, 0)
    ##############################################################################
    #                               END OF YOUR CODE                             #
    ##############################################################################
    return dx, dprev_h, dWx, dWh, db

rnn_forward()

这个函数是 RNN 的总体前向传播的函数，输入 x 加入了时间维度，需要嵌套调用rnn_step_forward() 函数。其输出 h 是一个数组，记录了 RNN 网络每个时刻的状态。我们使用 for 循环模拟时间的推移，因此在数组 h 中存储的维度是 [T, N, H] ，这不是我们想要的维度顺序，因此使用 transpose() 函数来变换维度顺序。

其中 $h_0$ RNN 网络的初始状态。

def rnn_forward(x, h0, Wx, Wh, b):
    """
    Run a vanilla RNN forward on an entire sequence of data. We assume an input
    sequence composed of T vectors, each of dimension D. The RNN uses a hidden
    size of H, and we work over a minibatch containing N sequences. After running
    the RNN forward, we return the hidden states for all timesteps.

    Inputs:
    - x: Input data for the entire timeseries, of shape (N, T, D).
    - h0: Initial hidden state, of shape (N, H)
    - Wx: Weight matrix for input-to-hidden connections, of shape (D, H)
    - Wh: Weight matrix for hidden-to-hidden connections, of shape (H, H)
    - b: Biases of shape (H,)

    Returns a tuple of:
    - h: Hidden states for the entire timeseries, of shape (N, T, H).
    - cache: Values needed in the backward pass
    """
    h, cache = None, None
    ##############################################################################
    # TODO: Implement forward pass for a vanilla RNN running on a sequence of    #
    # input data. You should use the rnn_step_forward function that you defined  #
    # above. You can use a for loop to help compute the forward pass.            #
    ##############################################################################
    T = x.shape[1]
    prev_h = h0
    h, cache = [], []
    for t in range(T):
        next_h, next_cache = rnn_step_forward(x[:, t, :], prev_h, Wx, Wh, b)
        h.append(next_h)
        cache.append(next_cache)
        prev_h = next_h
    #! 注意上面的h输出的维度是[T, N, H]，因为最里面的循环是循环时间的，所以第一维是T
    h = np.array(h).transpose(1, 0, 2)
    ##############################################################################
    #                               END OF YOUR CODE                             #
    ##############################################################################
    return h, cache

rnn_backward()

这个函数是 RNN 的总体反向传播的函数，输入 x 加入了时间维度，需要嵌套调用rnn_step_backward() 函数。
里面需要注意的是在调用 rnn_step_backward() 函数时传入的 dnext_h 参数。dnext_h 是需要叠加从上面（也就是h，即总体的状态的梯度）和右边（next_h）传来的梯度，而不是单纯只传入下一个状态（next_h）的梯度，结合下面 RNN 网络 many-to-many 和 many-to-one 和 one-to-one 的网络示意图，当前时刻的状态有参与到所有状态的梯度的计算中，体现在dh[:, t, :]，也有传播到下一个状态next_h，再计算下一个状态的损失，所以要叠加h的和next_h的梯度

def rnn_backward(dh, cache):
    """
    Compute the backward pass for a vanilla RNN over an entire sequence of data.

    Inputs:
    - dh: Upstream gradients of all hidden states, of shape (N, T, H)

    Returns a tuple of:\
        
    - dx: Gradient of inputs, of shape (N, T, D)
    - dh0: Gradient of initial hidden state, of shape (N, H)
    - dWx: Gradient of input-to-hidden weights, of shape (D, H)
    - dWh: Gradient of hidden-to-hidden weights, of shape (H, H)
    - db: Gradient of biases, of shape (H,)
    """
    dx, dh0, dWx, dWh, db = None, None, None, None, None
    ##############################################################################
    # TODO: Implement the backward pass for a vanilla RNN running an entire      #
    # sequence of data. You should use the rnn_step_backward function that you   #
    # defined above. You can use a for loop to help compute the backward pass.   #
    ##############################################################################
    N, T, H = dh.shape
    D = cache[0][1].shape[0]
    ddh = np.zeros((N, H))
    dx, dWx, dWh, db = [], np.zeros((D, H)), np.zeros((H, H)), np.zeros(H)
    
    for t in range(T - 1, -1, -1):
        ddx, ddh, ddWx, ddWh, ddb = rnn_step_backward(dh[:, t, :] + ddh, cache[t]) #! 注意导数是叠加从上面和右边传来的梯度，结合笔记里的RNN的图，当前时刻的状态有参与到最终损失的计算中，体现在dh[:, t, :]，也有传播到下一个状态next_h，再计算下一个状态的损失，所以要叠加h的和next_h的梯度
        dx.append(ddx)
        dWx = dWx + ddWx #! 由于在每一个时间 t 用的都是相同的系数 Wx 和 Wh ，所以梯度应当加和起来
        dWh = dWh + ddWh
        db = db + ddb
    dh0 = ddh
    dx = np.array(dx[::-1]).transpose(1, 0, 2) #! 注意这里由于时间是从后向前传递的，因此排在 dx 列表前面的是后面时间的梯度，排在后面的是前面的时间的梯度，因此要用[::-1]来调换一下方向，而使用 for 循环得到的 dx 维度是 [T, N, H] ，因此还要使用 transpose 来调换一下方向
    ##############################################################################
    #                               END OF YOUR CODE                             #
    ##############################################################################
    return dx, dh0, dWx, dWh, db

word_embedding_forward()

word_embedding的知识我参考总结了一下下面这篇博文：https://blog.csdn.net/weixin_42214778/article/details/105182423

word_embedding功能描述

在把我们的输入送给RNN之前，我们还需要做一个额外的操作，那就是word embedding，word embedding 简单来说就是将一个单词表示成一个向量的形式，因为我们神经网络要训练，需要我们把扔进去的单词先转换成常见的矩阵形式，再丢给神经网络。这里需要实现的是 word_embedding_forward 和 word_embedding_backward 两个函数。

参数解释

这里的输入有：1、x：其维度是（N，T），我理解为共有 N 个样本，每个样本有 T 个时间过程，其元素内容就是单词在词表中的下标；2、W：这是一个词表，其维度是（V，D），我理解为是一张每个单词相应的矩阵形式表，共有 V 个单词，每个单词对应 D 个特征。
这里的输出就是输入单词的矩阵形式：维度是（N，T，D），N 个样本，T 个时间阶段，每个单词 D 个特征。

由于输入 x 的元素内容就是单词在词表中的下标，因此可以直接使用 W[x, : ] 来取出相应单词的矩阵表示。

def word_embedding_forward(x, W):
    """
    Forward pass for word embeddings. We operate on minibatches of size N where
    each sequence has length T. We assume a vocabulary of V words, assigning each
    to a vector of dimension D.

    Inputs:
    - x: Integer array of shape (N, T) giving indices of words. Each element idx
      of x muxt be in the range 0 <= idx < V.
    - W: Weight matrix of shape (V, D) giving word vectors for all words.

    Returns a tuple of:
    - out: Array of shape (N, T, D) giving word vectors for all input words.
    - cache: Values needed for the backward pass
    """
    out, cache = None, None
    ##############################################################################
    # TODO: Implement the forward pass for word embeddings.                      #
    #                                                                            #
    # HINT: This can be done in one line using NumPy's array indexing.           #
    ##############################################################################
    #* 个人理解是输入的 x 有 N 个样本，每个样本 T 个时间过程，其元素内容就是单词，W 是长度为 V 的词表，每个词对应 D 个特征
    out = W[x, :] #! 输出的就是词表中在 x 的元素表示的下标上的长度为 D 的特征————shape (N, T, D)
    cache = (W, x)
    ##############################################################################
    #                               END OF YOUR CODE                             #
    ##############################################################################
    return out, cache

word_embedding_backward()

由于 word_embedding 在前向传播的时候输出是 out = W[x, :] ，因此变量 out 在对 W 求导时只需要在 dW 矩阵上以 x 矩阵元素值为下标加上 dout 的值，因为 out 只依赖于 W 在特定位置（即 x 的元素所表示的下标）的值， out 对 W 求导之后系数是 1 ，所以只要在特定位置加上 dout 的值就行。

需要注意的时使用了一个函数 np.add.at(A, B, C)，其使用方式是在 A 中 B 下标的位置上加上 C 的值，注意 C 可能是一个矩阵，加上一个矩阵的值也不足为奇，因为 B 表示的下标位置也不一定是最后一维。

def word_embedding_backward(dout, cache):
    """
    Backward pass for word embeddings. We cannot back-propagate into the words
    since they are integers, so we only return gradient for the word embedding
    matrix.

    HINT: Look up the function np.add.at

    Inputs:
    - dout: Upstream gradients of shape (N, T, D)
    - cache: Values from the forward pass

    Returns:
    - dW: Gradient of word embedding matrix, of shape (V, D).
    """
    dW = None
    ##############################################################################
    # TODO: Implement the backward pass for word embeddings.                     #
    #                                                                            #
    # Note that Words can appear more than once in a sequence.                   #
    # HINT: Look up the function np.add.at                                       #
    ##############################################################################
    W, x = cache
    dW = np.zeros_like(W)
    np.add.at(dW, x, dout) #! 在 dW 矩阵上根据 x 矩阵作为下标加上 dout 的值，因为 out 只依赖于 W 在特定位置（即 x 的元素所表示的 W 的下标）的值， out 对 W 求导之后系数是 1 ，所以只要在特定位置加上 dout 的值就行 
    ##############################################################################
    #                               END OF YOUR CODE                             #
    ##############################################################################
    return dW

lstm_step_forward()

LSTM前向传播公式及流程示意图

⊙表示矩阵对应位置元素相乘，c 是 cell state，h 是每个时间阶段的状态，注意区分 c 和 h

这里图上 $W$ 能直接矩阵乘 $\begin{pmatrix}h_{t - 1}\\x_t\end{pmatrix}$ 是因为它直接整合了 $W_h$ 和 $W_x$ ，而下面列出的具体实现的公式是没有整合分开乘的，即实际的情况。依照下面公式写代码即可。

下面计算出来的 temp 在第 1 维（从第 0 维开始计算，例如二维矩阵里，行就是第 0 维，列就是第 1 维）上共有 4h 个特征，要分成四个部分并代入 sigmoid 或 tanh 激活函数进行计算得出 i、f、o、g 。

def lstm_step_forward(x, prev_h, prev_c, Wx, Wh, b):
    """
    Forward pass for a single timestep of an LSTM.

    The input data has dimension D, the hidden state has dimension H, and we use
    a minibatch size of N.

    Inputs:
    - x: Input data, of shape (N, D)
    - prev_h: Previous hidden state, of shape (N, H)
    - prev_c: previous cell state, of shape (N, H)
    - Wx: Input-to-hidden weights, of shape (D, 4H)
    - Wh: Hidden-to-hidden weights, of shape (H, 4H)
    - b: Biases, of shape (4H,)

    Returns a tuple of:
    - next_h: Next hidden state, of shape (N, H)
    - next_c: Next cell state, of shape (N, H)
    - cache: Tuple of values needed for backward pass.
    """
    next_h, next_c, cache = None, None, None
    #############################################################################
    # TODO: Implement the forward pass for a single timestep of an LSTM.        #
    # You may want to use the numerically stable sigmoid implementation above.  #
    #############################################################################
    N, H = prev_h
    temp = np.dot(x, Wx) + np.dot(prev_h, Wh) + b
    #* 将 temp 的 4H 特征分别分给四个门
    i = sigmoid(temp[:, 0:H]) 
    f = sigmoid(temp[:, H:2*H])
    o = sigmoid(temp[:, 2*H:3*H])
    g = np.tanh(temp[:, 3*H:4*H])
    next_c = f * prev_c + i * g
    next_h = o * np.tanh(next_c)
    
    cache = (i, f, o, g, x, Wx, Wh, prev_c, prev_h, next_c)
    ##############################################################################
    #                               END OF YOUR CODE                             #
    ##############################################################################

    return next_h, next_c, cache

lstm_step_backward()

tanh激活函数在本篇博文前面有叙述就不再重复，其函数形式和求导结果可移到前面去观看。下面介绍一下 sigmoid 激活函数。

sigmoid激活函数

其函数形式是：
$\frac{1}{1+e^{-x}}$ 该函数对 x 求导的结果是：
$sigmoid^{'}(x) = sigmoid * (1 - sigmoid)$
可以求导一下，最后结果和上式是一样的，但是用上式书写和计算都更加简便。

梯度计算

注意：在前向传播时 $temp = W_xx + W_hh + b$ 被分成四个部分分别来计算 i、f、o、g ，因此在利用 dtemp 来计算 dx、dh、dc、dWx、dWh、db 时要先计算出 i、f、o、g 的梯度，况且使用 di、df、do、dg 计算 dx、dh、dc、dWx、dWh、db 还要注意是一个含有 sigmoid 或 tanh 的复合函数。

下面是 LSTM 前向传播和反向传播的计算图，红色箭头是反向传播的方向。

其中 np.hstack() 函数用于在列方向上组合矩阵。

def lstm_step_backward(dnext_h, dnext_c, cache):
    """
    Backward pass for a single timestep of an LSTM.

    Inputs:
    - dnext_h: Gradients of next hidden state, of shape (N, H)
    - dnext_c: Gradients of next cell state, of shape (N, H)
    - cache: Values from the forward pass

    Returns a tuple of:
    - dx: Gradient of input data, of shape (N, D)
    - dprev_h: Gradient of previous hidden state, of shape (N, H)
    - dprev_c: Gradient of previous cell state, of shape (N, H)
    - dWx: Gradient of input-to-hidden weights, of shape (D, 4H)
    - dWh: Gradient of hidden-to-hidden weights, of shape (H, 4H)
    - db: Gradient of biases, of shape (4H,)
    """
    dx, dh, dc, dWx, dWh, db = None, None, None, None, None, None
    #############################################################################
    # TODO: Implement the backward pass for a single timestep of an LSTM.       #
    #                                                                           #
    # HINT: For sigmoid and tanh you can compute local derivatives in terms of  #
    # the output value from the nonlinearity.                                   #
    #############################################################################
    i, f, o, g, x, Wx, Wh, prev_c, prev_h, next_c = cache
    do = dnext_h * np.tanh(next_c)
    dnext_c += dnext_h * o * (1 - np.tanh(next_c)**2)
    di = dnext_c * g
    dg = dnext_c * i
    dprev_c = dnext_c * f
    df = dnext_c * prev_c
    #! d(sigmoid) = sigmoid * (1 - sigmoid)
    #! d(tanh) = 1 - tanh^2
    dtemp = np.hstack([di * i * (1 - i), df * f * (1 - f), do * o * (1 - o), dg * (1 - g**2)])
    dx = np.dot(dtemp, Wx.T)
    dprev_h = np.dot(dtemp, Wh.T)
    dWx = np.dot(x.T, dtemp)
    dWh = np.dot(prev_h.T, dtemp)
    db = np.sum(dtemp, 0)
    ##############################################################################
    #                               END OF YOUR CODE                             #
    ##############################################################################

    return dx, dprev_h, dprev_c, dWx, dWh, db

lstm_forward()

这个函数是 LSTM 的总体前向传播的函数，输入 x 加入了时间维度，需要嵌套调用lstm_step_forward() 函数。其输出 h 是一个数组，记录了 LSTM 网络每个时刻的状态。我们使用 for 循环模拟时间的推移，因此在数组 h 中存储的维度是 [T, N, H] ，这不是我们想要的维度顺序，因此使用 transpose() 函数来变换维度顺序。

def lstm_forward(x, h0, Wx, Wh, b):
    """
    Forward pass for an LSTM over an entire sequence of data. We assume an input
    sequence composed of T vectors, each of dimension D. The LSTM uses a hidden
    size of H, and we work over a minibatch containing N sequences. After running
    the LSTM forward, we return the hidden states for all timesteps.

    Note that the initial cell state is passed as input, but the initial cell
    state is set to zero. Also note that the cell state is not returned; it is
    an internal variable to the LSTM and is not accessed from outside.

    Inputs:
    - x: Input data of shape (N, T, D)
    - h0: Initial hidden state of shape (N, H)
    - Wx: Weights for input-to-hidden connections, of shape (D, 4H)
    - Wh: Weights for hidden-to-hidden connections, of shape (H, 4H)
    - b: Biases of shape (4H,)

    Returns a tuple of:
    - h: Hidden states for all timesteps of all sequences, of shape (N, T, H)
    - cache: Values needed for the backward pass.
    """
    h, cache = None, None
    #############################################################################
    # TODO: Implement the forward pass for an LSTM over an entire timeseries.   #
    # You should use the lstm_step_forward function that you just defined.      #
    #############################################################################
    T = x.shape[1]
    prev_h = h0
    prev_c = np.zeros_like(h0) #* c 是(N, H)的维度
    h, cache = [], []
    for t in range(T):
        next_h, next_c, temp_cache = lstm_step_forward(x[:, t, :], prev_h, prev_c, Wx, Wh, b)
        cache.append(temp_cache)
        h.append(next_h)
        prev_h = next_h
        prev_c = next_c
    h = np.array(h).transpose(1, 0, 2)
    ##############################################################################
    #                               END OF YOUR CODE                             #
    ##############################################################################

    return h, cache

lstm_backward()

其中的步骤和 rnn_backward() 差不多，要注意的也是在调用 lstm_step_backward() 函数时传入的 dnext_h 参数。dnext_h 是需要叠加从上面（也就是h，即总体的状态的梯度）和右边（next_h）传来的梯度，而不是单纯只传入下一个状态（next_h）的梯度，当前时刻的状态有参与到所有状态的梯度的计算中，体现在dh[:, t, :]，也有传播到下一个状态next_h，再计算下一个状态的损失，所以要叠加h的和next_h的梯度

def lstm_backward(dh, cache):
    """
    Backward pass for an LSTM over an entire sequence of data.]

    Inputs:
    - dh: Upstream gradients of hidden states, of shape (N, T, H)
    - cache: Values from the forward pass

    Returns a tuple of:
    - dx: Gradient of input data of shape (N, T, D)
    - dh0: Gradient of initial hidden state of shape (N, H)
    - dWx: Gradient of input-to-hidden weight matrix of shape (D, 4H)
    - dWh: Gradient of hidden-to-hidden weight matrix of shape (H, 4H)
    - db: Gradient of biases, of shape (4H,)
    """
    dx, dh0, dWx, dWh, db = None, None, None, None, None
    #############################################################################
    # TODO: Implement the backward pass for an LSTM over an entire timeseries.  #
    # You should use the lstm_step_backward function that you just defined.     #
    #############################################################################
    N, T, H = dh.shape
    D = cache[0][4].shape[1]
    ddh = np.zeros((N, H))
    ddc = np.zeros((N, H))
    dx = []
    dWx = np.zeros((N, 4*H))
    dWh = np.zeros((H, 4*H))
    db = np.zeros(4*H)
    for t in range(T - 1, -1, -1):
        ddx, ddh, ddc, ddWx, ddWh, ddb = lstm_step_backward(dh[:, t, :] + ddh, ddc, cache[t])
        dx.append(ddx)
        dWx = dWx + ddWx
        dWh = dWh + ddWh
        db = db + ddb
    dh0 = ddh
    dx = np.array(dx[::-1]).transpose(1, 0, 2)
    ##############################################################################
    #                               END OF YOUR CODE                             #
    ##############################################################################

    return dx, dh0, dWx, dWh, db

RNN、LSTM 网络示意图对比总结

RNN_Captioning.ipynb

这一个文件是需要我们运行观察结果的文件，里面没有需要写的代码。可以到我的GitHub仓库看运行结果。

LSTM_Captioning.ipynb

这一部分所需要用到的 LSTM 的函数在 rnn_layers.py 的部分中已经讲解完毕，可以移到上面去查看。这一文件中运行的结果可以到我的GitHub仓库查看。

net_visualization_pytorch.py

在这一部分我选择完成 pytorch 版本的，虽然 tensorflow 的出现时间更长，但是近几年 pytorch 的使用趋势急速增长，科研中使用 pytorch 也会更多些。

这一部分我使用的是2020版本的作业，由于我对pytorch不熟，实现上参考了这篇博文：2020 cs231n 作业3 笔记 NetworkVisualization-PyTorch

compute_saliency_maps()

saliency map介绍

saliency map可以理解为特征图，为了衡量图像中每个像素点的特征对分类结果的影响，可以先看一下这一部分运行出的结果，下面一排就是上面一排的 saliency map 。

saliency map实现

对输入 X 进行预测类别，得到分类结果 y_pred
计算损失 loss
求出输入 X 的梯度，取其在RGB 三个通道上的最大值的绝对值

def compute_saliency_maps(X, y, model):
    """
    Compute a class saliency map using the model for images X and labels y.

    Input:
    - X: Input images; Tensor of shape (N, 3, H, W)
    - y: Labels for X; LongTensor of shape (N,)
    - model: A pretrained CNN that will be used to compute the saliency map.

    Returns:
    - saliency: A Tensor of shape (N, H, W) giving the saliency maps for the input
    images.
    """
    # Make sure the model is in "test" mode
    model.eval()

    # Make input tensor require gradient
    X.requires_grad_()

    saliency = None
    ##############################################################################
    # TODO: Implement this function. Perform a forward and backward pass through #
    # the model to compute the gradient of the correct class score with respect  #
    # to each input image. You first want to compute the loss over the correct   #
    # scores (we'll combine losses across a batch by summing), and then compute  #
    # the gradients with a backward pass.                                        #
    ##############################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    X.requires_grad_(True)
    model.zero_grad()
    loss_fun = torch.nn.CrossEntropyLoss()
    y_pred = model(X)
    loss = loss_fun(y_pred, y)
    loss.requires_grad_(True)
    loss.backward()
    saliency,_ = X.grad.abs().max(axis=1)

    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    ##############################################################################
    #                             END OF YOUR CODE                               #
    ##############################################################################
    return saliency

make_fooling_image()

函数目标

这一部分主要是要实现对于输入的图片 X ，给定目标类别 target_y ，输出和 X 很像的 X_fooling 使得 X_fooling 在目标类别上的得分最高（即 loss 最低）。

实现思路

在实现上可以简要的概括为：初始化 X_fooling = X.clone() ，然后对 X_fooling 进行梯度下降使其在 target_y 上得分最高。

注意：在官方给出的函数说明中，在计算更新步长时，首先对梯度进行归一化，公式如下：
$update\_step = \frac{learning\_rate * dx}{\|dx\|^2}$

def make_fooling_image(X, target_y, model):
    """
    Generate a fooling image that is close to X, but that the model classifies
    as target_y.

    Inputs:
    - X: Input image; Tensor of shape (1, 3, 224, 224)
    - target_y: An integer in the range [0, 1000)
    - model: A pretrained CNN

    Returns:
    - X_fooling: An image that is close to X, but that is classifed as target_y
    by the model.
    """
    # Initialize our fooling image to the input image, and make it require gradient
    X_fooling = X.clone()
    X_fooling = X_fooling.requires_grad_()

    learning_rate = 1
    ##############################################################################
    # TODO: Generate a fooling image X_fooling that the model will classify as   #
    # the class target_y. You should perform gradient ascent on the score of the #
    # target class, stopping when the model is fooled.                           #
    # When computing an update step, first normalize the gradient:               #
    #   dX = learning_rate * g / ||g||_2                                         #
    #                                                                            #
    # You should write a training loop.                                          #
    #                                                                            #
    # HINT: For most examples, you should be able to generate a fooling image    #
    # in fewer than 100 iterations of gradient ascent.                           #
    # You can print your progress over iterations to check your algorithm.       #
    ##############################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    model.eval()
    X_fooling.requires_grad_()
    iters = 100
    y = torch.LongTensor([target_y])
    loss_fun = torch.nn.CrossEntropyLoss()
    for i in range(iters):
        print("第" + str(i) + "次迭代")
        score = model(X_fooling)
        print("score", score.argmax(axis = 1)) #* 输出分数最高的类
        print('y', y) #* 输出真实类
        if score.argmax(axis = 1) == y:
            break
        loss = loss_fun(score, y)
        loss.requires_grad_()
        model.zero_grad()
        loss.backward()
        with torch.no_grad():
            g = X_fooling.grad
            dX_fooling = learning_rate * g / torch.norm(g)
            X_fooling -= dX_fooling

    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    ##############################################################################
    #                             END OF YOUR CODE                               #
    ##############################################################################
    return X_fooling

class_visualization_update_step()

函数作用

给出图片 img 和目标分类 targe_y ，对 img 进行梯度上升从而增加 img 在目标类别上的得分。

实现细节

由于需要让目标类别的分数最高，因此我们不是像之前用 loss 来对参数求梯度，再使用梯度下降来降低 loss ，而是使用目标类别的分数 sy 对参数求梯度，再使用梯度上升。

区别就在于计算了分数之后，为了使得分数更高就用了梯度上升，之前是计算loss，为了使loss更低所以用了梯度下降

还要注意一点是官方的函数实现说明中写明梯度上升步长是 img 的梯度以及 img 的 L2 正则项。

def class_visualization_update_step(img, model, target_y, l2_reg, learning_rate):
    ########################################################################
    # TODO: Use the model to compute the gradient of the score for the     #
    # class target_y with respect to the pixels of the image, and make a   #
    # gradient step on the image using the learning rate. Don't forget the #
    # L2 regularization term!                                              #
    # Be very careful about the signs of elements in your code.            #
    ########################################################################
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    model.eval()  
    img.requires_grad_()
    score = model(img)
    sy = score[:, target_y] #! 取出每个样本在目标类别上的得分
    sy.requires_grad_()
    model.zero_grad()
    sy.backward()
    dimg = img.grad + 2 * l2_reg * img #! 加上正则项
    with torch.no_grad():
        dimg /= torch.norm(dimg)
        img += learning_rate * dimg #! 由于是要让在目标类别上的得分最高，使用的是梯度上升，所以是 += 的符号

    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    ########################################################################
    #                             END OF YOUR CODE                         #
    ########################################################################

NetworkVisualization.ipynb

这一个文件是需要我们运行观察结果的文件，里面没有需要写的代码。可以到我的GitHub仓库看。

StyleTransfer-PyTorch.ipynb

这一个文件是用来实现风格转换的，我选择的pytorch版本来实现，主要参考的是2020 cs231n 作业3 笔记 StyleTransfer-PyTorch，里面运行结果可以到我的GitHub仓库看。

这一部分的损失函数 loss 主要由下面三个部分组成：content loss + style loss + total variation loss

content_loss()

这个函数是用于计算原图片和生成的图片之间像素的差距，使用的是之前定义的 extract_features() 函数提取的卷积层获取的 feature map 的特征进行差距的计算。参考上面提到的博文中对于 feature map 的一段解释：

content loss 的计算公式是： $L_c = w_c \times \sum_{i,j} (F_{ij}^{\ell} - P_{ij}^{\ell})^2$ 其中 $w_c$ 是损失权重， $\ell$ 是通道层，对于 content loss 所有的通道都要参与激素， $F$ 是生成的图片， $P$ 是原来的图片。

给出的输入参数有：content_loss：是损失权重，即 $W_c$ content_current 是生成的图片，即 $F$ ；content_original 是原图片，即 $P$ 。

def content_loss(content_weight, content_current, content_original):
  """
  Compute the content loss for style transfer.
  
  Inputs:
  - content_weight: Scalar giving the weighting for the content loss.
  - content_current: features of the current image; this is a PyTorch Tensor of shape
    (1, C_l, H_l, W_l).
  - content_target: features of the content image, Tensor with shape (1, C_l, H_l, W_l).
  
  Returns:
  - scalar content loss
  """
  N, C, H, W = content_current.shape
  Lc = content_weight * (content_current - content_original).pow(2).sum()
  return Lc

gram_matrix()

这一个函数是用来为计算 style_loss 做准备的，需要计算gram矩阵，gram矩阵可以认为是协方差矩阵的近似值，表示的是feature map每个通道（channel）之间的联系（也就是风格）。因为我们希望我们生成的图像的激活统计信息与我们的风格图像的激活统计信息相匹配，匹配（近似）协方差是一种方法，有多种方法可以做到这一点，但 Gram 矩阵很好，因为它易于计算并且在实践中显示出良好的结果。

给定 features （即 feature map ），输入的 features 维度为 $(N, C, H, W)$ ，转换为 $(N, C, M)$ ，其中 $M = H * W$ ，则输出的G维度为 $(N, C, C)$ 。官方给出的计算公式是：
Given a feature map $F^\ell$ of shape $C_\ell, M_\ell)$ , the Gram matrix has shape $C_\ell, C_\ell)$ and its elements are given by:
$G_{ij}^\ell = \sum_k F^{\ell}_{ik} F^{\ell}_{jk}$
其中 $\ell$ 是表示第几层的 feature map ，其实可以忽略，将维度转换为正确的 $(N, C, M)$ 后使用矩阵乘法自己乘自己就行了。

def gram_matrix(features, normalize=True):
  """
  Compute the Gram matrix from features.
  
  Inputs:
  - features: PyTorch Variable of shape (N, C, H, W) giving features for
    a batch of N images.
  - normalize: optional, whether to normalize the Gram matrix
      If True, divide the Gram matrix by the number of neurons (H * W * C)
  
  Returns:
  - gram: PyTorch Variable of shape (N, C, C) giving the
    (optionally normalized) Gram matrices for the N input images.
  """
  #! 计算gram矩阵，具体公式看上面那段介绍中 G_{ij}^l 的那行
  N, C, H, W = features.shape
  F = features.view(N, C, H * W) #! 把后面两个相乘即 H * W 后
  F_t = F.permute(0, 2, 1) #! 计算矩阵 F 的转置
  gram = torch.matmul(F, F_t)
  if normalize:
    gram /= (H * W * C)
  return gram

style_loss()

style loss 的计算公式是：
$L_s^\ell = w_\ell \sum_{i, j} \left(G^\ell_{ij} - A^\ell_{ij}\right)^2$
其中： $\ell$ 的意义和之前一样，代表第 $\ell$ 层的 feature map； $G$ 是生成的图片第 $\ell$ 层的 feature map 的 gram matrix ， $A$ 是原来的图片第 $\ell$ 层的 feature map 的 gram matrix 。

注意：在实践中，我们通常在一组层 $\mathcal{L}$ 而不是单层 $\ell$ 上计算样式损失，因此总的风格损失是每一层的风格损失的总和，即总的损失 $L_s$ ：
$L_s = \sum_{\ell \in \mathcal{L}} L_s^\ell$

# Now put it together in the style_loss function...
def style_loss(feats, style_layers, style_targets, style_weights):
  """
  Computes the style loss at a set of layers.
  
  Inputs:
  - feats: list of the features at every layer of the current image, as produced by
    the extract_features function.
  - style_layers: List of layer indices into feats giving the layers to include in the
    style loss.
  - style_targets: List of the same length as style_layers, where style_targets[i] is
    a PyTorch Variable giving the Gram matrix the source style image computed at
    layer style_layers[i].
  - style_weights: List of the same length as style_layers, where style_weights[i]
    is a scalar giving the weight for the style loss at layer style_layers[i].
    
  Returns:
  - style_loss: A PyTorch Variable holding a scalar giving the style loss.
  """
  # Hint: you can do this with one for loop over the style layers, and should
  # not be very much code (~5 lines). You will need to use your gram_matrix function.
  style_current = []
  style_loss = 0
  for i, idx in enumerate(style_layers):
    style_current.append(gram_matrix(feats[idx].clone())) #! 使用 gram matrix 表示的是 feature map 每个通道（channel）之间的联系（也就是风格），可以直接看作风格矩阵，表示第 idx 层的特征列表是从 feats[idx] 取出，根据这个特征列表的特征从风格矩阵中取出这些风格
    style_loss += style_weights[i] * torch.sum((style_current[i] - style_targets[i])**2)
  return style_loss

tv_loss()

为了提高图像的平滑度，使用 tv_loss() 在我们的损失中添加一个项来惩罚像素值的摆动来做到这一点，即去噪。

实现思路：可以计算彼此相邻（水平或垂直）的所有像素对的像素值差的平方和。在这里，我们将 3 个输入通道 (RGB) 中的每一个的总变化正则化相加，并通过总变化权重 $w_t$ 对总和损失进行加权，具体公式为：

$L_{tv} = w_t \times \sum_{c=1}^3\sum_{i=1}^{H-1} \sum_{j=1}^{W-1} \left( (x_{i,j+1, c} - x_{i,j,c})^2 + (x_{i+1, j,c} - x_{i,j,c})^2 \right)$

这里的下标是从 1 开始的，在写代码时要注意从 0 开始。

def tv_loss(img, tv_weight):
  """
  Compute total variation loss.
  
  Inputs:
  - img: PyTorch Variable of shape (1, 3, H, W) holding an input image.
  - tv_weight: Scalar giving the weight w_t to use for the TV loss.
  
  Returns:
  - loss: PyTorch Variable holding a scalar giving the total variation loss
    for img weighted by tv_weight.
  """
  # Your implementation should be vectorized and not require any loops!
  #* total variation loss可以使图像变得平滑。信号处理中，总变差去噪，也称为总变差正则化，是最常用于数字图像处理的过程，其在噪声去除中具有应用。
  N, C, H, W = img.shape
  x1 = img[:, :, 0:H-1, :]
  x2 = img[:, :, 1:H, :]
  x3 = img[:, :, :, 0:W-1]
  x4 = img[:, :, :, 1:W]
  loss = tv_weight * ((x4 - x3).pow(2).sum() + (x2 - x1).pow(2).sum())
  return loss

GANs-PyTorch.ipynb / Generative_Adversarial_Networks_PyTorch.ipynb

这一部分主要参考了2020 cs231n 作业3 笔记 Generative_Adversarial_Networks_PyTorch，里面运行结果可以到我的GitHub仓库看。

GAN网络简单总结

GAN网络由两大部分组成：生成器 Generator 和判别器 Discriminator ，其中生成器尽力生成完美的图片来让判别器把它认成真的，即让得分尽量接近 1 ；而判别器就是尽量判别出我们输入的真图片使其分数尽量接近 1 ，判别出生成器生成的假图片使其分数尽量接近 0，最后就可以使用生成器来生成我们想要的图片。我们的训练目标可以总结成公式如下：（以下为官方解释）
We can think of this back and forth process of the generator ( $G$ ) trying to fool the discriminator ( $D$ ), and the discriminator trying to correctly classify real vs. fake as a minimax game:
$\underset{G}{\text{minimize}}\; \underset{D}{\text{maximize}}\; \mathbb{E}_{x \sim p_\text{data}}\left[\log D(x)\right] + \mathbb{E}_{z \sim p(z)}\left[\log \left(1-D(G(z))\right)\right]$

我们会让生成器和迭代器来交替更新（免得其中一个拖后腿造成另一个也没法进步，错误的认为当前就是最好的情况），其步骤以及各自的目标函数是：

更新generator (G) 以最大化discriminator做出错误分类的概率：
$\underset{G}{\text{maximize}}\; \mathbb{E}_{z \sim p(z)}\left[\log D(G(z))\right]$
更新discriminator (D) 以最大化discriminator做出正确分类的概率：
$\underset{D}{\text{maximize}}\; \mathbb{E}_{x \sim p_\text{data}}\left[\log D(x)\right] + \mathbb{E}_{z \sim p(z)}\left[\log \left(1-D(G(z))\right)\right]$

总结GAN的训练算法

在每一个训练迭代期，我们都要先训练判别器网络然后才是生成器网络，对于判别器网络的 k 个训练步我们将会从噪声先验分布 z 中采样得到一个小批量样本，接着同样从训练数据 x 中采样获得小批量的真实样本，然后将噪声样本传给生成器网络，并在生成器的输出端获得伪造图像，也就是我们有一个小批量的伪造图像和小批量的真实图像，然后我们会用这些真假小批量数据在判别器上进行一次梯度计算，接下来利用梯度信息更新判别器参数，按照这样的步骤迭代一定次数来训练一会判别器。在这之后执行第二步也就是训练生成器，在这一步我们会采样得到一个小批量的噪声样本把他传入生成器，然后对生成器进行梯度计算，从而优化其目标函数。要交替执行上述两个步骤，也就是交替地在生成器和判别器上计算梯度，要努力平衡两个网络。

sample_noise()

由于生成器G是从以噪声为初始数据来生成图片，因此这个函数是用来生成随机噪音作为G的初始数据的。需要生成 [-1, 1] 之间shape为 [batch_size, dim] 的数据。

def sample_noise(batch_size, dim):
  """
  Generate a PyTorch Tensor of uniform random noise.

  Input:
  - batch_size: Integer giving the batch size of noise to generate.
  - dim: Integer giving the dimension of noise to generate.
  
  Output:
  - A PyTorch Tensor of shape (batch_size, dim) containing uniform
    random noise in the range (-1, 1).
  """
  ##############################################################################
  # TODO: Implement architecture                                               #
  #                                                                            #
  # HINT: nn.Sequential might be helpful.                                      #
  ##############################################################################
  # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
  
  #* 注意生成器G是以噪声为初始数据来生成图片的   
  out = torch.rand(batch_size, dim)
  out = 2 * out - 1 #! 限定输出随机数的范围是 [-1, 1]
  return out

  # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
  ##############################################################################
  #                               END OF YOUR CODE                             #
  ##############################################################################

discriminator()

判别器D的网络结构是：

Fully connected layer from size 784 to 256
LeakyReLU with alpha 0.01
Fully connected layer from 256 to 256
LeakyReLU with alpha 0.01
Fully connected layer from 256 to 1

def discriminator():
    """
    Build and return a PyTorch model implementing the architecture above.
    """
    model = nn.Sequential(
        ##############################################################################
        # TODO: Implement architecture                                               #
        #                                                                            #
        # HINT: nn.Sequential might be helpful.                                      #
        ##############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        
        Flatten(),
        nn.Linear(784, 256),
        nn.LeakyReLU(0.01),
        nn.Linear(256, 256),
        nn.LeakyReLU(0.01),
        nn.Linear(256, 1),

        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ##############################################################################
        #                               END OF YOUR CODE                             #
        ##############################################################################
    )
    return model

generator()

生成器G的网络结构是：

Fully connected layer from noise_dim to 1024
ReLU
Fully connected layer with size 1024
ReLU
Fully connected layer with size 784
TanH (To clip the image to be [-1,1])

def generator(noise_dim=NOISE_DIM):
    """
    Build and return a PyTorch model implementing the architecture above.
    """
    model = nn.Sequential(
        ##############################################################################
        # TODO: Implement architecture                                               #
        #                                                                            #
        # HINT: nn.Sequential might be helpful.                                      #
        ##############################################################################
        # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        
        nn.Linear(noise_dim, 1024),
        nn.ReLU(),
        nn.Linear(1024, 1024),
        nn.ReLU(),
        nn.Linear(1024, 784),
        nn.Tanh(),

        # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
        ##############################################################################
        #                               END OF YOUR CODE                             #
        ##############################################################################
    )
    return model

bce_loss()

使用 bce_loss 函数来计算二进制交叉熵损失，这是在给定预测输出后计算其预测损失的函数，用在判别器D和生成器G中。给定分数 $s\in\mathbb{R}$ 和标签 $y\in\{0, 1\}$ ，二元交叉熵损失为： $\log(s) + (1 - y) * \log(1 - s)$

def bce_loss(input, target):
    """
    Numerically stable version of the binary cross-entropy loss function.

    As per https://github.com/pytorch/pytorch/issues/751
    See the TensorFlow docs for a derivation of this formula:
    https://www.tensorflow.org/api_docs/python/tf/nn/sigmoid_cross_entropy_with_logits

    Inputs:
    - input: PyTorch Variable of shape (N, ) giving scores.
    - target: PyTorch Variable of shape (N,) containing 0 and 1 giving targets.

    Returns:
    - A PyTorch Variable containing the mean BCE loss over the minibatch of input data.
    """
    neg_abs = - input.abs()
    loss = input.clamp(min=0) - input * target + (1 + neg_abs.exp()).log()
    return loss.mean()

discriminator_loss()

判别器D的损失计算公式定义如下：
$\ell_D = -\mathbb{E}_{x \sim p_\text{data}}\left[\log D(x)\right] - \mathbb{E}_{z \sim p(z)}\left[\log \left(1-D(G(z))\right)\right]$
其中 $x$ 是我们输入的真实图片， $z$ 是初始的假图片， $G (z)$ 是生成器生成的假图片， $D ()$ 是判别器D对真假图片进行判别后的分数。

def discriminator_loss(logits_real, logits_fake):
    """
    Computes the discriminator loss described above.
    
    Inputs:
    - logits_real: PyTorch Variable of shape (N,) giving scores for the real data.
    - logits_fake: PyTorch Variable of shape (N,) giving scores for the fake data.
    
    Returns:
    - loss: PyTorch Variable containing (scalar) the loss for the discriminator.
    """       
    
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    real = torch.ones_like(logits_real).type(dtype) #! 对于真数据的理想标签应该是 1
    fake = torch.zeros_like(logits_fake).type(dtype) #! 对于假数据的理想标签应该是 0
    real_loss = bce_loss(logits_real, real) #! 判别器判断真数据时的 loss
    fake_loss = bce_loss(logits_fake, fake) #! 判别器判断假数据时的 loss
    loss = real_loss + fake_loss
    
    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    return loss

generator_loss()

生成器G的损失计算公式为： $\ell_G = -\mathbb{E}_{z \sim p(z)}\left[\log D(G(z))\right]$ 其中 $z$ 是初始的假图片， $G (z)$ 是生成器生成的假图片， $D ()$ 是判别器D对真假图片进行判别后的分数。

def generator_loss(logits_fake):
    """
    Computes the generator loss described above.

    Inputs:
    - logits_fake: PyTorch Variable of shape (N,) giving scores for the fake data.
    
    Returns:
    - loss: PyTorch Variable containing the (scalar) loss for the generator.
    """
    
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    fake = torch.ones_like(logits_fake).type(dtype) #! 全 1 目标分数矩阵，用于计算损失
    loss = bce_loss(logits_fake, fake) #! 生成器的损失定义为判别器给出生成器生成的假图片的分数与 1 的差距，其中 1 是判别器给出百分之百确定是真实图片的分数
    
    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    
    return loss

get_optimizer()

这个函数用来定义优化器，使用的是Adam优化算法。

def get_optimizer(model):
    """
    Construct and return an Adam optimizer for the model with learning rate 1e-3,
    beta1=0.5, and beta2=0.999.
    
    Input:
    - model: A PyTorch model that we want to optimize.
    
    Returns:
    - An Adam optimizer for the model with the desired hyperparameters.
    """
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    optimizer = optim.Adam(model.parameters(), lr=1e-3, betas=(0.5, 0.999))
    
    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    return optimizer

ls_discriminator_loss()

Least Squares GAN 是使用另一种计算 GAN 的损失的计算方法。它是原始 GAN 损失函数的更新、更稳定的替代方案，对于这一部分，我们所要做的就是改变损失函数并重新训练模型。
判别器D的损失计算定义如下：
$\ell_D = \frac{1}{2}\mathbb{E}_{x \sim p_\text{data}}\left[\left(D(x)-1\right)^2\right] + \frac{1}{2}\mathbb{E}_{z \sim p(z)}\left[ \left(D(G(z))\right)^2\right]$

def ls_discriminator_loss(scores_real, scores_fake):
    """
    Compute the Least-Squares GAN loss for the discriminator.
    
    Inputs:
    - scores_real: PyTorch Variable of shape (N,) giving scores for the real data.
    - scores_fake: PyTorch Variable of shape (N,) giving scores for the fake data.
    
    Outputs:
    - loss: A PyTorch Variable containing the loss.
    """
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

    real_loss = 0.5 * (scores_real - 1).pow(2)
    fake_loss = 0.5 * scores_fake.pow(2)
    real_loss = real_loss.mean()
    fake_loss = fake_loss.mean()
    loss = real_loss + fake_loss
 
    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    
    return loss

ls_generator_loss（）

生成器G的损失计算定义如下：
$\ell_G = \frac{1}{2}\mathbb{E}_{z \sim p(z)}\left[\left(D(G(z))-1\right)^2\right]$

def ls_generator_loss(scores_fake):
    """
    Computes the Least-Squares GAN loss for the generator.
    
    Inputs:
    - scores_fake: PyTorch Variable of shape (N,) giving scores for the fake data.
    
    Outputs:
    - loss: A PyTorch Variable containing the loss.
    """
    # *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
 
    loss = 0.5 * (scores_fake - 1).pow(2)
    loss = loss.mean()
 
    # *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****
    return loss

build_dc_classifiebuild_dc_classifier()

这一部分我们实现的是 Deeply Convolutional GANs ，在之前的部分中，我们实现了 Ian Goodfellow 原始 GAN 网络。然而，这种网络架构不允许真正的空间推理。它通常无法推理诸如“锐边”之类的东西，因为它缺少任何卷积层。因此，在本节中，我们将实现 DCGAN 中的一些想法，即实现深度卷积GAN。

我们将使用受 TensorFlow MNIST 分类教程启发的判别器，它能够相当快地在 MNIST 数据集上达到 99% 以上的准确率。判别器D的网络结构定义为：

Reshape into image tensor (Use Unflatten!)
32 Filters, 5x5, Stride 1, Leaky ReLU(alpha=0.01)
Max Pool 2x2, Stride 2
64 Filters, 5x5, Stride 1, Leaky ReLU(alpha=0.01)
Max Pool 2x2, Stride 2
Flatten
Fully Connected size 4 x 4 x 64, Leaky ReLU(alpha=0.01)
Fully Connected size 1

def build_dc_classifier(batch_size):
    """
    Build and return a PyTorch model for the DCGAN discriminator implementing
    the architecture above.
    """
    return nn.Sequential(
        
        Unflatten(batch_size, 1, 28, 28),
        ###########################
        ######### TO DO ###########
        nn.Conv2d(1, 32, 5),
        nn.LeakyReLU(0.01),
        nn.MaxPool2d(2,2),
        nn.Conv2d(32, 64, 5),
        nn.LeakyReLU(0.01),
        nn.MaxPool2d(2,2),
        Flatten(),
        nn.Linear(64*4*4, 4*4*64),
        nn.LeakyReLU(0.01),
        nn.Linear(64*4*4, 1),
        ###########################
        
    )

build_dc_generator()

生成器G的网络结构定义为：

Fully connected with output size 1024
ReLU
BatchNorm
Fully connected with output size 7 x 7 x 128
ReLU
BatchNorm
Reshape into Image Tensor of shape 7, 7, 128
Conv2D^T (Transpose): 64 filters of 4x4, stride 2, ‘same’ padding (use padding=1)
ReLU
BatchNorm
Conv2D^T (Transpose): 1 filter of 4x4, stride 2, ‘same’ padding (use padding=1)
TanH
Should have a 28x28x1 image, reshape back into 784 vector

def build_dc_generator(noise_dim=NOISE_DIM):
    """
    Build and return a PyTorch model implementing the DCGAN generator using
    the architecture described above.
    """
    return nn.Sequential(
        ###########################
        ######### TO DO ###########
        nn.Linear(noise_dim, 1024),
        nn.ReLU(),
        nn.BatchNorm1d(1024), #* 里面那个参数是特征维度
        nn.Linear(1024, 7*7*128),
        nn.ReLU(),
        nn.BatchNorm1d(7*7*128),
        Unflatten(-1, 128, 7, 7),
        nn.ConvTranspose2d(128, 64, kernel_size=4, stride=2, padding=1), #! https://zhuanlan.zhihu.com/p/48501100 上采样方法之一————反卷积的介绍
        nn.ReLU(),
        nn.BatchNorm2d(64),
        nn.ConvTranspose2d(64, 1, kernel_size=4, stride=2, padding=1),
        nn.Tanh(),
        Flatten(),
        
        ###########################
    )

你可能感兴趣的:(cs231n,机器学习,人工智能)

“租赁业务ERP+deepseek”模式的应用软件研究员汽车 DeepSeek 汽车租赁系统
汽车租赁业务从上世纪90年代发展至今，从传统的人工管理到软件辅助，随着互联网的发展，业务公司对汽车租赁系统提出了更高的要求，比如自助订单，业务推广、客户资质评估，车辆风控，风险预警等，又随着近期人工智能的出现，业务公司对业务系统的期望更高，期望都节约更多人工成本，让管理变得简单快捷高效和智能。所以就引发人们新的启发：“业务系统ERP+deepseek”，但业务系统ERP+deepseek能否满足业
机器学习——分类、回归、聚类、LASSO回归、Ridge回归（自用）代码的建筑师模型学习模型训练机器学习机器学习分类回归正则化项 LASSO Ridge 朴素
纠正自己的误区：机器学习是一个大范围，并不是一个小的方向，比如：线性回归预测、卷积神经网络和强化学都是机器学习算法在不同场景的应用。机器学习最为关键的是要有数据，也就是数据集名词解释：数据集中的一行叫一条样本或者实例，列名称为特征或者属性。样本的数量称为数据量，特征的数量称为特征维度机器学习常用库：Numpy和sklearn朴素的意思是特征的各条件都是相互独立的机器学习（模型、策略、算法）损失函数
量化交易系统中如何处理机器学习模型的训练和部署？ openwin_top 量化交易系统开发机器学习人工智能量化交易
microPythonPython最小内核源码解析NI-motion运动控制c语言示例代码解析python编程示例系列python编程示例系列二python的Web神器Streamlit如何应聘高薪职位量化交易系统中，机器学习模型的训练和部署需要遵循一套严密的流程，以确保模型的可靠性、性能和安全性。以下是详细描述以及相关的示例：1.数据收集和预处理数据收集在量化交易中，数据是最重要的资产。收集的数
不懂英语可以学编程吗?,不懂英文可以学编程吗 P5688346 人工智能
大家好，给大家分享一下英语不好能学python编程吗，很多人还不知道这一点。下面详细解释一下。现在让我们来看看！Sourcecodedownload:本文相关源码提到人工智能，就不得不提Python编程语言，大多数人觉得编程语言肯定会涉及到很多代码，满屏的英文字母，想想就头疼，觉得自己不会英语，肯定学不好Python，但是不会英语到底能不能够学习Python呢，下面小编给大家分析分析。其实各位想要
【深度学习与大模型基础】第7章-特征分解与奇异值分解 lynn-66 深度学习与大模型基础算法机器学习人工智能
一、特征分解特征分解（EigenDecomposition）是线性代数中的一种重要方法，广泛应用于计算机行业的多个领域，如机器学习、图像处理和数据分析等。特征分解将一个方阵分解为特征值和特征向量的形式，帮助我们理解矩阵的结构和性质。1.特征分解的定义对于一个n×n的方阵A，如果存在一个非零向量v和一个标量λ，使得：则称λ为矩阵A的特征值，v为对应的特征向量。特征分解将矩阵A分解为：其中：Q是由特征
《当人工智能遇上广域网：跨越地理距离的通信变革》程序猿阿伟人工智能
在数字化时代，广域网作为连接全球信息的纽带，让数据能够在不同地区的网络之间流动。然而，地理距离给广域网数据传输带来诸多挑战，如高延迟、低带宽、信号衰减和不稳定等问题。幸运的是，飞速发展的人工智能技术为解决这些难题提供了新的方向，开启了广域网传输的新篇章。广域网传输面临的地理挑战广域网覆盖范围极为广泛，可连接不同城市、国家甚至跨越洲际，这使得数据传输要跨越漫长的地理距离。以跨国公司的广域网为例，其总
【论文阅读】Persistent Homology Captures the Generalization of Neural Networks Without A Validation Set 开心星人论文阅读论文阅读
将神经网络表征为加权的无环图，直接根据模型的权重矩阵构造PD。计算相邻batch的权重矩阵PD之间的距离。比较同调收敛性与神经网络的验证精度变化趋势摘要机器学习从业者通常通过监控模型的某些指标来估计其泛化误差，并在训练数值收敛之前停止训练，以防止过拟合。通常，这种误差度量或任务相关的指标是通过一个验证集（holdoutset）来计算的。因为这些数据没有直接用于更新模型参数，通常假设模型在验证集上的
震惊！ “深度学习”都在学习什么扉间798 深度学习学习人工智能
常见的机器学习分类算法俗话说三个臭皮匠胜过诸葛亮这里面集成学习就是将单一的算法弱弱结合算法融合用投票给特征值加权重AdaBoost集成学习算法通过迭代训练一系列弱分类器，给予分类错误样本更高权重，使得后续弱分类器更关注这些样本，然后将这些弱分类器线性组合成强分类器，提高整体分类性能。（一）投票机制投票是一种直观且常用的算法融合策略。在多分类问题中，假设有多个分类器对同一数据进行分类判断。每个分类器
【论文阅读】Availability Attacks Create Shortcuts 开心星人论文阅读论文阅读
还得重复读这一篇论文，有些地方理解不够透彻可用性攻击通过在训练数据中添加难以察觉的扰动，使数据无法被机器学习算法利用，从而防止数据被未经授权地使用。例如，一家私人公司未经用户同意就收集了超过30亿张人脸图像，用于构建商业人脸识别模型。为解决这些担忧，许多数据投毒攻击被提出，以防止数据被未经授权的深度模型学习。它们通过在训练数据中添加难以察觉的扰动，使模型无法从数据中学习太多信息，从而导致模型在未见
NLP高频面试题（十）——目前常见的几种大模型架构是啥样的 Chaos_Wang_ NLP常见面试题自然语言处理架构人工智能
深入浅出：目前常见的几种大模型架构解析随着Transformer模型的提出与发展，语言大模型迅速崛起，已经成为人工智能领域最为关注的热点之一。本文将为大家详细解析几种目前常见的大模型架构，帮助读者理解其核心差异及适用场景。1.什么是LLM（大语言模型）？LLM通常指参数量巨大、能够捕捉丰富语义信息的Transformer模型，它们通过海量的文本数据训练而成，能够实现高度逼真的文本生成、复杂的语言理
机器学习 Day01人工智能概述山北雨夜漫步机器学习人工智能
1.什么样的程序适合在gpu上运行计算密集型的程序：此类程序主要运算集中在寄存器，寄存器读写速度快，而GPU拥有强大的计算能力，能高效处理大量的寄存器运算，因此适合在GPU上运行。像科学计算中的数值模拟、密码破解等场景的程序，都属于计算密集型，在GPU上运行可大幅提升运算速度。易于并行的程序：GPU采用SIMD架构，有众多核心，同一时间每个核心适合做相同的事。易于并行的程序能充分利用GPU这一特性
《今日AI-人工智能-编程日报》-源自2025年3月20日小亦编辑部每日AI-人工智能-编程日报人工智能大数据
一、AI行业动态英伟达新一代AI芯片Rubin发布计划英伟达宣布其新一代AI芯片Rubin将于2026年下半年推出，下下一代AI芯片架构命名为Feynman，计划于2028年登场。同时，英伟达还推出了RTXPRO6000系列Blackwell专业卡，拥有24064核心、96GB显存和最高600W功耗。OpenAI星际之门数据中心建设进展OpenAI的首个数据中心“星际之门”预计于2026年中在德克
机器学习：让计算机学会思考的艺术平凡而伟大. 机器学习机器学习人工智能
目录什么是机器学习？机器学习的基本步骤常见的机器学习算法机器学习的实际应用如何入门机器学习？结语在当今数字化时代，机器学习（MachineLearning,ML）已经成为一个炙手可热的话题。从推荐系统到自动驾驶汽车，再到语音助手，机器学习的应用无处不在。然而，对于许多人来说，机器学习仍然是一个神秘而复杂的领域。本文将用通俗易懂的语言，带你走进机器学习的世界，了解它的基本原理和应用。什么是机器学习？
机器学习中的 K-均值聚类算法及其优缺点平凡而伟大. 机器学习机器学习算法均值算法
K-均值聚类是一种常用的无监督学习算法，用于将数据集中的样本分成K个簇。其基本原理是将所有样本点划分到K个簇使得簇内样本点之间的距离尽可能接近，而不同簇之间的距离尽可能远。算法流程如下：随机选择K个样本点作为初始的聚类中心。将每个样本点分配到与其最近的聚类中心所在的簇。更新每个簇的聚类中心为该簇所有样本点的平均值。重复第2步和第3步，直到聚类中心不再变化或者达到最大迭代次数。优点：简单且易于实现。
一文讲清楚深度学习和机器学习平凡而伟大. 机器学习人工智能深度学习机器学习人工智能
目录1.定义机器学习（MachineLearning,ML）深度学习（DeepLearning,DL）2.工作原理机器学习深度学习3.应用场景机器学习深度学习4.主要区别5.为什么选择深度学习？6.总结深度学习和机器学习是人工智能（AI）领域中两个密切相关但有所区别的概念。要清楚地解释它们之间的关系，我们可以从定义、工作原理、应用场景以及两者的主要区别等方面进行探讨。1.定义机器学习（Machin
AIOps：解决企业IT挑战的智能利器雅菲奥朗认证培训 AIOps SRE 可观测性
前言：在当今数字化的时代，企业IT基础设施和应用程序规模不断扩大，面临着日益复杂的挑战。在这种情况下，AIOps人工智能运维成为解决企业IT运维困境的智能利器。AIOps与可观测性密切相关，可观测性是实现AIOps的基础。通过收集、监视和理解系统数据，AIOps能够自动化运维任务、实时监控系统状态、预测潜在问题，从而提高效率和稳定性。AIOps尤其适用于IT运维部门，这是一个迫切需要此类技术的群体
使用AIOps进行更好的事件管理茵赛飞3D CAD数据转换软件 pagerduty devops 人工智能运维
DevOps为科技界带来了更加协作和高效的工作流程。随着AIOps的集成，自动化更进一步，使用人工智能为团队提供更快的根本原因分析和算法降噪。主要从采用AIOps中受益的主要领域之一是事件管理。AIOps可以帮助DevOps团队自动化工作流程，以实现更智能、更高效的事件管理，从而腾出时间让IT运营团队成员专注于创新以改善用户体验。在本文中，我们将了解AIOps如何从检测和识别到响应改进事件管理，以
AI大模型编程能力对比：Deepseek&Claude&Gemini 黑夜路人（heiyeluren） AI人工智能人工智能 ai AIGC 语言模型
在当今快速发展的技术领域，人工智能（AI）模型在编程和数据处理方面的应用越来越广泛。不同的AI模型因其独特的设计理念和技术优势，适用于不同的编程任务和场景。本文将对三种主流的AI模型——DeepSeekv3、GeminiFlash2.0和Claude3.5Sonnet的编程能力进行详细对比，帮助读者根据具体需求选择最合适的工具。同时对DeepSeekv3、GeminiFlash2.0和Claude
DeepSeek：智能搜索与分析的新纪元 XRC2231 学习
在人工智能浪潮席卷全球的今天，DeepSeek如同一颗璀璨的新星，以其独特的魅力和强大的功能，在AI领域脱颖而出。DeepSeek，这一基于深度学习和数据挖掘技术的智能搜索与分析系统，不仅重新定义了搜索引擎的边界，更以其卓越的性能和广泛的应用场景，为全球用户带来了前所未有的智能体验。本文将从DeepSeek的定义、特点、应用场景、优势等方面进行全面而深入的介绍，带您领略这一新兴技术的独特魅力。一、
哈尔滨工业大学DeepSeek公开课人工智能：大模型原理技术与应用-从GPT到DeepSeek｜附视频下载方法你觉得205 人工智能机器学习大数据 ai 知识图谱 python 运维
导读INTRODUCTION今天继续哈尔滨工业大学车万翔教授带来了一场主题为“DeepSeek技术前沿与应用”的报告。本报告深入探讨了大语言模型在自然语言处理（NLP）领域的核心地位及其发展历程，从基础概念出发，延伸至语言模型在机器翻译、拼音输入法、语音识别等任务中的关键作用。强调了语言模型不仅辅助其他NLP任务，本身也蕴含大量知识，如地理信息、语义理解和推理能力。随着技术的发展，尤其是trans
机器学习knnlearn1 XW-ABAP 机器学习机器学习人工智能
importmatplotlib.pyplotaspltimportnumpyasnpimportoperator#定义一个函数用于创建数据集defcreateDataSet():#定义特征矩阵，每个元素是一个二维坐标点，代表不同策略数据点的坐标group=np.array([[20,3],[15,5],[18,1],[5,17],[2,15],[3,20]])#定义每个数据点对应的标签，用于区分
基于 MySQL 和 Spring Boot 的在线论坛管理系统设计与实现城南|阿洋-计算机从小白到大神 mysql spring boot 数据库
markdownCopy✌全网粉丝20W+,csdn特邀作者、博客专家、CSDN[新星计划]导师、java领域优质创作者,博客之星、掘金/华为云/阿里云/InfoQ等平台优质作者、专注于Java、pyhton、机器学习技术领域和毕业项目实战✌哈喽兄弟们，好久不见哦～最近整理了一下之前写过的一些小项目/毕业设计。发现还是有很多存货的，想一想既然放在电脑里面也吃灰，那么还不如分享出去，没准还可以帮助到
零基础入门机器学习：用Scikit-learn实现鸢尾花分类藍海琴泉机器学习 scikit-learn 分类
适合人群：机器学习新手|数据分析爱好者|需快速展示案例的学生一、引言：为什么要学这个案例？目的：明确机器学习解决什么问题，建立学习信心。机器学习定义：让计算机从数据中自动学习规律（如分类鸢尾花品种）。为什么选鸢尾花数据集：数据量小、特征明确，适合教学演示。Scikit-learn优势：提供现成算法和工具，无需从头写数学公式。二、环境准备：5分钟快速上手目的：搭建可运行的代码环境，避免卡在工具安装环
机器学习--DBSCAN聚类算法详解 2201_75491841 机器学习算法聚类人工智能
目录引言1.什么是DBSCAN聚类？2.DBSCAN聚类算法的原理3.DBSCAN算法的核心概念3.1邻域（Neighborhood）3.2核心点（CorePoint）3.3直接密度可达（DirectlyDensity-Reachable）3.4密度可达（Density-Reachable）3.5密度相连（Density-Connected）4.DBSCAN算法的步骤5.DBSCAN算法的优缺点5
【机器学习】机器学习工程实战-第3章数据收集和准备腊肉芥末果机器学习工程实战机器学习人工智能
上一章：第2章项目开始前文章目录3.1关于数据的问题3.1.1数据是否可获得3.1.2数据是否相当大3.1.3数据是否可用3.1.4数据是否可理解3.1.5数据是否可靠3.2数据的常见问题3.2.1高成本3.2.2质量差3.2.3噪声（noise）3.2.4偏差（bias）3.2.5预测能力低（lowpredictivepower）3.2.6过时的样本3.2.7离群值3.2.8数据泄露/目标泄漏3
机器学习实战第一章机器学习基础 LuoY、 Machine Learning 机器学习算法人工智能
第一章机器学习1.1何谓机器学习1.2关键术语1.3机器学习的主要任务1.4如何选择合适的算法1.5开发机器学习应用程序的步骤1.6Python语言的优势1.1何谓机器学习 1、简单地说，机器学习就是把无序的数据转换成有用的信息； 2、机器学习能让我们自数据集中受启发，我们会利用计算机来彰显数据背后的真实含义； 3、机器学习横跨计算机科学、工程技术和统计学等多个学科，需要多学科的
数据挖掘实战-基于机器学习的垃圾邮件检测模型艾派森数据挖掘实战合集数据挖掘机器学习人工智能 python
‍♂️个人主页：@艾派森的个人主页✍作者简介：Python学习者希望大家多多支持，我们一起进步！如果文章对你有帮助的话，欢迎评论点赞收藏加关注+目录1.项目背景2.数据集介绍
集成学习（随机森林） herry57 数学建模大数据随机森林集成学习
目录一、集成学习概念二、Bagging集成原理三、随机森林四、例子（商品分类）一、集成学习概念集成学习通过建⽴⼏个模型来解决单⼀预测问题。它的⼯作原理是⽣成多个分类器/模型，各⾃独⽴地学习和作出预测。这些预测最后结合成组合预测，因此优于任何⼀个单分类的做出预测。只要单分类器的表现不太差，集成学习的结果总是要好于单分类器的二、Bagging集成原理分类圆形和长方形三、随机森林在机器学习中，随机森林是
【机器学习】朴素贝叶斯入门：从零到垃圾邮件过滤实战吴师兄大模型 0基础实现机器学习入门到精通机器学习人工智能朴素贝叶斯深度学习 pytorch sklearn 开发语言
Langchain系列文章目录01-玩转LangChain：从模型调用到Prompt模板与输出解析的完整指南02-玩转LangChainMemory模块：四种记忆类型详解及应用场景全覆盖03-全面掌握LangChain：从核心链条构建到动态任务分配的实战指南04-玩转LangChain：从文档加载到高效问答系统构建的全程实战05-玩转LangChain：深度评估问答系统的三种高效方法（示例生成、手
【机器学习】机器学习工程实战-第2章项目开始前腊肉芥末果机器学习工程实战机器学习人工智能
上一章：第1章概述文章目录2.1机器学习项目的优先级排序2.1.1机器学习的影响2.1.2机器学习的成本2.2估计机器学习项目的复杂度2.2.1未知因素2.2.2简化问题2.2.3非线性进展2.3确定机器学习项目的目标2.3.1模型能做什么2.3.2成功模型的属性2.4构建机器学习团队2.4.1两种文化2.4.2机器学习团队的成员2.5机器学习项目为何失败2.5.1缺乏有经验的人才2.5.2缺乏领
mondb入手木zi_鸣 mongodb
windows 启动mongodb 编写bat文件， mongod --dbpath D:\software\MongoDBDATA mongod --help 查询各种配置配置在mongob 打开批处理，即可启动，27017原生端口，shell操作监控端口扩展28017，web端操作端口启动配置文件配置，数据更灵活
大型高并发高负载网站的系统架构 bijian1013 高并发负载均衡
扩展Web应用程序一.概念简单的来说，如果一个系统可扩展，那么你可以通过扩展来提供系统的性能。这代表着系统能够容纳更高的负载、更大的数据集，并且系统是可维护的。扩展和语言、某项具体的技术都是无关的。扩展可以分为两种： 1.
DISPLAY变量和xhost(原创) czmmiao display
DISPLAY 在Linux/Unix类操作系统上, DISPLAY用来设置将图形显示到何处. 直接登陆图形界面或者登陆命令行界面后使用startx启动图形, DISPLAY环境变量将自动设置为:0:0, 此时可以打开终端, 输出图形程序的名称(比如xclock)来启动程序, 图形将显示在本地窗口上, 在终端上输入printenv查看当前环境变量, 输出结果中有如下内容:DISPLAY=:0.0
获取B/S客户端IP 周凡杨 java 编程 jsp Web 浏览器
最近想写个B/S架构的聊天系统，因为以前做过C/S架构的QQ聊天系统，所以对于Socket通信编程只是一个巩固。对于C/S架构的聊天系统，由于存在客户端Java应用，所以直接在代码中获取客户端的IP，应用的方法为： String ip = InetAddress.getLocalHost().getHostAddress(); 然而对于WEB
浅谈类和对象朱辉辉33 编程
类是对一类事物的总称，对象是描述一个物体的特征，类是对象的抽象。简单来说，类是抽象的，不占用内存，对象是具体的，占用存储空间。类是由属性和方法构成的，基本格式是public class 类名{ //定义属性 private/public 数据类型属性名； //定义方法 publ
android activity与viewpager+fragment的生命周期问题肆无忌惮_ viewpager
有一个Activity里面是ViewPager，ViewPager里面放了两个Fragment。第一次进入这个Activity。开启了服务，并在onResume方法中绑定服务后，对Service进行了一定的初始化，其中调用了Fragment中的一个属性。 super.onResume(); bindService(intent, conn, BIND_AUTO_CREATE);
base64Encode对图片进行编码 843977358 base64 图片 encoder
/** * 对图片进行base64encoder编码 * * @author mrZhang * @param path * @return */ public static String encodeImage(String path) { BASE64Encoder encoder = null; byte[] b = null; I
Request Header简介 aigo servlet
当一个客户端(通常是浏览器)向Web服务器发送一个请求是，它要发送一个请求的命令行，一般是GET或POST命令，当发送POST命令时，它还必须向服务器发送一个叫“Content-Length”的请求头(Request Header) 用以指明请求数据的长度，除了Content-Length之外，它还可以向服务器发送其它一些Headers，如：
HttpClient4.3 创建SSL协议的HttpClient对象 alleni123 httpclient 爬虫 ssl
public class HttpClientUtils { public static CloseableHttpClient createSSLClientDefault(CookieStore cookies){ SSLContext sslContext=null; try { sslContext=new SSLContextBuilder().l
java取反 -右移-左移-无符号右移的探讨百合不是茶位运算符位移
取反：在二进制中第一位，1表示符数，0表示正数 byte a = -1; 原码：10000001 反码：11111110 补码：11111111 //异或: 00000000 byte b = -2; 原码：10000010 反码：11111101 补码：11111110 //异或: 00000001
java多线程join的作用与用法 bijian1013 java 多线程
对于JAVA的join，JDK 是这样说的：join public final void join （long millis ）throws InterruptedException Waits at most millis milliseconds for this thread to die. A timeout of 0 means t
Java发送http请求(get 与post方法请求) bijian1013 java spring
PostRequest.java package com.bijian.study; import java.io.BufferedReader; import java.io.DataOutputStream; import java.io.IOException; import java.io.InputStreamReader; import java.net.HttpURL
【Struts2二】struts.xml中package下的action配置项默认值 bit1129 struts.xml
在第一部份，定义了struts.xml文件，如下所示： <!DOCTYPE struts PUBLIC "-//Apache Software Foundation//DTD Struts Configuration 2.3//EN" "http://struts.apache.org/dtds/struts
【Kafka十三】Kafka Simple Consumer bit1129 simple
代码中关于Host和Port是割裂开的，这会导致单机环境下的伪分布式Kafka集群环境下，这个例子没法运行。实际情况是需要将host和port绑定到一起， package kafka.examples.lowlevel; import kafka.api.FetchRequest; import kafka.api.FetchRequestBuilder; impo
nodejs学习api ronin47 nodejs api
NodeJS基础什么是NodeJS JS是脚本语言，脚本语言都需要一个解析器才能运行。对于写在HTML页面里的JS，浏览器充当了解析器的角色。而对于需要独立运行的JS，NodeJS就是一个解析器。每一种解析器都是一个运行环境，不但允许JS定义各种数据结构，进行各种计算，还允许JS使用运行环境提供的内置对象和方法做一些事情。例如运行在浏览器中的JS的用途是操作DOM，浏览器就提供了docum
java-64.寻找第N个丑数 bylijinnan java
public class UglyNumber { /** * 64.查找第N个丑数具体思路可参考 [url] http://zhedahht.blog.163.com/blog/static/2541117420094245366965/[/url] * 题目：我们把只包含因子 2、3和5的数称作丑数（Ugly Number）。例如6、8都是丑数，但14
二维数组（矩阵）对角线输出 bylijinnan 二维数组
/** 二维数组对角线输出两个方向例如对于数组： { 1, 2, 3, 4 }, { 5, 6, 7, 8 }, { 9, 10, 11, 12 }, { 13, 14, 15, 16 }, slash方向输出： 1 5 2 9 6 3 13 10 7 4 14 11 8 15 12 16 backslash输出： 4 3
[JWFD开源工作流设计]工作流跳跃模式开发关键点(今日更新) comsci 工作流
既然是做开源软件的,我们的宗旨就是给大家分享设计和代码,那么现在我就用很简单扼要的语言来透露这个跳跃模式的设计原理大家如果用过JWFD的ARC-自动运行控制器,或者看过代码,应该知道在ARC算法模块中有一个函数叫做SAN(),这个函数就是ARC的核心控制器,要实现跳跃模式,在SAN函数中一定要对LN链表数据结构进行操作,首先写一段代码,把
redis常见使用 cuityang redis 常见使用
redis 通常被认为是一个数据结构服务器，主要是因为其有着丰富的数据结构 strings、map、 list、sets、 sorted sets 引入jar包 jedis-2.1.0.jar (本文下方提供下载) package redistest; import redis.clients.jedis.Jedis; public class Listtest
配置多个redis dalan_123 redis
配置多个redis客户端 <?xml version="1.0" encoding="UTF-8"?><beans xmlns="http://www.springframework.org/schema/beans" xmlns:xsi=&quo
attrib命令 dcj3sjt126com attr
attrib指令用于修改文件的属性.文件的常见属性有:只读.存档.隐藏和系统. 只读属性是指文件只可以做读的操作.不能对文件进行写的操作.就是文件的写保护. 存档属性是用来标记文件改动的.即在上一次备份后文件有所改动.一些备份软件在备份的时候会只去备份带有存档属性的文件.
Yii使用公共函数 dcj3sjt126com yii
在网站项目中，没必要把公用的函数写成一个工具类，有时候面向过程其实更方便。在入口文件index.php里添加 require_once('protected/function.php'); 即可对其引用，成为公用的函数集合。 function.php如下： <?php /** * This is the shortcut to D
linux 系统资源的查看（free、uname、uptime、netstat） eksliang netstat linux uname linux uptime linux free
linux 系统资源的查看转载请出自出处：http://eksliang.iteye.com/blog/2167081 http://eksliang.iteye.com 一、free查看内存的使用情况语法如下： free [-b][-k][-m][-g] [-t] 参数含义 -b:直接输入free时，显示的单位是kb我们可以使用b(bytes),m
JAVA的位操作符 greemranqq 位运算 JAVA位移 <<>>>
最近几种进制，加上各种位操作符，发现都比较模糊，不能完全掌握，这里就再熟悉熟悉。 1.按位操作符：按位操作符是用来操作基本数据类型中的单个bit,即二进制位，会对两个参数执行布尔代数运算，获得结果。与（&）运算： 1&1 = 1, 1&0 = 0, 0&0 &
Web前段学习网站 ihuning Web
Web前段学习网站菜鸟学习：http://www.w3cschool.cc/ JQuery中文网：http://www.jquerycn.cn/ 内存溢出：http://outofmemory.cn/#csdn.blog http://www.icoolxue.com/ http://www.jikexue
强强联合：FluxBB 作者加盟 Flarum justjavac r
原文：FluxBB Joins Forces With Flarum作者：Toby Zerner译文：强强联合：FluxBB 作者加盟 Flarum译者：justjavac FluxBB 是一个快速、轻量级论坛软件，它的开发者是一名德国的 PHP 天才 Franz Liedke。FluxBB 的下一个版本(2.0)将被完全重写，并已经开发了一段时间。FluxBB 看起来非常有前途的，
java统计在线人数（session存储信息的） macroli java Web
这篇日志是我写的第三次了前两次都发布失败！郁闷极了！由于在web开发中常常用到这一部分所以在此记录一下，呵呵，就到备忘录了！我对于登录信息时使用session存储的，所以我这里是通过实现HttpSessionAttributeListener这个接口完成的。 1、实现接口类，在web.xml文件中配置监听类，从而可以使该类完成其工作。 public class Ses
bootstrp carousel初体验快速构建图片播放 qiaolevip 每天进步一点点学习永无止境 bootstrap 纵观千象
img{ border: 1px solid white; box-shadow: 2px 2px 12px #333; _width: expression(this.width > 600 ? "600px" : this.width + "px"); _height: expression(this.width &
SparkSQL读取HBase数据，通过自定义外部数据源 superlxw1234 spark sparksql sparksql读取hbase sparksql外部数据源
关键字：SparkSQL读取HBase、SparkSQL自定义外部数据源前面文章介绍了SparSQL通过Hive操作HBase表。 SparkSQL从1.2开始支持自定义外部数据源(External DataSource)，这样就可以通过API接口来实现自己的外部数据源。这里基于Spark1.4.0，简单介绍SparkSQL自定义外部数据源，访
Spring Boot 1.3.0.M1发布 wiselyman spring boot
Spring Boot 1.3.0.M1于6.12日发布，现在可以从Spring milestone repository下载。这个版本是基于Spring Framework 4.2.0.RC1,并在Spring Boot 1.2之上提供了大量的新特性improvements and new features。主要包含以下： 1.提供一个新的sprin