灵隐寺扫地僧

[CS231n Assignment 2 #04 ] 卷积神经网络(Convolutional Networks )

文章目录

- 作业介绍
- 1. 卷积操作
- - 1. 1 Convolution: Naive forward pass
  - 1.2 Aside: Image processing via convolutions
  - 1.3. Convolution: Naive backward pass
- 2. 池化操作
- - 2.1 Max-Pooling: Naive forward
  - 2.2 Max-Pooling: Naive backward
- 3. 更高效率的实现
- 4. Convolutional "sandwich" layers
- 5. Three-layer ConvNet
- - 5.1 Overfit small data
  - 5.2 Train the net
  - 5.3 Visualize Filters
- 6. 空间归一化（Spatial Batch Normalization）
- - 6.1 Spatial batch normalization: forward
  - 6.2 Spatial batch normalization: backward
- 7. 分组归一化（Group Normalization）
- - 7.1 前向传播
  - 7.2 反向传播
- 8. 实验

作业介绍

课堂笔记：[Lecture 9 ] CNN Architectures（CNN架构）
作业主页：Assignment 2
作业目的：为了使深度神经网络更好的得到训练，一个方案是使用更复杂的优化方法：SGD+Momentum，Adam，RMSProp等。另一个方案就是改变网络结构，比如我们这节要完成的批量归一化。
官方示例代码： Assignment 2 code
作业源文件 ConvolutionalNetworks.ipynb

1. 卷积操作

1. 1 Convolution: Naive forward pass

在cs231n/layers.py中实实现conv_forward_naive。这是一个简单的实现，你不必考虑太多的效率问题。

def conv_forward_naive(x, w, b, conv_param):
    """
    A naive implementation of the forward pass for a convolutional layer.

    The input consists of N data points, each with C channels, height H and
    width W. We convolve each input with F different filters, where each filter
    spans all C channels and has height HH and width WW.

    Input:
    - x: Input data of shape (N, C, H, W)
    - w: Filter weights of shape (F, C, HH, WW)
    - b: Biases, of shape (F,)
    - conv_param: A dictionary with the following keys:
      - 'stride': The number of pixels between adjacent receptive fields in the
        horizontal and vertical directions.
      - 'pad': The number of pixels that will be used to zero-pad the input. 
        

    During padding, 'pad' zeros should be placed symmetrically (i.e equally on both sides)
    along the height and width axes of the input. Be careful not to modfiy the original
    input x directly.

    Returns a tuple of:
    - out: Output data, of shape (N, F, H', W') where H' and W' are given by
      H' = 1 + (H + 2 * pad - HH) / stride
      W' = 1 + (W + 2 * pad - WW) / stride
    - cache: (x, w, b, conv_param)
    """
    out = None
    # Hint: you can use the function np.pad for padding.                      #
    stride,pad = conv_param["stride"],conv_param["pad"]
    N, C, H, W = x.shape
    F, C, HH, WW = w.shape

    x_pad = np.pad(x,((0,0),(0,0),(pad,pad),(pad,pad)),
                   mode = 'constant',constant_values = 0)
    # out of shape (N,F,out_H,out_W)
    out_H = 1 + (H + 2 * pad - HH) // stride
    out_W = 1 + (W + 2 * pad - WW) // stride

    out = np.zeros((N,F,out_H,out_W))
    for i in range(out_H):
        for j in range(out_W):
            # 参与计算的数值
            x_pad_masked = x_pad[:,:, i*stride:i*stride+HH, j*stride:j*stride+WW]
            # 每个输出通道分别计算
            for f in range(F):
                # 借助广播机制,会执行 N 次
                # https://cs231n.github.io/python-numpy-tutorial/
                out[:,f,i,j] = np.sum(x_pad_masked * w[f,:,:,:],axis=(1,2,3))
    # 给每一个输出通道加上 bias
    # b of shape (1,F,1,1)
    # out = out + (b)[None,:,None,None]
    # 能用广播机制进行计算, 如果 rank不相等,则先在前面 prepend 1s 直到 shape 有相同的长度
    out = out + b.reshape((1,F,1,1))
    cache = (x, w, b, conv_param)
    return out, cache

1.2 Aside: Image processing via convolutions

为了检查实现并更好地理解卷积层可以执行的操作类型，我们将设置一个包含两幅图像的输入，并手动设置执行常见图像处理操作(灰度转换和边缘检测)的过滤器。卷积前向传递将对每个输入图像应用这些操作。然后我们可以将结果可视化为完整性检查。

from scipy.misc import imread, imresize

kitten, puppy = imread('kitten.jpg'), imread('puppy.jpg')
# kitten is wide, and puppy is already square
d = kitten.shape[1] - kitten.shape[0]
kitten_cropped = kitten[:, d//2:-d//2, :]

img_size = 200   # Make this smaller if it runs too slow
x = np.zeros((2, 3, img_size, img_size))
x[0, :, :, :] = imresize(puppy, (img_size, img_size)).transpose((2, 0, 1))
x[1, :, :, :] = imresize(kitten_cropped, (img_size, img_size)).transpose((2, 0, 1))

# Set up a convolutional weights holding 2 filters, each 3x3
w = np.zeros((2, 3, 3, 3))

# The first filter converts the image to grayscale.
# Set up the red, green, and blue channels of the filter.
w[0, 0, :, :] = [[0, 0, 0], [0, 0.3, 0], [0, 0, 0]]
w[0, 1, :, :] = [[0, 0, 0], [0, 0.6, 0], [0, 0, 0]]
w[0, 2, :, :] = [[0, 0, 0], [0, 0.1, 0], [0, 0, 0]]

# Second filter detects horizontal edges in the blue channel.
w[1, 2, :, :] = [[1, 2, 1], [0, 0, 0], [-1, -2, -1]]

# Vector of biases. We don't need any bias for the grayscale
# filter, but for the edge detection filter we want to add 128
# to each output so that nothing is negative.
b = np.array([0, 128])

# Compute the result of convolving each input in x with each filter in w,
# offsetting by b, and storing the results in out.
out, _ = conv_forward_naive(x, w, b, {'stride': 1, 'pad': 1})

def imshow_noax(img, normalize=True):
    """ Tiny helper to show images as uint8 and remove axis labels """
    if normalize:
        img_max, img_min = np.max(img), np.min(img)
        img = 255.0 * (img - img_min) / (img_max - img_min)
    plt.imshow(img.astype('uint8'))
    plt.gca().axis('off')

# Show the original images and the results of the conv operation
plt.subplot(2, 3, 1)
imshow_noax(puppy, normalize=False)
plt.title('Original image')
plt.subplot(2, 3, 2)
imshow_noax(out[0, 0])
plt.title('Grayscale')
plt.subplot(2, 3, 3)
imshow_noax(out[0, 1])
plt.title('Edges')
plt.subplot(2, 3, 4)
imshow_noax(kitten_cropped, normalize=False)
plt.subplot(2, 3, 5)
imshow_noax(out[1, 0])
plt.subplot(2, 3, 6)
imshow_noax(out[1, 1])
plt.show()

可以看到，第一个卷积核执行的是类似灰度化的操作，即输出结果等于 0.3R+0.6G+0.1*B。
第二个卷积核检测蓝色通道中的水平边缘。

1.3. Convolution: Naive backward pass

也是只实现功能，先暂不考虑效率

def conv_backward_naive(dout, cache):
    """
    A naive implementation of the backward pass for a convolutional layer.

    Inputs:
    - dout: Upstream derivatives.
    - cache: A tuple of (x, w, b, conv_param) as in conv_forward_naive

    Returns a tuple of:
    - dx: Gradient with respect to x
    - dw: Gradient with respect to w
    - db: Gradient with respect to b
    """
    dx, dw, db = None, None, None
    # TODO: Implement the convolutional backward pass.
    (x, w, b, conv_param) = cache
    stride, pad = conv_param["stride"], conv_param["pad"]
    N, C, H, W = x.shape
    F, C, HH, WW = w.shape
    out_H = 1 + (H + 2 * pad - HH) // stride
    out_W = 1 + (W + 2 * pad - WW) // stride

    x_pad = np.pad(x, ((0, 0), (0, 0), (pad, pad), (pad, pad)),
                   mode='constant', constant_values=0)
    dx = np.zeros_like(x)
    dw = np.zeros_like(w)
    db = np.zeros_like(b)
    dx_pad = np.zeros_like(x_pad)

    db = np.sum(dout, axis=(0, 2, 3))
    for i in range(out_H):
        for j in range(out_W):
            # of shape (N,C,HH,WW)
            x_pad_masked = x_pad[:,:, i*stride:i*stride+HH, j*stride:j*stride+WW]
            # 每个输出通道分别计算
            for f in range(F):
                # 一个通道的值 of shape (N,1,1,1)
                dout_C = dout[:,f:f+1,i:i+1,j:j+1]
                # of shape (C,HH,WW)
                dw[f,:,:,:] += np.sum(dout_C * x_pad_masked, axis = 0)
                # of shape (N,C,HH,WW) = (N,1,1,1) * (C,HH,WW)
                dx_pad[:,:, i*stride:i*stride+HH, j*stride:j*stride+WW] += dout_C * w[f,:,:,:]
            # for n in range(N):  # compute dx_pad
            #     # [F,C,HH,WW] * [F,1,1,1] -> [C,HH,WW]
            #     dx_pad[n, :, i * stride:i * stride + HH, j * stride:j * stride + WW] += \
            #         np.sum((w[:, :, :, :] * (dout[n, :, i, j])[:,None,None,None]),axis=0)

    # 把 dx_pad 映射回 dx
    # of shape (N,C,H,W)
    dx = dx_pad[:,:,pad:-pad,pad:-pad]
    return dx, dw, db

2. 池化操作

2.1 Max-Pooling: Naive forward

同样,只考虑功能不考虑效率。

def max_pool_forward_naive(x, pool_param):
    """
    A naive implementation of the forward pass for a max-pooling layer.

    Inputs:
    - x: Input data, of shape (N, C, H, W)
    - pool_param: dictionary with the following keys:
      - 'pool_height': The height of each pooling region
      - 'pool_width': The width of each pooling region
      - 'stride': The distance between adjacent pooling regions

    No padding is necessary here. Output size is given by 

    Returns a tuple of:
    - out: Output data, of shape (N, C, H', W') where H' and W' are given by
      H' = 1 + (H - pool_height) / stride
      W' = 1 + (W - pool_width) / stride
    - cache: (x, pool_param)
    """
    out = None
    # TODO: Implement the max-pooling forward pass
    N,C,H,W = x.shape
    pool_height = pool_param["pool_height"]
    pool_width = pool_param["pool_width"]
    stride = pool_param["stride"]

    outH = 1 + (H - pool_height) // stride
    outW = 1 + (W - pool_width) // stride
    out = np.zeros((N,C,outH,outW))

    for i in range(outH):
        for j in range(outW):
            # of shape (N,C,HH,WW)
            x_pool_mask = x[:,:,i*stride:i*stride+pool_height,j*stride:j*stride+pool_width]
            # out of shape (N,C)
            out[:,:,i,j] = np.max(x_pool_mask,axis=(2,3))
    cache = (x, pool_param)
    return out, cache

2.2 Max-Pooling: Naive backward

def max_pool_backward_naive(dout, cache):
    """
    A naive implementation of the backward pass for a max-pooling layer.

    Inputs:
    - dout: Upstream derivatives
    - cache: A tuple of (x, pool_param) as in the forward pass.

    Returns:
    - dx: Gradient with respect to x
    """
    dx = None
    # TODO: Implement the max-pooling backward pass
    (x, pool_param) = cache
    N,C,H,W = x.shape
    HH = pool_param["pool_height"]
    WW = pool_param["pool_width"]
    stride = pool_param["stride"]

    dx = np.zeros_like(x)
    outH = 1 + (H - HH) // stride
    outW = 1 + (W - WW) // stride
    for i in range(outH):
        for j in range(outW):
            # of shape (N,C,HH,WW)
            x_pool_mask = x[:, :, i * stride:i * stride + HH, j * stride:j * stride + WW]
            # 先找出最大值,然后再找出最大值对应的位置做一个掩码
            # ,但是如果最大值不唯一的时候就不合理了
            # (N,C,1,1)
            max_x_masked = np.max(x_pool_mask, axis=(2, 3),keepdims=True)
            # of shape (N,C,HH,WW)
            temp_binary_mask = (x_pool_mask == max_x_masked)
            dx[:, :, i * stride: i * stride + HH, j * stride: j * stride + WW] += temp_binary_mask * dout[:, :, i:i+1, j:j+1]
    return dx

3. 更高效率的实现

快速实现卷积和池化层是一项挑战。为了避免您的麻烦，官方在文件cs231n/fast_layers.py中提供了卷积和池化层的正向和向后传递的快速实现。有时间的可以看一下。
快速卷积实现依赖Cython扩展.要编译它，您需要从cs231n目录运行以下代码:
python setup.py build_ext --inplace

NOTE: The fast implementation for pooling will only perform optimally if the pooling regions are non-overlapping and tile the input. If these conditions are not met then the fast pooling implementation will not be much faster than the naive implementation.

下面是实现的效率对比：

# Rel errors should be around e-9 or less
from cs231n.fast_layers import conv_forward_fast, conv_backward_fast
from time import time
np.random.seed(231)
x = np.random.randn(100, 3, 31, 31)
w = np.random.randn(25, 3, 3, 3)
b = np.random.randn(25,)
dout = np.random.randn(100, 25, 16, 16)
conv_param = {'stride': 2, 'pad': 1}

t0 = time()
out_naive, cache_naive = conv_forward_naive(x, w, b, conv_param)
t1 = time()
out_fast, cache_fast = conv_forward_fast(x, w, b, conv_param)
t2 = time()

print('Testing conv_forward_fast:')
print('Naive: %fs' % (t1 - t0))
print('Fast: %fs' % (t2 - t1))
print('Speedup: %fx' % ((t1 - t0) / (t2 - t1)))
print('Difference: ', rel_error(out_naive, out_fast))

t0 = time()
dx_naive, dw_naive, db_naive = conv_backward_naive(dout, cache_naive)
t1 = time()
dx_fast, dw_fast, db_fast = conv_backward_fast(dout, cache_fast)
t2 = time()

print('\nTesting conv_backward_fast:')
print('Naive: %fs' % (t1 - t0))
print('Fast: %fs' % (t2 - t1))
print('Speedup: %fx' % ((t1 - t0) / (t2 - t1)))
print('dx difference: ', rel_error(dx_naive, dx_fast))
print('dw difference: ', rel_error(dw_naive, dw_fast))
print('db difference: ', rel_error(db_naive, db_fast))

Testing conv_forward_fast:
Naive: 0.125854s
Fast: 0.005614s
Speedup: 22.417803x
Difference:  4.926407851494105e-11

Testing conv_backward_fast:
Naive: 0.269846s
Fast: 0.004650s
Speedup: 58.027019x
dx difference:  1.383704034070129e-11
dw difference:  3.75105216164263e-13
db difference:  0.0

# Relative errors should be close to 0.0
from cs231n.fast_layers import max_pool_forward_fast, max_pool_backward_fast
np.random.seed(231)
x = np.random.randn(100, 3, 32, 32)
dout = np.random.randn(100, 3, 16, 16)
pool_param = {'pool_height': 2, 'pool_width': 2, 'stride': 2}

t0 = time()
out_naive, cache_naive = max_pool_forward_naive(x, pool_param)
t1 = time()
out_fast, cache_fast = max_pool_forward_fast(x, pool_param)
t2 = time()

print('Testing pool_forward_fast:')
print('Naive: %fs' % (t1 - t0))
print('fast: %fs' % (t2 - t1))
print('speedup: %fx' % ((t1 - t0) / (t2 - t1)))
print('difference: ', rel_error(out_naive, out_fast))

t0 = time()
dx_naive = max_pool_backward_naive(dout, cache_naive)
t1 = time()
dx_fast = max_pool_backward_fast(dout, cache_fast)
t2 = time()

print('\nTesting pool_backward_fast:')
print('Naive: %fs' % (t1 - t0))
print('fast: %fs' % (t2 - t1))
print('speedup: %fx' % ((t1 - t0) / (t2 - t1)))
print('dx difference: ', rel_error(dx_naive, dx_fast))

Testing pool_forward_fast:
Naive: 0.007368s
fast: 0.001482s
speedup: 4.969926x
difference:  0.0

Testing pool_backward_fast:
Naive: 0.015857s
fast: 0.007847s
speedup: 2.020660x
dx difference:  0.0

4. Convolutional “sandwich” layers

将之前实现的层组合起来，构成我们常见的子模块。
conv + ReLu

def conv_relu_forward(x, w, b, conv_param):
    """
    A convenience layer that performs a convolution followed by a ReLU.

    Inputs:
    - x: Input to the convolutional layer
    - w, b, conv_param: Weights and parameters for the convolutional layer

    Returns a tuple of:
    - out: Output from the ReLU
    - cache: Object to give to the backward pass
    """
    a, conv_cache = conv_forward_fast(x, w, b, conv_param)
    out, relu_cache = relu_forward(a)
    cache = (conv_cache, relu_cache)
    return out, cache

def conv_relu_backward(dout, cache):
    """
    Backward pass for the conv-relu convenience layer.
    """
    conv_cache, relu_cache = cache
    da = relu_backward(dout, relu_cache)
    dx, dw, db = conv_backward_fast(da, conv_cache)
    return dx, dw, db

conv+ReLu+maxpooling

def conv_relu_pool_forward(x, w, b, conv_param, pool_param):
    """
    Convenience layer that performs a convolution, a ReLU, and a pool.

    Inputs:
    - x: Input to the convolutional layer
    - w, b, conv_param: Weights and parameters for the convolutional layer
    - pool_param: Parameters for the pooling layer

    Returns a tuple of:
    - out: Output from the pooling layer
    - cache: Object to give to the backward pass
    """
    a, conv_cache = conv_forward_fast(x, w, b, conv_param)
    s, relu_cache = relu_forward(a)
    out, pool_cache = max_pool_forward_fast(s, pool_param)
    cache = (conv_cache, relu_cache, pool_cache)
    return out, cache

def conv_relu_pool_backward(dout, cache):
    """
    Backward pass for the conv-relu-pool convenience layer
    """
    conv_cache, relu_cache, pool_cache = cache
    ds = max_pool_backward_fast(dout, pool_cache)
    da = relu_backward(ds, relu_cache)
    dx, dw, db = conv_backward_fast(da, conv_cache)
    return dx, dw, db

5. Three-layer ConvNet

Now that you have implemented all the necessary layers, we can put them together into a simple convolutional network.

Open the file cs231n/classifiers/cnn.py and complete the implementation of the ThreeLayerConvNet class. Remember you can use the fast/sandwich layers (already imported for you) in your implementation.

class ThreeLayerConvNet(object):
    """
    A three-layer convolutional network with the following architecture:

    conv - relu - 2x2 max pool - affine - relu - affine - softmax

    The network operates on minibatches of data that have shape (N, C, H, W)
    consisting of N images, each with height H and width W and with C input
    channels.
    """

    def __init__(self, input_dim=(3, 32, 32), num_filters=32, filter_size=7,
                 hidden_dim=100, num_classes=10, weight_scale=1e-3, reg=0.0,
                 dtype=np.float32):
        """Initialize a new network.
        Inputs:
        - input_dim: Tuple (C, H, W) giving size of input data
        - num_filters: Number of filters to use in the convolutional layer
        - filter_size: Width/height of filters to use in the convolutional layer
        - hidden_dim: Number of units to use in the fully-connected hidden layer
        - num_classes: Number of scores to produce from the final affine layer.
        - weight_scale: Scalar giving standard deviation for random initialization
          of weights.
        - reg: Scalar giving L2 regularization strength
        - dtype: numpy datatype to use for computation.
        """
        self.params = {}
        self.reg = reg
        self.dtype = dtype

        ############################################################################
        # TODO: Initialize weights and biases for the three-layer convolutional    #
        # network. Weights should be initialized from a Gaussian centered at 0.0   #
        # with standard deviation equal to weight_scale; biases should be          #
        # initialized to zero. All weights and biases should be stored in the      #
        #  dictionary self.params. Store weights and biases for the convolutional  #
        # layer using the keys 'W1' and 'b1'; use keys 'W2' and 'b2' for the       #
        # weights and biases of the hidden affine layer, and keys 'W3' and 'b3'    #
        # for the weights and biases of the output affine layer.                   #
        #                                                                          #
        # IMPORTANT: For this assignment, you can assume that the padding          #
        # and stride of the first convolutional layer are chosen so that           #
        # **the width and height of the input are preserved**. Take a look at      #
        # the start of the loss() function to see how that happens.                #                           
        ############################################################################
        """"""
        # 第一层卷积层
        # of shape (C_out,C_in,HH,WW)
        self.params['W1'] = weight_scale * np.random.randn(num_filters,input_dim[0],filter_size,filter_size)
        # of shape (C_out)
        self.params['b1'] = np.zeros(num_filters)
        # 第二层全连接层,假设第一层的输入输出尺度不发生变化（因为有padd）,但是会因为 max_pooling使得空间尺寸减一半
        # input of shape (N,C_out,H/2,W/2), out of shape (N,hidden_dim)
        hidden_input_dim = num_filters * (input_dim[1]//2) * (input_dim[2]//2)
        self.params['W2'] =  weight_scale * np.random.randn(hidden_input_dim,hidden_dim)
        self.params['b2'] = np.zeros(hidden_dim)
        # 第三层分类层
        self.params['W3'] = weight_scale * np.random.randn(hidden_dim,num_classes)
        self.params['b3'] = np.zeros(num_classes)
        ############################################################################
        #                             END OF YOUR CODE                             #
        ############################################################################

        for k, v in self.params.items():
            self.params[k] = v.astype(dtype)


    def loss(self, X, y=None):
        """
        Evaluate loss and gradient for the three-layer convolutional network.

        Input / output: Same API as TwoLayerNet in fc_net.py.
        """
        W1, b1 = self.params['W1'], self.params['b1']
        W2, b2 = self.params['W2'], self.params['b2']
        W3, b3 = self.params['W3'], self.params['b3']

        # pass conv_param to the forward pass for the convolutional layer
        # Padding and stride chosen to preserve the input spatial size
        filter_size = W1.shape[2]
        conv_param = {'stride': 1, 'pad': (filter_size - 1) // 2}

        # pass pool_param to the forward pass for the max-pooling layer
        pool_param = {'pool_height': 2, 'pool_width': 2, 'stride': 2}

        scores = None
        ############################################################################
        # TODO: Implement the forward pass for the three-layer convolutional net,  #
        # computing the class scores for X and storing them in the scores          #
        # variable.                                                                #
        #                                                                          #
        # Remember you can use the functions defined in cs231n/fast_layers.py and  #
        # cs231n/layer_utils.py in your implementation (already imported).         #
        ############################################################################
        # 第一层卷积层
        conv_out,conv_cache = conv_relu_pool_forward(X,W1,b1,conv_param,pool_param)
        # 第二层全连接层
        fc_out,fc_cache = affine_relu_forward(conv_out,W2,b2)
        # 第三层分类层
        # out of shape (N,num_classes)
        # 这是没有softmax概率化的分数
        scores,scores_cache = affine_forward(fc_out,W3,b3)

        if y is None:
            return scores

        loss, grads = 0, {}
        ############################################################################
        # TODO: Implement the backward pass for the three-layer convolutional net, #
        # storing the loss and gradients in the loss and grads variables. Compute  #
        # data loss using softmax, and make sure that grads[k] holds the gradients #
        # for self.params[k]. Don't forget to add L2 regularization!               #
        #                                                                          #
        # NOTE: To ensure that your implementation matches ours and you pass the   #
        # automated tests, make sure that your L2 regularization includes a factor #
        # of 0.5 to simplify the expression for the gradient.                      #
        ############################################################################
        loss,dout = softmax_loss(scores,y)
        # 分类层梯度
        ds,grads["W3"],grads["b3"] = affine_backward(dout,scores_cache)
        loss += self.reg * 0.5 * np.sum(W3 * W3)
        grads["W3"] += self.reg * W3
        # 全连接层梯度
        da,grads["W2"],grads["b2"] = affine_relu_backward(ds,fc_cache)
        loss += self.reg * 0.5 * np.sum(W2 * W2)
        grads["W2"] += self.reg * W2
        # 卷积层梯度
        dx,grads["W1"],grads["b1"] = conv_relu_pool_backward(da,conv_cache)
        loss += self.reg * 0.5 * np.sum(W1 * W1)
        grads["W1"] += self.reg * W1
        
        return loss, grads

5.1 Overfit small data

A nice trick is to train your model with just a few training samples. You should be able to overfit small datasets, which will result in very high training accuracy and comparatively low validation accuracy

np.random.seed(231)

num_train = 100
small_data = {
  'X_train': data['X_train'][:num_train],
  'y_train': data['y_train'][:num_train],
  'X_val': data['X_val'],
  'y_val': data['y_val'],
}

model = ThreeLayerConvNet(weight_scale=1e-2)

solver = Solver(model, small_data,
                num_epochs=15, batch_size=50,
                update_rule='adam',
                optim_config={
                  'learning_rate': 1e-3,
                },
                verbose=True, print_every=1)
solver.train()

Plotting the loss, training accuracy, and validation accuracy should show clear overfitting:

plt.subplot(2, 1, 1)
plt.plot(solver.loss_history, 'o')
plt.xlabel('iteration')
plt.ylabel('loss')

plt.subplot(2, 1, 2)
plt.plot(solver.train_acc_history, '-o')
plt.plot(solver.val_acc_history, '-o')
plt.legend(['train', 'val'], loc='upper left')
plt.xlabel('epoch')
plt.ylabel('accuracy')
plt.show()

5.2 Train the net

By training the three-layer convolutional network for one epoch, you should achieve greater than 40% accuracy on the training set:

model = ThreeLayerConvNet(weight_scale=0.001, hidden_dim=500, reg=0.001)

solver = Solver(model, data,
                num_epochs=1, batch_size=50,
                update_rule='adam',
                optim_config={
                  'learning_rate': 1e-3,
                },
                verbose=True, print_every=20)
solver.train()

5.3 Visualize Filters

可视化第一层的卷积核： $C_{out},C_{in},HH,WW)$
首先是展示N张图片的代码：

def visualize_grid(Xs, ubound=255.0, padding=1):
    """
    Reshape a 4D tensor of image data to a grid for easy visualization.

    Inputs:
    - Xs: Data of shape (N, H, W, C)
    - ubound: Output grid will have values scaled to the range [0, ubound]
    - padding: The number of blank pixels between elements of the grid
    """
    (N, H, W, C) = Xs.shape
    grid_size = int(ceil(sqrt(N)))
    grid_height = H * grid_size + padding * (grid_size - 1)
    grid_width = W * grid_size + padding * (grid_size - 1)
    grid = np.zeros((grid_height, grid_width, C))
    next_idx = 0
    y0, y1 = 0, H
    for y in range(grid_size):
        x0, x1 = 0, W
        for x in range(grid_size):
            if next_idx < N:
                img = Xs[next_idx]
                low, high = np.min(img), np.max(img)
                grid[y0:y1, x0:x1] = ubound * (img - low) / (high - low)
                # grid[y0:y1, x0:x1] = Xs[next_idx]
                next_idx += 1
            x0 += W + padding
            x1 += W + padding
        y0 += H + padding
        y1 += H + padding
    # grid_max = np.max(grid)
    # grid_min = np.min(grid)
    # grid = ubound * (grid - grid_min) / (grid_max - grid_min)
    return grid

展示Cout个卷积核：

from cs231n.vis_utils import visualize_grid

grid = visualize_grid(model.params['W1'].transpose(0, 2, 3, 1))
plt.imshow(grid.astype('uint8'))
plt.axis('off')
plt.gcf().set_size_inches(5, 5)
plt.show()

6. 空间归一化（Spatial Batch Normalization）

We already saw that batch normalization is a very useful technique for training deep fully-connected networks. As proposed in the original paper [3], batch normalization can also be used for convolutional networks, but we need to tweak it a bit; the modification will be called “spatial batch normalization.”

Normally batch-normalization accepts inputs of shape (N, D) and produces outputs of shape (N, D), where we normalize across the minibatch dimension N. For data coming from convolutional layers, batch normalization needs to accept inputs of shape (N, C, H, W) and produce outputs of shape (N, C, H, W) where the N dimension gives the minibatch size and the (H, W) dimensions give the spatial size of the feature map.

If the feature map was produced using convolutions, then we expect the statistics of each feature channel to be relatively consistent both between different imagesand different locations within the same image. Therefore spatial batch normalization computes a mean and variance for each of the C feature channels by computing statistics over both the minibatch dimension N and the spatial dimensions H and W.

[3] Sergey Ioffe and Christian Szegedy, “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift”, ICML 2015.

6.1 Spatial batch normalization: forward

In the file cs231n/layers.py, implement the forward pass for spatial batch normalization in the function spatial_batchnorm_forward.

def spatial_batchnorm_forward(x, gamma, beta, bn_param):
    """
    Computes the forward pass for spatial batch normalization.

    Inputs:
    - x: Input data of shape (N, C, H, W)
    - gamma: Scale parameter, of shape (C,)
    - beta: Shift parameter, of shape (C,)
    - bn_param: Dictionary with the following keys:
      - mode: 'train' or 'test'; required
      - eps: Constant for numeric stability
      - momentum: Constant for running mean / variance. momentum=0 means that
        old information is discarded completely at every time step, while
        momentum=1 means that new information is never incorporated. The
        default of momentum=0.9 should work well in most situations.
      - running_mean: Array of shape (D,) giving running mean of features
      - running_var Array of shape (D,) giving running variance of features

    Returns a tuple of:
    - out: Output data, of shape (N, C, H, W)
    - cache: Values needed for the backward pass
    """
    out, cache = None, None

    ###########################################################################
    # TODO: Implement the forward pass for spatial batch normalization.       #
    #                                                                         #
    # HINT: You can implement spatial batch normalization by calling the      #
    # vanilla version of batch normalization you implemented above.           #
    # Your implementation should be very short; ours is less than five lines. #
    ###########################################################################
    N,C,H,W = x.shape
    # (N,H,W,C)
    input = x.transpose(0,2,3,1).reshape(-1,C)
    temp_out,cache = batchnorm_forward(input,gamma,beta,bn_param)
    out = temp_out.reshape(N,H,W,C).transpose(0,3,1,2)
    # temp_output, cache = batchnorm_forward(x.transpose(0, 3, 2, 1).reshape((N * H * W, C)), gamma, beta, bn_param)
    # out = temp_output.reshape(N, W, H, C).transpose(0, 3, 2, 1)
    return out, cache

可以发现，对于输入为（N，C，H，W）的张量，空间归一化是把不同图像的不同位置的分布看成一个高斯分布。即类似于之前批量归一化中输入为（NHW，C）
测试

np.random.seed(231)
# Check the training-time forward pass by checking means and variances
# of features both before and after spatial batch normalization

N, C, H, W = 2, 3, 4, 5
x = 4 * np.random.randn(N, C, H, W) + 10

print('Before spatial batch normalization:')
print('  Shape: ', x.shape)
print('  Means: ', x.mean(axis=(0, 2, 3)))
print('  Stds: ', x.std(axis=(0, 2, 3)))

# Means should be close to zero and stds close to one
gamma, beta = np.ones(C), np.zeros(C)
bn_param = {'mode': 'train'}
out, _ = spatial_batchnorm_forward(x, gamma, beta, bn_param)
print('After spatial batch normalization:')
print('  Shape: ', out.shape)
print('  Means: ', out.mean(axis=(0, 2, 3)))
print('  Stds: ', out.std(axis=(0, 2, 3)))

# Means should be close to beta and stds close to gamma
gamma, beta = np.asarray([3, 4, 5]), np.asarray([6, 7, 8])
out, _ = spatial_batchnorm_forward(x, gamma, beta, bn_param)
print('After spatial batch normalization (nontrivial gamma, beta):')
print('  Shape: ', out.shape)
print('  Means: ', out.mean(axis=(0, 2, 3)))
print('  Stds: ', out.std(axis=(0, 2, 3)))

Before spatial batch normalization:
  Shape:  (2, 3, 4, 5)
  Means:  [9.33463814 8.90909116 9.11056338]
  Stds:  [3.61447857 3.19347686 3.5168142 ]
After spatial batch normalization:
  Shape:  (2, 3, 4, 5)
  Means:  [ 6.18949336e-16  5.99520433e-16 -1.22124533e-16]
  Stds:  [0.99999962 0.99999951 0.9999996 ]
After spatial batch normalization (nontrivial gamma, beta):
  Shape:  (2, 3, 4, 5)
  Means:  [6. 7. 8.]
  Stds:  [2.99999885 3.99999804 4.99999798]

6.2 Spatial batch normalization: backward

In the file cs231n/layers.py, implement the backward pass for spatial batch normalization in the function spatial_batchnorm_backward.

def spatial_batchnorm_backward(dout, cache):
    """
    Computes the backward pass for spatial batch normalization.

    Inputs:
    - dout: Upstream derivatives, of shape (N, C, H, W)
    - cache: Values from the forward pass

    Returns a tuple of:
    - dx: Gradient with respect to inputs, of shape (N, C, H, W)
    - dgamma: Gradient with respect to scale parameter, of shape (C,)
    - dbeta: Gradient with respect to shift parameter, of shape (C,)
    """
    dx, dgamma, dbeta = None, None, None

    ###########################################################################
    # TODO: Implement the backward pass for spatial batch normalization.      #
    #                                                                         #
    # HINT: You can implement spatial batch normalization by calling the      #
    # vanilla version of batch normalization you implemented above.           #
    # Your implementation should be very short; ours is less than five lines. #
    ###########################################################################
    N,C,H,W = dout.shape
    dout = dout.transpose(0,2,3,1).reshape(-1,C)
    dx,dgamma,dbeta = batchnorm_backward(dout,cache)
    dx = dx.reshape(N,H,W,C).transpose(0,3,1,2)
    return dx, dgamma, dbeta

7. 分组归一化（Group Normalization）

In the previous notebook, we mentioned that Layer Normalization is an alternative normalization technique that mitigates the batch size limitations of Batch Normalization. However, as the authors of [4] observed, Layer Normalization does not perform as well as Batch Normalization when used with Convolutional Layers:

With fully connected layers, all the hidden units in a layer tend to make similar contributions to the final prediction, and re-centering and rescaling the summed inputs to a layer works well. However, the assumption of similar contributions is no longer true for convolutional neural networks. The large number of the hidden units whose receptive fields lie near the boundary of the image are rarely turned on and thus have very different statistics from the rest of the hidden units within the same layer.

The authors of [5] propose an intermediary technique. In contrast to Layer Normalization, where you normalize over the entire feature per-datapoint, they suggest a consistent splitting of each per-datapoint feature into G groups, and a per-group per-datapoint normalization instead.

Even though an assumption of equal contribution is still being made within each group, the authors hypothesize that this is not as problematic, as innate grouping arises within features for visual recognition. One example they use to illustrate this is that many high-performance handcrafted features in traditional Computer Vision have terms that are explicitly grouped together. Take for example Histogram of Oriented Gradients [6]-- after computing histograms per spatially local block, each per-block histogram is normalized before being concatenated together to form the final feature vector.

You will now implement Group Normalization. Note that this normalization technique that you are to implement in the following cells was introduced and published to arXiv less than a month ago – this truly is still an ongoing and excitingly active field of research!

[4] Ba, Jimmy Lei, Jamie Ryan Kiros, and Geoffrey E. Hinton. “Layer Normalization.” stat 1050 (2016): 21.
[5] Wu, Yuxin, and Kaiming He. “Group Normalization.” arXiv preprint arXiv:1803.08494 (2018).
[6] N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. In Computer Vision and Pattern Recognition (CVPR), 2005.

7.1 前向传播

In the file cs231n/layers.py, implement the forward pass for group normalization in the function spatial_groupnorm_forward.

def spatial_groupnorm_forward(x, gamma, beta, G, gn_param):
    """
    Computes the forward pass for spatial group normalization.
    In contrast to layer normalization, group normalization splits each entry 
    in the data into G contiguous pieces, which it then normalizes independently.
    Per feature shifting and scaling are then applied to the data,
    in a manner identical to that of batch normalization and layer normalization.

    Inputs:
    - x: Input data of shape (N, C, H, W)
    - gamma: Scale parameter, of shape (C,)
    - beta: Shift parameter, of shape (C,)
    - G: Integer mumber of groups to split into, should be a divisor of C
    - gn_param: Dictionary with the following keys:
      - eps: Constant for numeric stability

    Returns a tuple of:
    - out: Output data, of shape (N, C, H, W)
    - cache: Values needed for the backward pass
    """
    out, cache = None, None
    eps = gn_param.get('eps',1e-5)
    ###########################################################################
    # TODO: Implement the forward pass for spatial group normalization.       #
    # This will be extremely similar to the layer norm implementation.        #
    # In particular, think about how you could transform the matrix so that   #
    # the bulk of the code is similar to both train-time batch normalization  #
    # and layer normalization!                                                # 
    ###########################################################################
    N,C,H,W = x.shape
    # of shape (N * G, C/G, H, W) -> (N * G, C/G * H * W)
    x = x.reshape(N * G,-1)
    # 下面是类似于 Layer norm 的过程
    # 先转换为 Batch norm 的过程

    x = x.T # (D',N')
    mean = np.mean(x, axis=0)  # (N',)
    var = np.var(x, axis=0)  # (N', )
    x_hat = (x - mean) / (np.sqrt(var + eps))  # (D',N')
    x_hat = x_hat.T  # (N',D')

    x_hat = x_hat.reshape(N,C,H,W)
    # gamma of shape (C,), beta of shape (C,)
    if len(gamma.shape) == 1:
        gamma = gamma[None,:,None,None]
    if len(beta.shape) == 1:
        beta = beta[None,:,None,None]
    out = x_hat * gamma + beta
    cache = (gamma, x_hat, x, mean, var, eps)
    return out, cache

7.2 反向传播

In the file cs231n/layers.py, implement the backward pass for spatial batch normalization in the function spatial_groupnorm_backward.

def spatial_groupnorm_backward(dout, cache):
    """
    Computes the backward pass for spatial group normalization.

    Inputs:
    - dout: Upstream derivatives, of shape (N, C, H, W)
    - cache: Values from the forward pass

    Returns a tuple of:
    - dx: Gradient with respect to inputs, of shape (N, C, H, W)
    - dgamma: Gradient with respect to scale parameter, of shape (C,)
    - dbeta: Gradient with respect to shift parameter, of shape (C,)
    """
    ###########################################################################
    # TODO: Implement the backward pass for spatial group normalization.      #
    # This will be extremely similar to the layer norm implementation.        #
    ###########################################################################
    # x of shape (N,G,C/G,H,W)
    # x_hat of shape (N,C,H,W)
    gamma, x_hat, x, mean, var, eps = cache
    dgamma = np.sum(dout * x_hat, axis=(0,2,3),keepdims=True) #(N,C,H,W) -> (1,C,1,1)
    dbeta = np.sum(dout, axis=(0,2,3),keepdims=True) #(N,C,H,W) -> (1,C,1,1)

    # 使用链式法则计算 dx of shape (N,G,1,1,1)
    N, G, C_G, H, W = x.shape
    dx_hat = dout * gamma
    # (N,G,C/G,H,W)
    dx_hat = dx_hat.reshape(*x.shape)

    # (C/G,H,W,   N,G) -- (D',N') 下面就类似于之前的 batch norm 了
    dx_hat = dx_hat.transpose(2,3,4,0,1)
    # (C/G,H,W,   N,G) -- (D',N')
    x = x.transpose(2, 3, 4, 0, 1)
    mean = mean.transpose(2, 3, 4, 0, 1)    # (1,1,1,  N,G)
    var = var.transpose(2, 3, 4, 0, 1)      # (1,1,1,  N,G)
    div = 1 / (np.sqrt(var + eps))


    # 第一条分支 of shape (D',N')
    dx_1 = div * dx_hat
    # 第二条分支 of shape (N',) -- (1,1,1,N,G)
    dmean = np.sum(- div * dx_hat,axis=(0,1,2),keepdims=True)
    # 第三条分支 of shape (N',) -- (1,1,1,N,G)
    dvar = np.sum(- 0.5 * (x - mean) * (div**3) * dx_hat,axis=(0,1,2),keepdims=True)
    # 计算 var 中的 mean 项
    newN = C_G * H * W
    dmean +=  np.sum(-2 * (x - mean) * dvar,axis=(0,1,2),keepdims = True) / newN
    # 计算 var 中的 xi 项
    dx_2 = 2 * (x - mean) * dvar / newN

    dx = dx_1 + dx_2 + dmean / newN
    # (C/G,H,W,   N,G) -> (N,G, C/G,H,W)
    dx = dx.transpose(3,4,0,1,2)
    dx = dx.reshape(*dout.shape)
    return dx, dgamma, dbeta

8. 实验

用之前实现的模块，在CIFAR10数据集上训练一个合适的CNN模型。
最新的作业，把这一部分放在了下一节，用Pytorch或Tensorflow实现一个CNN，这样更方便，而且还可以使用GPU进行训练。所以，具体的内容参考下一次的作业。

你可能感兴趣的:(#,CS231n,深度学习)

机器学习与深度学习间关系与区别 ℒℴѵℯ心·动ꦿ໊ོ꫞ 人工智能学习深度学习 python
一、机器学习概述定义机器学习（MachineLearning,ML）是一种通过数据驱动的方法，利用统计学和计算算法来训练模型，使计算机能够从数据中学习并自动进行预测或决策。机器学习通过分析大量数据样本，识别其中的模式和规律，从而对新的数据进行判断。其核心在于通过训练过程，让模型不断优化和提升其预测准确性。主要类型1.监督学习（SupervisedLearning）监督学习是指在训练数据集中包含输入
将cmd中命令输出保存为txt文本文件落难Coder Windows cmd window
最近深度学习本地的训练中我们常常要在命令行中运行自己的代码，无可厚非，我们有必要保存我们的炼丹结果，但是复制命令行输出到txt是非常麻烦的，其实Windows下的命令行为我们提供了相应的操作。其基本的调用格式就是：运行指令>输出到的文件名称或者具体保存路径测试下，我打开cmd并且ping一下百度：pingwww.baidu.com>./data.txt看下相同目录下data.txt的输出：如果你再
推荐3家毕业AI论文可五分钟一键生成！文末附免费教程！小猪包333 写论文人工智能 AI写作深度学习计算机视觉
在当前的学术研究和写作领域，AI论文生成器已经成为许多研究人员和学生的重要工具。这些工具不仅能够帮助用户快速生成高质量的论文内容，还能进行内容优化、查重和排版等操作。以下是三款值得推荐的AI论文生成器：千笔-AIPassPaper、懒人论文以及AIPaperPass。千笔-AIPassPaper千笔-AIPassPaper是一款基于深度学习和自然语言处理技术的AI写作助手，旨在帮助用户快速生成高质
AI大模型的架构演进与最新发展季风泯灭的季节 AI大模型应用技术二人工智能架构
随着深度学习的发展，AI大模型（LargeLanguageModels,LLMs）在自然语言处理、计算机视觉等领域取得了革命性的进展。本文将详细探讨AI大模型的架构演进，包括从Transformer的提出到GPT、BERT、T5等模型的历史演变，并探讨这些模型的技术细节及其在现代人工智能中的核心作用。一、基础模型介绍：Transformer的核心原理Transformer架构的背景在Transfo
[实践应用] 深度学习之模型性能评估指标 YuanDaima2048 深度学习工具使用深度学习人工智能损失函数性能评估 pytorch python 机器学习
文章总览：YuanDaiMa2048博客文章总览深度学习之模型性能评估指标分类任务回归任务排序任务聚类任务生成任务其他介绍在机器学习和深度学习领域，评估模型性能是一项至关重要的任务。不同的学习任务需要不同的性能指标来衡量模型的有效性。以下是对一些常见任务及其相应的性能评估指标的详细解释和总结。分类任务分类任务是指模型需要将输入数据分配到预定义的类别或标签中。以下是分类任务中常用的性能指标：准确率(
[实践应用] 深度学习之优化器 YuanDaima2048 深度学习工具使用 pytorch 深度学习人工智能机器学习 python 优化器
文章总览：YuanDaiMa2048博客文章总览深度学习之优化器1.随机梯度下降（SGD）2.动量优化（Momentum）3.自适应梯度（Adagrad）4.自适应矩估计（Adam）5.RMSprop总结其他介绍在深度学习中，优化器用于更新模型的参数，以最小化损失函数。常见的优化函数有很多种，下面是几种主流的优化器及其特点、原理和PyTorch实现：1.随机梯度下降（SGD）原理:随机梯度下降通过
生成式地图制图 Bwywb_3 深度学习机器学习深度学习生成对抗网络
生成式地图制图（GenerativeCartography）是一种利用生成式算法和人工智能技术自动创建地图的技术。它结合了传统的地理信息系统（GIS）技术与现代生成模型（如深度学习、GANs等），能够根据输入的数据自动生成符合需求的地图。这种方法在城市规划、虚拟环境设计、游戏开发等多个领域具有应用前景。主要特点：自动化生成：通过算法和模型，系统能够根据输入的地理或空间数据自动生成地图，而无需人工逐
吴恩达深度学习笔记(30)-正则化的解释极客Array
正则化（Regularization）深度学习可能存在过拟合问题——高方差，有两个解决方法，一个是正则化，另一个是准备更多的数据，这是非常可靠的方法，但你可能无法时时刻刻准备足够多的训练数据或者获取更多数据的成本很高，但正则化通常有助于避免过拟合或减少你的网络误差。如果你怀疑神经网络过度拟合了数据，即存在高方差问题，那么最先想到的方法可能是正则化，另一个解决高方差的方法就是准备更多数据，这也是非常
个人学习笔记7-6：动手学深度学习pytorch版-李沐浪子L 深度学习深度学习笔记计算机视觉 python 人工智能神经网络 pytorch
#人工智能##深度学习##语义分割##计算机视觉##神经网络#计算机视觉13.11全卷积网络全卷积网络（fullyconvolutionalnetwork，FCN）采用卷积神经网络实现了从图像像素到像素类别的变换。引入l转置卷积（transposedconvolution）实现的，输出的类别预测与输入图像在像素级别上具有一一对应关系：通道维的输出即该位置对应像素的类别预测。13.11.1构造模型下
深度学习-点击率预估-研究论文2024-09-14速读 sp_fyf_2024 深度学习人工智能
深度学习-点击率预估-研究论文2024-09-14速读1.DeepTargetSessionInterestNetworkforClick-ThroughRatePredictionHZhong,JMa,XDuan,SGu,JYao-2024InternationalJointConferenceonNeuralNetworks,2024深度目标会话兴趣网络用于点击率预测摘要：这篇文章提出了一种新
损失函数与反向传播 Star_. PyTorch pytorch 深度学习 python
损失函数定义与作用损失函数(lossfunction)在深度学习领域是用来计算搭建模型预测的输出值和真实值之间的误差。1.损失函数越小越好2.计算实际输出与目标之间的差距3.为更新输出提供依据（反向传播)常见的损失函数回归常见的损失函数有：均方差（MeanSquaredError，MSE）、平均绝对误差（MeanAbsoluteErrorLoss，MAE）、HuberLoss是一种将MSE与MAE
【深度学习】训练过程中一个OOM的问题，太难查了 weixin_40293999 深度学习深度学习人工智能
现象：各位大佬又遇到过ubuntu的这个问题么？现象是在训练过程中，ssh上不去了，能ping通，没死机，但是ubunutu的pc侧的显示器，鼠标啥都不好用了。只能重启。问题原因：OOM了95G，尼玛！！！！pytorch爆内存了，然后journald假死了，在journald被watchdog干掉之后，系统就崩溃了。这种规模的爆内存一般，即使被oomkill了，也要卡半天的，确实会这样，能不能配
云服务业界动态简报-20180128 Captain7
一、青云青云QingCloud推出深度学习平台DeepLearningonQingCloud，包含了主流的深度学习框架及数据科学工具包，通过QingCloudAppCenter一键部署交付，可以让算法工程师和数据科学家快速构建深度学习开发环境，将更多的精力放在模型和算法调优。二、腾讯云1.腾讯云正式发布腾讯专有云TCE(TencentCloudEnterprise)矩阵，涵盖企业版、大数据版、AI
机器学习VS深度学习 nfgo 机器学习
机器学习（MachineLearning,ML）和深度学习（DeepLearning,DL）是人工智能（AI）的两个子领域，它们有许多相似之处，但在技术实现和应用范围上也有显著区别。下面从几个方面对两者进行区分：1.概念层面机器学习：是让计算机通过算法从数据中自动学习和改进的技术。它依赖于手动设计的特征和数学模型来进行学习，常用的模型有决策树、支持向量机、线性回归等。深度学习：是机器学习的一个子领
大数据毕业设计hadoop+spark+hive知识图谱租房数据分析可视化大屏租房推荐系统 58同城租房爬虫房源推荐系统房价预测系统计算机毕业设计机器学习深度学习人工智能 2401_84572577 程序员大数据 hadoop 人工智能
做了那么多年开发，自学了很多门编程语言，我很明白学习资源对于学一门新语言的重要性，这些年也收藏了不少的Python干货，对我来说这些东西确实已经用不到了，但对于准备自学Python的人来说，或许它就是一个宝藏，可以给你省去很多的时间和精力。别在网上瞎学了，我最近也做了一些资源的更新，只要你是我的粉丝，这期福利你都可拿走。我先来介绍一下这些东西怎么用，文末抱走。（1）Python所有方向的学习路线（
深度学习-13-小语言模型之SmolLM的使用皮皮冰燃深度学习深度学习
文章附录1SmolLM概述1.1SmolLM简介1.2下载模型2运行2.1在CPU/GPU/多GPU上运行模型2.2使用torch.bfloat162.3通过位和字节的量化版本3应用示例4问题及解决4.1attention_mask和pad_token_id报错4.2max_new_tokens=205参考附录1SmolLM概述1.1SmolLM简介SmolLM是一系列尖端小型语言模型，提供三种规
基于深度学习的农作物病害检测 SEU-WYL 深度学习dnn 深度学习人工智能
基于深度学习的农作物病害检测利用卷积神经网络（CNN）、生成对抗网络（GAN）、Transformer等深度学习技术，自动识别和分类农作物的病害，帮助农业工作者提高作物管理效率、减少损失。1.农作物病害检测的挑战病害种类繁多：农作物病害的类型多样，不同病害在同一作物上的表现差异很大，同时同一种病害在不同生长阶段的症状也可能不同。环境影响：天气、光照、湿度等外部环境因素会影响农作物的表现，使得病害检
基于深度学习的文本引导的图像编辑 SEU-WYL 深度学习dnn 深度学习人工智能
基于深度学习的文本引导的图像编辑（Text-GuidedImageEditing）是一种通过自然语言文本指令对图像进行编辑或修改的技术。它结合了图像生成和自然语言处理（NLP）的最新进展，使用户能够通过描述性文本对图像内容进行精确的调整和操控。1.文本引导的图像编辑的挑战文本和图像之间的对齐：如何将文本中的语义信息准确地映射到图像中的特定区域或元素是一个关键挑战。这涉及到多模态数据的对齐和理解。编
深度学习--对抗生成网络（GAN, Generative Adversarial Network） Ambition_LAO 深度学习生成对抗网络
对抗生成网络（GAN,GenerativeAdversarialNetwork）是一种深度学习模型，由IanGoodfellow等人在2014年提出。GAN主要用于生成数据，通过两个神经网络相互对抗，来生成以假乱真的新数据。以下是对GAN的详细阐述，包括其概念、作用、核心要点、实现过程、代码实现和适用场景。1.概念GAN由两个神经网络组成：生成器（Generator）和判别器（Discrimina
深度学习：怎么看pth文件的参数奥利给少年深度学习人工智能
.pth文件是PyTorch模型的权重文件，它通常包含了训练好的模型的参数。要查看或使用这个文件，你可以按照以下步骤操作：1.确保你有模型的定义你需要有创建这个.pth文件时所用的模型的代码。这意味着你需要有模型的类定义和架构。2.加载模型权重使用PyTorch的load_state_dict方法来加载权重。这里是如何操作的：importtorchimporttorch.nnasnn#定义模型结构
chatgpt赋能python：如何在Python中安装Keras库？ turensu ChatGpt python chatgpt keras 计算机
如何在Python中安装Keras库？Keras是一个简单易用的神经网络库，由FrançoisChollet编写。它在Python编程语言中实现了深度学习的功能，可以使您更轻松地构建和试验不同类型的神经网络。如果您是一名Python开发人员，肯定会想知道如何在您的Python项目中安装Keras库。在本文中，我们将向您展示如何安装和配置Keras库。步骤1：安装Python要使用Keras库，您需
如何理解深度学习的训练过程奋斗的草莓熊深度学习人工智能 python scikit-learn virtualenv numpy pandas
文章目录1.训练是干什么？2.预训练模型进行训练，主要更改的是预训练模型的什么东西？1.训练是干什么？以yolov5为例子，训练的目的是把一组输入猫狗图像放到神经网络中，得到一个输出模型，这个模型下次可以直接用来识别哪个是猫，哪个是狗2.预训练模型进行训练，主要更改的是预训练模型的什么东西？超参数（Hyperparameters）：这是模型结构中定义的参数，比如：卷积核大小（kernel_size
Keras深度学习框架入门及实战指南司莹嫣Maude
Keras深度学习框架入门及实战指南keraskeras-team/keras:是一个基于Python的深度学习库，它没有使用数据库。适合用于深度学习任务的开发和实现，特别是对于需要使用Python深度学习库的场景。特点是深度学习库、Python、无数据库。项目地址:https://gitcode.com/gh_mirrors/ke/keras一、项目介绍Keras简介Keras是一款高级神经网络
深度学习驱动的车牌识别：技术演进与未来挑战逼子歌深度学习车牌识别神经网络字符识别 YOLO 卷积神经网络
一、引言1.1研究背景在当今社会，智能交通系统的发展日益重要，而车牌识别作为其关键组成部分，发挥着至关重要的作用。车牌识别技术广泛应用于交通管理、停车场管理、安防监控等领域。在交通管理中，它可以用于车辆识别、交通违法监控和车流统计等，提高交通管理的效率和准确性。在停车场管理中，实现车辆的自动识别和收费，提升管理和服务水平。在安防监控领域，可用于追踪嫌疑人及犯罪行为。深度学习的出现为车牌识别带来了重
每天五分钟玩转深度学习PyTorch：模型参数优化器torch.optim 幻风_huanfeng 深度学习框架pytorch 深度学习 pytorch 人工智能神经网络机器学习优化算法
本文重点在机器学习或者深度学习中，我们需要通过修改参数使得损失函数最小化(或最大化)，优化算法就是一种调整模型参数更新的策略。在pytorch中定义了优化器optim，我们可以使用它调用封装好的优化算法，然后传递给它神经网络模型参数，就可以对模型进行优化。本文是学习第6步(优化器)，参考链接pytorch的学习路线随机梯度下降算法在深度学习和机器学习中，梯度下降算法是最常用的参数更新方法，它的公式
什么是AIGC？有哪些免费工具？ chent_某位 AIGC
AIGC（AIGeneratedContent），即“人工智能生成内容”，是指通过人工智能技术自动生成各种类型的数字内容。AIGC让机器能够根据输入的信息或数据生成符合人类需求的文本、图像、音频、视频等内容，极大提高了内容创作的效率。AIGC的背景与起源随着深度学习和自然语言处理技术的快速发展，人工智能已经不再局限于简单的任务，如分类、预测和数据分析，而是具备了生成内容的能力。生成式AI模型，如O
transformer架构(Transformer Architecture)原理与代码实战案例讲解 AI架构设计之禅大数据AI人工智能 Python入门实战计算科学神经计算深度学习神经网络大数据人工智能大型语言模型 AI AGI LLM Java Python 架构设计 Agent RPA
transformer架构(TransformerArchitecture)原理与代码实战案例讲解关键词：Transformer,自注意力机制,编码器-解码器,预训练,微调,NLP,机器翻译作者：禅与计算机程序设计艺术/ZenandtheArtofComputerProgramming1.背景介绍1.1问题的由来自然语言处理（NLP）领域的发展经历了从规则驱动到统计驱动再到深度学习驱动的三个阶段。
如何有效的学习AI大模型？ Python程序员罗宾学习人工智能语言模型自然语言处理架构
学习AI大模型是一个系统性的过程，涉及到多个学科的知识。以下是一些建议，帮助你更有效地学习AI大模型：基础知识储备：数学基础：学习线性代数、概率论、统计学和微积分等，这些是理解机器学习算法的数学基础。编程技能：掌握至少一种编程语言，如Python，因为大多数AI模型都是用Python实现的。理论学习：机器学习基础：了解监督学习、非监督学习、强化学习等基本概念。深度学习：学习神经网络的基本结构，如卷
【深度学习】【OnnxRuntime】【Python】模型转化、环境搭建以及模型部署的详细教程牙牙要健康深度学习 onnx onnxruntime 深度学习 python 人工智能
【深度学习】【OnnxRuntime】【Python】模型转化、环境搭建以及模型部署的详细教程提示:博主取舍了很多大佬的博文并亲测有效,分享笔记邀大家共同学习讨论文章目录【深度学习】【OnnxRuntime】【Python】模型转化、环境搭建以及模型部署的详细教程前言模型转换--pytorch转onnxWindows平台搭建依赖环境onnxruntime调用onnx模型ONNXRuntime推理核
基于深度学习的多模态信息检索 SEU-WYL 深度学习dnn 深度学习人工智能
基于深度学习的多模态信息检索（MultimodalInformationRetrieval,MMIR）是指利用深度学习技术，从包含多种模态（如文本、图像、视频、音频等）的数据集中检索出满足用户查询意图的相关信息。这种方法不仅可以处理单一模态的数据，还可以在多种模态之间建立关联，从而更准确地满足用户需求。1.多模态信息检索的挑战异构数据表示：多模态数据通常具有不同的特征和表示形式（如文本的词嵌入与图
ios内付费 374016526 ios 内付费
近年来写了很多IOS的程序，内付费也用到不少，使用IOS的内付费实现起来比较麻烦，这里我写了一个简单的内付费包，希望对大家有帮助。具体使用如下: 这里的sender其实就是调用者，这里主要是为了回调使用。 [KuroStoreApi kuroStoreProductId:@"产品ID" storeSender:self storeFinishCallBa
20 款优秀的 Linux 终端仿真器 brotherlamp linux linux视频 linux资料 linux自学 linux教程
终端仿真器是一款用其它显示架构重现可视终端的计算机程序。换句话说就是终端仿真器能使哑终端看似像一台连接上了服务器的客户机。终端仿真器允许最终用户用文本用户界面和命令行来访问控制台和应用程序。（LCTT 译注：终端仿真器原意指对大型机-哑终端方式的模拟，不过在当今的 Linux 环境中，常指通过远程或本地方式连接的伪终端，俗称“终端”。）你能从开源世界中找到大量的终端仿真器，它们
Solr Deep Paging(solr 深分页) eksliang solr深分页 solr分页性能问题
转载请出自出处：http://eksliang.iteye.com/blog/2148370 作者：eksliang(ickes) blg:http://eksliang.iteye.com/ 概述长期以来，我们一直有一个深分页问题。如果直接跳到很靠后的页数，查询速度会比较慢。这是因为Solr的需要为查询从开始遍历所有数据。直到Solr的4.7这个问题一直没有一个很好的解决方案。直到solr
数据库面试题 18289753290 面试题数据库
1.union ,union all 网络搜索出的最佳答案： union和union all的区别是,union会自动压缩多个结果集合中的重复结果，而union all则将所有的结果全部显示出来，不管是不是重复。 Union：对两个结果集进行并集操作，不包括重复行，同时进行默认规则的排序； Union All：对两个结果集进行并集操作，包括重复行，不进行排序； 2.索引有哪些分类？作用是
Android TV屏幕适配酷的飞上天空 android
先说下现在市面上TV分辨率的大概情况两种分辨率为主 1.720标清，分辨率为1280x720. 屏幕尺寸以32寸为主，部分电视为42寸 2.1080p全高清，分辨率为1920x1080 屏幕尺寸以42寸为主，此分辨率电视屏幕从32寸到50寸都有适配遇到问题，已1080p尺寸为例：分辨率固定不变，屏幕尺寸变化较大。如：效果图尺寸为1920x1080，如果使用d
Timer定时器与ActionListener联合应用永夜-极光 java
功能:在控制台每秒输出一次代码: package Main; import javax.swing.Timer; import java.awt.event.*; public class T { private static int count = 0; public static void main(String[] args){
Ubuntu14.04系统Tab键不能自动补全问题解决随便小屋 Ubuntu 14.04
Unbuntu 14.4安装之后就在终端中使用Tab键不能自动补全，解决办法如下： 1、利用vi编辑器打开/etc/bash.bashrc文件（需要root权限） sudo vi /etc/bash.bashrc 接下来会提示输入密码 2、找到文件中的下列代码 #enable bash completion in interactive shells #if
学会人际关系三招轻松走职场 aijuans 职场
要想成功，仅有专业能力是不够的，处理好与老板、同事及下属的人际关系也是门大学问。如何才能在职场如鱼得水、游刃有余呢？在此，教您简单实用的三个窍门。　　第一，多汇报最近，管理学又提出了一个新名词“追随力”。它告诉我们，做下属最关键的就是要多请示汇报，让上司随时了解你的工作进度，有了新想法也要及时建议。不知不觉，你就有了“追随力”，上司会越来越了解和信任你。　　第二，勤沟通团队的力
《O2O：移动互联网时代的商业革命》读书笔记 aoyouzi 读书笔记
移动互联网的未来：碎片化内容+碎片化渠道=各式精准、互动的新型社会化营销。 O2O：Online to OffLine 线上线下活动 O2O就是在移动互联网时代，生活消费领域通过线上和线下互动的一种新型商业模式。手机二维码本质：O2O商务行为从线下现实世界到线上虚拟世界的入口。线上虚拟世界创造的本意是打破信息鸿沟，让不同地域、不同需求的人
js实现图片随鼠标滚动的效果百合不是茶 JavaScript 滚动属性的获取图片滚动属性获取页面加载
1,获取样式属性值 top 与顶部的距离 left 与左边的距离 right 与右边的距离 bottom 与下边的距离 zIndex 层叠层次例子:获取左边的宽度,当css写在body标签中时 <div id="adver" style="position:absolute;top:50px;left:1000p
ajax同步异步参数async bijian1013 jquery Ajax async
开发项目开发过程中，需要将ajax的返回值赋到全局变量中，然后在该页面其他地方引用，因为ajax异步的原因一直无法成功，需将async:false，使其变成同步的。格式： $.ajax({ type: 'POST', ur
Webx3框架（1） Bill_chen eclipse spring maven 框架 ibatis
Webx是淘宝开发的一套Web开发框架，Webx3是其第三个升级版本；采用Eclipse的开发环境，现在支持java开发；采用turbine原型的MVC框架，扩展了Spring容器，利用Maven进行项目的构建管理，灵活的ibatis持久层支持，总的来说，还是一套很不错的Web框架。 Webx3遵循turbine风格，velocity的模板被分为layout/screen/control三部
【MongoDB学习笔记五】MongoDB概述 bit1129 mongodb
MongoDB是面向文档的NoSQL数据库，尽量业界还对MongoDB存在一些质疑的声音，比如性能尤其是查询性能、数据一致性的支持没有想象的那么好，但是MongoDB用户群确实已经够多。MongoDB的亮点不在于它的性能，而是它处理非结构化数据的能力以及内置对分布式的支持(复制、分片达到的高可用、高可伸缩)，同时它提供的近似于SQL的查询能力，也是在做NoSQL技术选型时，考虑的一个重要因素。Mo
spring/hibernate/struts2常见异常总结白糖_ Hibernate
Spring ①ClassNotFoundException: org.aspectj.weaver.reflect.ReflectionWorld$ReflectionWorldException 缺少aspectjweaver.jar，该jar包常用于spring aop中 ②java.lang.ClassNotFoundException: org.sprin
jquery easyui表单重置(reset)扩展思路 bozch form jquery easyui reset
在jquery easyui表单中尚未提供表单重置的功能，这就需要自己对其进行扩展。扩展的时候要考虑的控件有： combo,combobox,combogrid,combotree,datebox,datetimebox 需要对其添加reset方法，reset方法就是把初始化的值赋值给当前的组件，这就需要在组件的初始化时将值保存下来。在所有的reset方法添加完毕之后，就需要对fo
编程之美-烙饼排序 bylijinnan 编程之美
package beautyOfCoding; import java.util.Arrays; /* *《编程之美》的思路是：搜索+剪枝。有点像是写下棋程序：当前情况下，把所有可能的下一步都做一遍；在这每一遍操作里面，计算出如果按这一步走的话，能不能赢（得出最优结果）。 *《编程之美》上代码有很多错误，且每个变量的含义令人费解。因此我按我的理解写了以下代码： */
Struts1.X 源码分析之ActionForm赋值原理 chenbowen00 struts
struts1在处理请求参数之前，首先会根据配置文件action节点的name属性创建对应的ActionForm。如果配置了name属性，却找不到对应的ActionForm类也不会报错，只是不会处理本次请求的请求参数。如果找到了对应的ActionForm类，则先判断是否已经存在ActionForm的实例，如果不存在则创建实例，并将其存放在对应的作用域中。作用域由配置文件action节点的s
[空天防御与经济]在获得充足的外部资源之前,太空投资需有限度 comsci 资源
这里有一个常识性的问题: 地球的资源,人类的资金是有限的,而太空是无限的..... 就算全人类联合起来,要在太空中修建大型空间站,也不一定能够成功,因为资源和资金,技术有客观的限制.... &
ORACLE临时表—ON COMMIT PRESERVE ROWS daizj oracle 临时表
ORACLE临时表转临时表：像普通表一样，有结构，但是对数据的管理上不一样，临时表存储事务或会话的中间结果集，临时表中保存的数据只对当前会话可见，所有会话都看不到其他会话的数据，即使其他会话提交了，也看不到。临时表不存在并发行为，因为他们对于当前会话都是独立的。创建临时表时，ORACLE只创建了表的结构（在数据字典中定义），并没有初始化内存空间，当某一会话使用临时表时，ORALCE会
基于Nginx XSendfile+SpringMVC进行文件下载 denger 应用服务器 Web nginx 网络应用 lighttpd
在平常我们实现文件下载通常是通过普通 read-write方式，如下代码所示。 @RequestMapping("/courseware/{id}") public void download(@PathVariable("id") String courseID, HttpServletResp
scanf接受char类型的字符 dcj3sjt126com c
/* 2013年3月11日22:35:54 目的：学习char只接受一个字符 */ # include <stdio.h> int main(void) { int i; char ch; scanf("%d", &i); printf("i = %d\n", i); scanf("%
学编程的价值 dcj3sjt126com 编程
发一个人会编程, 想想以后可以教儿女, 是多么美好的事啊, 不管儿女将来从事什么样的职业, 教一教, 对他思维的开拓大有帮助像这位朋友学习: http://blog.sina.com.cn/s/articlelist_2584320772_0_1.html VirtualGS教程 (By @林泰前): 几十年的老程序员，资深的
二维数组（矩阵）对角线输出飞天奔月二维数组
今天在BBS里面看到这样的面试题目, 1，二维数组（N*N），沿对角线方向，从右上角打印到左下角如N=4： 4*4二维数组 { 1 2 3 4 } { 5 6 7 8 } { 9 10 11 12 } {13 14 15 16 } 打印顺序 4 3 8 2 7 12 1 6 11 16 5 10 15 9 14 13 要
Ehcache（08）——可阻塞的Cache——BlockingCache 234390216 并发 ehcache BlockingCache 阻塞
可阻塞的Cache—BlockingCache 在上一节我们提到了显示使用Ehcache锁的问题，其实我们还可以隐式的来使用Ehcache的锁，那就是通过BlockingCache。BlockingCache是Ehcache的一个封装类，可以让我们对Ehcache进行并发操作。其内部的锁机制是使用的net.
mysqldiff对数据库间进行差异比较 jackyrong mysqld
mysqldiff该工具是官方mysql-utilities工具集的一个脚本，可以用来对比不同数据库之间的表结构，或者同个数据库间的表结构如果在windows下，直接下载mysql-utilities安装就可以了，然后运行后，会跑到命令行下： 1）基本用法 mysqldiff --server1=admin:12345
spring data jpa 方法中可用的关键字 lawrence.li java spring
spring data jpa 支持以方法名进行查询/删除/统计。查询的关键字为find 删除的关键字为delete/remove (>=1.7.x) 统计的关键字为count (>=1.7.x) 修改需要使用@Modifying注解 @Modifying @Query("update User u set u.firstna
Spring的ModelAndView类 nicegege spring
项目中controller的方法跳转的到ModelAndView类，一直很好奇spring怎么实现的？ /* * Copyright 2002-2010 the original author or authors. * * Licensed under the Apache License, Version 2.0 (the "License"); * yo
搭建 CentOS 6 服务器(13) - rsync、Amanda rensanning centos
（一）rsync Server端 # yum install rsync # vi /etc/xinetd.d/rsync service rsync { disable = no flags = IPv6 socket_type = stream wait
Learn Nodejs 02 toknowme nodejs
（1）npm是什么 npm is the package manager for node 官方网站：https://www.npmjs.com/ npm上有很多优秀的nodejs包，来解决常见的一些问题，比如用node-mysql，就可以方便通过nodejs链接到mysql，进行数据库的操作在开发过程往往会需要用到其他的包，使用npm就可以下载这些包来供程序调用 &nb
Spring MVC 拦截器 xp9802 spring mvc
Controller层的拦截器继承于HandlerInterceptorAdapter HandlerInterceptorAdapter.java 1 public abstract class HandlerInterceptorAdapter implements HandlerIntercep