CNN中的卷积操作

目录:

  • 1.CNN中的卷积操作
    • 直接卷积法
    • 通用矩阵乘法GEMM
  • 2.手动实现Conv2d

一、卷积神经网络中的卷积操作

CNN中的卷积操作_第1张图片
直接卷积法

代码实现:

# 根据公式计算卷积的尺寸
def cal_convoluation_size(input, kernel, padding=0, stride=1, dilation=1):
    new_kernel = dilation * (kernel - 1) + 1  # 空洞卷积,空洞数为0时dilation=1
    # 根据公式计算输出,并返回
    return math.floor((input + 2 * padding - new_kernel) / stride + 1)

# 简单版本的直接卷积法:不考虑padding,dilation=1,padding=0
def convoluation(image, kernel):
    image_height, image_width, channels = image.shape
    kernel_height, kernel_width = kernel.shape
    # 计算输出的形状大小
    out_height = cal_convoluation_size(image_height, kernel_height)
    out_width = cal_convoluation_size(image_width, kernel_width)
    output = np.zeros((out_height, out_width, channels))

    # 计算output的每个像素值
    # 先找到目标图(dx, dy)对应原图中的中心点位置(cx, cy),然后计算
    for dy in range(out_height):
        for dx in range(out_width):
            # 遍历kernel计算输出(output[dy, dx])的像素值
            for ky in range(kernel_height):
                for kx in range(kernel_width):
                    kernel_value = kernel[ky, kx]
                    pixel_value = image[dy + ky, dx + kx]
                    output[dy, dx] += kernel_value * pixel_value   
      
    return output
通用矩阵乘法GEMM

针对卷积速度慢的问题,使用GEMM进行优化。
(还可以对GEMM进一步优化,感兴趣的同学可以自行去了解下Winograd算法。)

GEMM的核心思想是img2col。img2col的流程如下:

CNN中的卷积操作_第2张图片

代码实现:

# 根据公式计算卷积的尺寸
def cal_convoluation_size(input, kernel, padding=0, stride=1, dilation=1):
    new_kernel = dilation * (kernel - 1) + 1  # 空洞卷积,空洞数为0时dilation=1
    # 根据公式计算输出,并返回
    return math.floor((input + 2 * padding - new_kernel) / stride + 1)

# 定义gemm卷积函数:先定义一个简单版本的,不考虑padding、stride、dilation
# images-->(N, C, H, W), kernels-->(out_channels, in_channels, kh, kw), 且 C = in_channels
# 输出结果output-->(N, out_channels, output_height, output_width)
def gemm(images, kernels, padding=0, stride=1, dilation=1):
    N, C, H, W = images.shape
    out_channels, in_channels, kh, kw = kernels.shape
    
    # 1.kernels转换为col: (out_channel, in_channel * kh * kw)
    kernel_col = kernels.reshape(out_channels, -1)
    
    # 2.img转换为col
    # 计算输出的形状大小
    out_height = cal_convoluation_size(H, kh, padding, stride, dilation)
    out_width = cal_convoluation_size(W, kw, padding, stride, dilation)
    # img_col的行数、列数
    kernel_count = kh * kw
    rows, cols = in_channels * kernel_count, out_height * out_width
    
    # 将图片的数量N放在高维,这样GEMM得到的结果不用再通过切片去拿
    img_col = np.zeros((N, rows, cols))
    for i in range(N):  # 第几张图片
        for idy in range(out_height):
            for idx in range(out_width):
                col_index = idy * out_width + idx
                for ic in range(C):  # C=in_channels
                    for iky in range(kh):
                        for ikx in range(kw):
                            row_index = ic * kernel_count + iky * kw + ikx                            # 赋值
                            img_col[i, row_index, col_index] = images[i, ic, idy + iky, idx + ikx]
    
    # 3.卷积计算之GEMM方法
    # (out_channels, in_channels * kh * kw) @ (N, in_channels * kh * kw, out_height * out_width)
    # = (N, out_channels, out_height * out_width)
    output = kernel_col @ img_col
    return output.reshape(N, out_channels, out_height, out_width)

二、手动实现Conv2d

反向传播时,需要将对columns的梯度转换为对输入image的梯度,即还要实现一个col2img。

代码实现:

# 2D卷积
class Conv2d(Module):
    def __init__(self, in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True):
        super().__init__()
        self.in_channels = in_channels
        self.out_channels = out_channels
        self.kernel_size = (kernel_size, kernel_size)
        self.stride = stride
        self.padding = padding
        self.dilation = dilation
        self.groups = groups
        self.bias = bias

        # 权重初始化:Xavier初始化、Kaiming初始化
        # fan_in = in_channels * kh * kw, fan_out = out_channels * kh * kw
        fan_in = in_channels * kernel_size * kernel_size
        bound = 1 / math.sqrt(fan_in)
        gain = math.sqrt(2)  # ReLU
        self.weight = Parameter(
            np.random.normal(0, gain * bound, size=(out_channels, in_channels, kernel_size, kernel_size)))
        self.bias = Parameter(np.random.uniform(-bound, bound, size=(out_channels,)))

    def forward(self, input):
        # 已添加padding和stride的逻辑,暂时不考虑dilation
        # 仔细思考了一下,加dilation不难,逻辑稍微修改一下即可:
        #  1)加dilation,只需将kernel变换一下即可,中间补0即可。——尚未优化
        self.input = input  # save for backward
        N, _, H, W = input.shape
        kh, kw = self.kernel_size

        # 计算输出的形状大小
        self.out_height = self.cal_convoluation_size(H, kh, self.padding, self.stride, self.dilation)  # save for backward
        self.out_width = self.cal_convoluation_size(W, kw, self.padding, self.stride, self.dilation)  # save for backward

        # kernel转换为col
        self.kernel_col = self.weight.data.reshape(self.out_channels, -1)  # save for backward

        # img转换为col
        self.columns = self.img2col(input, (self.out_channels, self.in_channels, kh, kw),
            (self.out_height, self.out_width), self.padding, self.stride, self.dilation)  # save for backward

        # 卷积计算之GEMM方法
        # (out_channels, in_channels * kh * kw) @ (N, in_channels * kh * kw, out_height * out_width)
        # = (N, out_channels, out_height * out_width)
        output = self.kernel_col @ self.columns + self.bias.data[..., None]

        # (N, out_channels, out_height * out_width) --> (N, out_channels, out_height, out_width)
        return output.reshape(N, self.out_channels, self.out_height, self.out_width)

    def backward(self, delta):
        '''
        反向计算weight和bias的梯度,同时计算并返回"误差对输入的"误差项
        delta:反向传递过来的"误差对输出的"误差项
        '''
        # (N, out_channels, out_height, out_width) --> (N, out_channels, out_height * out_width)
        delta = delta.reshape(len(delta), self.out_channels, -1)

        # 计算对weight的梯度
        # (N, out_channels, out_height * out_width) @ (N, out_height * out_width, in_channels * kh * kw)
        # = (N, out_channels, in_channels * kh * kw) --> (out_channels, in_channels * kh * kw)
        kernel_col_grad = np.sum(delta @ np.transpose(self.columns, axes=(0, 2, 1)), axis=0)  # 所有样本对weight的梯度相加
        # (out_channels, in_channels * kh * kw) --> (out_channels, in_channels, kh, kw)
        self.weight.grad += kernel_col_grad.reshape(self.out_channels, self.in_channels, *self.kernel_size)

        # 计算对bias的梯度
        # (N, out_channels, out_height * out_width) --> (out_channels,)
        self.bias.grad += np.sum(delta, axis=(0, 2))  # 所有样本对bias的梯度相加

        # 计算并返回"误差对输入的"误差项
        # (in_channels * kh * kw, out_channels) @ (N, out_channels, out_height * out_width)
        # = (N, in_channels * kh * kw, out_height * out_width)
        columns_delta = self.kernel_col.T @ delta
        return self.delta_col2img(columns_delta, self.input.shape,
                                  (self.out_channels, self.in_channels, *self.kernel_size),
                                  (self.out_height, self.out_width), self.padding, self.stride, self.dilation)

    # 根据公式计算卷积的尺寸
    def cal_convoluation_size(self, input, kernel, padding=0, stride=1, dilation=1):
        new_kernel = dilation * (kernel - 1) + 1  # 空洞卷积,空洞数为0时dilation=1
        # 根据公式计算输出,并返回
        return math.floor((input + 2 * padding - new_kernel) / stride + 1)

    # 将img2col从gemm中抽离出来,方便forward和backward
    def img2col(self, images, kernel_shape, out_shape, padding=0, stride=1, dilation=1):
        # 考虑padding
        N, C, H, W = images.shape
        new_images = np.zeros((N, C, H + 2 * padding, W + 2 * padding))  # 周围padding用0填充
        new_images[:, :, padding:H + padding, padding:W + padding] = images
        
        out_channels, in_channels, kh, kw = kernel_shape
        out_height, out_width = out_shape

        # img_col的行数、列数
        kernel_count = kh * kw
        rows, cols = in_channels * kernel_count, out_height * out_width

        # 将图片的数量N放在高维,这样GEMM得到的结果不用再通过切片去拿  
        columns = np.zeros((N, cols, rows))
        for idy in range(out_height):
            for idx in range(out_width):
                col_index = idy * out_width + idx
                start_y = self.stride * idy
                start_x = self.stride * idx
                columns[:, col_index] = new_images[:, :, start_y:start_y + kh, start_x:start_x + kw].reshape(N, -1)
                    
        return columns.transpose(0, 2, 1)

    def delta_col2img(self, columns_delta, input_shape, kernel_shape, out_shape, padding=0, stride=1, dilation=1):
        '''
        columns_delta: (N, in_channels * kh * kw, out_height * out_width)
        input_shape: (N, C, H, W)
        kernel_shape: (out_channels, in_channels, kh, kw)
        out_shape: (out_height, out_width)
        '''
        N, C, H, W = input_shape
        out_channels, in_channels, kh, kw = kernel_shape
        out_height, out_width = out_shape

        # 考虑padding
        images_delta = np.zeros((N, C, H + 2 * padding, W + 2 * padding))
        for i in range(N):  # 第几张图片
            for idy in range(out_height):
                for idx in range(out_width):
                    col_index = idy * out_width + idx
                    column_delta = columns_delta[i, :, col_index]  # (in_channels * kh * kw,)
                    # (in_channels * kh * kw,) --> (in_channels, kh, kw)
                    column_delta = column_delta.reshape(in_channels, kh, kw)
                    
                    # 将每一列的delta叠加到原图对应位置中
                    for ic, kernel_delta in enumerate(column_delta):
                        for iky, kh_delta in enumerate(kernel_delta):
                            for ikx, kw_delta in enumerate(kh_delta):
                                # 考虑stride
                                images_delta[i, ic, stride * idy + iky, stride * idx + ikx] += column_delta[ic, iky, ikx]

        # 考虑padding,去除外围的padding
        return images_delta[:, :, padding:H + padding, padding:W + padding]

你可能感兴趣的:(CNN中的卷积操作)