

Pytorch To Keras

首先,我们必须有清楚的认识,网上以及github上一些所谓的pytorch转换Keras或者Keras转换成Pytorch的工具代码几乎不能运行或者有使用的局限性(比如仅仅能转换某一些模型),但是我们是可以用这些转换代码中看出一些端倪来,比如二者的参数的尺寸(shape)的形式、channel的排序(first or last)是否一样,掌握到差异性,就能根据这些差异自己编写转换代码,没错,自己编写转换代码,是最稳妥的办法整个过程也就分为两个部分。笔者将会以Nvidia开源的FlowNet为例,将开源的Pytorch代码转化为Keras模型。

  • 按照Pytorch中模型的结构,编写对应的Keras代码,用keras的函数式API,构建起来会非常方便。
  • 把Pytorch的模型参数,按照层的名称依次赋值给Keras的模型




Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            (None, 6, 512, 512)  0                                            
conv0 (Conv2D)                  (None, 64, 512, 512) 3520        input_1[0][0]                    
leaky_re_lu_1 (LeakyReLU)       (None, 64, 512, 512) 0           conv0[0][0]                      
zero_padding2d_1 (ZeroPadding2D (None, 64, 514, 514) 0           leaky_re_lu_1[0][0]              
conv1 (Conv2D)                  (None, 64, 256, 256) 36928       zero_padding2d_1[0][0]           
leaky_re_lu_2 (LeakyReLU)       (None, 64, 256, 256) 0           conv1[0][0]                      
conv1_1 (Conv2D)                (None, 128, 256, 256 73856       leaky_re_lu_2[0][0]              
leaky_re_lu_3 (LeakyReLU)       (None, 128, 256, 256 0           conv1_1[0][0]                    
zero_padding2d_2 (ZeroPadding2D (None, 128, 258, 258 0           leaky_re_lu_3[0][0]              
conv2 (Conv2D)                  (None, 128, 128, 128 147584      zero_padding2d_2[0][0]           
leaky_re_lu_4 (LeakyReLU)       (None, 128, 128, 128 0           conv2[0][0]                      
conv2_1 (Conv2D)                (None, 128, 128, 128 147584      leaky_re_lu_4[0][0]              
leaky_re_lu_5 (LeakyReLU)       (None, 128, 128, 128 0           conv2_1[0][0]                    
zero_padding2d_3 (ZeroPadding2D (None, 128, 130, 130 0           leaky_re_lu_5[0][0]              
conv3 (Conv2D)                  (None, 256, 64, 64)  295168      zero_padding2d_3[0][0]           
leaky_re_lu_6 (LeakyReLU)       (None, 256, 64, 64)  0           conv3[0][0]                      
conv3_1 (Conv2D)                (None, 256, 64, 64)  590080      leaky_re_lu_6[0][0]              
leaky_re_lu_7 (LeakyReLU)       (None, 256, 64, 64)  0           conv3_1[0][0]                    
zero_padding2d_4 (ZeroPadding2D (None, 256, 66, 66)  0           leaky_re_lu_7[0][0]              
conv4 (Conv2D)                  (None, 512, 32, 32)  1180160     zero_padding2d_4[0][0]           
leaky_re_lu_8 (LeakyReLU)       (None, 512, 32, 32)  0           conv4[0][0]                      
conv4_1 (Conv2D)                (None, 512, 32, 32)  2359808     leaky_re_lu_8[0][0]              
leaky_re_lu_9 (LeakyReLU)       (None, 512, 32, 32)  0           conv4_1[0][0]                    
zero_padding2d_5 (ZeroPadding2D (None, 512, 34, 34)  0           leaky_re_lu_9[0][0]              
conv5 (Conv2D)                  (None, 512, 16, 16)  2359808     zero_padding2d_5[0][0]           
leaky_re_lu_10 (LeakyReLU)      (None, 512, 16, 16)  0           conv5[0][0]                      
conv5_1 (Conv2D)                (None, 512, 16, 16)  2359808     leaky_re_lu_10[0][0]             
leaky_re_lu_11 (LeakyReLU)      (None, 512, 16, 16)  0           conv5_1[0][0]                    
zero_padding2d_6 (ZeroPadding2D (None, 512, 18, 18)  0           leaky_re_lu_11[0][0]             
conv6 (Conv2D)                  (None, 1024, 8, 8)   4719616     zero_padding2d_6[0][0]           
leaky_re_lu_12 (LeakyReLU)      (None, 1024, 8, 8)   0           conv6[0][0]                      
conv6_1 (Conv2D)                (None, 1024, 8, 8)   9438208     leaky_re_lu_12[0][0]             
leaky_re_lu_13 (LeakyReLU)      (None, 1024, 8, 8)   0           conv6_1[0][0]                    
deconv5 (Conv2DTranspose)       (None, 512, 16, 16)  8389120     leaky_re_lu_13[0][0]             
predict_flow6 (Conv2D)          (None, 2, 8, 8)      18434       leaky_re_lu_13[0][0]             
leaky_re_lu_14 (LeakyReLU)      (None, 512, 16, 16)  0           deconv5[0][0]                    
upsampled_flow6_to_5 (Conv2DTra (None, 2, 16, 16)    66          predict_flow6[0][0]              
concatenate_1 (Concatenate)     (None, 1026, 16, 16) 0           leaky_re_lu_11[0][0]             
inter_conv5 (Conv2D)            (None, 512, 16, 16)  4728320     concatenate_1[0][0]              
deconv4 (Conv2DTranspose)       (None, 256, 32, 32)  4202752     concatenate_1[0][0]              
predict_flow5 (Conv2D)          (None, 2, 16, 16)    9218        inter_conv5[0][0]                
leaky_re_lu_15 (LeakyReLU)      (None, 256, 32, 32)  0           deconv4[0][0]                    
upsampled_flow5_to4 (Conv2DTran (None, 2, 32, 32)    66          predict_flow5[0][0]              
concatenate_2 (Concatenate)     (None, 770, 32, 32)  0           leaky_re_lu_9[0][0]              
inter_conv4 (Conv2D)            (None, 256, 32, 32)  1774336     concatenate_2[0][0]              
deconv3 (Conv2DTranspose)       (None, 128, 64, 64)  1577088     concatenate_2[0][0]              
predict_flow4 (Conv2D)          (None, 2, 32, 32)    4610        inter_conv4[0][0]                
leaky_re_lu_16 (LeakyReLU)      (None, 128, 64, 64)  0           deconv3[0][0]                    
upsampled_flow4_to3 (Conv2DTran (None, 2, 64, 64)    66          predict_flow4[0][0]              
concatenate_3 (Concatenate)     (None, 386, 64, 64)  0           leaky_re_lu_7[0][0]              
inter_conv3 (Conv2D)            (None, 128, 64, 64)  444800      concatenate_3[0][0]              
deconv2 (Conv2DTranspose)       (None, 64, 128, 128) 395328      concatenate_3[0][0]              
predict_flow3 (Conv2D)          (None, 2, 64, 64)    2306        inter_conv3[0][0]                
leaky_re_lu_17 (LeakyReLU)      (None, 64, 128, 128) 0           deconv2[0][0]                    
upsampled_flow3_to2 (Conv2DTran (None, 2, 128, 128)  66          predict_flow3[0][0]              
concatenate_4 (Concatenate)     (None, 194, 128, 128 0           leaky_re_lu_5[0][0]              
inter_conv2 (Conv2D)            (None, 64, 128, 128) 111808      concatenate_4[0][0]              
predict_flow2 (Conv2D)          (None, 2, 128, 128)  1154        inter_conv2[0][0]                
up_sampling2d_1 (UpSampling2D)  (None, 2, 512, 512)  0           predict_flow2[0][0]  


  (conv0): Sequential(
    (0): Conv2d(6, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): LeakyReLU(negative_slope=0.1, inplace)
  (conv1): Sequential(
    (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (1): LeakyReLU(negative_slope=0.1, inplace)
  (conv1_1): Sequential(
    (0): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): LeakyReLU(negative_slope=0.1, inplace)
  (conv2): Sequential(
    (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (1): LeakyReLU(negative_slope=0.1, inplace)
  (conv2_1): Sequential(
    (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): LeakyReLU(negative_slope=0.1, inplace)
  (conv3): Sequential(
    (0): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (1): LeakyReLU(negative_slope=0.1, inplace)
  (conv3_1): Sequential(
    (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): LeakyReLU(negative_slope=0.1, inplace)
  (conv4): Sequential(
    (0): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (1): LeakyReLU(negative_slope=0.1, inplace)
  (conv4_1): Sequential(
    (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): LeakyReLU(negative_slope=0.1, inplace)
  (conv5): Sequential(
    (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (1): LeakyReLU(negative_slope=0.1, inplace)
  (conv5_1): Sequential(
    (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): LeakyReLU(negative_slope=0.1, inplace)
  (conv6): Sequential(
    (0): Conv2d(512, 1024, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
    (1): LeakyReLU(negative_slope=0.1, inplace)
  (conv6_1): Sequential(
    (0): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): LeakyReLU(negative_slope=0.1, inplace)
  (deconv5): Sequential(
    (0): ConvTranspose2d(1024, 512, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
    (1): LeakyReLU(negative_slope=0.1, inplace)
  (deconv4): Sequential(
    (0): ConvTranspose2d(1026, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
    (1): LeakyReLU(negative_slope=0.1, inplace)
  (deconv3): Sequential(
    (0): ConvTranspose2d(770, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
    (1): LeakyReLU(negative_slope=0.1, inplace)
  (deconv2): Sequential(
    (0): ConvTranspose2d(386, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
    (1): LeakyReLU(negative_slope=0.1, inplace)
  (inter_conv5): Sequential(
    (0): Conv2d(1026, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (inter_conv4): Sequential(
    (0): Conv2d(770, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (inter_conv3): Sequential(
    (0): Conv2d(386, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (inter_conv2): Sequential(
    (0): Conv2d(194, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (predict_flow6): Conv2d(1024, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (predict_flow5): Conv2d(512, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (predict_flow4): Conv2d(256, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (predict_flow3): Conv2d(128, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (predict_flow2): Conv2d(64, 2, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (upsampled_flow6_to_5): ConvTranspose2d(2, 2, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
  (upsampled_flow5_to_4): ConvTranspose2d(2, 2, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
  (upsampled_flow4_to_3): ConvTranspose2d(2, 2, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
  (upsampled_flow3_to_2): ConvTranspose2d(2, 2, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
  (upsample1): Upsample(scale_factor=4.0, mode=bilinear)
conv0 Sequential(
  (0): Conv2d(6, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): LeakyReLU(negative_slope=0.1, inplace)
conv0.0 Conv2d(6, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
conv0.1 LeakyReLU(negative_slope=0.1, inplace)
conv1 Sequential(
  (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
  (1): LeakyReLU(negative_slope=0.1, inplace)
conv1.0 Conv2d(64, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
conv1.1 LeakyReLU(negative_slope=0.1, inplace)
conv1_1 Sequential(
  (0): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): LeakyReLU(negative_slope=0.1, inplace)
conv1_1.0 Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
conv1_1.1 LeakyReLU(negative_slope=0.1, inplace)
conv2 Sequential(
  (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
  (1): LeakyReLU(negative_slope=0.1, inplace)
conv2.0 Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
conv2.1 LeakyReLU(negative_slope=0.1, inplace)
conv2_1 Sequential(
  (0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): LeakyReLU(negative_slope=0.1, inplace)
conv2_1.0 Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
conv2_1.1 LeakyReLU(negative_slope=0.1, inplace)
conv3 Sequential(
  (0): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
  (1): LeakyReLU(negative_slope=0.1, inplace)
conv3.0 Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
conv3.1 LeakyReLU(negative_slope=0.1, inplace)
conv3_1 Sequential(
  (0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): LeakyReLU(negative_slope=0.1, inplace)
conv3_1.0 Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
conv3_1.1 LeakyReLU(negative_slope=0.1, inplace)
conv4 Sequential(
  (0): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
  (1): LeakyReLU(negative_slope=0.1, inplace)
conv4.0 Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
conv4.1 LeakyReLU(negative_slope=0.1, inplace)
conv4_1 Sequential(
  (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): LeakyReLU(negative_slope=0.1, inplace)
conv4_1.0 Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
conv4_1.1 LeakyReLU(negative_slope=0.1, inplace)
conv5 Sequential(
  (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
  (1): LeakyReLU(negative_slope=0.1, inplace)
conv5.0 Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
conv5.1 LeakyReLU(negative_slope=0.1, inplace)
conv5_1 Sequential(
  (0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): LeakyReLU(negative_slope=0.1, inplace)
conv5_1.0 Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
conv5_1.1 LeakyReLU(negative_slope=0.1, inplace)
conv6 Sequential(
  (0): Conv2d(512, 1024, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
  (1): LeakyReLU(negative_slope=0.1, inplace)
conv6.0 Conv2d(512, 1024, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
conv6.1 LeakyReLU(negative_slope=0.1, inplace)
conv6_1 Sequential(
  (0): Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): LeakyReLU(negative_slope=0.1, inplace)
conv6_1.0 Conv2d(1024, 1024, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
conv6_1.1 LeakyReLU(negative_slope=0.1, inplace)
deconv5 Sequential(
  (0): ConvTranspose2d(1024, 512, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
  (1): LeakyReLU(negative_slope=0.1, inplace)
deconv5.0 ConvTranspose2d(1024, 512, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
deconv5.1 LeakyReLU(negative_slope=0.1, inplace)
deconv4 Sequential(
  (0): ConvTranspose2d(1026, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
  (1): LeakyReLU(negative_slope=0.1, inplace)
deconv4.0 ConvTranspose2d(1026, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
deconv4.1 LeakyReLU(negative_slope=0.1, inplace)
deconv3 Sequential(
  (0): ConvTranspose2d(770, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
  (1): LeakyReLU(negative_slope=0.1, inplace)
deconv3.0 ConvTranspose2d(770, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
deconv3.1 LeakyReLU(negative_slope=0.1, inplace)
deconv2 Sequential(
  (0): ConvTranspose2d(386, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
  (1): LeakyReLU(negative_slope=0.1, inplace)
deconv2.0 ConvTranspose2d(386, 64, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
deconv2.1 LeakyReLU(negative_slope=0.1, inplace)
inter_conv5 Sequential(
  (0): Conv2d(1026, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
inter_conv5.0 Conv2d(1026, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
inter_conv4 Sequential(
  (0): Conv2d(770, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
inter_conv4.0 Conv2d(770, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
inter_conv3 Sequential(
  (0): Conv2d(386, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
inter_conv3.0 Conv2d(386, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
inter_conv2 Sequential(
  (0): Conv2d(194, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))





  • Pytorch是channels_first的,Keras默认是channels_last,在代码开头加上这两句:
  • K.set_image_data_format(‘channels_first’)

  • 众所周知,卷积层的权重是一个4维张量,那么,在Pytorch和keras中,卷积核的权重的形式是否一致的,那自然是不一致的,要不然我为啥还要写这一点。那么就涉及到Pytorch权重的变形。
  • 既然卷积层权重形式在两个框架是不一致的,转置卷积自然也是不一致的。




    for l in model.layers:
        for i, w in enumerate(l.get_weights()):
            print('%d'%i  , w.shape)

第一个卷积层输出如下 0之后是卷积权重的shape,1之后的是偏置项
0 (3, 3, 6, 64)
1 (64,)

所以Keras的卷积层权重形式是[ height, width, input_channels, out_channels]


    net = FlowNet2SD()
    for n, m in net.named_parameters():

torch.Size([64, 6, 3, 3])

通过对比我们可以发现,Pytorch的卷积层shape是[ out_channels, input_channels, height, width]的形式。



0 (4, 4, 256, 1026)
1 (256,)

可以看出在Keras中,转置卷积形式是 [ height, width, out_channels, input_channels]


torch.Size([1026, 256, 4, 4])

可以看出在Pytorch中,转置卷积形式是 [ input_channels,out_channels,height, width]


  • 对于卷积层来说,Pytorch的权重需要使用

np.transpose(, [2, 3, 1, 0])


  • 对于转置卷积来说,通过对比其实也是一样的。不信你去试试嘛。O(∩_∩)O哈哈~
  • 对于偏置项,两种模块都是一维的向量,不需要处理。
  • 有的情况还可能需要通道颠倒一下,但是很少需要这样做。




for k,v in weights_from_torch.items():
    if 'bias' not in k:
        weights_from_torch[k] =, 3, 1, 0)


k_model = k_model()
for layer in k_model.layers:
    current_layer_name =
    if current_layer_name=='conv0':
        weights = [weights_from_torch['conv0.0.weight'],weights_from_torch['conv0.0.bias']]
    elif current_layer_name=='conv1':
        weights = [weights_from_torch['conv1.0.weight'],weights_from_torch['conv1.0.bias']]
    elif current_layer_name=='conv1_1':
        weights = [weights_from_torch['conv1_1.0.weight'],weights_from_torch['conv1_1.0.bias']]

赋值需要用save_weights,其参数需要是一个列表,形式和get_weights的返回结果一致,即 [ conv_weights, bias_weights]

