首先上观点:
为什么卷积的时候卷积核维度多了一维呢?因为有通道(channel)的存在。比如对于语音处理中常用的特征MFCC,一般来说网络的输入是[timestep, num_mfcc],对其使用一维卷积,则有:
import numpy as np
import tensorflow as tf
tf.enable_eager_execution()
np.random.seed(0)
timestep = 2
num_mfcc = 3
mfcc = np.random.rand(1,timestep,num_mfcc)
print(mfcc.shape)
print(mfcc)
mfcc = tf.convert_to_tensor(mfcc,dtype=tf.float32,name='mfcc')
out = tf.keras.layers.Conv1D(filters=4,kernel_size=2,strides=1,padding="same")(mfcc)
print(out.numpy().shape)
print(out.numpy())
# 结果
(1, 2, 3)
[[[0.5488135 0.71518937 0.60276338]
[0.54488318 0.4236548 0.64589411]]]
(1, 2, 4)
[[[ 0.07261014 -0.5585205 0.29929006 0.0478835 ]
[ 0.12995268 -0.06531391 0.13798538 -0.10690583]]]
如果使用二维卷积,则有:
import numpy as np
import tensorflow as tf
tf.enable_eager_execution()
np.random.seed(0)
timestep = 2
num_mfcc = 3
#mfcc = np.random.rand(1,timestep,num_mfcc)
mfcc = np.random.rand(1,timestep,num_mfcc,1)
print(mfcc.shape)
print(mfcc)
mfcc = tf.convert_to_tensor(mfcc,dtype=tf.float32,name='mfcc')
out = tf.keras.layers.Conv2D(filters=4,kernel_size=2,strides=1,padding="same")(mfcc)
print(out.numpy().shape)
print(out.numpy())
#结果
(1, 2, 3, 1)
[[[[0.5488135 ]
[0.71518937]
[0.60276338]]
[[0.54488318]
[0.4236548 ]
[0.64589411]]]]
(1, 2, 3, 4)
[[[[-0.03934591 -0.332482 -0.05778596 -0.24500753]
[-0.09447848 -0.49875036 -0.15079793 -0.08275565]
[ 0.11487824 -0.10501289 -0.1826699 -0.3210651 ]]
[[ 0.22380978 -0.15814964 -0.11571192 -0.04434876]
[ 0.20537378 -0.13075545 -0.03344373 -0.0848015 ]
[ 0.2155428 -0.17510419 -0.22685075 0.0272733 ]]]]
可以看到输入相同的情况下结果有很大的不同,原因是在一维卷积看来,num_mfcc这一维度相当于一维数据的通道数,所谓一维卷积的卷积核大小是kernal_size*num_mfcc,而在二维卷积看来,这个mfcc是个图片,num_mfcc是图片的宽度,整个图片的通道数只有1,所以卷积核大小是kernal_size*kernal_size*1。
卷积输出的维度只与卷积的两个参数有关:步移stride和卷积核数目filters(注意不是卷积核大小filter_size)
前者决定输出值的新timestep,后者决定输出值的最后一维(通常为特征维)的维度。
官方文档:
tf.keras.layers.Conv1D :https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv1D
tf.keras.layers.MaxPool1D:https://www.tensorflow.org/api_docs/python/tf/keras/layers/MaxPool1D
tf.keras.layers.Conv2D :https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2D
tf.keras.layers.MaxPool2D:https://www.tensorflow.org/api_docs/python/tf/keras/layers/MaxPool2D