吴恩达老师深度学习第一门课的核心就是理解前向传播,计算cost,反向传播三个步骤(其实只要静下心来把教程里的公式在草稿纸上推一遍,也不会很难),吴恩达老师主要是通过逻辑回归来讲解这些概念
首先是如何处理输入样本,对于一张彩色图片,具有RGB三个通道,在计算机上就是通过三个矩阵来存储通道里的像素值,如上图中 dpi的彩图,其像素个数就有
,我们需要用一个特征向量x来表示这张图片,特征向量的维度就是图片的像素个数,即
,这是一个样本,对于m个样本,就可以定义一个二维矩阵,行为特征值,列为样本数,用代码表示就是这样
image = np.array(imageio.imread(r"cat.jpg")) #读取图片
num_px=64 #图片大小
my_image = np.array(Image.fromarray(image).resize((num_px, num_px),Image.ANTIALIAS)) #将图片重塑成指定大小num_px*num_px,Image.ANTIALIAS表示为高质量
my_image = my_image.reshape((1, num_px * num_px * 3)).T #将特征向量的shape转化为(1,num_px * num_px * 3).T
有了输入,现在就可以搭建我们的神经网络了,以两层的神经网络为例,即只有一个隐藏层。
假设输入层有m个样本,单个样本大小为num_px*num_px,则输入层的规模为(,m)=(num_px*num_px*3,m),隐藏层有
个神经元,输出层有
个神经元,首先需要初始化权重W和偏置b:
def param_init(layer_num,layer_dims):
W = []
b = []
for i in range(layer_num):
sW = np.random.randn(layer_size[i+1],layer_size[i])
sb = np.zeros((layer_size[i+1],1))
W.append(sW)
b.append(sb)
return W,b
或者也可以把初始化的值放在字典里:
def init_parameters(layer_dims):
parameters = {}
for i in range(1,len(layer_dims)):
parameters['W'+str(i)] = np.random.randn(layer_dims[i],layer_dims[i-1])
parameters['b'+str(i)] = np.zeros((layer_dims[i],1))
assert(parameters['W'+str(i)].shape == (layer_dims[i],layer_dims[i-1]))
assert(parameters['b'+str(i)].shape == (layer_dims[i],1))
return parameters
的维度为(
,
),
的维度为(
, 1),
的维度为(
,
),
的维度为(
, 1),下面进行三个过程的推导(偷懒,见下图):
def forward_propagation(X,W,b,layer_num,layer_size,activate_fun):
Z = []
A = []
for i in range(layer_num):
if i==0:
sZ = np.dot(W[i],X)+b[i]
else:
sZ = np.dot(W[i],A[i-1])+b[i]
sA = activate_fun[i](sZ)
Z.append(sZ)
A.append(sA)
return Z,A
def compute_cost(prediction,Y):
m = Y.shape[1]
logprobs = np.multiply(np.log(prediction+1e-5), Y) + np.multiply((1 - Y), np.log(1 - prediction+1e-5))
cost = (-1./m)*np.nansum(logprobs)
cost = np.squeeze(cost)
return cost
def backward_propagation(l,X,Y,W,Z,A,derivate_function):
dZ = list(range(l))
dA = list(range(l))
dW = list(range(l))
db = list(range(l))
m = Y.shape[1]
dZ[l-1] = A[l-1] - Y
for i in range(l-1,-1,-1):
if i>0:
dW[i] = (1/m)*np.dot(dZ[i],A[i-1].T)
else:
dW[i] = (1/m)*np.dot(dZ[i],X.T)
db[i] = (1/m)*np.sum(dZ[i],axis=1,keepdims=True)
dA[i-1] = np.dot(W[i].T,dZ[i])
dZ[i-1] = np.multiply(dA[i-1],np.int64(A[i-1]>0))
return dW,db
def update_param(W,b,dW,db,learning_rate=0.5):
for i in range(len(W)):
W[i] = W[i] - learning_rate*dW[i]
b[i] = b[i] - learning_rate*db[i]
return W,b
def predict(X,W,b,layer_num,layer_size,activate_fun):
# Computes probabilities using forward propagation, and classifies to 0/1 using 0.5 as the threshold.
### START CODE HERE ### (≈ 2 lines of code)
Z,A = forward_propagation(X,W,b,layer_num,layer_size,activate_fun)
predictions = np.round(A[layer_num-1])
### END CODE HERE ###
return predictions
上面代码的思想是完成一个可以构建任意层数,每层任意神经元个数的程序,接下来使用吴恩达老师深度学习第一门课第三周的作业进行验证。
X,Y = load_planar_dataset()
#使用逻辑回归的分类结果
clf = sklearn.linear_model.LogisticRegressionCV()
clf.fit(X.T,Y.T.ravel())
plot_decision_boundary(lambda x:clf.predict(x),X,Y.reshape(X[0,:].shape))
继续,这里构建了一个四层(layer_num)的神经网络,从输入层往后每层神经元个数分别为6,4,4,1:
layer_num = 4
layer_size = [X.shape[0],6,4,4,1]
activate_fun = [relu,relu,relu,sigmoid]
derivate_function = [derivate_relu,derivate_relu,derivate_relu,derivate_sigm]
W,b = param_init(layer_num,layer_size)
for i in range(20000):
Z,A = forward_propagation(X,W,b,layer_num,layer_size,activate_fun)
cost = compute_cost(A[layer_num-1],Y)
dW,db = backward_propagation(layer_num,X,Y,W,Z,A,derivate_function)
W,b = update_param(W,b,dW,db,learning_rate=0.07)
if i%100==0:
print("Cost after iteration %i: %f" % (i, cost))
# Print accuracy
predictions = predict(X,W,b,layer_num,layer_size,activate_fun)
print ('Accuracy: %d' % float((np.dot(Y, predictions.T) + np.dot(1 - Y, 1 - predictions.T)) / float(Y.size) * 100) + '%')
plot_decision_boundary(lambda x: predict(x.T,W,b,layer_num,layer_size,activate_fun), X, Y.reshape(X[0,:].shape))
plt.title("Decision Boundary for hidden layer size " + str(layer_num))
结果:
准确率为90%,虽然不高,但模型是可以用的,然后用了另外一个二分类的数据集,进行进一步验证,结果如下:
# Datasets
noisy_circles, noisy_moons, blobs, gaussian_quantiles, no_structure = load_extra_datasets()
datasets = {"noisy_circles": noisy_circles,
"noisy_moons": noisy_moons,
"blobs": blobs,
"gaussian_quantiles": gaussian_quantiles}
### START CODE HERE ### (choose your dataset)
dataset = "noisy_moons"
### END CODE HERE ###
X, Y = datasets[dataset]
X, Y = X.T, Y.reshape(1, Y.shape[0])
因为比较懒,所以没有对代码进一步优化,不过对于深度学习第一门课算是有了一个学习效果,还是一句话:只要静下心,慢慢推。总能推出来的
附代码链接:https://download.csdn.net/download/weixin_42149550/11634095