上图的单神经元就完成了下图中函数的计算。下图的函数又被称为ReLU(修正线性单元)函数
复杂的神经网络也是由这些简单的单神经元构成。实现后,要得到结果,只需要输入即可。
x那一列是输入,y是输出,中间是隐藏单元,由神经网络自己定义
领域 | 所用的神经网络 |
---|---|
房产预测等领域 | 标准架构 Standarded NN |
计算机视觉 | 卷积神经网络 CNN |
音频,文字翻译(一维序列问题) | 循环神经网络 RNN |
无人驾驶 | 更为复杂的架构,或是几种架构的组合 |
结构化数据:每个特征都有明确的定义
非结构化数据:更难处理
AIM:同时能够处理结构化数据和非结构化数据
用小写字母m表示数据规模,数据是有标签的数据(有输入x和输出y)
算法的创新往往是让神经网络运行地更快
原本用sigmod函数,后来改成ReLU函数,学习效果就有了显著提升。
(x,y)为单个数据,x为特征向量,y为标签,对于二分分类,是0或1。
x的维度是 n x 或 n n_x或n nx或n
用 m m m表示数据集的数量,{ ( x ( 1 ) , y ( 1 ) ) , ( x ( 2 ) , y ( 2 ) ) , . . . , ( x ( m ) , y ( m ) ) (x^{(1)},y^{(1)}),(x^{(2)},y^{(2)}),...,(x^{(m)},y^{(m)}) (x(1),y(1)),(x(2),y(2)),...,(x(m),y(m))}
X X X所代表的矩阵如下,是一个 n x n_x nx行, m m m列的矩阵
[ ⋮ ⋮ ⋯ ⋮ x ( 1 ) x ( 2 ) ⋯ x ( m ) ⋮ ⋮ ⋯ ⋮ ] \left[ \begin{matrix} \vdots & \vdots & \cdots & \vdots\\ x^{(1)} & x^{(2)} & \cdots & x^{(m)}\\ \vdots & \vdots & \cdots & \vdots\\ \end{matrix} \right] ⎣⎢⎢⎡⋮x(1)⋮⋮x(2)⋮⋯⋯⋯⋮x(m)⋮⎦⎥⎥⎤
这里对于每组向量最好就用列向量而非行向量
Y Y Y所表示的矩阵如下,是一个1行m列的矩阵
[ y ( 1 ) y ( 2 ) ⋯ y ( m ) ] \left[ \begin{matrix} y^{(1)} & y^{(2)} & \cdots & y^{(m)}\\ \end{matrix} \right] [y(1)y(2)⋯y(m)]
有了训练集x, 想要输出y,其实就是算 y ^ \hat{y} y^,也就是给了一张图,要算出这张图是“猫片”的概率。
x ∈ R n x x{\in}R^{n_x} x∈Rnx,关于logistic参数: w ∈ R n x w{\in}R^{n_x} w∈Rnx, b ∈ R b{\in}R b∈R
最后的输出 y ^ \hat{y} y^ = w T x + b {w^T}x+b wTx+b,这是一个线性函数,不过很显然,我想要求得概率,但这个一次方程最后的结果很可能大于1或为负值,这就不是一个有意义的概率。于是有了 s i g m o i d sigmoid sigmoid函数, σ ( z ) = 1 1 + e ( − z ) \sigma(z)=\frac{1}{1+e^{(-z)}} σ(z)=1+e(−z)1 这个 s i g m o i d sigmoid sigmoid函数是一个比较经典的激活函数,不过为了提升效果,也会使用别的激活函数。详细内容在下一次内容中会有介绍。
这里,w和b是分开表示的两个参数,其中b对应了一个拦截器。
y ^ \hat{y} y^ = σ ( w T x + b ) , 其 中 σ ( z ) = 1 1 + e ( z ) \sigma({w^T}x+b),其中\sigma(z)=\frac{1}{1+e^{(z)}} σ(wTx+b),其中σ(z)=1+e(z)1
这里的目的是从已经给定的含有m个样本的训练集{ ( x ( 1 ) , y ( 1 ) ) , ( x ( 2 ) , y ( 2 ) ) , . . . , ( x ( m ) , y ( m ) ) (x^{(1)},y^{(1)}),(x^{(2)},y^{(2)}),...,(x^{(m)},y^{(m)}) (x(1),y(1)),(x(2),y(2)),...,(x(m),y(m))}中找到符合的w和b。并且使输出 y ^ ( i ) \hat{y}^{(i)} y^(i)尽量等于 y ( i ) y^{(i)} y(i)
L o s s Loss Loss f u n c t i o n function function:希望是一个凸函数
L ( y ^ , y ) = − ( y ∗ l o g ( y ^ ) + ( 1 − y ) ∗ l o g ( 1 − y ^ ) ) L(\hat{y},y)=-(y*log(\hat{y})+(1-y)*log(1-\hat{y})) L(y^,y)=−(y∗log(y^)+(1−y)∗log(1−y^))
损失函数是在单个训练样本中定义的,衡量的是在单个样本上的表现。而要衡量在全体训练样本上的表现,就要用到 c o s t cost cost f u n c t i o n function function
Cost function J ( w , b ) = 1 m Σ i = 1 m L ( y ^ , y ) J(w,b)=\frac{1}{m}\Sigma_{i=1}^{m}L(\hat{y},y) J(w,b)=m1Σi=1mL(y^,y)
最终是要这个成本函数为最小。
图中的成本函数是一个凸函数,要让w与b的取值使J最小。
大致方法是先初始化一个点,然后以一定的步长朝最陡的下坡方向走。最终希望收敛到全局最优解。走一步就是一次迭代。
实例中, J ( a , b , c ) = 3 ( a + b ∗ c ) J(a,b,c)=3(a+b*c) J(a,b,c)=3(a+b∗c)
下面是forward过程,就是算成本函数J的过程。
下面是backword过程,也就是要求得成本函数对于各个分量的偏导数。
把大量的样本放到高维数组中,利用python里的numpy作线性运算(并行化运算),而不是进行for循环的嵌套。
这里放两段练习中的例子
import time
x1 = [9, 2, 5, 0, 0, 7, 5, 0, 0, 0, 9, 2, 5, 0, 0]
x2 = [9, 2, 2, 9, 0, 9, 2, 5, 0, 0, 9, 2, 5, 0, 0]
### CLASSIC DOT PRODUCT OF VECTORS IMPLEMENTATION ###
tic = time.process_time()
dot = 0
for i in range(len(x1)):
dot+= x1[i]*x2[i]
toc = time.process_time()
print ("dot = " + str(dot) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")
### CLASSIC OUTER PRODUCT IMPLEMENTATION ###
tic = time.process_time()
outer = np.zeros((len(x1),len(x2))) # we create a len(x1)*len(x2) matrix with only zeros
for i in range(len(x1)):
for j in range(len(x2)):
outer[i,j] = x1[i]*x2[j]
toc = time.process_time()
print ("outer = " + str(outer) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")
### CLASSIC ELEMENTWISE IMPLEMENTATION ###
tic = time.process_time()
mul = np.zeros(len(x1))
for i in range(len(x1)):
mul[i] = x1[i]*x2[i]
toc = time.process_time()
print ("elementwise multiplication = " + str(mul) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")
### CLASSIC GENERAL DOT PRODUCT IMPLEMENTATION ###
W = np.random.rand(3,len(x1)) # Random 3*len(x1) numpy array
tic = time.process_time()
gdot = np.zeros(W.shape[0])
for i in range(W.shape[0]):
for j in range(len(x1)):
gdot[i] += W[i,j]*x1[j]
toc = time.process_time()
print ("gdot = " + str(gdot) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")
########################以上是传统方法,下面是向量化方法################################
x1 = [9, 2, 5, 0, 0, 7, 5, 0, 0, 0, 9, 2, 5, 0, 0]
x2 = [9, 2, 2, 9, 0, 9, 2, 5, 0, 0, 9, 2, 5, 0, 0]
### VECTORIZED DOT PRODUCT OF VECTORS ###
tic = time.process_time()
dot = np.dot(x1,x2)
toc = time.process_time()
print ("dot = " + str(dot) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")
### VECTORIZED OUTER PRODUCT ###
tic = time.process_time()
outer = np.outer(x1,x2)
toc = time.process_time()
print ("outer = " + str(outer) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")
### VECTORIZED ELEMENTWISE MULTIPLICATION ###
tic = time.process_time()
mul = np.multiply(x1,x2)
toc = time.process_time()
print ("elementwise multiplication = " + str(mul) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")
### VECTORIZED GENERAL DOT PRODUCT ###
tic = time.process_time()
dot = np.dot(W,x1)
toc = time.process_time()
print ("gdot = " + str(dot) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")
import numpy as np
import matplotlib.pyplot as plt
import h5py
import scipy
from PIL import Image
from scipy import ndimage
from lr_utils import load_dataset
#%matplotlib inline
# Loading the data (cat/non-cat)
train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset()
# Example of a picture
index = 20
plt.imshow(train_set_x_orig[index])
#plt.waitforbuttonpress()
print(index)
print ("y = " + str(train_set_y[:, index]) + ", it's a '" + classes[np.squeeze(train_set_y[:, index])].decode("utf-8") + "' picture.")
m_train = train_set_x_orig.shape[0]
m_test = test_set_x_orig.shape[0]
num_px = train_set_x_orig.shape[1]
### END CODE HERE ###
print ("Number of training examples: m_train = " + str(m_train))
print ("Number of testing examples: m_test = " + str(m_test))
print ("Height/Width of each image: num_px = " + str(num_px))
print ("Each image is of size: (" + str(num_px) + ", " + str(num_px) + ", 3)")
print ("train_set_x shape: " + str(train_set_x_orig.shape))
print ("train_set_y shape: " + str(train_set_y.shape))
print ("test_set_x shape: " + str(test_set_x_orig.shape))
print ("test_set_y shape: " + str(test_set_y.shape))
# Reshape the training and test examples
### START CODE HERE ### (≈ 2 lines of code)
train_set_x_flatten = train_set_x_orig.reshape(train_set_x_orig.shape[0], -1).T
test_set_x_flatten = test_set_x_orig.reshape(test_set_x_orig.shape[0], -1).T
### END CODE HERE ###
print ("train_set_x_flatten shape: " + str(train_set_x_flatten.shape))
print ("train_set_y shape: " + str(train_set_y.shape))
print ("test_set_x_flatten shape: " + str(test_set_x_flatten.shape))
print ("test_set_y shape: " + str(test_set_y.shape))
print ("sanity check after reshaping: " + str(train_set_x_flatten[0:5,0]))
train_set_x = train_set_x_flatten/255.
test_set_x = test_set_x_flatten/255.
# GRADED FUNCTION: sigmoid
def sigmoid(z):
"""
Compute the sigmoid of z
Arguments:
z -- A scalar or numpy array of any size.
Return:
s -- sigmoid(z)
"""
### START CODE HERE ### (≈ 1 line of code)
s = 1 / (1 + np.exp(-z))
### END CODE HERE ###
return s
print ("sigmoid([0, 2]) = " + str(sigmoid(np.array([0,2]))))
# GRADED FUNCTION: initialize_with_zeros
def initialize_with_zeros(dim):
"""
This function creates a vector of zeros of shape (dim, 1) for w and initializes b to 0.
Argument:
dim -- size of the w vector we want (or number of parameters in this case)
Returns:
w -- initialized vector of shape (dim, 1)
b -- initialized scalar (corresponds to the bias)
"""
### START CODE HERE ### (≈ 1 line of code)
w = np.zeros((dim, 1))
b = 0
### END CODE HERE ###
assert(w.shape == (dim, 1))
assert(isinstance(b, float) or isinstance(b, int))
return w, b
dim = 2
w, b = initialize_with_zeros(dim)
print ("w = " + str(w))
print ("b = " + str(b))
# GRADED FUNCTION: propagate
def propagate(w, b, X, Y):
"""
Implement the cost function and its gradient for the propagation explained above
Arguments:
w -- weights, a numpy array of size (num_px * num_px * 3, 1)
b -- bias, a scalar
X -- data of size (num_px * num_px * 3, number of examples)
Y -- true "label" vector (containing 0 if non-cat, 1 if cat) of size (1, number of examples)
Return:
cost -- negative log-likelihood cost for logistic regression
dw -- gradient of the loss with respect to w, thus same shape as w
db -- gradient of the loss with respect to b, thus same shape as b
Tips:
- Write your code step by step for the propagation. np.log(), np.dot()
"""
m = X.shape[1]
# FORWARD PROPAGATION (FROM X TO COST)
### START CODE HERE ### (≈ 2 lines of code)
A = sigmoid(np.dot(w.T, X) + b) # compute activation
cost = -1 / m * np.sum(Y * np.log(A) + (1 - Y) * np.log(1 - A)) # compute cost
### END CODE HERE ###
# BACKWARD PROPAGATION (TO FIND GRAD)
### START CODE HERE ### (≈ 2 lines of code)
dw = 1 / m * np.dot(X, (A - Y).T)
db = 1 / m * np.sum(A - Y)
### END CODE HERE ###
assert(dw.shape == w.shape)
assert(db.dtype == float)
cost = np.squeeze(cost)
assert(cost.shape == ())
grads = {"dw": dw,
"db": db}
return grads, cost
w, b, X, Y = np.array([[1],[2]]), 2, np.array([[1,2],[3,4]]), np.array([[1,0]])
grads, cost = propagate(w, b, X, Y)
print ("dw = " + str(grads["dw"]))
print ("db = " + str(grads["db"]))
print ("cost = " + str(cost))
# GRADED FUNCTION: optimize
def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost = False):
"""
This function optimizes w and b by running a gradient descent algorithm
Arguments:
w -- weights, a numpy array of size (num_px * num_px * 3, 1)
b -- bias, a scalar
X -- data of shape (num_px * num_px * 3, number of examples)
Y -- true "label" vector (containing 0 if non-cat, 1 if cat), of shape (1, number of examples)
num_iterations -- number of iterations of the optimization loop
learning_rate -- learning rate of the gradient descent update rule
print_cost -- True to print the loss every 100 steps
Returns:
params -- dictionary containing the weights w and bias b
grads -- dictionary containing the gradients of the weights and bias with respect to the cost function
costs -- list of all the costs computed during the optimization, this will be used to plot the learning curve.
Tips:
You basically need to write down two steps and iterate through them:
1) Calculate the cost and the gradient for the current parameters. Use propagate().
2) Update the parameters using gradient descent rule for w and b.
"""
costs = []
for i in range(num_iterations):
# Cost and gradient calculation (≈ 1-4 lines of code)
### START CODE HERE ###
grads, cost = propagate(w, b, X, Y)
### END CODE HERE ###
# Retrieve derivatives from grads
dw = grads["dw"]
db = grads["db"]
# update rule (≈ 2 lines of code)
### START CODE HERE ###
w = w - learning_rate * dw
b = b - learning_rate * db
### END CODE HERE ###
# Record the costs
if i % 100 == 0:
costs.append(cost)
# Print the cost every 100 training examples
if print_cost and i % 100 == 0:
print ("Cost after iteration %i: %f" %(i, cost))
params = {"w": w,
"b": b}
grads = {"dw": dw,
"db": db}
return params, grads, costs
params, grads, costs = optimize(w, b, X, Y, num_iterations= 100, learning_rate = 0.009, print_cost = False)
print ("w = " + str(params["w"]))
print ("b = " + str(params["b"]))
print ("dw = " + str(grads["dw"]))
print ("db = " + str(grads["db"]))
print(costs)
# GRADED FUNCTION: predict
def predict(w, b, X):
'''
Predict whether the label is 0 or 1 using learned logistic regression parameters (w, b)
Arguments:
w -- weights, a numpy array of size (num_px * num_px * 3, 1)
b -- bias, a scalar
X -- data of size (num_px * num_px * 3, number of examples)
Returns:
Y_prediction -- a numpy array (vector) containing all predictions (0/1) for the examples in X
'''
m = X.shape[1]
Y_prediction = np.zeros((1,m))
w = w.reshape(X.shape[0], 1)
# Compute vector "A" predicting the probabilities of a cat being present in the picture
### START CODE HERE ### (≈ 1 line of code)
A = sigmoid(np.dot(w.T, X) + b)
### END CODE HERE ###
for i in range(A.shape[1]):
# Convert probabilities A[0,i] to actual predictions p[0,i]
### START CODE HERE ### (≈ 4 lines of code)
if A[0, i] <= 0.5:
Y_prediction[0, i] = 0
else:
Y_prediction[0, i] = 1
### END CODE HERE ###
assert(Y_prediction.shape == (1, m))
return Y_prediction
print ("predictions = " + str(predict(w, b, X)))
# GRADED FUNCTION: model
def model(X_train, Y_train, X_test, Y_test, num_iterations = 2000, learning_rate = 0.5, print_cost = False):
"""
Builds the logistic regression model by calling the function you've implemented previously
Arguments:
X_train -- training set represented by a numpy array of shape (num_px * num_px * 3, m_train)
Y_train -- training labels represented by a numpy array (vector) of shape (1, m_train)
X_test -- test set represented by a numpy array of shape (num_px * num_px * 3, m_test)
Y_test -- test labels represented by a numpy array (vector) of shape (1, m_test)
num_iterations -- hyperparameter representing the number of iterations to optimize the parameters
learning_rate -- hyperparameter representing the learning rate used in the update rule of optimize()
print_cost -- Set to true to print the cost every 100 iterations
Returns:
d -- dictionary containing information about the model.
"""
### START CODE HERE ###
# initialize parameters with zeros (≈ 1 line of code)
w, b = initialize_with_zeros(X_train.shape[0])
# Gradient descent (≈ 1 line of code)
parameters, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost)
# Retrieve parameters w and b from dictionary "parameters"
w = parameters["w"]
b = parameters["b"]
# Predict test/train set examples (≈ 2 lines of code)
Y_prediction_test = predict(w, b, X_test)
Y_prediction_train = predict(w, b, X_train)
### END CODE HERE ###
# Print train/test Errors
print("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100))
print("test accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100))
d = {"costs": costs,
"Y_prediction_test": Y_prediction_test,
"Y_prediction_train" : Y_prediction_train,
"w" : w,
"b" : b,
"learning_rate" : learning_rate,
"num_iterations": num_iterations}
return d
d = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 2000, learning_rate = 0.005, print_cost = True)
# Example of a picture that was wrongly classified.
index = 1
plt.imshow(test_set_x[:,index].reshape((num_px, num_px, 3)))
#print ("y = " + str(test_set_y[0,index]) + ", you predicted that it is a \"" + classes[int(d["Y_prediction_test"][0,index])].decode("utf-8") + "\" picture.")
import matplotlib
## START CODE HERE ## (PUT YOUR IMAGE NAME)
my_image = "my_image2.jpg" # change this to the name of your image file
#my_label_y = [1]
## END CODE HERE ##
# We preprocess the image to fit your algorithm.
fname = "/home/star/project/python_project/Logistic_regression/images/" + my_image
image = np.array(matplotlib.pyplot.imread(fname))
my_image = np.array(Image.fromarray(image).resize(size=(num_px,num_px))).reshape((1, num_px*num_px*3)).T
my_predicted_image = predict(d["w"], d["b"], my_image)
plt.imshow(image)
plt.show()
print("y = " + str(np.squeeze(my_predicted_image)) + ", your algorithm predicts a \"" + classes[int(np.squeeze(my_predicted_image)),].decode("utf-8") + "\" picture.")
#自己在jupyter notebook中运行的时候在最后一步中报错了,排查后是scipy版本的问题,在本地配了个环境就能运行。
以上代码是把练习里的内容全都复制了出来,在vs code中成功运行了。如果想自己运行,记得把代码中所有的路径都改一下,包括lr_utils.py
中的。
相关资源可以去我的gitee中下载
https://gitee.com/Laurie_xzh/note-book.git
我的资源里也有。