吴恩达深度学习笔记(一)——第一课第二周

深度学习概论

什么是神经网络

吴恩达深度学习笔记(一)——第一课第二周_第1张图片
上图的单神经元就完成了下图中函数的计算。下图的函数又被称为ReLU(修正线性单元)函数
吴恩达深度学习笔记(一)——第一课第二周_第2张图片
复杂的神经网络也是由这些简单的单神经元构成。实现后,要得到结果,只需要输入即可。
吴恩达深度学习笔记(一)——第一课第二周_第3张图片

x那一列是输入,y是输出,中间是隐藏单元,由神经网络自己定义

用神经网络进行监督学习

领域 所用的神经网络
房产预测等领域 标准架构 Standarded NN
计算机视觉 卷积神经网络 CNN
音频,文字翻译(一维序列问题) 循环神经网络 RNN
无人驾驶 更为复杂的架构,或是几种架构的组合

结构化数据:每个特征都有明确的定义
非结构化数据:更难处理

AIM:同时能够处理结构化数据和非结构化数据

用小写字母m表示数据规模,数据是有标签的数据(有输入x和输出y)
算法的创新往往是让神经网络运行地更快
原本用sigmod函数,后来改成ReLU函数,学习效果就有了显著提升。

神经网络基础

logistic回归

符号约定

(x,y)为单个数据,x为特征向量,y为标签,对于二分分类,是0或1。
x的维度是 n x 或 n n_x或n nxn
m m m表示数据集的数量,{ ( x ( 1 ) , y ( 1 ) ) , ( x ( 2 ) , y ( 2 ) ) , . . . , ( x ( m ) , y ( m ) ) (x^{(1)},y^{(1)}),(x^{(2)},y^{(2)}),...,(x^{(m)},y^{(m)}) (x1y(1)),(x2y(2)),...,(xmy(m))}
X X X所代表的矩阵如下,是一个 n x n_x nx行, m m m列的矩阵
[ ⋮ ⋮ ⋯ ⋮ x ( 1 ) x ( 2 ) ⋯ x ( m ) ⋮ ⋮ ⋯ ⋮ ] \left[ \begin{matrix} \vdots & \vdots & \cdots & \vdots\\ x^{(1)} & x^{(2)} & \cdots & x^{(m)}\\ \vdots & \vdots & \cdots & \vdots\\ \end{matrix} \right] x(1)x(2)x(m)
这里对于每组向量最好就用列向量而非行向量

Y Y Y所表示的矩阵如下,是一个1行m列的矩阵
[ y ( 1 ) y ( 2 ) ⋯ y ( m ) ] \left[ \begin{matrix} y^{(1)} & y^{(2)} & \cdots & y^{(m)}\\ \end{matrix} \right] [y(1)y(2)y(m)]

logisitic回归模型的介绍

有了训练集x, 想要输出y,其实就是算 y ^ \hat{y} y^,也就是给了一张图,要算出这张图是“猫片”的概率。

x ∈ R n x x{\in}R^{n_x} xRnx,关于logistic参数: w ∈ R n x w{\in}R^{n_x} wRnx, b ∈ R b{\in}R bR
最后的输出 y ^ \hat{y} y^ = w T x + b {w^T}x+b wTx+b,这是一个线性函数,不过很显然,我想要求得概率,但这个一次方程最后的结果很可能大于1或为负值,这就不是一个有意义的概率。于是有了 s i g m o i d sigmoid sigmoid函数, σ ( z ) = 1 1 + e ( − z ) \sigma(z)=\frac{1}{1+e^{(-z)}} σ(z)=1+e(z)1 这个 s i g m o i d sigmoid sigmoid函数是一个比较经典的激活函数,不过为了提升效果,也会使用别的激活函数。详细内容在下一次内容中会有介绍。
吴恩达深度学习笔记(一)——第一课第二周_第4张图片

这里,w和b是分开表示的两个参数,其中b对应了一个拦截器。

cost函数与loss函数

y ^ \hat{y} y^ = σ ( w T x + b ) , 其 中 σ ( z ) = 1 1 + e ( z ) \sigma({w^T}x+b),其中\sigma(z)=\frac{1}{1+e^{(z)}} σ(wTx+b),σ(z)=1+e(z)1
这里的目的是从已经给定的含有m个样本的训练集{ ( x ( 1 ) , y ( 1 ) ) , ( x ( 2 ) , y ( 2 ) ) , . . . , ( x ( m ) , y ( m ) ) (x^{(1)},y^{(1)}),(x^{(2)},y^{(2)}),...,(x^{(m)},y^{(m)}) (x1y(1)),(x2y(2)),...,(xmy(m))}中找到符合的w和b。并且使输出 y ^ ( i ) \hat{y}^{(i)} y^(i)尽量等于 y ( i ) y^{(i)} y(i)

L o s s Loss Loss f u n c t i o n function function:希望是一个凸函数
L ( y ^ , y ) = − ( y ∗ l o g ( y ^ ) + ( 1 − y ) ∗ l o g ( 1 − y ^ ) ) L(\hat{y},y)=-(y*log(\hat{y})+(1-y)*log(1-\hat{y})) L(y^,y)=(ylog(y^)+(1y)log(1y^))
损失函数是在单个训练样本中定义的,衡量的是在单个样本上的表现。而要衡量在全体训练样本上的表现,就要用到 c o s t cost cost f u n c t i o n function function

Cost function J ( w , b ) = 1 m Σ i = 1 m L ( y ^ , y ) J(w,b)=\frac{1}{m}\Sigma_{i=1}^{m}L(\hat{y},y) J(w,b)=m1Σi=1mL(y^,y)
最终是要这个成本函数为最小。

梯度下降法——用来训练逻辑回归中的W和b吴恩达深度学习笔记(一)——第一课第二周_第5张图片

图中的成本函数是一个凸函数,要让w与b的取值使J最小。
大致方法是先初始化一个点,然后以一定的步长朝最陡的下坡方向走。最终希望收敛到全局最优解。走一步就是一次迭代。

用降维法可以更清晰地看到梯度下降法的过程
吴恩达深度学习笔记(一)——第一课第二周_第6张图片

计算图

实例中, J ( a , b , c ) = 3 ( a + b ∗ c ) J(a,b,c)=3(a+b*c) J(a,b,c)=3(a+bc)

下面是forward过程,就是算成本函数J的过程。
吴恩达深度学习笔记(一)——第一课第二周_第7张图片
下面是backword过程,也就是要求得成本函数对于各个分量的偏导数。
吴恩达深度学习笔记(一)——第一课第二周_第8张图片

向量化

把大量的样本放到高维数组中,利用python里的numpy作线性运算(并行化运算),而不是进行for循环的嵌套。
这里放两段练习中的例子

import time

x1 = [9, 2, 5, 0, 0, 7, 5, 0, 0, 0, 9, 2, 5, 0, 0]
x2 = [9, 2, 2, 9, 0, 9, 2, 5, 0, 0, 9, 2, 5, 0, 0]

### CLASSIC DOT PRODUCT OF VECTORS IMPLEMENTATION ###
tic = time.process_time()
dot = 0
for i in range(len(x1)):
    dot+= x1[i]*x2[i]
toc = time.process_time()
print ("dot = " + str(dot) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")

### CLASSIC OUTER PRODUCT IMPLEMENTATION ###
tic = time.process_time()
outer = np.zeros((len(x1),len(x2))) # we create a len(x1)*len(x2) matrix with only zeros
for i in range(len(x1)):
    for j in range(len(x2)):
        outer[i,j] = x1[i]*x2[j]
toc = time.process_time()
print ("outer = " + str(outer) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")

### CLASSIC ELEMENTWISE IMPLEMENTATION ###
tic = time.process_time()
mul = np.zeros(len(x1))
for i in range(len(x1)):
    mul[i] = x1[i]*x2[i]
toc = time.process_time()
print ("elementwise multiplication = " + str(mul) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")

### CLASSIC GENERAL DOT PRODUCT IMPLEMENTATION ###
W = np.random.rand(3,len(x1)) # Random 3*len(x1) numpy array
tic = time.process_time()
gdot = np.zeros(W.shape[0])
for i in range(W.shape[0]):
    for j in range(len(x1)):
        gdot[i] += W[i,j]*x1[j]
toc = time.process_time()
print ("gdot = " + str(gdot) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")

########################以上是传统方法,下面是向量化方法################################

x1 = [9, 2, 5, 0, 0, 7, 5, 0, 0, 0, 9, 2, 5, 0, 0]
x2 = [9, 2, 2, 9, 0, 9, 2, 5, 0, 0, 9, 2, 5, 0, 0]

### VECTORIZED DOT PRODUCT OF VECTORS ###
tic = time.process_time()
dot = np.dot(x1,x2)
toc = time.process_time()
print ("dot = " + str(dot) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")

### VECTORIZED OUTER PRODUCT ###
tic = time.process_time()
outer = np.outer(x1,x2)
toc = time.process_time()
print ("outer = " + str(outer) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")

### VECTORIZED ELEMENTWISE MULTIPLICATION ###
tic = time.process_time()
mul = np.multiply(x1,x2)
toc = time.process_time()
print ("elementwise multiplication = " + str(mul) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")

### VECTORIZED GENERAL DOT PRODUCT ###
tic = time.process_time()
dot = np.dot(W,x1)
toc = time.process_time()
print ("gdot = " + str(dot) + "\n ----- Computation time = " + str(1000*(toc - tic)) + "ms")

代码实现

import numpy as np
import matplotlib.pyplot as plt
import h5py
import scipy
from PIL import Image
from scipy import ndimage
from lr_utils import load_dataset

#%matplotlib inline

# Loading the data (cat/non-cat)
train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset()

# Example of a picture
index = 20
plt.imshow(train_set_x_orig[index])

#plt.waitforbuttonpress()
print(index)
print ("y = " + str(train_set_y[:, index]) + ", it's a '" + classes[np.squeeze(train_set_y[:, index])].decode("utf-8") +  "' picture.")

m_train = train_set_x_orig.shape[0]
m_test = test_set_x_orig.shape[0]
num_px = train_set_x_orig.shape[1]
### END CODE HERE ###

print ("Number of training examples: m_train = " + str(m_train))
print ("Number of testing examples: m_test = " + str(m_test))
print ("Height/Width of each image: num_px = " + str(num_px))
print ("Each image is of size: (" + str(num_px) + ", " + str(num_px) + ", 3)")
print ("train_set_x shape: " + str(train_set_x_orig.shape))
print ("train_set_y shape: " + str(train_set_y.shape))
print ("test_set_x shape: " + str(test_set_x_orig.shape))
print ("test_set_y shape: " + str(test_set_y.shape))

# Reshape the training and test examples

### START CODE HERE ### (≈ 2 lines of code)
train_set_x_flatten = train_set_x_orig.reshape(train_set_x_orig.shape[0], -1).T
test_set_x_flatten = test_set_x_orig.reshape(test_set_x_orig.shape[0], -1).T
### END CODE HERE ###

print ("train_set_x_flatten shape: " + str(train_set_x_flatten.shape))
print ("train_set_y shape: " + str(train_set_y.shape))
print ("test_set_x_flatten shape: " + str(test_set_x_flatten.shape))
print ("test_set_y shape: " + str(test_set_y.shape))
print ("sanity check after reshaping: " + str(train_set_x_flatten[0:5,0]))

train_set_x = train_set_x_flatten/255.
test_set_x = test_set_x_flatten/255.

# GRADED FUNCTION: sigmoid

def sigmoid(z):
    """
    Compute the sigmoid of z

    Arguments:
    z -- A scalar or numpy array of any size.

    Return:
    s -- sigmoid(z)
    """

    ### START CODE HERE ### (≈ 1 line of code)
    s = 1 / (1 + np.exp(-z))
    ### END CODE HERE ###
    
    return s

print ("sigmoid([0, 2]) = " + str(sigmoid(np.array([0,2]))))

# GRADED FUNCTION: initialize_with_zeros

def initialize_with_zeros(dim):
    """
    This function creates a vector of zeros of shape (dim, 1) for w and initializes b to 0.
    
    Argument:
    dim -- size of the w vector we want (or number of parameters in this case)
    
    Returns:
    w -- initialized vector of shape (dim, 1)
    b -- initialized scalar (corresponds to the bias)
    """
    
    ### START CODE HERE ### (≈ 1 line of code)
    w = np.zeros((dim, 1))
    b = 0
    ### END CODE HERE ###

    assert(w.shape == (dim, 1))
    assert(isinstance(b, float) or isinstance(b, int))
    
    return w, b

dim = 2
w, b = initialize_with_zeros(dim)
print ("w = " + str(w))
print ("b = " + str(b))

# GRADED FUNCTION: propagate

def propagate(w, b, X, Y):
    """
    Implement the cost function and its gradient for the propagation explained above

    Arguments:
    w -- weights, a numpy array of size (num_px * num_px * 3, 1)
    b -- bias, a scalar
    X -- data of size (num_px * num_px * 3, number of examples)
    Y -- true "label" vector (containing 0 if non-cat, 1 if cat) of size (1, number of examples)

    Return:
    cost -- negative log-likelihood cost for logistic regression
    dw -- gradient of the loss with respect to w, thus same shape as w
    db -- gradient of the loss with respect to b, thus same shape as b
    
    Tips:
    - Write your code step by step for the propagation. np.log(), np.dot()
    """
    
    m = X.shape[1]
    
    # FORWARD PROPAGATION (FROM X TO COST)
    ### START CODE HERE ### (≈ 2 lines of code)
    A = sigmoid(np.dot(w.T, X) + b)            # compute activation
    cost = -1 / m * np.sum(Y * np.log(A) + (1 - Y) * np.log(1 - A))         # compute cost
    ### END CODE HERE ###
    
    # BACKWARD PROPAGATION (TO FIND GRAD)
    ### START CODE HERE ### (≈ 2 lines of code)
    dw = 1 / m * np.dot(X, (A - Y).T)
    db = 1 / m * np.sum(A - Y)
    ### END CODE HERE ###
    assert(dw.shape == w.shape)
    assert(db.dtype == float)
    cost = np.squeeze(cost)
    assert(cost.shape == ())
    
    grads = {"dw": dw,
             "db": db}
    
    return grads, cost

w, b, X, Y = np.array([[1],[2]]), 2, np.array([[1,2],[3,4]]), np.array([[1,0]])
grads, cost = propagate(w, b, X, Y)
print ("dw = " + str(grads["dw"]))
print ("db = " + str(grads["db"]))
print ("cost = " + str(cost))

# GRADED FUNCTION: optimize

def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost = False):
    """
    This function optimizes w and b by running a gradient descent algorithm
    
    Arguments:
    w -- weights, a numpy array of size (num_px * num_px * 3, 1)
    b -- bias, a scalar
    X -- data of shape (num_px * num_px * 3, number of examples)
    Y -- true "label" vector (containing 0 if non-cat, 1 if cat), of shape (1, number of examples)
    num_iterations -- number of iterations of the optimization loop
    learning_rate -- learning rate of the gradient descent update rule
    print_cost -- True to print the loss every 100 steps
    
    Returns:
    params -- dictionary containing the weights w and bias b
    grads -- dictionary containing the gradients of the weights and bias with respect to the cost function
    costs -- list of all the costs computed during the optimization, this will be used to plot the learning curve.
    
    Tips:
    You basically need to write down two steps and iterate through them:
        1) Calculate the cost and the gradient for the current parameters. Use propagate().
        2) Update the parameters using gradient descent rule for w and b.
    """
    
    costs = []
    
    for i in range(num_iterations):
        
        
        # Cost and gradient calculation (≈ 1-4 lines of code)
        ### START CODE HERE ### 
        grads, cost = propagate(w, b, X, Y)
        ### END CODE HERE ###
        
        # Retrieve derivatives from grads
        dw = grads["dw"]
        db = grads["db"]
        
        # update rule (≈ 2 lines of code)
        ### START CODE HERE ###
        w = w - learning_rate * dw
        b = b - learning_rate * db
        ### END CODE HERE ###
        
        # Record the costs
        if i % 100 == 0:
            costs.append(cost)
        
        # Print the cost every 100 training examples
        if print_cost and i % 100 == 0:
            print ("Cost after iteration %i: %f" %(i, cost))
    
    params = {"w": w,
              "b": b}
    
    grads = {"dw": dw,
             "db": db}
    
    return params, grads, costs


params, grads, costs = optimize(w, b, X, Y, num_iterations= 100, learning_rate = 0.009, print_cost = False)

print ("w = " + str(params["w"]))
print ("b = " + str(params["b"]))
print ("dw = " + str(grads["dw"]))
print ("db = " + str(grads["db"]))
print(costs)

# GRADED FUNCTION: predict

def predict(w, b, X):
    '''
    Predict whether the label is 0 or 1 using learned logistic regression parameters (w, b)
    
    Arguments:
    w -- weights, a numpy array of size (num_px * num_px * 3, 1)
    b -- bias, a scalar
    X -- data of size (num_px * num_px * 3, number of examples)
    
    Returns:
    Y_prediction -- a numpy array (vector) containing all predictions (0/1) for the examples in X
    '''
    
    m = X.shape[1]
    Y_prediction = np.zeros((1,m))
    w = w.reshape(X.shape[0], 1)
    
    # Compute vector "A" predicting the probabilities of a cat being present in the picture
    ### START CODE HERE ### (≈ 1 line of code)
    A = sigmoid(np.dot(w.T, X) + b)
    ### END CODE HERE ###

    for i in range(A.shape[1]):
        
        # Convert probabilities A[0,i] to actual predictions p[0,i]
        ### START CODE HERE ### (≈ 4 lines of code)
        if A[0, i] <= 0.5:
            Y_prediction[0, i] = 0
        else:
            Y_prediction[0, i] = 1
        ### END CODE HERE ###
    
    assert(Y_prediction.shape == (1, m))
    
    return Y_prediction

print ("predictions = " + str(predict(w, b, X)))

# GRADED FUNCTION: model

def model(X_train, Y_train, X_test, Y_test, num_iterations = 2000, learning_rate = 0.5, print_cost = False):
    """
    Builds the logistic regression model by calling the function you've implemented previously
    
    Arguments:
    X_train -- training set represented by a numpy array of shape (num_px * num_px * 3, m_train)
    Y_train -- training labels represented by a numpy array (vector) of shape (1, m_train)
    X_test -- test set represented by a numpy array of shape (num_px * num_px * 3, m_test)
    Y_test -- test labels represented by a numpy array (vector) of shape (1, m_test)
    num_iterations -- hyperparameter representing the number of iterations to optimize the parameters
    learning_rate -- hyperparameter representing the learning rate used in the update rule of optimize()
    print_cost -- Set to true to print the cost every 100 iterations
    
    Returns:
    d -- dictionary containing information about the model.
    """
    
    ### START CODE HERE ###
    
    # initialize parameters with zeros (≈ 1 line of code)
    w, b = initialize_with_zeros(X_train.shape[0])

    # Gradient descent (≈ 1 line of code)
    parameters, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost)
    
    # Retrieve parameters w and b from dictionary "parameters"
    w = parameters["w"]
    b = parameters["b"]
    
    # Predict test/train set examples (≈ 2 lines of code)
    Y_prediction_test = predict(w, b, X_test)
    Y_prediction_train = predict(w, b, X_train)

    ### END CODE HERE ###

    # Print train/test Errors
    print("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100))
    print("test accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100))

    
    d = {"costs": costs,
         "Y_prediction_test": Y_prediction_test, 
         "Y_prediction_train" : Y_prediction_train, 
         "w" : w, 
         "b" : b,
         "learning_rate" : learning_rate,
         "num_iterations": num_iterations}
    
    return d

d = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 2000, learning_rate = 0.005, print_cost = True)

# Example of a picture that was wrongly classified.
index = 1
plt.imshow(test_set_x[:,index].reshape((num_px, num_px, 3)))

#print ("y = " + str(test_set_y[0,index]) + ", you predicted that it is a \"" + classes[int(d["Y_prediction_test"][0,index])].decode("utf-8") +  "\" picture.")
 

import matplotlib 
## START CODE HERE ## (PUT YOUR IMAGE NAME) 
my_image = "my_image2.jpg"   # change this to the name of your image file 
#my_label_y = [1]
## END CODE HERE ##

# We preprocess the image to fit your algorithm.
fname = "/home/star/project/python_project/Logistic_regression/images/" + my_image
image = np.array(matplotlib.pyplot.imread(fname))
my_image = np.array(Image.fromarray(image).resize(size=(num_px,num_px))).reshape((1, num_px*num_px*3)).T

my_predicted_image = predict(d["w"], d["b"], my_image)

plt.imshow(image)
plt.show() 
print("y = " + str(np.squeeze(my_predicted_image)) + ", your algorithm predicts a \"" + classes[int(np.squeeze(my_predicted_image)),].decode("utf-8") +  "\" picture.")
#自己在jupyter notebook中运行的时候在最后一步中报错了,排查后是scipy版本的问题,在本地配了个环境就能运行。

吴恩达深度学习笔记(一)——第一课第二周_第9张图片

以上代码是把练习里的内容全都复制了出来,在vs code中成功运行了。如果想自己运行,记得把代码中所有的路径都改一下,包括lr_utils.py中的。
相关资源可以去我的gitee中下载
https://gitee.com/Laurie_xzh/note-book.git
我的资源里也有。

你可能感兴趣的:(吴恩达深度学习笔记,深度学习,神经网络,机器学习)