视频链接:机器学习-李宏毅(2019) Machine Learning
参考:李宏毅 深度学习19(完整版)国语】机器学习 深度学习
李宏毅-深度学习Note(目录)
Regression三步骤
Model → Goodness of Function → Gradient Descent
1.1 Model
自定义预测函数
1.2 Goodness of Function
A set of function → Goodness of function f ← Training Data
1.3 Gradient Descent
凡可微分的函数,都可以用梯度下降找到最优解
注:Linear Regression上是没有局部最优解,只可能有全局最优解
Because the loss function L is convex.(凸函数)
问:为什么机器学习时需要划分训练与测试阶段?
主流为监督学习
优化模型可以使函数更加复杂,泰勒展开式
一个越复杂的模型,表示能力越强,因为表示的情况越多。(高次包含低次)
import numpy as np
import matplotlib.pyplot as plt
x_data = [338., 333., 328., 207., 226., 25., 179., 60., 208., 606.]
y_data = [640., 633., 619., 393., 428., 27., 193., 66., 226., 1591.]
# y_data = b + w * x_data
# bias
x = np.arange(-200, -100, 1)
# weight
y = np.arange(-5, 5, 0.1)
Z = np.zeros((len(x), len(y)))
X, Y = np.meshgrid(x, y)
for i in range(len(x)):
for j in range(len(y)):
b = x[i]
w = y[j]
Z[j][i] = 0
for n in range(len(x_data)):
Z[j][i] = Z[j][i] + (y_data[n] - b - w * x_data[n])**2
Z[j][i] = Z[j][i]/len(x_data)
# bias
x = np.arange(-200, -100, 1)
# weight
y = np.arange(-5, 5, 0.1)
Z = np.zeros((len(x), len(y)))
X, Y = np.meshgrid(x, y)
for i in range(len(x)):
for j in range(len(y)):
b = x[i]
w = y[j]
Z[j][i] = 0
for n in range(len(x_data)):
Z[j][i] = Z[j][i] + (y_data[n] - b - w * x_data[n])**2
Z[j][i] = Z[j][i]/len(x_data)
# y_data = b + w * x_data
# initial b
b = -120
# initial w
w = -4
# learning rate
lr = 0.0000001
# iteration = 100000
iteration = 100
# store initial values for plotting
b_history = [b]
w_history = [w]
# iterations
for i in range(iteration):
b_grad = 0.0
w_grad = 0.0
for n in range(len(x_data)):
b_grad = b_grad - 2.0*(y_data[n] - b - w*x_data[n])*1.0
w_grad = w_grad - 2.0*(y_data[n] - b - w*x_data[n])*x_data[n]
# update parameters
b = b - lr*b_grad
w = w - lr*w_grad
# store parameters for plotting
b_history.append(b)
w_history.append(w)
# plot for figure
plt.contourf(x,y,Z, 50, alpha=0.5, cmap=plt.get_cmap('jet'))
plt.plot([-188.4], [2.67], x, ms=12, markeredgewidth=3, color='orange')
plt.plot(b_history, w_history, 'o-', ms=3, lw=1.5, color='black')
plt.xlim(-200, -100)
plt.ylim(-5, 5)
plt.xlabel(r'$b$', fontsize=16)
plt.ylabel(r'$w$', fontsize=16)
plt.show()
# 优化
# y_data = b + w * x_data
# initial b
b = -120
# initial w
w = -4
# learning rate
lr = 1
# iteration = 100000
iteration = 100
# store initial values for plotting
b_history = [b]
w_history = [w]
# iterations
for i in range(iteration):
b_grad = 0.0
w_grad = 0.0
for n in range(len(x_data)):
b_grad = b_grad - 2.0*(y_data[n] - b - w*x_data[n])*1.0
w_grad = w_grad - 2.0*(y_data[n] - b - w*x_data[n])*x_data[n]
# update parameters
b = b - lr*b_grad
w = w - lr*w_grad
# store parameters for plotting
b_history.append(b)
w_history.append(w)
# plot for figure
plt.contourf(x,y,Z, 50, alpha=0.5, cmap=plt.get_cmap('jet'))
plt.plot([-188.4], [2.67], x, ms=12, markeredgewidth=3, color='orange')
plt.plot(b_history, w_history, 'o-', ms=3, lw=1.5, color='black')
plt.xlim(-200, -100)
plt.ylim(-5, 5)
plt.xlabel(r'$b$', fontsize=16)
plt.ylabel(r'$w$', fontsize=16)
plt.show()
与标准值的偏差bias + 与其他函数均值的方差variance
simple model → Large bias + Small variance
comple model → Small bias + Large variance
不能适合训练集 + large bias → Underfitting
适合训练集,在测试集上较大错误 + large variance → Overfitting
large bias:
Add more features as input
A more complex model
large variance:
More data
Regularization
Corss Validation:
训练集(训练集+验证集) + 测试集(Public) + 测试集(Private)