PyTorch深度学习实战 第三讲

梯度下降和随机梯度下降

梯度下降

# 梯度下降
import matplotlib.pyplot as plt

x_data = [1.0, 2.0, 3.0]
y_data = [2.0, 4.0, 6.0]
w = 1.0
def forward(x):
    return x*w
def cost(x, y):
    cost = 0
    for x_train, y_train in zip(x, y):
        y_pred = forward(x_train)
        cost_per = (y_pred - y_train)**2
        cost += cost_per
    return cost / len(x)
def gradient(x, y):
    grad = 0
    for x_train, y_train in zip(x, y):
        grad_per = 2*x_train*(x_train*w - y_train)
        grad += grad_per
    return grad / len(x)
# 训练
alpha = 0.01
epoch_list = []
cost_list = []
for epoch in range(100):
    cost_val = cost(x_data, y_data)
    grad_val = gradient(x_data, y_data)
    w -= alpha * grad_val  
    epoch_list.append(epoch)
    cost_list.append(cost_val)
    print('epoch:', epoch, 'w=', w, 'loss=', cost_val)

print("Predict: ", 4, forward(4))
plt.plot(epoch_list, cost_list)
plt.ylabel("Cost")
plt.xlabel("Epoch")
plt.grid()
plt.show()

PyTorch深度学习实战 第三讲_第1张图片

SGD

# SGD : 使用每一组训练样本的梯度去更新w 实现并行计算 快速
x_data = [1.0, 2.0, 3.0]
y_data = [2.0, 4.0, 6.0]
w = 1.0
def forward(x):
    return x*w
def loss(x, y):
    y_pred = forward(x)
    return (y_pred - y)**2
def gradient(x, y):
    return 2*x*(x*w - y)
# 训练
alpha = 0.01

for epoch in range(100):
    
    for x, y in zip(x_data, y_data):
        loss_val = loss(x, y)
        grad_val = gradient(x, y)
        w -= alpha * grad_val  
    
    print('epoch:', epoch, 'w=', w, 'loss=', loss_val)

print("Predict: ", 4, forward(4))

梯度下降采用总的cost来更新权重,SGD使用每一个样本计算出来的梯度更新一次权重,故GD中更新100次,SGD中更新了300次。
GD性能好,但计算时间成本高,SGD性能比GD弱,但实现并行计算,速度快,折中选用批量随机梯度下降。即在batch内进行梯度下降,在batch间进行随机梯度下降。

你可能感兴趣的:(PyTorch实战,深度学习,pytorch,python)