Homework2

1. Generate n = 2,000 points uniformly at random in the two-dimensional unit square. Which point do you expect the centroid to be?

答: 因为有2000个点,所以质点应该是(0.5,0.5)

2. What objective does the centroid of the points optimize?

答: 优化的目标就是到各个点的欧式距离之和最小,也就是说最小化:

3. Apply gradient descent (GD) to find the centroid.

答:

损失函数为,对求的偏导后可得:

def cal_grad(point,all_point):
    #通过算偏导数得出x,y的梯度
    grad_x = sum(-(all_point[:,0]-point[0])/(sum((all_point-point)**2,1)**0.5))
    grad_y = sum(-(all_point[:,1]-point[1])/(sum((all_point-point)**2,1)**0.5))
    return np.array([grad_x/n,grad_y/n])

设置超参数:学习率与阈值
进行参数更新

def update(point,grad):
    point = point - lr*grad
    return point

进行迭代

# 梯度下降gd
start_point = rand(2)
pre_loss = 0
point_hist=start_point
for i in range(max_iter):
    grad = cal_grad(start_point,data)
    start_point = update(start_point,grad)
    point_hist = np.vstack((point_hist,start_point))
    loss = cal_loss(start_point,data)
    if abs(pre_loss - loss) < threshold:
        #loss变化小于阈值后停止
        break
    pre_loss = loss    
# print(point_hist)
pylab.plot(point_hist[:,0],point_hist[:,1],'r-')
pylab.plot(data[:,0],data[:,1],'g.')

可视化后得到如下结果,红色为收敛路径:

image

4.Apply stochastic gradient descent (SGD) to find the centroid. Can you say in simple words, what the algorithm is doing?

答:简单的说就是每次只通过一个样本算出梯度,然后进行更新,这样减少了很多计算量.

具体实现很简单,从样本中抽出一个传入cal_grad函数就好了.

路径如下,可以看出是有很大震荡的,并且对学习率/阈值的要求较严格,否则可能会造成不收敛:

image

import pandas as pd
import numpy as np
import pylab
from scipy import *
%matplotlib inline
%config InlineBackend.figure_format = 'svg'

n = 2000
data = rand(n,2)
# pylab.plot(data[:,0],data[:,1],'g.')
start_point = rand(2)
lr = 0.01
threshold = 1e-6
max_iter = 20000

def cal_grad(point,all_point):
    #通过算偏导数得出x,y的梯度
    grad_x = sum(-(all_point[:,0]-point[0])/(sum((all_point-point)**2,1)**0.5))
    grad_y = sum(-(all_point[:,1]-point[1])/(sum((all_point-point)**2,1)**0.5))
    g = (grad_x ** 2 + grad_y ** 2) ** 0.5 #归一化
    return np.array([grad_x/g,grad_y/g])

cal_grad(start_point,data)

def update(point,grad):
    point = point - lr*grad
    return point

def cal_loss(point,all_point):
    return sum(sum((all_point-point)**2,1)**0.5)

# 梯度下降gd
start_point = rand(2)
pre_loss = 0
point_hist=start_point
for i in range(max_iter):
    grad = cal_grad(start_point,data)
    start_point = update(start_point,grad)
    point_hist = np.vstack((point_hist,start_point))
    loss = cal_loss(start_point,data)
    if abs(pre_loss - loss) < threshold:
        #loss变化小于阈值后停止
        break
    pre_loss = loss    
# print(point_hist)
pylab.plot(point_hist[:,0],point_hist[:,1],'r-')
pylab.plot(data[:,0],data[:,1],'g.')
# pylab.show()

[]

image

# 随机梯度下降SGD
start_point = np.array([1,0])
pre_loss = 0
point_hist=start_point
lr = 0.02
count = 0
max_iter = 100000
threshold = 1e-4
for i in range(max_iter):
    random_choice = np.random.randint(0,2000)
    grad = cal_grad(start_point,array([data[random_choice]]))
    start_point = update(start_point,grad)
    point_hist = np.vstack((point_hist,start_point))
    loss = cal_loss(start_point,data)
    if abs(pre_loss - loss) < threshold:
        #loss变化小于阈值后停止
        count +=1
        if count > 1:
            break
    else:
        count = 0
    if loss-pre_loss > 0.1:
        lr = lr *0.8
    pre_loss = loss    
# print(len(point_hist))
pylab.plot(point_hist[:,0],point_hist[:,1],'r-')
pylab.plot(data[:,0],data[:,1],'g.')
start_point

array([0.5021768 , 0.48251419])

image

白话梯度下降

Homework2

1. Generate n = 2,000 points uniformly at random in the two-dimensional unit square. Which point do you expect the centroid to be?

2. What objective does the centroid of the points optimize?

3. Apply gradient descent (GD) to find the centroid.

4.Apply stochastic gradient descent (SGD) to find the centroid. Can you say in simple words, what the algorithm is doing?

你可能感兴趣的:(白话梯度下降)