本文内容源自《深度学习原理与PyTorch实战》
github地址
单车的需求个数与天气,日期,节假日等变量有关,我们需要建立一个神经网络,通过反向传播算法来训练神经网络,实现预测的效果。
instant dteday season yr mnth hr holiday weekday workingday weathersit temp atemp hum windspeed casual registered cnt
0 1 2011-01-01 1 0 1 0 0 6 0 1 0.24 0.2879 0.81 0.0 3 13 16
1 2 2011-01-01 1 0 1 1 0 6 0 1 0.22 0.2727 0.80 0.0 8 32 40
2 3 2011-01-01 1 0 1 2 0 6 0 1 0.22 0.2727 0.80 0.0 5 27 32
3 4 2011-01-01 1 0 1 3 0 6 0 1 0.24 0.2879 0.75 0.0 3 10 13
4 5 2011-01-01 1 0 1 4 0 6 0 1 0.24 0.2879 0.75 0.0 0 1 1
我们在处理数据时,是通过一个矩阵读取出多个样本,但是我们肯定没法一次读出全部的样本,那么我们就需要读出一个batch那么多的样本数,这时候矩阵大小就是(batch, 变量个数)。
我们也不能直接用数字来赋值,例如月份,数值为1、2、3、4、5……肯定是没有意义的。我们的解决方案是将类型变量用一个“一位热码“(one-hot)来编码,也就是:
s e a s o n = 1 → ( 1 , 0 , 0 , 0 ) s e a s o n = 2 → ( 0 , 1 , 0 , 0 ) s e a s o n = 3 → ( 0 , 0 , 1 , 0 ) s e a s o n = 4 → ( 0 , 0 , 0 , 1 ) season = 1 \rightarrow (1, 0, 0 ,0) \\ season = 2 \rightarrow (0, 1, 0, 0) \\ season = 3 \rightarrow (0, 0, 1, 0) \\ season = 4 \rightarrow (0, 0, 0, 1) \\ season=1→(1,0,0,0)season=2→(0,1,0,0)season=3→(0,0,1,0)season=4→(0,0,0,1)
因此,如果一个类型变量有n个不同取值,那么我们的“一位热码“所对应的向量长度就为n。
#对于类型变量的特殊处理
# season=1,2,3,4, weathersi=1,2,3, mnth= 1,2,...,12, hr=0,1, ...,23, weekday=0,1,...,6
# 经过下面的处理后,将会多出若干特征,例如,对于season变量就会有 season_1, season_2, season_3, season_4
# 这四种不同的特征。
dummy_fields = ['season', 'weathersit', 'mnth', 'hr', 'weekday']
for each in dummy_fields:
#利用pandas对象,我们可以很方便地将一个类型变量属性进行one-hot编码,变成多个属性
dummies = pd.get_dummies(rides[each], prefix=each, drop_first=False)
rides = pd.concat([rides, dummies], axis=1)
# 把原有的类型变量对应的特征去掉,将一些不相关的特征去掉
fields_to_drop = ['instant', 'dteday', 'season', 'weathersit',
'weekday', 'atemp', 'mnth', 'workingday', 'hr']
data = rides.drop(fields_to_drop, axis=1)
data.head()
由于每个数值型变量都是相互独立的,所以它们的数值绝对大小与问题本身没有关系,为了消除数值大小的差异,我们对每一个数值型变量进行标准化处理,也就是让其数值都围绕着0左右波动。比如,对于温度temp这个变量来说,它在整个数据库取值的平均着为mean(temp), 方差为std(temp),所以,归一化的温度计算为:
t e m p ′ = t e m p − m e a n ( t e m p ) s t d ( t e m p ) temp'=\frac{temp - mean(temp)}{std(temp)} temp′=std(temp)temp−mean(temp)
这样做的好处就是可以将不同的取值范围的变量设置为让它们处于一个平等的地位。
# 调整所有的特征,标准化处理
quant_features = ['cnt', 'temp', 'hum', 'windspeed']
#quant_features = ['temp', 'hum', 'windspeed']
# 我们将每一个变量的均值和方差都存储到scaled_features变量中。
scaled_features = {}
for each in quant_features:
mean, std = data[each].mean(), data[each].std()
scaled_features[each] = [mean, std]
data.loc[:, each] = (data[each] - mean)/std
书中写了两种方法,一种是自己书写损失函数以及参数优化,另一种是调用pytorch自带函数。
首先是自己书写的方法:
# 定义神经网络架构,features.shape[1]个输入层单元,10个隐含层,1个输出层
input_size = features.shape[1] #输入层单元个数
hidden_size = 10 #隐含层单元个数
output_size = 1 #输出层单元个数
batch_size = 128 #每隔batch的记录数
weights1 = torch.randn([input_size, hidden_size], dtype = torch.double, requires_grad = True) #第一到二层权重
biases1 = torch.randn([hidden_size], dtype = torch.double, requires_grad = True) #隐含层偏置
weights2 = torch.randn([hidden_size, output_size], dtype = torch.double, requires_grad = True) #隐含层到输出层权重
def neu(x):
#计算隐含层输出
#x为batch_size * input_size的矩阵,weights1为input_size*hidden_size矩阵,
#biases为hidden_size向量,输出为batch_size * hidden_size矩阵
hidden = x.mm(weights1) + biases1.expand(x.size()[0], hidden_size)
hidden = torch.sigmoid(hidden)
#输入batch_size * hidden_size矩阵,mm上weights2, hidden_size*output_size矩阵,
#输出batch_size*output_size矩阵
output = hidden.mm(weights2)
return output
def cost(x, y):
# 计算损失函数
error = torch.mean((x - y)**2)
return error
def zero_grad():
# 清空每个参数的梯度信息
if weights1.grad is not None and biases1.grad is not None and weights2.grad is not None:
weights1.grad.data.zero_()
weights2.grad.data.zero_()
biases1.grad.data.zero_()
def optimizer_step(learning_rate):
# 梯度下降算法
weights1.data.add_(- learning_rate * weights1.grad.data)
weights2.data.add_(- learning_rate * weights2.grad.data)
biases1.data.add_(- learning_rate * biases1.grad.data)
# 神经网络训练循环
losses = []
for i in range(1000):
# 每128个样本点被划分为一个撮,在循环的时候一批一批地读取
batch_loss = []
# start和end分别是提取一个batch数据的起始和终止下标
for start in range(0, len(X), batch_size):
end = start + batch_size if start + batch_size < len(X) else len(X)
xx = torch.tensor(X[start:end], dtype = torch.double, requires_grad = True)
yy = torch.tensor(Y[start:end], dtype = torch.double, requires_grad = True)
predict = neu(xx)
loss = cost(predict, yy)
zero_grad()
loss.backward()
optimizer_step(0.01)
batch_loss.append(loss.data.numpy())
# 每隔100步输出一下损失值(loss)
if i % 100==0:
losses.append(np.mean(batch_loss))
print(i, np.mean(batch_loss))
# 打印输出损失值
fig = plt.figure(figsize=(10, 7))
plt.plot(np.arange(len(losses))*100,losses, 'o-')
plt.xlabel('epoch')
plt.ylabel('MSE')
Pytorch自带函数编写
# 定义神经网络架构,features.shape[1]个输入层单元,10个隐含层,1个输出层
input_size = features.shape[1]
hidden_size = 10
output_size = 1
batch_size = 128
neu = torch.nn.Sequential(
torch.nn.Linear(input_size, hidden_size),
torch.nn.Sigmoid(),
torch.nn.Linear(hidden_size, output_size),
)
cost = torch.nn.MSELoss()
optimizer = torch.optim.SGD(neu.parameters(), lr = 0.01)
# 神经网络训练循环
losses = []
for i in range(1000):
# 每128个样本点被划分为一个撮,在循环的时候一批一批地读取
batch_loss = []
# start和end分别是提取一个batch数据的起始和终止下标
for start in range(0, len(X), batch_size):
end = start + batch_size if start + batch_size < len(X) else len(X)
xx = torch.tensor(X[start:end], dtype = torch.float, requires_grad = True)
yy = torch.tensor(Y[start:end], dtype = torch.float, requires_grad = True)
predict = neu(xx)
loss = cost(predict, yy)
optimizer.zero_grad()
loss.backward()
optimizer.step()
batch_loss.append(loss.data.numpy())
# 每隔100步输出一下损失值(loss)
if i % 100==0:
losses.append(np.mean(batch_loss))
print(i, np.mean(batch_loss))