前言:针对视频、语音、文本等时序数据,很多时候比较适合利用一些时间序列网络模型RNN和LSTM等。
类比于人类在看视频、听语音、读文章时,我们往往都是针对一序列的数据来思考,也就是说,当我们看到下一帧/一段数据时,之前看过的数据还保留在脑海中,对后面的数据分析是存在影响和指导意义的。那么,我们就需要设计相类似的神经网络(RNN/LSTM等)去对这类时序数据进行更好的分析,而非看到后面忘了前面,在得到这些数据更具表示性的特征之后,我们就能完成进一步的下游任务,例如视频/文本的情感分类等各种分类聚类任务了。
此处简单总结了一下两者相关的基本知识,并提供一个不调库的简单实现。
原理比较简单,基础的神经网络只是层与层之间建立了权连接,而RNN则是在层之间的神经元之间也建立了权连接,也就是各个隐藏层的神经元之间是权连接的,也就是说,随着序列不断延展,前面的隐藏层将会影响到后面的隐藏层,损失也是随着序列的延展而不断积累的。
RNN由于梯度消失的原因只能有短期记忆,LSTM网络通过精妙的门控制将短期记忆与长期记忆结合起来,并且一定程度上解决了梯度消失的问题。利用了三个控制门进行实现:输入门、遗忘门、输出门。
参考:https://blog.csdn.net/qq_16792139/article/details/115530197
每个单词用one-hot encoding
import numpy as np
sentence = 'white blood cells destroying an infection'
words = sentence.split(' ')
X = np.eye(len(words))
for i in range(len(words)):
print(words[i],'的one-hot encoding:',X[i])
----------------------------------------------------------------------------
Output:
white 的one-hot encoding: [1. 0. 0. 0. 0. 0.]
blood 的one-hot encoding: [0. 1. 0. 0. 0. 0.]
cells 的one-hot encoding: [0. 0. 1. 0. 0. 0.]
destroying 的one-hot encoding: [0. 0. 0. 1. 0. 0.]
an 的one-hot encoding: [0. 0. 0. 0. 1. 0.]
infection 的one-hot encoding: [0. 0. 0. 0. 0. 1.]
随机初始化一个转换矩阵W,每个单词线型转换到low-dimension低维向量,写出每个单词的低维向量,写出转换矩阵参数W。(以维度为2为例)
W = np.random.uniform(0,1,(6,2))
low_X = X.dot(W)
for i in range(len(words)):
print(words[i],'的low-dimension encoding:',low_X[i])
----------------------------------------------------------------------------
Output:
white 的low-dimension encoding: [0.22358387 0.45554808]
blood 的low-dimension encoding: [0.26184663 0.84112237]
cells 的low-dimension encoding: [0.81084067 0.9949041 ]
destroying 的low-dimension encoding: [0.44082166 0.33534952]
an 的low-dimension encoding: [0.26281604 0.37646656]
infection 的low-dimension encoding: [0.09828981 0.4961023 ]
依次对序列,例如(white),(white, blood), (white, blood, cells), …等序列,经过RNN或LSTM网络模型,输出该序列的输出向量
获得目标单词(每个输入序列的下一个单词)的向量,写出该目标单词的低维向量,并且构造并写出损失函数
例如 $L = log( sigmoid( v(white) * v(blood)^T ))), \ $
例如 L = l o g ( s i g m o i d ( v ( w h i t e , b l o o d ) ∗ v ( c e l l s ) T ) ) ) L = log( sigmoid( v(white, blood) * v(cells)^T ))) L=log(sigmoid(v(white,blood)∗v(cells)T)))
使用梯度下降算法,更新转换矩阵W的参数,更新公式为:W = W - 0.01 * (1 - L), 写出更新后的参数W
【RNN版】
from math import log
sentence = 'white blood cells destroying an infection'
words = sentence.split(' ')
# 初始化参数
lr=0.01
W1=np.ones((2,2))
W2=np.ones((2,2))
def sigmoid(x):
return 1.0/(1+np.exp(-x))
for i in range(len(words)-1):
store=[0,0] # 隐藏层初始为0
# 前向传播
for j in range(0,i+1):
mid=low_X[j]@W1+store # 隐藏层更新
store=mid
y=mid@W2 # 输出层
# 计算损失
next_word=low_X[i+1]
loss=log(sigmoid(np.dot(y,next_word.T)))
# 转换矩阵更新
W_new=W-lr*(1-loss)
sequence=''
for j in range(0,i+1):
sequence+=words[j]
sequence+=' '
print("input sequence is '{}',next_word is {}".format(sequence,next_word))
print("W_new is {}".format(W_new))
Output:
input sequence is 'white ',next_word is [0.26184663 0.84112237]
W_new is [[0.21156631 0.44353052]
[0.24982907 0.82910481]
[0.79882311 0.98288654]
[0.4288041 0.32333196]
[0.25079848 0.364449 ]
[0.08627225 0.48408474]]
input sequence is 'white blood ',next_word is [0.81084067 0.9949041 ]
W_new is [[0.21356786 0.44553206]
[0.25183061 0.83110635]
[0.80082466 0.98488808]
[0.43080565 0.32533351]
[0.25280002 0.36645054]
[0.08827379 0.48608629]]
input sequence is 'white blood cells ',next_word is [0.44082166 0.33534952]
W_new is [[0.21354583 0.44551003]
[0.25180858 0.83108432]
[0.80080262 0.98486605]
[0.43078361 0.32531148]
[0.25277799 0.36642851]
[0.08825176 0.48606426]]
input sequence is 'white blood cells destroying ',next_word is [0.26281604 0.37646656]
W_new is [[0.21354621 0.44551041]
[0.25180896 0.8310847 ]
[0.80080301 0.98486643]
[0.430784 0.32531186]
[0.25277837 0.36642889]
[0.08825214 0.48606464]]
input sequence is 'white blood cells destroying an ',next_word is [0.09828981 0.4961023 ]
W_new is [[0.21355779 0.445522 ]
[0.25182055 0.83109629]
[0.80081459 0.98487802]
[0.43079558 0.32532344]
[0.25278995 0.36644047]
[0.08826373 0.48607622]]
【LSTM版】
在RNN的基础上,多加了三个控制门(输入门、遗忘门、输出门)
import math
from math import log
def sigmoid_function(z):
ls=[]
for i in range(z.shape[0]):
ls.append(1/(1 + math.exp(-z[i])))
return ls
def sigmoid_function_2(z):
return 1/(1 + math.exp(-z))
def round_ls(ls):
temp=[]
for i in range(len(ls)):
temp.append(round(ls[i]))
return np.array(temp)
sentence = 'white blood cells destroying an infection'
words = sentence.split(' ')
# 初始化参数
lr=0.01
W1=np.random.rand(2,2)
bias1=np.random.rand(2)
W2=np.random.rand(2,2)
bias2=np.random.rand(2)
W3=np.random.rand(2,2)
bias3=np.random.rand(2)
W4=np.random.rand(2,2)
bias4=np.random.rand(2)
# 前向传播
for i in range(len(low_X)-1):
c=[0,0] # 隐藏层初始为0向量
for j in range(0,i+1):
input_v=low_X[j]@W1+bias1
input_gate=round_ls(sigmoid_function(low_X[j]@W2+bias2))
forget_gate=round_ls(sigmoid_function(low_X[j]@W3+bias3))
output_gate=round_ls(sigmoid_function(low_X[j]@W4+bias4))
c=input_v*input_gate+forget_gate*c
y=sigmoid_function(c)*output_gate
# 计算损失并更新转换矩阵W
next_word=low_X[i+1]
loss=log(sigmoid_function_2(np.dot(y,next_word.T)))
W_new=W-lr*(1-loss)
sequence=''
for j in range(0,i+1):
sequence+=words[j]
sequence+=' '
print("input sequence is '{}',next_word is {}".format(sequence,next_word))
print("W_new is {}".format(W_new))
Output:
input sequence is 'white ',next_word is [0.26184663 0.84112237]
W_new is [[0.20973448 0.44169869]
[0.24799723 0.82727298]
[0.79699128 0.9810547 ]
[0.42697227 0.32150013]
[0.24896664 0.36261716]
[0.08444042 0.48225291]]
input sequence is 'white blood ',next_word is [0.81084067 0.9949041 ]
W_new is [[0.21150985 0.44347405]
[0.2497726 0.82904834]
[0.79876665 0.98283007]
[0.42874764 0.3232755 ]
[0.25074201 0.36439253]
[0.08621578 0.48402828]]
input sequence is 'white blood cells ',next_word is [0.44082166 0.33534952]
W_new is [[0.20960578 0.44156999]
[0.24786854 0.82714428]
[0.79686258 0.98092601]
[0.42684357 0.32137143]
[0.24883794 0.36248846]
[0.08431172 0.48212421]]
input sequence is 'white blood cells destroying ',next_word is [0.26281604 0.37646656]
W_new is [[0.20926757 0.44123177]
[0.24753032 0.82680606]
[0.79652437 0.98058779]
[0.42650536 0.32103322]
[0.24849973 0.36215025]
[0.0839735 0.481786 ]]
input sequence is 'white blood cells destroying an ',next_word is [0.09828981 0.4961023 ]
W_new is [[0.20916201 0.44112621]
[0.24742476 0.8267005 ]
[0.79641881 0.98048223]
[0.4263998 0.32092766]
[0.24839417 0.36204469]
[0.08386794 0.48168044]]
RNN和LSTM与传统方法的区别