rnn 递归神经网络
Recurrent neural network is a type of neural network used to deal specifically with sequential data. Actually what makes RNN so powerful is the fact that it doesn't take into consideration just the actual input but also the previous input which allows it to memorize what happens previously. To get a better intuition on RNN let’s take the example of text classification, for this task we can use the classic machine learning algorithms like naive bayes but the problem with this algorithm, it takes a sentence as a set of independent words and precisely the frequency of each word without worrying about the composition of words or the order of words in a sentence which makes a huge difference to form the meaning of a sentence. RNN unlike those classic algorithms, works well on sequence data because it takes the word i as input and combine with the output of word i-1, the same thing would be applied for word i+1 and this is the reason it’s called recurrent neural network because clearly the neural network apply the same operations on each word i of the sentence.
递归神经网络是一种用于专门处理顺序数据的神经网络。 实际上,使RNN如此强大的原因是它不仅考虑了实际输入,还考虑了先前的输入,从而使它能够记住先前发生的事情。 为了更好地了解RNN,让我们以文本分类为例,对于此任务,我们可以使用经典的机器学习算法(如朴素贝叶斯(Naive Bayes)),但该算法的问题是将一个句子作为一组独立的单词并精确地将频率无需担心单词的组成或句子中单词的顺序,而这会极大地影响句子的含义。 RNN与那些经典算法不同,它在序列数据上效果很好,因为它将词i作为输入并与词i-1的输出结合在一起,对词i + 1也会应用相同的东西,这就是其被称为递归神经的原因因为显然神经网络对句子的每个单词i都应用相同的操作。
As you might be thinking enough bla bla show us how they work, and that's exactly what I’d do in the next part :
您可能已经想够了,bla bla告诉我们它们是如何工作的,而这正是我在下一部分中所做的:
RNN的工作原理: (How RNN works :)
In order to understand how RNN works under the hood, let’s take an example of NLP application Named entity recognition, this technique is used to detect names in a sentence :
In the examples above, for each instance of training (sentence) we map each word with an output, if the word is name(john, Ellen …) we map it to 1. Otherwise, we map it to 0. So to train RNN on sentences to recognize names within, the RNN architecture would be something like that :
正向传播: (Forward propagation :)
for this training example, we have 5 words which means 5 steps so for each step t we calculate a, y using the shared weights Wa,Wx, Wy, ba, by :
And generally the equations would be :
Then, we calculate the cost function to represent the relation between the real output y and the output predicted ŷ for each time step t:
Now, we’ll sum over the cost function of each word to calculate the loss function :
反向传播: (Back propagation :)
Back propagation is like going back in time to compute derivative of the loss function with respect to parameters Wa, Wx, Wy, ba, by using the chain rule to simplify the calculus. After getting the derivatives, we update the parameters using descent gradient :
反向传播就像通过使用链式规则简化计算来回溯以计算损耗函数相对于参数Wa,Wx,Wy,ba的导数。 得到导数后,我们使用下降梯度更新参数:
After multiple iterations using several training examples, we’d be able to minimize the loss function and the predicted output would converge to the real output. Thus, we’ll use the optimized weights to detect names through future sentences.
在使用几个训练示例进行多次迭代之后,我们将能够使损失函数最小化,并且预测输出将收敛到实际输出。 因此,我们将使用优化的权重来通过将来的句子检测名称。
For more articles check :
翻译自: https://medium.com/swlh/simple-explanation-of-recurrent-neural-network-rnn-1285749cc363
rnn 递归神经网络