Authors present the gated self-matching networks for reading comprehension style question answering, which aims to answer questions from a given passage.
Firstly, math the question and passage with gated attention-based recurrent networks to obtatin the question-aware passage representation.
Then, utilize a self-matching attention mechanism to refine the presentation by matching the passage against itself.
Finally, employ the pointer networks to locate the positions of answers from the passage.
This model (R-Net) consists of four parts:
1. the recurrent network encoder (to build representation for questions and passage separately)
2. the gated matching layer (to match the question and passage)
3. the self-matching layer (to aggregate information from the whole passage)
4. the pointer network layer (to predict the answer boundary)
Three-fold key contributions:
1. propose a gated attention-based recurrent network, assigning different levels of importance to passage parts depending on their relevance to the question
2. introduce a self-matching mechanism, effectively aggregating evidence from the whole passage to infer the answer and dynamically refining passage representation with information from the whole passage
3. yield state-of-the-art results against strong baselines
Given a passage P P and a question Q Q , predict an answer A A to question Q Q based on information in P P
Consider a question Q={wQt}mt=1 Q = { w t Q } t = 1 m and a passage P={ wPt}nt=1 P = { w t P } t = 1 n , firstly convert words to word-level embeddings ( { eQt}mt=1 { e t Q } t = 1 m and { ePt}nt=1 { e t P } t = 1 n ) and character-level embeddings ( { cQt}mt=1 { c t Q } t = 1 m and { cPt}nt=1 { c t P } t = 1 n ) which are generated by taking final hidden states of a bi-directional recurrent neural network applied to embeddings of characters in the token. Such character-level embeddings have been shown to be helpful to deal with out-of-vocab tokens.
Then use a bi-directional RNN to produce new representation { uQt}mt=1 { u t Q } t = 1 m and { uPt}nt=1 { u t P } t = 1 n .
Here, use Gated Recurent Unit (GRU) because it is computationally cheaper.
Utilize a gated attention-based recurrent network (a variant of attention-based recurrent networks) to incorporate question information into passage representation.
Given { uQt}mt=1 { u t Q } t = 1 m and { uPt}nt=1 { u t P } t = 1 n , generate question-aware passage representation { vPt}nt=1 { v t P } t = 1 n via soft-alignment of words
The self-matching attention is aim to solve the presentation with limited knowledge of context. It dynamically
1. coleects evidence from the whole passage words
2. encodes the evidence relevant to the current passage word and its matching question information into the passage representation hPt h t P :
After the original self-matching layer of the passage, authors utilize bi-directional GRU to deeply integrate the matching results before feeding them into answer pointer layer. It helps to further propagate the information aggregated by self-matching of the passage.
Use an attention-polling over the question representation to generate the initial hidden vector for the pointer network to predict the start and end position of the answer.
Given a passage representation { hPt}nt=1 { h t P } t = 1 n , the attention mechanism is utilized as a pointer to select the start position p1 p 1 and end position p2 p 2 :
To train the network, minimize the objective function: