2.5.3 Sentence-Rank
In previous section, a graph-based ranking algorithm on words is introduced. In this section, the vertex in the graph is applied on the sentences. Compared to word-rank, sentence rank preserves much more linguistic information and the results from sentence rank are more understandable because now the LS is composed by whole sentence rather than terms and phrases.
Similar to the word rank graph, without considering the un-weighted graph, there are 2 kinds of graph construction for sentences in a document: undirected weighted graph and directed weighted graph. Moreover, Rada and Paul proposed another form of directed weighted graph and they classify directed weighted graph into directed forward weighted graph and directed backward weighted graph [9]. In undirected weighted graph, the assumption is that every sentence in the document has connection with all other sentences. In directed forward weighted graph, every sentence points to all the following sentences in text, while every sentence receives all the previous sentences in text. In directed backward weighted graph, every sentence points to all the previous sentences in text, while every sentence receives all the following sentences in text.
Figure2. 17
Figure2.18 shows a 5-sentences passage. Figure2.18 shows the 3 graphs in detail. In Figure2.18 (a), as an undirected graph, every sentence has connection with other 4 sentences with a pair of in coming pointer and out coming pointer. In Figure2.18 (b), as a directed forward graph, sentence 1 points to sentences 2, 3, 4, 5; sentence 2 points to sentences 3, 4, 5; sentence 3 points to sentences 4, 5; sentence 4 points to sentence 5 and sentence 5 does not have out coming pointer. As a directed backward graph, Figure2.18 (c) shows the (b)’s reversed order, sentence 5 points to sentences 1, 2, 3, 4; sentence 4 points to sentences 1, 2, 3; sentence 3 points to sentences 1, 2; sentence 2 points to sentence 1 and sentence 1 does not have out coming pointer.
a. undirected b. directed forward c. directed backward
Figure2. 18
Therefore, after the direction being decided, the weight between the connections is about computing the 2 sentences’ similarity, as in equation 2-10 , the common terms shared by both sentences are divided by the length of both sentences in log form. Length can be simply considered as the terms number in sentence.
[10] 2- 10
Generally, Wk is a word appearing in both Si and Sj. Wk can also be the words with same meaning but different forms such as “interested” and “interesting” which could finally prove the ability in finding the similar or related documents. If there is no common word Wk, then the edge between Si and Sj can be removed.
After iterations on 2-8 and 2-9 , the sentences from text with the highest rank are selected and taken as an abstraction. In [9], 4 highest ranked sentences are extracted after the iteration, resulting in a summary of about 100 words. Rada and Paul also evaluated the abstraction from this graph ranking algorithm with other methods such as HITS and Positional Power Function (POS) by ROUGE evaluation [15]. It turned out the sentence rank from page rank provided good performance. The details are not included in this report.
After all, Rada and Paul concluded that, based on this sentence rank algorithm, the abstraction from the extracted sentences are more informative for the given text and it does not require deep linguistic language nor domain or language specific annotated corpora [10].