Brief Introduction to Suffix Array

Brief Introduction to Suffix Array

Last Modified: 2000-11-14 (Since: 2000-11-14)

Suffix array is a data structure designed for efficient searching of a large text. The data structure is simply an array containing all the pointers to the text suffixes sorted in lexicographical (alphabetical) order. Each suffix is a string starting at a certain poinsition in the text and ending at the end of the text. Searching a text can be performed by binary search using the suffix array.

Let's get started with the suffix array construction. Suppose that we have the sample text ``abracadabra'' and wish to construct the suffix array for the sample text.

First, we should assign index points to the sample text. Index points specify positions where search can be performed. In our example, index points are assigned character by character. Thus, we can search the sample text with the suffix array at any positions later.

Second, we should sort the index points according to thier corresponding suffixes. The correspondance between the index points and the suffixes looks like

After sorting:

Brief Introduction to Suffix Array_第1张图片

Finally, The resulting index points become the suffix array for the sample text.

Search of the sample text can be performed by binary search using the created suffix array. The following figure shows the process of searching the sample text for `ra'. Numbered arrows shows the order of processing.

Brief Introduction to Suffix Array_第2张图片

References

  • Bentley: Programming Pearls 2nd edition
    Chapter 15 is a good reference for learning suffix array.
  • Baeza-Yates et al.: Modern Information Retrieval
    Chapter 8 explains suffix array as well as other data structures.
  • Manber et al.: Suffix Arrays - A New Method for On-Line String Searches
    The original paper.
  • Yamashita: Terminology "Suffix Array" (in Japanese), Journal of Japanese Society for Artificial Intelligence Vol. 15 No. 6 p. 1142

你可能感兴趣的:(Brief Introduction to Suffix Array)