The R*-tree: An Efficient Robust Access Method for Points and Rectangles

 

4/8/10
The R*-tree: An Efficient Robust Access Method for Points and Rectangles
This paper elaborates further on the Guttman's R-tree and discuss several drawbacks of the Guttman's version. Then it put forward a new variant of R-tree: the R* tree which largely improve the performance of spatial access.
Summary of the paper:
  1. Summarize of the main property of R-tree
    Its spatial access method(SAM) is mainly based on overlapping regions by using minimum bound rectangles(directory rectangles).
    The R-tree is based on a heuristic optimization and the optimization criterion it persues is to minimize the area of each enclosing rectangle in the inner nodes.
  2. Be Critical about the old method
    It points out that the old method only consider one aspect in optimization that is to minimize area, instead the paper suggests we can incorporate various optimization criteria together and to adopt the engineering approach which work out the best solution by experiments.
  3. A standard performance test is introduced.
  4. Re-evaluate the R-tree
    Many facts contribute to good retrieval performance, but they may affect each other in a complex ways.
    It list 4 principles for optimizations: these principle may have negative effect on each other
    • Minimize area
    • Minimize overlapping
    • Minimize margin
    • Storage utilization should be optimized
  5. The core of optimization: Insertion
    The paper take insight into some insertion variants of the R-tree, elaborate the idea each of them takes and states some problems with these variants such as small seeds, biased to one group and ignorance of geometric properties.
    It introduce Greene's-Split which in my eyes, it makes another standard in evaluating the distance between two rectangles. Of course, it is still not perfect.
  6. The R* tree's solution
    • Choose Subtree
      Instead of focus on only one aspect, it takes three aspects into consideration. And it works out a optimized combination through great amount of experiments. I think it is a suggestive trial in doing research which consider several facts and use them according to different classification, in this case, the leaf and non-leaf.
      Since the quadratic cost for choosing subtree maybe undesirable in practice, a approximation algorithm is given.
    • Split of R* tree
      Again the splitting process takes area-value, margin-value and overlap-value into account.
    • Forced Reinsert
      This optimization is somewhat different from earlier approach. Considering the R-tree is comparatively static once the entry have been inserted, it cannot guarantee overall performance after thousands of insertion. Instead, a reconstruction of the tree is necessary.
      By utilizing forced reinsertion, it improves overall structure and require less splits.
  7. Performance of the R* tree
    Compare between variants of R-tree on different test cases. Since it is mainly about testing, I read through this part roughly.
  8. My Questions
  •  
    • What does quadratic query rectangles mean? To minimize margin will make the directory rectangles more quadratic, can't understand.
    • Why R* tree's implementation of choose subtree routine can reduce disc access? Disc access is equal to log(N)-1, R* tree implementation could not reduce the height of the tree I think.
    • How to insert a point into a R-tree? The point has no size and it will never cause enlargement of area, the algorithm seems doesn't work...
      So I can hardly understand the point access method(PAM) part in the paper.

你可能感兴趣的:(The R*-tree: An Efficient Robust Access Method for Points and Rectangles)