Creating the Node and NodeList Classes

Creating the Node and NodeList Classes

A skip list, like a binary tree, is made up of a collection of elements. Each element in a skip list has some data associated with it―a height, and a collection of element references. For example, in Figure 12 the Bob element has the data Bob, a height of 2, and two element references, one to Dave and one to Cal. Before creating a skip list class, we first need to create a class that represents an element in the skip list. I named this class Node, and its germane code is shown below. (The complete skip list code is available in this article as a code download.)
Copy Code

public class Node
{
   #region Private Member Variables
   private NodeList nodes;
   IComparable myValue;
   #endregion

   #region Constructors
   public Node(IComparable value, int height)
   {
      this.myValue = value;
            this.nodes = new NodeList(height);
   }
   #endregion

   #region Public Properties
   public int Height
   {
      get { return nodes.Capacity; }
   }

   public IComparable Value
   {
      get { return myValue; }
   }

   public Node this[int index]
   {
      get { return nodes[index]; }
      set { nodes[index] = value; }
   }
   #endregion
}

Notice that the Node class only accepts objects that implement the IComparable interface as data. This is because a skip list is maintained as a sorted list, meaning that its elements are ordered by their data. In order to order the elements, the data of the elements must be comparable. (If you remember back to Part 3, our binary search tree Node class also required that its data implement IComparable for the same reason.)

The Node class uses a NodeList class to store its collection of Node references. The NodeList class, shown below, is a strongly-typed collection of Nodes and is derived from System.Collections.CollectionBase.
Copy Code

public class NodeList : CollectionBase
{
   public NodeList(int height)
   {
      // set the capacity based on the height
      base.InnerList.Capacity = height;

      // create dummy values up to the Capacity
      for (int i = 0; i < height; i++)
         base.InnerList.Add(null);
   }

   // Adds a new Node to the end of the node list
   public void Add(Node n)
   {
      base.InnerList.Add(n);
   }

   // Accesses a particular Node reference in the list
   public Node this[int index]
   {
      get { return (Node) base.InnerList[index]; }
      set { base.InnerList[index] = value; }
   }

   // Returns the capacity of the list
   public int Capacity
   {
      get { return base.InnerList.Capacity; }
   }
}

The NodeList constructor accepts a height input parameter that indicates the number of references that the node needs. It appropriately sets the Capacity of the InnerList to this height and adds null references for each of the height references.

With the Node and NodeList classes created, we're ready to move on to creating the SkipList class. The SkipList class, as we'll see, contains a single reference to the head element. It also provides methods for searching the list, enumerating through the list's elements, adding elements to the list, and removing elements from the list.

    Note   For a graphical view of skip lists in action, be sure to check out the skip list applet at [url]http://iamwww.unibe.ch/~wenger/DA/SkipList/.[/url] You can add and remove items from a skip list and visually see how the structure and height of the skip list is altered with each operation.

Creating the SkipList Class

The SkipList class provides an abstraction of a skip list. It contains public methods like:

    * Add(IComparable): adds a new item to the skip list.
    * Remove(IComparable): removes an existing item from the skip list.
    * Contains(IComparable): returns true if the item exists in the skip list, false otherwise.

And public properties, such as:

    * Height: the height of the tallest element in the skip list.
    * Count: the total number of elements in the skip list.

The skeletal structure of the class is shown below. Over the next several sections, we'll examine the skip list's operations and fill in the code for its methods.
Copy Code

public class SkipList
{
   #region Private Member Variables
   Node head;
   int count;
   Random rndNum;

   protected const double PROB = 0.5;
   #endregion

   #region Public Properties
   public virtual int Height
   {
      get { return head.Height; }
   }

   public virtual int Count
   {
      get { return count; }
   }
   #endregion

   #region Constructors
   public SkipList() : this(-1) {}
   public SkipList(int randomSeed)
   {
      head = new Node(1);
      count = 0;
      if (randomSeed < 0)
         rndNum = new Random();
      else
         rndNum = new Random(randomSeed);
   }
   #endregion

   protected virtual int chooseRandomHeight(int maxLevel)
   {
      ...
   }

   public virtual bool Contains(IComparable value)
   {
      ...
   }

   public virtual void Add(IComparable value)
   {
      ...
   }

   public virtual void Remove(IComparable value)
   {
      ...
   }
}

We'll fill in the code for the methods in a bit, but for now pay close attention to the class's private member variables, public properties, and constructors. There are three relevant private member variables:

    * head, which is the list's head element. Remember that a skip list has a dummy head element (refer back to Figures 11 and 12 for a graphical depiction of the head element).
    * count, an integer value keeping track of how many elements are in the skip list.
    * rndNum, an instance of the Random class. Since we need to randomly determine the height when adding a new element to the list, we'll use this Random instance to generate the random numbers.

The SkipList class has two read-only public properties, Height and Count. Height returns the height of the tallest skip list element. Since the head is always equal to the tallest skip list element, we can simply return the head element's Height property. The Count property simply returns the current value of the private member variable count. (count, as we'll see, is incremented in the Add() method and decremented in the Remove() method.)

Notice there are two forms of the SkipList constructor. The default constructor merely calls the second, passing in a value of -1. The second form assigns to head a new Node instance with height 1, and sets count equal to 0. It then checks to see if the passed in randomSeed value is less than 0 or not. If it is, then it creates an instance of the Random class using an auto-generated random seed value. Otherwise, it uses the random seed value passed into the constructor.

    Note   Computer random-number generators, such as the Random class in the .NET Framework, are referred to as pseudo-random number generators because they don't pick random numbers, but instead use a function to generate the random numbers. The random number generating function works by starting with some value, called the seed. Based on the seed, a sequence of random numbers are computed. Slight changes in the seed value lead to seemingly random changes in the series of numbers returned.

If you use the Random class's default constructor, the system clock is used to generate a seed. You can optionally specify a seed. The benefit of specifying a seed is that if you use the same seed value, you'll get the same sequence of random numbers. Being able to get the same results is beneficial when testing the correctness and efficiency of a randomized algorithm like the skip list.
Searching a skip list

The algorithm for searching a skip list for a particular value is straightforward. Non-formally, the search process can be described as follows: we start with the head element's top-most reference. Let e be the element referenced by the head's top-most reference. We check to see if the e's value is less than, greater than, or equal to the value for which we are searching. If it equals the value, then we have found the item for which we're looking. If it's greater than the value we're looking for and if the value exists in the list, it must be to the left of e, meaning it must have a lesser height than e. Therefore, we move down to the second level head node reference and repeat this process.

If, on the other hand, the value of e is less than the value we're looking for then the value, if it exists in the list, must be on the right hand side of e. Therefore, we repeat these steps for the top-most reference of e. This process continues until we find the value we're searching for, or exhaust all the "levels" without finding the value.

More formally, the algorithm can be spelled out with the following pseudo-code:
Copy Code

Node current = head
for i = skipList.Height downto 1
while current[i].Value < valueSearchingFor
    current = current[i] // move to the next node

if current[i].Value == valueSearchingFor then
   return true
else
return false


Take a moment to trace the algorithm over the skip list shown in Figure 13. The red arrows show the path of checks when searching the skip lists. Skip list (a) shows the results when searching for Ed. Skip list (b) shows the results when searching for Cal. Skip list (c) shows the results when searching for Gus, which does not exist in the skip list. Notice that throughout the algorithm we are moving in a right, downward direction. The algorithm never moves to a node to the left of the current node, and never moves to a higher reference level.

Figure 13. Searching over a skip list.

The code for the Contains(IComparable) method is quite simple, involving a while and a for loop. The for loop iterates down through the reference level layers. The while loop iterates across the skip list's elements.
Copy Code

public virtual bool Contains(IComparable value)
{
   Node current = head;
   int i = 0;

for (i = head.Height - 1; i >= 0; i--)
   {
      while (current[i] != null)
      {
         int results = current[i].Value.CompareTo(value);
         if (results == 0)
            return true;
         else if (results < 0)
            current = current[i];
         else // results > 0
            break; // exit while loop
      }
   }

   // if we reach here, we searched to the end of the list without finding the element
   return false;
}

Inserting into a skip list

Inserting a new element into a skip list is akin to adding a new element in a sorted link list, and involves two steps:

   1. Locate where in the skip list the new element belongs. This location is found by using the search algorithm to find the location that comes immediately before the spot the new element will be added
   2. Thread the new element into the list by updating the necessary references.

Since skip list elements can have many levels and, therefore, many references, threading a new element into a skip list is not as simple as threading a new element into a simple linked list. Figure 14 shows a diagram of a skip list and the threading process that needs to be done to add the element Gus. For this example, imagine that the randomly determined height for the Gus element was 3. To successfully thread in the Gus element, we'd need to update Frank's level 3 and 2 references, as well as Gil's level 1 reference. Gus's level 1 reference would point to Hank. If there were additional nodes to the right of Hank, Gus's level 2 reference would point to the first element to the right of Hank with height 2 or greater, while Gus's level 3 reference would point to the first element right of Hank with height 3 or greater.

Figure 14. Inserting elements into a skip list

In order to properly rethread the skip list after inserting the new element, we need to keep track of the last element encountered for each height. In Figure 14, Frank was the last element encountered for references at levels 4, 3, and 2, while Gil was the last element encountered for reference level 1. In the insert algorithm below, this record of last elements for each level is maintained by the updates array, which is populated as the search for the location for the new element is performed.
Copy Code

public virtual void Add(IComparable value)
{
   Node [] updates = new Node[head.Height];
   Node current = head;
   int i = 0;

   // first, determine the nodes that need to be updated at each level
   for (i = head.Height - 1; i >= 0; i--)
   {
      while (current[i] != null && current[i].Value.CompareTo(value) < 0)
         current = current[i];

      updates[i] = current;
   }

   // see if a duplicate is being inserted
   if (current[0] != null && current[0].Value.CompareTo(value) == 0)
      // cannot enter a duplicate, handle this case by either just returning or by throwing an exception
      return;

   // create a new node
   Node n = new Node(value, chooseRandomHeight(head.Height + 1));
   count++;   // increment the count of elements in the skip list

   // if the node's level is greater than the head's level, increase the head's level
   if (n.Height > head.Height)
   {
      head.IncrementHeight();
      head[head.Height - 1] = n;
   }

   // splice the new node into the list
   for (i = 0; i < n.Height; i++)
   {
      if (i < updates.Length)
      {
         n[i] = updates[i][i];
         updates[i][i] = n;
      }
   }
}

There are a couple of key portions of the Add(IComparable) method that are important. First, be certain to examine the first for loop. In this loop, not only is the correct location for the new element located, but the updates array is also fully populated. After this loop, a check is done to make sure that the data being entered is not a duplicate. I chose to implement my skip list such that duplicates are not allowed. However, skip lists can handle duplicate values just fine. If you want to allow for duplicates, simply remove this check.

Next, a new Node instance, n, is created. This represents the element to be added to the skip list. Note that the height of the newly created Node is determined by a call to the chooseRandomHeight() method, passing in the current skip list height plus one. We'll examine this method shortly. Another thing to note is that after adding the Node, a check is made to see if the new Node's height is greater than that of the skip list's head element's height. If it is, then the head element's height needs to be incremented, because the head element height should have the same height as the tallest element in the skip list.

The final for loop rethreads the references. It does this by iterating through the updates array, having the newly inserted Node's references point to the Nodes previously pointed to by the Node in the updates array, and then having the updates array Node update its reference to the newly inserted Node. To help clarify things, try running through the Add(IComparable) method code using the skip list in Figure 14, where the added Node's height is 3.

Randomly Determining the Newly Inserted Node's Height

When inserting a new element into the skip list, we need to randomly select a height for the newly added Node. Recall from our earlier discussions of skip lists that when Pugh first envisioned multi-level, linked-list elements, he imagined a linked list where each 2ith element had a reference to an element 2i elements away. In such a list, precisely 50 percent of the nodes would have height 1, 25 percent with height 2, and so on.

The chooseRandomHeight() method uses a simple technique to compute heights so that the distribution of values matches Pugh's initial vision. This distribution can be achieved by flipping a coin and setting the height to one greater than however many heads in a row were achieved. That is, if upon the first flip you get a tails, then the height of the new element will be one. If you get one heads and then a tails, the height will be 2. Two heads followed by a tails indicates a height of three, and so on. Since there is a 50 percent probability that you will get a tails, a 25 percent probability that you will get a heads and then a tails, a 12.5 percent probability that you will get two heads and then a tails, and so on. The distribution works out to be the same as the desired distribution.

The code to compute the random height is given by the following simple code snippet:
Copy Code

const double PROB = 0.5;
protected virtual int chooseRandomHeight()
{
   int level = 1;
   while (rndNum.NextDouble() < PROB)
      level++;

   return level;
}


One concern with the chooseRandomHeight()method is that the value returned might be extraordinarily large. That is, imagine that we have a skip list with, say, two elements, both with height 1. When adding our third element, we randomly choose the height to be 10. This is an unlikely event, since there is only roughly a 0.1 percent chance of selecting such a height, but it could conceivable happen. The downside of this, now, is that our skip list has an element with height 10, meaning there is a number of superfluous levels in our skip list. To put it more bluntly, the references at levels 2 up to 10 would not be utilized. Even as additional elements were added to the list, there's still only a 3 percent chance of getting a node over a height of 5, so we'd likely have many wasted levels.

Pugh suggests a couple of solutions to this problem. One is to simply ignore it. Having superfluous levels doesn't require any change in the code of the data structure, nor does it affect the asymptotic running time. The approach I chose to use is to use "fixed dice" when choosing the random level. That is, you restrict the height of the new element to be a height of at most one greater than the tallest element currently in the skip list. The actual implementation of the chooseRandomHeight() method is shown below, which implements this "fixed dice" approach. Notice that a maxLevel input parameter is passed in, and the while loop exits prematurely if level reaches this maximum. In the Add(IComparable) method, note that the maxLevel value passed in is the height of the head element plus one. (Recall that the head element's height is the same as the height of the maximum element in the skip list.)
Copy Code

protected virtual int chooseRandomHeight(int maxLevel)
{
   int level = 1;

   while (rndNum.NextDouble() < PROB && level < maxLevel)
      level++;

   return level;
}

The head element should be the same height as the tallest element in the skip list. So, in the Add(IComparable) method, if the newly added Node's height is greater than the head element's height, I call the IncrementHeight() method:
Copy Code

/* - snippet from the Add() method… */
if (n.Height > head.Height)
{
   head.IncrementHeight();
   head[head.Height - 1] = n;
}
/************************************/

The IncrementHeight() is a method of the Node class that I left out for brevity. It simply increases the Capacity of the Node's NodeList and adds a null reference to the newly added level. For the method's source code refer to the article's code sample.

    Note   In his paper, "Skip Lists: A Probabilistic Alternative to Balanced Trees," Pugh examines the effects of changing the value of PROB from 0.5 to other values, such as 0.25, 0.125, and others. Lower values of PROB decrease the average number of references per element, but increase the likelihood of the search taking substantially longer than expected. For more details, be sure to read Pugh's paper, which is mentioned in the References section at the end of this article.

Deleting an element from a skip list

Like adding an element to a skip list, removing an element involves a two-step process:

   1. The element to be deleted must be found.
   2. That element needs to be snipped from the list and the references need to be rethreaded.

Figure 15 shows the rethreading that must occur when Dave is removed from the skip list.

Figure 15. Deleting an element from a skip list

As with the Add(IComparable) method, Remove(IComparable) maintains an updates array that keeps track of the elements at each level that appear immediately before the element to be deleted. Once this updates array has been populated, the array is iterated through from the bottom up, and the elements in the array are rethreaded to point to the deleted element's references at the corresponding levels. The Remove(IComparable) method code follows.
Copy Code

public virtual void Remove(IComparable value)
{
   Node [] updates = new Node[head.Height];
   Node current = head;
   int i = 0;

   // first, determine the nodes that need to be updated at each level
   for (i = head.Height - 1; i >= 0; i--)
   {
      while (current[i] != null && current[i].Value.CompareTo(value) < 0)
         current = current[i];

      updates[i] = current;
   }

   current = current[0];
   if (current != null && current.Value.CompareTo(value) == 0)
   {
      count--;

      // We found the data to delete
      for (i = 0; i < head.Height; i++)
      {
         if (updates[i][i] != current)
            break;
         else
            updates[i][i] = current[i];
      }

      // finally, see if we need to trim the height of the list
      if (head[head.Height - 1] == null)
      {
         // we removed the single, tallest item... reduce the list height
         head.DecrementHeight();
      }
   }
   else
   {
      // the data to delete wasn't found. Either return or throw an exception
      return;
   }
}

The first for loop should look familiar. It's the same code found in Add(IComparable), used to populate the updates array. Once the updates array has been populated, we check to ensure that the element we reached does indeed contain the value to be deleted. If not, the Remove() method simply returns. You might opt to have it throw an exception of some sort, though. Assuming the element reached is the element to be deleted, the count member variable is decremented and the references are rethreaded. Lastly, if we deleted the element with the greatest height, then we should decrement the height of the head element. This is accomplished through a call to the DecrementHeight() method of the Node class.
Analyzing the running time

In "Skip Lists: A Probabilistic Alternative to Balanced Trees," Pugh provides a quick proof showing that the skip list's search, insertion, and deletion running times are asymptotically bounded by log2 n in the average case. However, a skip list can exhibit linear time in the worst case, but the likelihood of the worst-case scenario happening is extremely unlikely.

Since the heights of the elements of a skip list are randomly chosen, there is a chance that all, or virtually all, elements in the skip list will end up with the same height. For example, imagine that we had a skip list with 100 elements, all that happen to have height 1 chosen for their randomly selected height. Such a skip list would be, essentially, a normal linked list, not unlike the one shown in Figure 8. As we discussed earlier, the running time for operations on a normal linked list is linear.

While such worst-case scenarios are possible, realize that they are highly improbable. To put things in perspective, the likelihood of having a skip list with 100 height 1 elements is the same likelihood of flipping a coin 100 times and having it come up tails all 100 times. The chances of this happening are precisely 1 in 1,267,650,600,228,229,401,496,703,205,376. Of course with more elements, the probability goes down even further. For more information, be sure to read about Pugh's probabilistic analysis of skip lists in his paper.

Examining Some Empirical Results

Included in the article's download is the SkipList class, along with a testing Windows Forms application. With this testing application, you can manually add, remove, and inspect the list, and can see the nodes of the list displayed. Also, this testing application includes a "stress tester," where you can indicate how many operations to perform and an optional random seed value. The stress tester then creates a skip list, adds at least half as many elements as operations requested, and then, with the remaining operations, does a mix of inserts, deletes, and queries. At the end you can see review a log of the operations performed and their result, along with the skip list height, the number of comparisons needed for the operation, and the number of elements in the list.

The graph in Figure 16 shows the average number of comparisons per operation for increasing skip list sizes. Note that as the skip list doubles in size, the average number of comparisons needed per operation only increases by a small amount (one or two more comparisons). To fully understand the utility of logarithmic growth, consider how the time for searching an array would fare on this graph. For a 256 element array, on average 128 comparisons would be needed to find an element. For a 512 element array, on average 256 comparisons would be needed. Compare that to the skip list, which for skip lists with 256 and 512 elements require only 9 and 10 comparisons on average!

Figure 16. Viewing the logarithmic growth of comparisons required for an increasing number of skip list elements.
Conclusion

In Part 3 of this article series, we looked at binary trees and binary search trees. BSTs provide an efficient log2 n running time in the average case. However, the running time is sensitive to the topology of the tree, and a tree with a suboptimal ratio of breadth to height can reduce the running time of a BST's operations to linear time.

To remedy this worst-case running time of BSTs, which could happen quite easily since the topology of a BST is directly dependent on the order with which items are added, computer scientists have been inventing a myriad of self-balancing BSTs, starting with the AVL tree created in the 1960s. While data structures such as the AVL tree, the red-black tree, and numerous other specialized BSTs offer log2 n running time in both the average and worst case, they require especially complex code that can be difficult to correctly create.

An alternative data structure that offers the same asymptotic running time as a self-balanced BST, is William Pugh's skip list. The skip list is a specialized, sorted link list, whose elements have a height associated with them. In this article we constructed a SkipList class and saw how straightforward the skip list's operations were, and how easy it was to implement them in code.

This fourth part of the article series is the last proposed part on trees. In the fifth installment, we'll look at graphs, which is a collection of vertexes with an arbitrary number of edges connecting each vertex to one another. As we'll see in Part 5, trees are a special form of graphs. Graphs have an extraordinary number of applications in real-world problems.

As always, if you have questions, comments, or suggestions for future material to discuss, I invite your comments! I can be reached at [email protected].

Happy Programming!
References

    * Cormen, Thomas H., Charles E. Leiserson, and Ronald L. Rivest. "Introduction to Algorithms." MIT Press. 1990.
    * Pugh, William. "Skip Lists: A Probabilistic Alternative to Balanced Trees." Available online at [url]ftp://ftp.cs.umd.edu/pub/skipLists/skiplists.pdf.[/url]

Related Books

    * Combinatorial Algorithms, Enlarged Second Edition by Hu, T. C.
    * Algorithms in C, Parts 1-5 (Bundle): Fundamentals by Sedgewick, Robert

Scott Mitchell, author of five books and founder of 4GuysFromRolla.com, has been working with Microsoft Web technologies for the past five years. Scott works as an independent consultant, trainer, and writer, and recently completed his Masters degree in Computer Science at the University of California �C San Diego. He can be reached at [email protected] or through his blog at [url]http://ScottOnWriting.NET.[/url]

你可能感兴趣的:(职场,休闲,HTMLParser Java)