# Skip Lists

This article was published originally in the April 2001
issue of *Java Pro* with
the unenlightening title “The Sort-ed
Details.” It was written when JDK 1.3 was the latest
available Java release, years
before `ConcurrentSkipListMap`

appeared in
JDK 1.6. The example code is instructional and does not implement
all of the Java Collections interfaces required for a
production-use implementation. Even though you will want to
use `ConcurrentSkipListMap`

and `ConcurrentSkipListSet`

instead of
implementing your own skip list, the article remains useful for
those who would like an overview of how a skip list works.

Changes. The code examples have been updated to use generics, an
exercise of little value that makes the code harder to read. I
have also added timings for
`ConcurrentSkipListMap`

to the driver
program so you can see that the performance properties described
in the article are implementation-independent. Depending on the
hardware and JVM used—and the order in which the tests are
run—`ConcurrentSkipListMap`

executes
`put`

, `get`

,
and `remove`

slower or faster than the
article's code. In all cases, the execution times are the same
base-2 order of magnitude (i.e., less than a factor of two
difference) as the example code, which is two to three times
slower than
`TreeMap`

for the tests performed.

## Sorting and Searching

Sorting
continues to be one of the most common operations performed by
computer programs. Java programs are no exception. The Collections
Framework recognizes the fundamental need for ordering computer
data by providing the `Comparable`

and `Comparator`

interfaces. If you
cannot determine the natural ordering between two objects, you
cannot sort a collection of
objects. The `Comparable`

interface
allows an object to control its ordering by implementing the
`compareTo(Object)`

method. The `Comparator`

interface
allows you to implement
a `compare(Object,Object)`

method that
returns the ordering of an arbitrary pair of objects that may not
have been designed with ordering in mind.

A common approach to ordering data is to store the data in a
container, be it an array or a list of some kind, and sort it as
needed. Arrays are often
difficult to work with when you don't know how many array elements
you will need to store all of your data.
The `Vector`

and `ArrayList`

classes—each of which
implements the `java.util.List`

interface—resolve this difficulty by providing dynamically
growing array-based storage. Array data can be sorted with
`java.util.Arrays.sort()`

and `List`

data can be sorted
with `java.util.Collections.sort()`

. Once
the data is sorted, you can search it relatively efficiently in
O(log_{2}(n)) time using a binary search algorithm, implemented
by `Arrays.binarySearch`

and `Collections.binarySearch`

.

Searching for data in linear storage containers, such as arrays, is inefficient. In the worst case it requires the examination of every item of data. If the data is sorted, you can take advantage of a binary search, but you still have to pay the cost of sorting the data. For dynamically changing collections of data, that cost is not acceptable. An alternative is to use a data structure that maintains the ordering between its elements as those elements are inserted and removed.

Mappings, such as those represented by
the `java.util.Map`

interface,
associate a set of keys with a set of values. Sometimes all we
care about is the ability to quickly retrieve a value associated
with a given key. Should we want to access the mapped values
based on the natural ordering of their keys, or define a new
submapping over a range of key values, we have to sort the keys.
In these cases, an ordered mapping, represented by
the `java.util.SortedMap`

interface,
is more appropriate. The `java.util.TreeMap`

class implements the `SortedMap`

interface using a red-black tree, a type of balanced binary search
tree that allows both efficient searching and ordered traversal.
There are many alternative data structures that could have been
used for a default `SortedMap`

implementation in the Collections Framework. Implementing one of
them may shed some light on why the red-black tree was chosen over
the available alternatives.

## Enter the Skip List

Figure 1 contains the source for an implementation of a
probabilistically balanced container called a skip list. Skip
lists were invented by Bill Pugh, a professor at the University of
Maryland, in 1987 (see “Skip lists: a probablistic
alternative to balanced trees,” *Communications of
the ACM*, Vol. 33, No. 6, June 1990), who observed that
a hierarchy of linked lists is equivalent to a binary tree. You
can think of a skip list as a set of linked lists stacked one on
top of the other.

### List Arrays

It's easier to get a handle on skip lists by studying a
special case called a list array. In a list array, the linked
list at the lowest level (level 0) contains all the elements of
the list, like a normal linear linked list. The list at the
second level contains pointers to every other element. The third
level contains pointers to every fourth element, and so on. Each
level contains pointers to
^{n}/_{2l}
elements, where *n* is the number of elements in
the list and *l* is the level number starting
from 0.

Searching for a key in a list array can be done in O(log_{2}(n))
time. Starting at the topmost level, traverse the list until
you encounter an element greater than or equal to the search
key. If the element is equal to the search key, you're done.
If not, go down a level from the previous element and repeat the
process. If you reach level 0 without finding the key, it's not
in the list and you're done. Assuming a list
of *n* elements, the search requires no more
than 2log_{2}(n)
comparisons because there
are log_{2}(n)
lists and you compare no more than two elements before moving to
a lower level. Given the structure of the list array, the
search is analogous to a binary search, although it incurs twice
as many comparisons.

The list array search algorithm is implemented in the
`get()`

method in Figure 1. Notice that
a dummy list header and tail are used to avoid treating boundary
conditions as special cases. The use of the
`MinimumKey`

and `MaximumKey`

classes ensures that the
head of the list will always appear to be the lowest-valued
element and the tail of the list will appear to be the
highest-valued element.

Although searching exhibits a running time competitive with tree-based techniques, insertions and deletions do not perform as well. If you were to insert a key at the beginning of the list, you would need to adjust the level of every other element in the list because a list array restricts pointers at a given level to skip exactly one element in the list at the next lowest level.

### Probabilistic Balancing

It turns out that you can achieve similar search
performance and much better insertion and deletion performance
if you can guarantee that the *average*
number of elements skipped is one, instead of the exact number.
A skip list is a list array that provides such a guarantee by
randomly generating the skip increment for each level when you
insert a key. To insert a key, use the search algorithm to
locate where to perform the insertion. If the search succeeds
and duplicates are disallowed, simply replace the value
associated with the key. If the search fails, you should insert
the key immediately after the last node visited whose value was
less than the value to be inserted.

While you are searching, keep track of the last node
visited at each level so that you can update their links after
the insertion. Before you insert the key, you have to decide at
which levels to insert it (note that every key must be inserted
at level 0). You do this by generating a random number between
0 and 1. If the value is less than 0.5, stop. If the value is
equal or greater, go up an additional level and insert the key.
This generates another random number and repeats the process.
The random number generation is implemented
in `__randomLevel()`

(but uses a
probability of 0.25 instead of 0.5) in Figure 1, and the
insertion process is implemented
in `put()`

.

Skip lists can become relatively deep, but the
probablistic nature of the balancing makes it unlikely. You can
set a cap on the depth to keep small lists shallow. The maximum
level should be set so that the expected number of elements at
the maximum level is 1. If the probability for level skipping
is *p*, then the maximum level is
log_{1/p}(n). The example driver in Figure 2
sets *n* to 2^{20}
and *p*
to ^{1}/_{4}, making
the maximum level equal to 10. Deletions work in much the same
way as insertions, as shown in
the `remove()`

method from
Figure 1.

## Performance

Figure 2 contains a driver program that tests
the `SkipList`

class from Figure 1,
comparing its performance to `TreeMap`

. The
test is unscientific and should really explore a range of values
for both `NUM_LEVELS`

and `NUM_ELEMENTS`

. Nonetheless, it is good
enough to get a rough picture of skip list behavior.

If you run the driver, you will find that for large numbers
of elements, the `SkipList`

class takes
roughly twice as long as `TreeMap`

to perform
insertions, deletions, and searches. Insertions take longer
because of the time spent generating random numbers. Both
insertions and deletions suffer because they have to maintain
the `__update[]`

array and index into an array on
every call to `getNext()`

. Array indexing
incurs a higher cost than accessing a left or right child in a
tree because of run-time array bounds checking. All of the
operations suffer from the extra comparisons required when
traversing a skip list. Even though both skip list and tree
searches are O(log_{2}(n)), the constant factors associated with skip
list searches in Java are greater.

Straight-up in-order list traversal—the primary reason
for using an ordered mapping—is comparable in both
situations. You would expect it to be more efficient in a skip
list because all you have to do is follow all of the links at the
lowest level, whereas a tree requires you to perform potentially
costly comparisons. Array indexing again appears to be the
culprit. For smaller numbers of
elements, `SkipList`

is more
competitive.

One of the reasons advocated for using skip lists is that they are easier to implement. I find that they are no easier to implement than red-black trees. They also require more storage, which is generally undesirable. Even so, their worst-case performance is independent of the data they store. If you enter sorted data in sequence to a binary tree (not a red-black tree), it degenerates to a linear linked list. With a skip list, your performance is protected by the probabilistic nature of the link construction. Some of the alternatives to skip lists and red-black trees are AVL trees, splay trees, and 2-3 trees. The red-black tree was perhaps the best choice for implementing a general-purpose ordered mapping for the core APIs, but Java makes a capable playground for exploring the alternatives.