SXEmacs Internals Manual: Mathematics of Extent Ordering

23.5 Mathematics of Extent Ordering

The extents in a buffer are ordered by “display order” because that is that order that the redisplay mechanism needs to process them in. The e-order is an auxiliary ordering used to facilitate operations over extents. The operations that can be performed on the ordered list of extents in a buffer are

Locate where an extent would go if inserted into the list.
Insert an extent into the list.
Remove an extent from the list.
Map over all the extents that overlap a range.

(4) requires being able to determine the first and last extents that overlap a range.

NOTE: overlap is used as follows:

two ranges overlap if they have at least one point in common. Whether the endpoints are open or closed makes a difference here.
a point overlaps a range if the point is contained within the range; this is equivalent to treating a point P as the range [P, P].
In the case of an extent overlapping a point or range, the extent is normally treated as having closed endpoints. This applies consistently in the discussion of stacks of extents and such below. Note that this definition of overlap is not necessarily consistent with the extents that map-extents maps over, since map-extents sometimes pays attention to whether the endpoints of an extents are open or closed. But for our purposes, it greatly simplifies things to treat all extents as having closed endpoints.

First, define >, <, <=, etc. as applied to extents to mean comparison according to the display order. Comparison between an extent E and an index I means comparison between E and the range [I, I].

Also define e>, e<, e<=, etc. to mean comparison according to the e-order.

For any range R, define R(0) to be the starting index of the range and R(1) to be the ending index of the range.

For any extent E, define E(next) to be the extent directly following E, and E(prev) to be the extent directly preceding E. Assume E(next) and E(prev) can be determined from E in constant time. (This is because we store the extent list as a doubly linked list.)

Similarly, define E(e-next) and E(e-prev) to be the extents directly following and preceding E in the e-order.

Now:

Let R be a range. Let F be the first extent overlapping R. Let L be the last extent overlapping R.

Theorem 1: R(1) lies between L and L(next), i.e. L <= R(1) < L(next).

This follows easily from the definition of display order. The basic reason that this theorem applies is that the display order sorts by increasing starting index.

Therefore, we can determine L just by looking at where we would insert R(1) into the list, and if we know F and are moving forward over extents, we can easily determine when we’ve hit L by comparing the extent we’re at to R(1).

Theorem 2: F(e-prev) e< [1, R(0)] e<= F.

This is the analog of Theorem 1, and applies because the e-order sorts by increasing ending index.

Therefore, F can be found in the same amount of time as operation (1), i.e. the time that it takes to locate where an extent would go if inserted into the e-order list.

If the lists were stored as balanced binary trees, then operation (1) would take logarithmic time, which is usually quite fast. However, currently they’re stored as simple doubly-linked lists, and instead we do some caching to try to speed things up.

Define a stack of extents (or SOE) as the set of extents (ordered in the display order) that overlap an index I, together with the SOE’s previous extent, which is an extent that precedes I in the e-order. (Hopefully there will not be very many extents between I and the previous extent.)

Now:

Let I be an index, let S be the stack of extents on I, let F be the first extent in S, and let P be S’s previous extent.

Theorem 3: The first extent in S is the first extent that overlaps any range [I, J].

Proof: Any extent that overlaps [I, J] but does not include I must have a start index > I, and thus be greater than any extent in S.

Therefore, finding the first extent that overlaps a range R is the same as finding the first extent that overlaps R(0).

Theorem 4: Let I2 be an index such that I2 > I, and let F2 be the first extent that overlaps I2. Then, either F2 is in S or F2 is greater than any extent in S.

Proof: If F2 does not include I then its start index is greater than I and thus it is greater than any extent in S, including F. Otherwise, F2 includes I and thus is in S, and thus F2 >= F.