Delta encoding for `Span` and `Id`

This issue is a follow-up proposal to #233, which should be attempted first. Read that issue first for the background.

Whereas "start/length" encoding as described in #233 is probably a clean win with few downsides, there are even more aggressive approaches that do have significant costs. Namely: consider a "start-only" or even "length-only" representation. (The latter would essentially be [delta encoding][delta].) This would mean that `Span` would turn into a single 32- or 16-bit number, with these important trade-offs:

* These require that all the ranges in a pool are contiguous, i.e., range R's end is always right next to range R+1's start. There are no gaps, and the ranges appear in order.
* The span information is no longer self-contained; it requires looking at neighboring values to be useful. Namely: the `Span<T>` by itself would not suffice to find the items you want. You'd need to have something like an `Id<Span<T>>`, i.e., an *index of* a span, so that you can look up the neighboring span too, which is required to find the actual range of data you need.
* In the case of a "length-only" representation, it's even worse: you can't do random access to spans; you always need to iterate from the beginning of the `Pool<Span<T>>`, summing up the lengths along the way, to find the indices you need. Or, rather, doing random access to look up a specific span would require O(n) time instead of O(1). _This might be OK_ for some types, however, for which it is typical for client code to iterate from the beginning anyway. But it may not be appropriate for all pools, and we'd probably need to maintain multiple representation for different data types.

Aside from `Span<T>`, the "length-only" representation generalizes to `Id<T>` as well: this would be an "offset-only" representation where you only encode the offset from the previous `Id<T>` in a pool. The trade-offs are similar.

This could get to be a pretty involved project, but the idea would be to think carefully about where one of these tricky representations *could* be profitable. Then, we'll prototype it for that specific type of span, come up with some nice high-level interfaces that make the ranges ergonomic to access, and measure the impact on file size & performance.

[delta]: https://en.wikipedia.org/wiki/Delta_encoding

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Delta encoding for `Span` and `Id` #234

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Delta encoding for Span and Id #234

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Delta encoding for `Span` and `Id` #234