Skip to content

Mappings v2: More Efficient Encoding #155

Open
@jridgewell

Description

@jridgewell

In the last Scopes meeting, I mentioned that I had been working on a project that reduced reduced Google's module graph encoding by ~30% by using a packed VLQ encoding, essentially removing any separators like , or ;. Applying this to our mappings encoding, I think we can remove ~30% (or ~50% if we switch to an 8-bit VLQ and binary encoding).

Removing Separators

In order to remove separators, we first need to know exactly how many lines are present in the map, and how many mappings are present on each line. That should allow us to do a simple loop (ignore the relative deltas, this is just psuedocode):

const lines = readInt();
for (let i = 0; i < lines; i++) {
  const mappings = readInt();
  for (let j = 0; j < mappings; j++) {
    readMapping()
  }
}

The problem is that each mapping has a variable number of fields (either 1, 4, or 5). Without a , separator, we don't know when to stop reading the fields for the current mapping. So we also need to encode the length of each mapping. It's easy to do this with a field before each mapping:

function readMapping() {
  const length = readInt();
  const genCol = readInt();
  if (length === 1) return [genCol];

  const index = readInt();
  const line = readInt();
  const col = readInt();
  if (length === 4) return [genCol, index, line, col];

  return [genCol, index, line, col, readInt()];
}

This alone is pretty good, but we can still do better. genColumn is frequently very small, just a few bits of data. Instead of wasting 8 bits to encode the length of the mapping, we can use the low bits of genColumn:

function readMapping() {
  const data = readInt();
  const length = data & 0b11 === 0b11 ? 5 : data & 0b11 === 0b01 ? 4 : 1;
  const genCol = data >>> 2;
  //...
}

We can still do better. genColumn is never negative in practice. Instead of using zigzag encoding, we could just encode it as a positive int.

function readMapping() {
  const data = readPosInt();
  //...
}

Just eliminating separators can save us ~10-15%.

Omitting sourcesIndex and sourceLine

The next thing that I've noticed is that sourcesIndex rarely changes between mappings, and the same with sourceLine. This makes a lot of sense, if we're transpiling we'll be outputting a lot of mappings that are on the same line as the previous one.

If the sourcesIndex delta or the sourceLine delta are 0, we could omit them from the encoding. This just requires 2 more bits, bringing our total to 4 bits of data packing. We can still encode this pretty easily in genColumn:

function readMapping() {
  const data = readPosInt();
  const length = data & 0b0101; // 1, 4, or 5
  const sourcesIndexPresent = data & 0b0010;
  const sourceLinePresent = data & 0b1000;
  
  const genCol = data >>> 4;
  if (length === 1) return [genCol];

  const index = sourcesIndexPresent ? readInt() : lastIdx;
  const line = sourceLinePresent ? readInt() : lastLine;
  const col = readInt();
  if (length === 4) return [genCol, index, line, col];

  return [genCol, index, line, col, readInt()];
}

This saves us ~25-35%


Analysis sheet, code

This is a highlight from Google Search's internal source map:

Bytes Spec 8-bit VLQ No Sep (6-bit VLQ) No Sep (8-bit VLQ) Flags (6-bit VLQ) Flags (8-bit VLQ)
raw 2,790,581 2,556,308 2,376,350 2,122,263 1,815,680 1,447,719
gzip 6 896,546 885,869 883,010 873,562 842,970 822,952
brotli 6 841,365 815,145 826,303 804,323 794,121 774,462
Percent Spec 8-bit VLQ No Sep (6-bit VLQ) No Sep (8-bit VLQ) Flags (6-bit VLQ) Flags (8-bit VLQ)
raw 0.00% -8.40% -14.84% -23.95% -34.94% -48.12%
gzip 6 -62.29% -1.19% -1.51% -2.56% -5.98% -8.21%
brotli 6 -64.53% -3.12% -1.79% -4.40% -5.62% -7.95%

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions