Mappings v2: More Efficient Encoding

In the last Scopes meeting, I mentioned that I had been working on a project that reduced reduced Google's module graph encoding by ~30% by using a packed VLQ encoding, essentially removing any separators like `,` or `;`. Applying this to our mappings encoding, I think we can remove ~30% (or ~50% if we switch to an 8-bit VLQ and [binary encoding](https://github.com/tc39/source-map/issues/18)).

# Removing Separators

In order to remove separators, we first need to know exactly how many lines are present in the map, and how many mappings are present on each line. That should allow us to do a simple loop (ignore the relative deltas, this is just psuedocode):

```js
const lines = readInt();
for (let i = 0; i < lines; i++) {
  const mappings = readInt();
  for (let j = 0; j < mappings; j++) {
    readMapping()
  }
}
```

The problem is that each mapping has a variable number of fields (either 1, 4, or 5). Without a `,` separator, we don't know when to stop reading the fields for the current mapping. So we also need to encode the length of each mapping. It's easy to do this with a field before each mapping:

```js
function readMapping() {
  const length = readInt();
  const genCol = readInt();
  if (length === 1) return [genCol];

  const index = readInt();
  const line = readInt();
  const col = readInt();
  if (length === 4) return [genCol, index, line, col];

  return [genCol, index, line, col, readInt()];
}
```

This alone is pretty good, but we can still do better. `genColumn` is frequently very small, just a few bits of data. Instead of wasting 8 bits to encode the length of the mapping, we can use the low bits of `genColumn`:

```js
function readMapping() {
  const data = readInt();
  const length = data & 0b11 === 0b11 ? 5 : data & 0b11 === 0b01 ? 4 : 1;
  const genCol = data >>> 2;
  //...
}
```

We can still do better. `genColumn` is never negative in practice. Instead of using zigzag encoding, we could just encode it as a positive int.
```js
function readMapping() {
  const data = readPosInt();
  //...
}
```

**Just eliminating separators can save us ~10-15%.**

# Omitting `sourcesIndex` and `sourceLine`

The next thing that I've noticed is that `sourcesIndex` rarely changes between mappings, and the same with `sourceLine`. This makes a lot of sense, if we're transpiling we'll be outputting a lot of mappings that are on the same line as the previous one.

If the `sourcesIndex` delta or the `sourceLine` delta are 0, we could omit them from the encoding. This just requires 2 more bits, bringing our total to 4 bits of data packing. We can still encode this pretty easily in `genColumn`:

```js
function readMapping() {
  const data = readPosInt();
  const length = data & 0b0101; // 1, 4, or 5
  const sourcesIndexPresent = data & 0b0010;
  const sourceLinePresent = data & 0b1000;
  
  const genCol = data >>> 4;
  if (length === 1) return [genCol];

  const index = sourcesIndexPresent ? readInt() : lastIdx;
  const line = sourceLinePresent ? readInt() : lastLine;
  const col = readInt();
  if (length === 4) return [genCol, index, line, col];

  return [genCol, index, line, col, readInt()];
}
```

**This saves us ~25-35%**

- - -

[Analysis sheet](https://docs.google.com/spreadsheets/d/1lAPxQkIk1Kmm9E7NSY4Vfu_aAZl72JJgCCEyVWygpYA/edit?usp=sharing), [code](https://gist.github.com/jridgewell/f083199c289acf74f4a5ddc4cbecf7a4)

This is a highlight from Google Search's internal source map:

| Bytes    | Spec      | 8-bit VLQ | No Sep (6-bit VLQ) | No Sep (8-bit VLQ) | Flags (6-bit VLQ) | Flags (8-bit VLQ) |
| -------- | ---------:| ---------:| ------------------:| ------------------:| -----------------:| -----------------:|
| raw      | 2,790,581 | 2,556,308 |          2,376,350 |          2,122,263 |         1,815,680 |         1,447,719 |
| gzip 6   |   896,546 |   885,869 |            883,010 |            873,562 |           842,970 |           822,952 |
| brotli 6 |   841,365 |   815,145 |            826,303 |            804,323 |           794,121 |           774,462 |

| Percent  | Spec      | 8-bit VLQ | No Sep (6-bit VLQ) | No Sep (8-bit VLQ) | Flags (6-bit VLQ) | Flags (8-bit VLQ) |
| -------- | ---------:| ---------:| ------------------:| ------------------:| -----------------:| -----------------:|
| raw      |     0.00% |   \-8.40% |           \-14.84% |           \-23.95% |          \-34.94% |          \-48.12% |
| gzip 6   | <sub>\-62.29%</sub> |   \-1.19% |            \-1.51% |            \-2.56% |           \-5.98% |           \-8.21% |
| brotli 6 | <sub>\-64.53%</sub> |   \-3.12% |            \-1.79% |            \-4.40% |           \-5.62% |           \-7.95% |



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Mappings v2: More Efficient Encoding #155

Removing Separators

Omitting `sourcesIndex` and `sourceLine`

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bytes	Spec	8-bit VLQ	No Sep (6-bit VLQ)	No Sep (8-bit VLQ)	Flags (6-bit VLQ)	Flags (8-bit VLQ)
raw	2,790,581	2,556,308	2,376,350	2,122,263	1,815,680	1,447,719
gzip 6	896,546	885,869	883,010	873,562	842,970	822,952
brotli 6	841,365	815,145	826,303	804,323	794,121	774,462

Percent	Spec	8-bit VLQ	No Sep (6-bit VLQ)	No Sep (8-bit VLQ)	Flags (6-bit VLQ)	Flags (8-bit VLQ)
raw	0.00%	-8.40%	-14.84%	-23.95%	-34.94%	-48.12%
gzip 6	_-62.29%	-1.19%	-1.51%	-2.56%	-5.98%	-8.21%
brotli 6	_-64.53%	-3.12%	-1.79%	-4.40%	-5.62%	-7.95%

Mappings v2: More Efficient Encoding #155

Description

Removing Separators

Omitting sourcesIndex and sourceLine

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Omitting `sourcesIndex` and `sourceLine`