Description
In the last Scopes meeting, I mentioned that I had been working on a project that reduced reduced Google's module graph encoding by ~30% by using a packed VLQ encoding, essentially removing any separators like ,
or ;
. Applying this to our mappings encoding, I think we can remove ~30% (or ~50% if we switch to an 8-bit VLQ and binary encoding).
Removing Separators
In order to remove separators, we first need to know exactly how many lines are present in the map, and how many mappings are present on each line. That should allow us to do a simple loop (ignore the relative deltas, this is just psuedocode):
const lines = readInt();
for (let i = 0; i < lines; i++) {
const mappings = readInt();
for (let j = 0; j < mappings; j++) {
readMapping()
}
}
The problem is that each mapping has a variable number of fields (either 1, 4, or 5). Without a ,
separator, we don't know when to stop reading the fields for the current mapping. So we also need to encode the length of each mapping. It's easy to do this with a field before each mapping:
function readMapping() {
const length = readInt();
const genCol = readInt();
if (length === 1) return [genCol];
const index = readInt();
const line = readInt();
const col = readInt();
if (length === 4) return [genCol, index, line, col];
return [genCol, index, line, col, readInt()];
}
This alone is pretty good, but we can still do better. genColumn
is frequently very small, just a few bits of data. Instead of wasting 8 bits to encode the length of the mapping, we can use the low bits of genColumn
:
function readMapping() {
const data = readInt();
const length = data & 0b11 === 0b11 ? 5 : data & 0b11 === 0b01 ? 4 : 1;
const genCol = data >>> 2;
//...
}
We can still do better. genColumn
is never negative in practice. Instead of using zigzag encoding, we could just encode it as a positive int.
function readMapping() {
const data = readPosInt();
//...
}
Just eliminating separators can save us ~10-15%.
Omitting sourcesIndex
and sourceLine
The next thing that I've noticed is that sourcesIndex
rarely changes between mappings, and the same with sourceLine
. This makes a lot of sense, if we're transpiling we'll be outputting a lot of mappings that are on the same line as the previous one.
If the sourcesIndex
delta or the sourceLine
delta are 0, we could omit them from the encoding. This just requires 2 more bits, bringing our total to 4 bits of data packing. We can still encode this pretty easily in genColumn
:
function readMapping() {
const data = readPosInt();
const length = data & 0b0101; // 1, 4, or 5
const sourcesIndexPresent = data & 0b0010;
const sourceLinePresent = data & 0b1000;
const genCol = data >>> 4;
if (length === 1) return [genCol];
const index = sourcesIndexPresent ? readInt() : lastIdx;
const line = sourceLinePresent ? readInt() : lastLine;
const col = readInt();
if (length === 4) return [genCol, index, line, col];
return [genCol, index, line, col, readInt()];
}
This saves us ~25-35%
This is a highlight from Google Search's internal source map:
Bytes | Spec | 8-bit VLQ | No Sep (6-bit VLQ) | No Sep (8-bit VLQ) | Flags (6-bit VLQ) | Flags (8-bit VLQ) |
---|---|---|---|---|---|---|
raw | 2,790,581 | 2,556,308 | 2,376,350 | 2,122,263 | 1,815,680 | 1,447,719 |
gzip 6 | 896,546 | 885,869 | 883,010 | 873,562 | 842,970 | 822,952 |
brotli 6 | 841,365 | 815,145 | 826,303 | 804,323 | 794,121 | 774,462 |
Percent | Spec | 8-bit VLQ | No Sep (6-bit VLQ) | No Sep (8-bit VLQ) | Flags (6-bit VLQ) | Flags (8-bit VLQ) |
---|---|---|---|---|---|---|
raw | 0.00% | -8.40% | -14.84% | -23.95% | -34.94% | -48.12% |
gzip 6 | -62.29% | -1.19% | -1.51% | -2.56% | -5.98% | -8.21% |
brotli 6 | -64.53% | -3.12% | -1.79% | -4.40% | -5.62% | -7.95% |
Activity