-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Fix decoding of varints with extra padding bytes #1380
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
According to the protobuf conformance tests, it's valid for a varint to have unnecessary trailing zero bytes. This case wasn't being handled properly. Fixes protobufjs#1067.
So we've spent a good 20 minutes here in a review meeting trying to fully understand what's going on here :-D The fact we 4 people got confused reviewing this probably warrants at least a few comments along the lines that a varint can only be 10 bytes long because its maximum storage space is 64 bits, and that if we get more than 5 bytes in the first block, this overflows a 32 bits value and needs to be truncated anyway so we're no longer needing to process these bytes. BUT there's an issue with this new code. The old one simply skips ahead 5 bytes, and THEN checks if there was enough bytes and if not, throws an out of range exception. Here, this code may read out of range bytes THEN check if it was getting out of range. This needs to read so that the range check is done as you try reading each extra 5 bytes. Maybe this whole function needs a more thorough rewrite. |
Yeah, I puzzled over this code a great deal as well. I'll add some comments. That's a good point.
To be honest, I didn't understand why the old code unconditionally skipped ahead 5 bytes? That seemed like a bug to me. Shouldn't it only be skipping the bytes that are >=128 (i.e. the byte indicates that it is not the final byte of the varint)? That's what my new code does. It only increments
Doesn't the old code also exhibit this problem (reading out of range bytes) when reading the first 5 bytes of the varint? There are no range checks for those first 5 bytes. So I assumed that was intentional behavior. If it isn't, I can modify the function to do a range check prior to reading each byte. Or is there a range check for the first 5 bytes being done somewhere higher up the stack prior to this function being called? |
I hit this trying to pass the protobuf conformance tests using protobuf.js. I think Test case: Required.Proto3.ProtobufInput.RepeatedScalarSelectsLast.BOOL.ProtobufOutput The relevant part of the schema is (full schema): message TestAllTypesProto3 {
bool optional_bool = 13;
} The raw decoding is: Field 13: 68 Varint Value = 0, Hex = 00 The conformance tests verifies that an implementation handles any varint for a boolean field. I can't use
I agree, it looks like the old code is buggy. In my example, the bug occurs when protobuf.js reads the 5th varint |
I think uint32 only needs to check the first byte for out-of-bounds. If we check the first byte, then on each successive varint byte, we'll advance
So, after uint32 runs we can guarantee that I think the following function is safe: const uint32 = (() => {
let value = 4294967295; // optimizer type-hint, tends to deopt otherwise (?!)
return (r: pbjs.Reader): number => {
if (r.pos >= r.buf.length) {
throw RangeError("index out of range: " + r.pos + " >= " + r.len);
}
value = ( r.buf[r.pos] & 127 ) >>> 0; if (r.buf[r.pos++] < 128) return value;
value = (value | (r.buf[r.pos] & 127) << 7) >>> 0; if (r.buf[r.pos++] < 128) return value;
value = (value | (r.buf[r.pos] & 127) << 14) >>> 0; if (r.buf[r.pos++] < 128) return value;
value = (value | (r.buf[r.pos] & 127) << 21) >>> 0; if (r.buf[r.pos++] < 128) return value;
value = (value | (r.buf[r.pos] & 15) << 28) >>> 0; if (r.buf[r.pos++] < 128) return value;
if (r.buf[r.pos++] >= 128
&& r.buf[r.pos++] >= 128
&& r.buf[r.pos++] >= 128
&& r.buf[r.pos++] >= 128
&& r.buf[r.pos++] >= 128) {
throw Error("varint too long");
}
return value;
};
})() |
I've been thinking about this more. There's several errors uint32 should throw on:
Here's another (tested) attempt cribbing from Go's implementation: const readUint32Discard = (r: pbjs.Reader) => {
// Fast path, only initial bounds checks. Varint has max size of 10 bytes.
if (r.pos + 10 <= r.len) {
let value = (r.buf[r.pos] & 127) >>> 0; if (r.buf[r.pos++] < 128) return value;
value = (value | (r.buf[r.pos] & 127) << 7) >>> 0; if (r.buf[r.pos++] < 128) return value;
value = (value | (r.buf[r.pos] & 127) << 14) >>> 0; if (r.buf[r.pos++] < 128) return value;
value = (value | (r.buf[r.pos] & 127) << 21) >>> 0; if (r.buf[r.pos++] < 128) return value;
value = (value | (r.buf[r.pos] & 15) << 28) >>> 0; if (r.buf[r.pos++] < 128) return value;
// Process extra varint bytes but discard results.
if (r.buf[r.pos++] >= 128
&& r.buf[r.pos++] >= 128
&& r.buf[r.pos++] >= 128
&& r.buf[r.pos++] >= 128
&& r.buf[r.pos++] > 1) {
throw Error("varint too long");
}
return value;
}
// Slow path.
let value = 0;
const len = Math.min(10, r.len - r.pos);
for (let i = 0; i < len; i++) {
const b = r.buf[r.pos++];
if (i < 4) value = (value | (b & 127) << (7 * i)) >>> 0;
else if (i === 4) value = (value | (b & 15) << 28 ) >>> 0;
else if (i == 9 && b > 1) {
// Varint has a max length of 10 bytes. The 10th byte (i == 9) of a
// varint only uses the first bit; (9 * 7 bits == 63 bits).
throw Error("varint too long");
}
if (b < 128) {
return value;
}
}
// We stopped iteration because we ran out of bytes before getting a byte
// without a continuation bit set.
throw RangeError("index out of range: " + r.pos + " >= " + r.len);
}; Testing codeconst makeVarint = (n: number): Uint8Array => {
const buf = Array(10).fill(0);
let i = 0;
while (n >= 128) {
buf[i] = (n & 127) | 128;
n = n >>> 7;
i++;
}
buf[i++] = n & 127;
return Uint8Array.from(buf.slice(0, i));
};
describe('readUint32Discard', () => {
test('less than 10_000', () => {
for (let n = 0; n < 10_000; n++) {
const varint = makeVarint(n);
const r = pbjs.Reader.create(varint);
const got = readUint32Discard(r);
expect(got).toEqual(n);
const varintPad = new Uint8Array([...varint, ...Array(20).fill(0)]);
const r2 = pbjs.Reader.create(varintPad);
const got2 = readUint32Discard(r2);
expect(got2).toEqual(n);
}
});
const cases: [number, number][] = [
[2 ** 32 - 1, 2 ** 32 - 1],
[2 ** 32, 0],
[2 ** 32 + 1, 1],
];
test.each(cases)('varint %s', (n, want) => {
const varint = makeVarint(n);
const r = pbjs.Reader.create(varint);
const got = readUint32Discard(r);
expect(got).toEqual(want);
const rPad = pbjs.Reader.create(new Uint8Array([...varint, ...Array(20).fill(0)]));
const gotPad = readUint32Discard(rPad);
expect(gotPad).toEqual(want);
});
const errCases: [string, number[], string][] = [
['not enough bytes', Array(2).fill(128), 'index out of range'],
['last byte too big', [128, 128, 128, 128, 128, 128, 128, 128, 128, 2], 'varint too long'],
['premature eof 10', Array(10).fill(128), 'varint too long'],
['premature eof 30', Array(30).fill(128), 'varint too long'],
];
test.each(errCases)('varint error %s', (desc, buf, want) => {
const r = pbjs.Reader.create(Uint8Array.from(buf));
expect(() => readUint32Discard(r)).toThrow(want);
});
test('stops after n bytes', () => {
for (let i = 1; i <= 10; i++) {
const arr = [...Array(i - 1).fill(128), 1];
const r = pbjs.Reader.create(Uint8Array.from(arr));
readUint32Discard(r);
expect(r.pos).toEqual(i);
}
});
test('r.pos >= len', () => {
const r = pbjs.Reader.create(Uint8Array.from([]));
expect(() => readUint32Discard(r)).toThrow('index out of range');
r.pos = 100;
expect(() => readUint32Discard(r)).toThrow('index out of range');
});
}) |
According to the protobuf conformance tests, it's valid for a varint to
have unnecessary trailing zero bytes. This case wasn't being handled
properly. Fixes #1067.