-
|
After recently perusing the AwkwardForth docs and seeing mentions of protobuf support as a possibility, I started messing around with Decoding with Protobuf
edition = "2024";
message Test {
int32 a = 1;
}
# Generate the test_pb2.py file
protoc --python_out=. test.proto
>>> from test_pb2 import Test
>>> message = Test(a=-1)
>>> print(message)
a: -1
>>> encoded_message = message.SerializeToString()
>>> print(encoded_message)
b'\x08\xff\xff\xff\xff\xff\xff\xff\xff\xff\x01'
>>> decoded_message = Test.FromString(encoded_message)
>>> print(decoded_message)
a: -1Nothing unexpected here, just establishing a baseline for myself that I can create a Test message with its field "a" set to -1, serialize that message, and then call a static method with the serialized data to generate a new Test message instance which recovers the value of -1. The encoded message looks as I expect too: a tag followed by the two's complement of -1 encoded as a varint that takes up the last ten bytes. Decoding with AwkwardForthThe first two steps have already been completed, so we can skip straight to the Python console and reproduce my problem. >>> from awkward.forth import ForthMachine32
>>> vm = ForthMachine32("""
... input source
... \\ Read the tag to get to the value
... source varint-> stack
... \\ Decode the value back into two's complement
... source varint-> stack
... \\ I don't get to take the two's complement and recover
... \\ the original value because the read above raises an error
... """)
>>> from test_pb2 import Test
>>> message = Test(a=-1)
>>> print(message)
a: -1
>>> encoded_message = message.SerializeToString()
>>> print(encoded_message)
b'\x08\xff\xff\xff\xff\xff\xff\xff\xff\xff\x01'
>>> vm.run({'source': encoded_message})
Traceback (most recent call last):
File "<python-input-7>", line 1, in <module>
vm.run({'source': encoded_message})
~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: 'varint too big' in AwkwardForth runtime: variable-length integer is too big to represent as a fixed-width integerThat's the ValueError I keep running into. The first part of the exception states that the varint is too big for the AwkwardForth runtime. I don't know enough about its implementation to comment on that. The second part of the exception says that the varint is too big to represent as a fixed-width integer, but I can see the unsigned 64-bit representation of the two's complement of -1 peeking out of the last ten bytes. Points of Discussion
awkward/awkward-cpp/src/libawkward/forth/ForthInputBuffer.cpp Lines 104 to 106 in c4a2e57
I'd love to hear your thoughts on this. I wish you a merry Christmas, and a huge thank you to the awkward development team for making one of my favorite libraries on the planet ;] |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
|
@swang373 — Thank you — that’s a great direction to explore. Firstly, Protobuf encodes negative
For -1, the varint is:
This is exactly what you saw. Protobuf’s decoder is perfectly happy with this because it always reads varints into a Secondly, AwkwardForth’s if (shift == 7 * 9) {
err = util::ForthError::varint_too_big;
return 0;
}This means:
This is not a bug — it’s a design choice based on the fact that Awkward’s own varints never exceed 9 bytes. Why does AwkwardForth stop at 9 Bytes?AwkwardForth’s ForthMachine32 uses 32‑bit stack integers. A 10‑byte varint can represent values up to:
But a 32‑bit stack cannot hold that. So the runtime enforces:
This is why the error message says:
It’s literally true: the Forth stack cannot hold a 64‑bit varint. So your intuition is correct: AwkwardForth cannot decode Protobuf’s negative int32 varints because they exceed the supported varint width. Could zigzag ( Example:
This fits comfortably within AwkwardForth’s limits. But as you noted, that’s not how Protobuf encodes What would need to change for full Protobuf compatibility? To support Protobuf’s full varint range, AwkwardForth would need:
Right now ForthMachine32 uses 32‑bit stack integers.
The
Protobuf’s rules are:
AwkwardForth currently has no instruction that performs this cast.
Because Protobuf’s wire format has additional complexities (tags, wire types, zigzag, length‑delimited fields, etc.) To summarize:
Using the That said, this is exactly the kind of exploration that helps the project grow. Awkward is an open‑source ecosystem, and we absolutely welcome contributors. If you’re interested in experimenting with a 64‑bit variant or sketching out what full Protobuf‑compatible varint decoding would require, we’d be very happy to review and discuss your ideas or patches. And merry Christmas to you too — your message radiates exactly the kind of thoughtful curiosity that makes open‑source communities thrive. |
Beta Was this translation helpful? Give feedback.
@swang373 — Thank you — that’s a great direction to explore.
Firstly, Protobuf encodes negative
int32values as 10‑byte varints. This is a quirk of Protobuf’s varint design:int32fields are not zigzag‑encoded.For -1, the varint is:
ff ff ff ff ff ff ff ff ff 01This is exactly what you saw.
Protobuf’s decoder is perfectly happy with this because it always reads varints into a
uint64and then casts down to the declared type.Secondly, AwkwardForth’s
varint->instruction is intentionally limited. AwkwardForth’s varint reader is designed for Awkwa…