-
Notifications
You must be signed in to change notification settings - Fork 134
Description
Description
tl;dr: msgspec is currently unable to decode+validate incoming MSGPACK messages where a field can contain both binary or string data.
In JSON, there is only string type. Msgspec implements a great solution to serializing binary data: When you add a bytes field to a Struct, it will automatically be encoded and decoded in base64 format. Probably this is the reason why msgspec does not allow str | bytes annotation, which would make the data ambigous in regards to being base64 or plain.
In MSGPACK however, there is a string type and binary type. The binary type will decode to bytes, while the string type decodes to str. No mangling is needed for binary data. If a Struct is defined with a bytes field, but a a string is found at its place in a MSGPACK message, we get a validation error. Likewise, if a str field is defined, but binary data is found, we get a validation error.
Now comes the culprit: There are communication protocols which make use of this property of MSGPACK. In some messages, string data is sent, in others, binary data, but without any further means to tell these messages apart.
- It would be perfectly legal to define a field of type
str | bytesfor MSGPACK, as these types are unambiguous in MSGPACK. - It is actually necessary to allow
str | bytesto be able to decode some, perfectly legal, message protocols based on MSGPACK.
Expected behavior
This code:
from msgspec import Struct
from msgspec.msgpack import Decoder, Encoder
class TestData(Struct):
content: str | bytes
decoder = Decoder(TestData)
print(decoder.decode(b"\x81\xa7content\xc4\x03foo"))produces output:
TestData(content=b'foo')
Current behavior
Exception is thrown:
TypeError: Type unions may not contain more than one str-like type (`str`, `Enum`, `Literal[str values]`, `datetime`, `date`, `time`, `timedelta`, `uuid`, `decimal`, `bytes`, `bytearray`) - type `str | bytes` is not supported