Skip to content

Cannot decode MSGPACK into Struct with bytes | str #973

@ypnos

Description

@ypnos

Description

tl;dr: msgspec is currently unable to decode+validate incoming MSGPACK messages where a field can contain both binary or string data.

In JSON, there is only string type. Msgspec implements a great solution to serializing binary data: When you add a bytes field to a Struct, it will automatically be encoded and decoded in base64 format. Probably this is the reason why msgspec does not allow str | bytes annotation, which would make the data ambigous in regards to being base64 or plain.

In MSGPACK however, there is a string type and binary type. The binary type will decode to bytes, while the string type decodes to str. No mangling is needed for binary data. If a Struct is defined with a bytes field, but a a string is found at its place in a MSGPACK message, we get a validation error. Likewise, if a str field is defined, but binary data is found, we get a validation error.

Now comes the culprit: There are communication protocols which make use of this property of MSGPACK. In some messages, string data is sent, in others, binary data, but without any further means to tell these messages apart.

  1. It would be perfectly legal to define a field of type str | bytes for MSGPACK, as these types are unambiguous in MSGPACK.
  2. It is actually necessary to allow str | bytes to be able to decode some, perfectly legal, message protocols based on MSGPACK.

Expected behavior

This code:

from msgspec import Struct
from msgspec.msgpack import Decoder, Encoder


class TestData(Struct):
    content: str | bytes


decoder = Decoder(TestData)
print(decoder.decode(b"\x81\xa7content\xc4\x03foo"))

produces output:

TestData(content=b'foo')

Current behavior

Exception is thrown:

TypeError: Type unions may not contain more than one str-like type (`str`, `Enum`, `Literal[str values]`, `datetime`, `date`, `time`, `timedelta`, `uuid`, `decimal`, `bytes`, `bytearray`) - type `str | bytes` is not supported

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions