feature request: working draft for cloudevents spec #8768

allen-munsch · 2025-11-10T15:29:49Z

The following is a draft PR to discuss adding a CNCF CloudEvents specification for flatbuffers.

References:

feature request: support for flatbuffers cloudevents/spec#1372

My original use case was to use flatbuffers in a zero copy pass through proxy in a low latency environment.

docs/source/cloudevents_spec.md

Co-authored-by: Doug Davis <[email protected]>

docs/source/cloudevents_spec.md

duglin · 2025-11-12T20:49:54Z

docs/source/cloudevents_spec.md

+  datacontenttype: string;
+  dataschema: string;
+  subject: string;
+  time: string;


How are these optional field serialized? For example, if "subject" is missing, is there still some "subject" type of entry there? Does the word "subject" actually appear in the serialization?

====================================================================== FLATBUFFERS OPTIONAL FIELD SERIALIZATION TEST ====================================================================== Question: How are optional fields serialized? Specifically: Does 'subject' appear in the binary when omitted? ====================================================================== CloudEvent WITH 'subject' field ====================================================================== Total size: 144 bytes Hex dump: 0000: 18 00 00 00 00 00 12 00 18 00 14 00 10 00 0c 00 ................ 0010: 08 00 00 00 00 00 04 00 12 00 00 00 14 00 00 00 ................ 0020: 20 00 00 00 34 00 00 00 38 00 00 00 54 00 00 00 ...4...8...T... 0030: 0a 00 00 00 6d 79 2d 73 75 62 6a 65 63 74 00 00 ....my-subject.. 0040: 11 00 00 00 63 6f 6d 2e 65 78 61 6d 70 6c 65 2e ....com.example. 0050: 65 76 65 6e 74 00 00 00 03 00 00 00 31 2e 30 00 event.......1.0. 0060: 1a 00 00 00 68 74 74 70 73 3a 2f 2f 65 78 61 6d ....https://exam 0070: 70 6c 65 2e 63 6f 6d 2f 73 6f 75 72 63 65 00 00 ple.com/source.. 0080: 09 00 00 00 65 76 65 6e 74 2d 31 32 33 00 00 00 ....event-123... The word 'subject' APPEARS in the binary data Position: 55 ---------------------------------------------------------------------- Structure Analysis: WITH subject ---------------------------------------------------------------------- Root table offset (absolute from start): 24 Table starts at: 24 VTable relative offset: 18 VTable starts at: 6 VTable size: 18 bytes Object inline size: 24 bytes Number of field entries: 7 Field Required? Offset (rel) Absolute Offset Present? -------------------------------------------------------------------------------- id True 20 44 True source True 16 40 True specversion True 12 36 True type True 8 32 True datacontenttype False 0 - False dataschema False 0 - False subject False 4 28 True time False - - NO extensions False - - NO data False - - NO ====================================================================== CloudEvent WITHOUT 'subject' field ====================================================================== Total size: 116 bytes Hex dump: 0000: 10 00 00 00 0c 00 14 00 10 00 0c 00 08 00 04 00 ................ 0010: 0c 00 00 00 10 00 00 00 24 00 00 00 28 00 00 00 ........$...(... 0020: 44 00 00 00 11 00 00 00 63 6f 6d 2e 65 78 61 6d D.......com.exam 0030: 70 6c 65 2e 65 76 65 6e 74 00 00 00 03 00 00 00 ple.event....... 0040: 31 2e 30 00 1a 00 00 00 68 74 74 70 73 3a 2f 2f 1.0.....https:// 0050: 65 78 61 6d 70 6c 65 2e 63 6f 6d 2f 73 6f 75 72 example.com/sour 0060: 63 65 00 00 09 00 00 00 65 76 65 6e 74 2d 34 35 ce......event-45 0070: 36 00 00 00 6... ✓ The word 'subject' does NOT appear in the binary data, HOWEVER because of the structure of the vtable readers are able to determine that is the case. ---------------------------------------------------------------------- Structure Analysis: WITHOUT subject ---------------------------------------------------------------------- Root table offset (absolute from start): 16 Table starts at: 16 VTable relative offset: 12 VTable starts at: 4 VTable size: 12 bytes Object inline size: 20 bytes Number of field entries: 4 Field Required? Offset (rel) Absolute Offset Present? -------------------------------------------------------------------------------- id True 16 32 True source True 12 28 True specversion True 8 24 True type True 4 20 True datacontenttype False - - NO dataschema False - - NO subject False - - NO time False - - NO extensions False - - NO data False - - NO ====================================================================== COMPARISON & CONCLUSIONS ====================================================================== Size with subject: 144 bytes Size without subject: 116 bytes Difference: 28 bytes ====================================================================== VERIFICATION: Reading back the data ====================================================================== Event WITH subject: ID: event-123 Subject: my-subject Event WITHOUT subject: ID: event-456 Subject: None

To reproduce the test can be found in this gist:

https://gist.github.com/allen-munsch/295ab0b944ae0c2816896945e3f168a2

Thanks. You may have answered this during the previous call, but how do receivers get the fbs/schema files? Are they just known in advance or are they shared via some other mechanism?

I think cloudevents_spec.fbs would need to be known in advance

dataschema is kinda confusing, cause its actually a URI, not just a string

I see 2 approaches for datacontenttype of application/cloudevents+flatbuffers,

pre sharing the dataschema and verifying via the dataschema URI

dynamically compiling the dataschema URI so that the data field can be read

duglin · 2025-11-12T20:54:39Z

docs/source/cloudevents_spec.md

+FlatBuffers achieves forward and backward compatibility through an offset-based vtable (virtual table) system:
+
+- **Forward compatibility**: Readers using older schemas can read data written with newer schemas by ignoring unknown fields
+- **Backward compatibility**: Readers using newer schemas can read data written with older schemas by treating missing fields as unset


I can't help but wonder if extensions could be handled in a similar way to how these compatibility rules work. Meaning, what if we treat the presence (and serialization) of an extension as if it were a "newer schema" and then Readers can choose to process the extension if they know about it, or ignore it if they don't. Then we don't need a special "extensions" table.

Would this mean the cloud event envelope would be tied to each extension type, it might require re-compiling the envelope?

Is extension meant as arbitrary bucket?

I'm confused a bit, here's how I interpreted what you wrote.

cloudevents_spec.fbs envelope would stay the same across events:

{ extensions: {blah: asdf} }

Where cloudevents.asdf.fbs has data in the envelope, like this?

{ blah: asdf }

See the example here: https://github.com/cloudevents/spec/blob/main/cloudevents/spec.md#example
notice that the extension attributes appear just like all other attributes. That's key design point of CE.
While not all protocols support that (e.g. proto and apparently fb), we try when we can.

In this case, a "spec defined" fbs file could do one of two things:
1 - define just the schema for the spec defined attributes - so no extensions allowed
2 - define a bucket for extensions, like you've done - but then violate our design pattern.

In a previous comment I asked about how receivers get the fbs files. If they are expected to "just have it", then (if I'm understanding things correctly), they'll skip over any unknown fields (extensions) automatically. Which may actually be ok if they don't care about them. However, if they know about certain extensions, then they could use a newer fbs file that defines those - assuming there's only one set of extensions they might receive. Right?

In a strongly typed/defined environment where both sender and receivers MUST have a prior knowledge of each other's schema, then I think it would be better to just create the fbs file with the extensions as top-level fields. But if you're in a more dynamic world then I think the "bucket" you've defined might be the only choice.

If I'm correct - the question would then be... is that too complicated to mention this choice? or is it better to just have one proposed solution?

we talked a bit about this on today's call and we're leaning towards what are have here... define a bucket for extensions. While the other approach is possible, it'll probably lead to more confusion.

If you get a chance, can you fix the PR in our repo? The text still isn't quite right

Yes, I will take a look.

I should have gotten this to you sooner, but I think the flat on root approach would be possible like this:

// cloudevents.fbs namespace io.cloudevents; table CloudEvent { // ---- REQUIRED ATTRIBUTES ---- id: string (id: 0, required); source: string (id: 1, required); specversion: string (id: 2, required); type: string (id: 3, required); // ---- OPTIONAL ATTRIBUTES ---- datacontenttype: string (id: 4); dataschema: string (id: 5); subject: string (id: 6); time: string (id: 7); data: [ubyte] (id: 8); // ---- Example APPEND-ONLY FIELDS ---- // ---- MUST be appended, field order matters here // acme: string (id: 9); // foo: string (id: 10); // bar: ulong (id: 11 deprecated); // baz: string (id: 12); // foobar: string (id: 13); // foobaz: string (id: 14); } root_type CloudEvent;

However, if the consensus is extensions, then that is okay too, personally I liked your suggestion to align on CE spec, cause flatbuffer certainly would support it, I think.

Co-authored-by: Doug Davis <[email protected]>

Signed-off-by: allen-munsch <[email protected]>

add working draft for cloudevents spec

9e68931

github-actions bot added the documentation Documentation label Nov 10, 2025

duglin mentioned this pull request Nov 12, 2025

add proprietary spec for flatbuffers cloudevents/spec#1373

Draft