Skip to content

Conversation

@allen-munsch
Copy link

The following is a draft PR to discuss adding a CNCF CloudEvents specification for flatbuffers.

References:

My original use case was to use flatbuffers in a zero copy pass through proxy in a low latency environment.

datacontenttype: string;
dataschema: string;
subject: string;
time: string;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are these optional field serialized? For example, if "subject" is missing, is there still some "subject" type of entry there? Does the word "subject" actually appear in the serialization?

Copy link
Author

@allen-munsch allen-munsch Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

======================================================================
FLATBUFFERS OPTIONAL FIELD SERIALIZATION TEST
======================================================================

Question: How are optional fields serialized?
Specifically: Does 'subject' appear in the binary when omitted?

======================================================================
CloudEvent WITH 'subject' field
======================================================================
Total size: 144 bytes

Hex dump:
0000:  18 00 00 00 00 00 12 00 18 00 14 00 10 00 0c 00   ................
0010:  08 00 00 00 00 00 04 00 12 00 00 00 14 00 00 00   ................
0020:  20 00 00 00 34 00 00 00 38 00 00 00 54 00 00 00    ...4...8...T...
0030:  0a 00 00 00 6d 79 2d 73 75 62 6a 65 63 74 00 00   ....my-subject..
0040:  11 00 00 00 63 6f 6d 2e 65 78 61 6d 70 6c 65 2e   ....com.example.
0050:  65 76 65 6e 74 00 00 00 03 00 00 00 31 2e 30 00   event.......1.0.
0060:  1a 00 00 00 68 74 74 70 73 3a 2f 2f 65 78 61 6d   ....https://exam
0070:  70 6c 65 2e 63 6f 6d 2f 73 6f 75 72 63 65 00 00   ple.com/source..
0080:  09 00 00 00 65 76 65 6e 74 2d 31 32 33 00 00 00   ....event-123...

   The word 'subject' APPEARS in the binary data
   Position: 55

----------------------------------------------------------------------
Structure Analysis: WITH subject
----------------------------------------------------------------------
Root table offset (absolute from start): 24
Table starts at: 24
VTable relative offset: 18
VTable starts at: 6
VTable size: 18 bytes
Object inline size: 24 bytes
Number of field entries: 7

Field                 Required?    Offset (rel)    Absolute Offset   Present?
--------------------------------------------------------------------------------
id                         True              20                 44       True
source                     True              16                 40       True
specversion                True              12                 36       True
type                       True               8                 32       True
datacontenttype           False               0                  -      False
dataschema                False               0                  -      False
subject                   False               4                 28       True
time                      False               -                  -         NO
extensions                False               -                  -         NO
data                      False               -                  -         NO

======================================================================
CloudEvent WITHOUT 'subject' field
======================================================================
Total size: 116 bytes

Hex dump:
0000:  10 00 00 00 0c 00 14 00 10 00 0c 00 08 00 04 00   ................
0010:  0c 00 00 00 10 00 00 00 24 00 00 00 28 00 00 00   ........$...(...
0020:  44 00 00 00 11 00 00 00 63 6f 6d 2e 65 78 61 6d   D.......com.exam
0030:  70 6c 65 2e 65 76 65 6e 74 00 00 00 03 00 00 00   ple.event.......
0040:  31 2e 30 00 1a 00 00 00 68 74 74 70 73 3a 2f 2f   1.0.....https://
0050:  65 78 61 6d 70 6c 65 2e 63 6f 6d 2f 73 6f 75 72   example.com/sour
0060:  63 65 00 00 09 00 00 00 65 76 65 6e 74 2d 34 35   ce......event-45
0070:  36 00 00 00                                       6...

✓ The word 'subject' does NOT appear in the binary data, 
HOWEVER because of the structure of the vtable
readers are able to determine that is the case.

----------------------------------------------------------------------
Structure Analysis: WITHOUT subject
----------------------------------------------------------------------
Root table offset (absolute from start): 16
Table starts at: 16
VTable relative offset: 12
VTable starts at: 4
VTable size: 12 bytes
Object inline size: 20 bytes
Number of field entries: 4

Field                 Required?    Offset (rel)    Absolute Offset   Present?
--------------------------------------------------------------------------------
id                         True              16                 32       True
source                     True              12                 28       True
specversion                True               8                 24       True
type                       True               4                 20       True
datacontenttype           False               -                  -         NO
dataschema                False               -                  -         NO
subject                   False               -                  -         NO
time                      False               -                  -         NO
extensions                False               -                  -         NO
data                      False               -                  -         NO

======================================================================
COMPARISON & CONCLUSIONS
======================================================================
Size with subject:    144 bytes
Size without subject: 116 bytes
Difference:           28 bytes

======================================================================
VERIFICATION: Reading back the data
======================================================================

Event WITH subject:
  ID: event-123
  Subject: my-subject

Event WITHOUT subject:
  ID: event-456
  Subject: None

Copy link
Author

@allen-munsch allen-munsch Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To reproduce the test can be found in this gist:

https://gist.github.com/allen-munsch/295ab0b944ae0c2816896945e3f168a2

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. You may have answered this during the previous call, but how do receivers get the fbs/schema files? Are they just known in advance or are they shared via some other mechanism?

Copy link
Author

@allen-munsch allen-munsch Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think cloudevents_spec.fbs would need to be known in advance

dataschema is kinda confusing, cause its actually a URI, not just a string

I see 2 approaches for datacontenttype of application/cloudevents+flatbuffers,

  1. pre sharing the dataschema and verifying via the dataschema URI
  2. dynamically compiling the dataschema URI so that the data field can be read

FlatBuffers achieves forward and backward compatibility through an offset-based vtable (virtual table) system:

- **Forward compatibility**: Readers using older schemas can read data written with newer schemas by ignoring unknown fields
- **Backward compatibility**: Readers using newer schemas can read data written with older schemas by treating missing fields as unset
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't help but wonder if extensions could be handled in a similar way to how these compatibility rules work. Meaning, what if we treat the presence (and serialization) of an extension as if it were a "newer schema" and then Readers can choose to process the extension if they know about it, or ignore it if they don't. Then we don't need a special "extensions" table.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this mean the cloud event envelope would be tied to each extension type, it might require re-compiling the envelope?

Is extension meant as arbitrary bucket?

I'm confused a bit, here's how I interpreted what you wrote.

cloudevents_spec.fbs envelope would stay the same across events:

{
extensions: {blah: asdf}
}

Where cloudevents.asdf.fbs has data in the envelope, like this?

{
blah: asdf
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the example here: https://github.com/cloudevents/spec/blob/main/cloudevents/spec.md#example
notice that the extension attributes appear just like all other attributes. That's key design point of CE.
While not all protocols support that (e.g. proto and apparently fb), we try when we can.

In this case, a "spec defined" fbs file could do one of two things:
1 - define just the schema for the spec defined attributes - so no extensions allowed
2 - define a bucket for extensions, like you've done - but then violate our design pattern.

In a previous comment I asked about how receivers get the fbs files. If they are expected to "just have it", then (if I'm understanding things correctly), they'll skip over any unknown fields (extensions) automatically. Which may actually be ok if they don't care about them. However, if they know about certain extensions, then they could use a newer fbs file that defines those - assuming there's only one set of extensions they might receive. Right?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In a strongly typed/defined environment where both sender and receivers MUST have a prior knowledge of each other's schema, then I think it would be better to just create the fbs file with the extensions as top-level fields. But if you're in a more dynamic world then I think the "bucket" you've defined might be the only choice.

If I'm correct - the question would then be... is that too complicated to mention this choice? or is it better to just have one proposed solution?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we talked a bit about this on today's call and we're leaning towards what are have here... define a bucket for extensions. While the other approach is possible, it'll probably lead to more confusion.

If you get a chance, can you fix the PR in our repo? The text still isn't quite right

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will take a look.

I should have gotten this to you sooner, but I think the flat on root approach would be possible like this:

// cloudevents.fbs
namespace io.cloudevents;

table CloudEvent {
  // ---- REQUIRED ATTRIBUTES ----
  id: string (id: 0, required);
  source: string (id: 1, required);
  specversion: string (id: 2, required);
  type: string (id: 3, required);

  // ---- OPTIONAL ATTRIBUTES ----
  datacontenttype: string (id: 4);
  dataschema: string (id: 5);
  subject: string (id: 6);
  time: string (id: 7);
  data: [ubyte] (id: 8);

  // ---- Example APPEND-ONLY FIELDS ----
  // ---- MUST be appended, field order matters here
  // acme: string (id: 9);
  // foo: string (id: 10);
  // bar: ulong (id: 11 deprecated);
  // baz: string (id: 12);
  // foobar: string (id: 13);
  // foobaz: string (id: 14);
}

root_type CloudEvent;

However, if the consensus is extensions, then that is okay too, personally I liked your suggestion to align on CE spec, cause flatbuffer certainly would support it, I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants