Skip to content

Local json compression + optimization for empty encoders#1021

Open
QnJ1c2kNCg wants to merge 2 commits intoArroyoSystems:masterfrom
QnJ1c2kNCg:local-json-compression
Open

Local json compression + optimization for empty encoders#1021
QnJ1c2kNCg wants to merge 2 commits intoArroyoSystems:masterfrom
QnJ1c2kNCg:local-json-compression

Conversation

@QnJ1c2kNCg
Copy link
Collaborator

This PR does two things:

  1. Add support for Gzip compression for the JsonLocalWriter

Very similar to #1007, but for the Local file backend.

  1. Avoid finishing the encoder if no data was written

I noticed that the Gzip encoder would produce bytes (header + footer) on every checkpoint even when no user traffic. This patch tracks if we wrote user data to the underlying buffer, marking the buffer as dirty, and only finishing the encoder if this flag is set.

The dirty flag starts as false and is set to true when write_all() is called. The flag resets to false when a new encoder is created.


Tested locally a variety of scenarios:

  • sink v1, v2 with minio
  • sink v2 with local FS
  • with and without compression
  • with and without data (to test that we don't finish the encoder)
  • crashes and recovery

Very similar to ArroyoSystems#1007, but for the Local file backend.
We want to only finish/re-create the Gzip encoder if data was written.

The previous behavior was that for every checkpoint we would finish/create
the encoder regardless of if any user data was written. This effectively
added ~20 bytes of nothing at every checkpoint.

This patch tracks if we wrote user data to the underlying buffer, marking
the buffer as `dirty`, and only finishing the encoder if this flag is set.
@QnJ1c2kNCg QnJ1c2kNCg requested a review from mwylde March 4, 2026 23:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants