-
Notifications
You must be signed in to change notification settings - Fork 4
Description
Use case
As a pipeline author, I need the ability for my code to reliably write to a clickhouse table without having to provide values for all the columns AND without having to know about all the columns in the table. My schema is regularly updated with new columns independent of the deploy of a new pipeline version (schema updates are often required to be applied prior to code that actually populates the data). As long as the new columns have reasonable defaults, the pipeline should be able to continue to function, writing data without error, without being updated.
Describe the solution you'd like
Support writing a subset of the columns in a table using a RowBinary-based format.
For performance reasons, I'd like to use a RowBinary-based format instead of converting data to JSON. Unfortunately, I don't have complete control over when new columns are added to my schema, so its possible that new columns will be added sometime while my pipeline is currently running. The default RowBinary format requires that ALL columns are included in the data, or misaligment and other errors occur. By allowing the use of RowBinaryWithNames format, Clickhouse should be able to continue to map data generate by the pipeline to the correct columns even after the addition of new columns to a table's schema.
Describe the alternatives you've considered
It's possible to make this work with other formats, including JSON, which implicitly declare the column mapping, but I don't want to deal with the performance penalty of converting to/from JSON. This can probably also be managed with AVRO or Parquet formats, but the RowBinary and its associated derivatives are the native formats supported by ClickHouse and should have the best performance.