-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Is your feature request related to a problem?
The current SigNoz metrics storage relies on a normalized two-table schema (samples_v4 and timeseries_v4). This design was originally chosen to avoid the performance penalties associated with Map or String types in a single table, with the trade-off being a read-time join.
In older ClickHouse versions, filtering a single key within a Map required the engine to read and scan the entire object for every row. This created a significant I/O bottleneck that the current two-table normalization was designed to solve; however, this forced fingerprint joins.
Describe the solution you'd like
I would like to propose migrating the SigNoz metrics schema from two-table storage to a Single-Table Metrics Store utilizing the native ClickHouse JSON data type (v25.3+).
Why this solution?
- Subcolumnar Access: Unlike the
Maptype, the nativeJSONtype physically flattens keys into independent subcolumns on disk. This allows ClickHouse to read only the specific attributes requested, effectively solving the "entire object fetch" problem. - Elimination of Joins: By moving labels directly into the primary table as subcolumns, we eliminate the need for the
timeseries_v4join. This will significantly reduce CPU and memory consumption during query execution. - High-Cardinality Scaling: This approach leverages ClickHouse's modern ability to handle dynamic subcolumns, providing the performance of a static schema with the flexibility of a dynamic one.
Discussion Question:
Do we still require the two-table approach for any other reasons, such as overall data storage efficiency or avoiding the duplication of labels? Specifically, how does the compression of native JSON subcolumns in ClickHouse compare to the storage savings of a normalized metadata table when dealing with billions of rows?
Additional context
- ClickHouse Schema for Observability: 200x Faster Queries & 3x Storage Savings
- SigNoz Engineering Blog: Building a High-Performance JSON Log Store