You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This PR series extends **HWORKS-2802 / HWORKS-2807** from UI-only partitioning visibility into the full backend/client implementation of `partitioned_by` across Delta, Hudi, and Iceberg.
It adds `partitioned_by` and `online_partition_columns` to the Python client API, with validation for allowed grains, duplicates, conflicts with `partition_key`, required `event_time`, and name collisions. The fields round-trip through REST, are exposed as read-only properties, and synthetic grain features are marked `offline_only`.
The implementation evolved from an initial **Delta GENERATED ALWAYS AS** design to a simpler cross-engine model where grain columns are real materialized partition columns. The client now derives `year`, `month`, `week`, `day`, and `hour` from `event_time` before writes, using Arrow for delta-rs/PyIceberg paths and Spark functions for Spark paths. This applies to Delta, Hudi, and Iceberg, including deletes.
Hudi support adds a `PartitionedByTransformer` for materialization jobs and later moves direct writes to the same real-column model, using ordinary SIMPLE partition columns with Hive-style partitioning. Iceberg gets the same grain materialization on both Spark and PyIceberg write paths.
Read pruning is handled by a partition predicate translator. It adds grain-column predicates from `event_time` ranges so partition pruning works consistently across engines and formats. The translator was simplified to one direction only: **event_time range → derived grain predicates**. It also handles UTC normalization, avoids unsafe OR translations, and improves pruning by descending into finer grains when the range shares coarser prefixes.
Several correctness fixes are included: per-row seconds-vs-milliseconds handling for integer `event_time`, DATE + hour materialization, partitioned deletes, repeated `Query.read()` mutation, post-merge private API renames, and CI compatibility when `deltalake` is unavailable.
The series also exposes `FeatureGroup.online_config` so users can inspect RonDB secondary indexes, documents `time_travel_format` and `partitioned_by` in the hops-fg skill, and guards online serving by warning or failing when offline-only partition grain columns are selected in online feature views.
Separately, it hardens the Hopsworks MCP server and CLI after a security audit: shell tools are now opt-in, HTTP defaults to localhost, bearer-token auth is supported, unauthenticated network binds are blocked unless explicitly overridden, terminal sessions use unguessable tokens, credential exposure is reduced, hostname verification defaults to true, and the CLI supports reading API keys from stdin.
---------
Signed-off-by: Jim Dowling <jim@logicalclocks.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0 commit comments