Skip to content

[FSTORE-2030] Add support for specifying lookback windows for PIT queries#959

Draft
manu-sj wants to merge 9 commits into
logicalclocks:mainfrom
manu-sj:FSTORE-2030
Draft

[FSTORE-2030] Add support for specifying lookback windows for PIT queries#959
manu-sj wants to merge 9 commits into
logicalclocks:mainfrom
manu-sj:FSTORE-2030

Conversation

@manu-sj
Copy link
Copy Markdown
Contributor

@manu-sj manu-sj commented May 21, 2026

Summary

  • Adds a Lookback dataclass under hsfs.constructor.lookback and a lookback parameter on FeatureView.get_batch_data, create_training_data, and the split variants. The window applies uniformly to every joined feature group in the PIT join.
  • Two modes: partition_key (constant-bound predicate on each FG's partition column — flyingduck and Spark prune partitions before reading files), event_time (predicate on the FG's event_time column — row-level correctness; partition pruning is engine-dependent).
  • Wire format is camelCase (lookbackKey, startWindow, endWindow) with bounds as epoch ms; the SDK fans the uniform lookback out across the query tree's joins and also emits a joinsLookback map keyed by prefix + fg.name on the training-dataset POST.

JIRA

FSTORE-2030

Test plan

  • 27 unit tests on Lookback (validation matrix, dict/dataclass round-trip, date+datetime coercion, nested-map serialization).
  • feature_view and training_dataset tests pass.
  • ruff + docsig clean on the new module and the modified public methods.
  • End-to-end on cluster manu-test via the loadtest integration tests (companion PR).

Companion PRs

  • Backend: logicalclocks/hopsworks-ee → branch FSTORE-2030
  • Integration tests: logicalclocks/loadtest → branch FSTORE-2030
  • Docs: logicalclocks/logicalclocks.github.io → branch FSTORE-2030

manu-sj and others added 2 commits May 20, 2026 17:21
…ries

https://hopsworks.atlassian.net/browse/FSTORE-2030

PIT (point-in-time) joins in the Feature Store currently emit
`feature_fg.event_time <= root.event_time` as a row-correlated
inequality, so engines must scan every historical partition of every
joined feature group on every query. As feature groups grow with
daily ingestion the cost grows unboundedly: a 31 GB / 64-week
transactional feature group ends up scanning all 64 partitions per
PIT batch read.

The ticket adds an optional `lookback` parameter on
`FeatureView.get_batch_data`, `create_training_data`, and the
split variants. The user supplies a `key` (`partition_key` or
`event_time`) and a `start_window` (required) plus optional
`end_window`. The backend turns the window into a constant-bound
predicate on each joined FG so flyingduck's directory walker and
Spark catalyst can prune partitions before opening any files.

The SDK exposes a `Lookback` dataclass in `hsfs.constructor.lookback`
that accepts either a dict (the customer's documented shape) or the
dataclass itself. Both `date` and `datetime` are accepted for the
bounds; `date` values are normalised to UTC midnight before
conversion to epoch milliseconds. The `Lookback` constructor
validates locally: `key` must be in `{event_time, partition_key}`,
`start_window` is required, and `start_window < end_window` strictly
when `end_window` is provided.

Two wire paths fan the uniform lookback across the feature view's
join tree before the request is submitted. The stateless
`get_batch_data` path calls `attach_to_joins(query.joins, lookback)`
so the per-Join `LookbackDTO` rides on the standard
`PUT /featurestores/{id}/query` SQL-construction call. The
materialised `create_training_data*` path builds a
`Dict[prefix + right_fg.name, LookbackDTO]` via
`build_joins_lookback_map(...)` and sets it on
`TrainingDatasetDTO.joinsLookback`, since the FV-backed training-data
POST has no `QueryDTO` body. When `key="event_time"` is selected the
engine logs an INFO that partition pruning becomes engine-dependent
(Hudi/Delta column-stats only) while row-level correctness is
always guaranteed.

`Join.from_response_json` and `TrainingDataset.from_response_json`
both deserialise the new fields with `None` defaults; older
training datasets written before V72 read back unchanged.

Signed-off-by: Manu Sathyarajan Joseph <manu.joseph@logicalclocks.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ries

https://hopsworks.atlassian.net/browse/FSTORE-2030

Tighten the SDK's public-API docstrings around the new `lookback`
parameter.

Document `lookback` in the parameter blocks of `get_batch_data`,
`create_training_data`, `create_train_test_split`, and
`create_train_validation_test_split`. The public-API contract requires
every parameter to be documented; the original change added the
signature on each method but left the docstrings without an entry.

Round out the docstrings on `Lookback.to_dict`, `Lookback.from_dict`,
`attach_to_joins`, and `build_joins_lookback_map` so each public surface
in the new `lookback` module clears `docsig` and explains what each
return value means. Also reorder imports in `lookback.py` to satisfy
ruff's isort rule.

Signed-off-by: Manu Sathyarajan Joseph <manu.joseph@logicalclocks.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 21, 2026

Coverage report

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  python/hopsworks_common
  usage.py
  python/hsfs
  feature_view.py 1763, 2064, 2360
  training_dataset.py
  python/hsfs/constructor
  join.py 59-69, 99, 103
  lookback.py 100, 350, 352, 365, 370-375, 418, 451, 474, 486, 510, 536, 542-543, 547, 549, 586-590, 661-665, 675, 693, 698, 738-742
  query.py 717, 720, 733-735
  python/hsfs/core
  feature_view_engine.py 73-80, 390, 416-417, 461, 484, 487-488, 514-515
Project Total  

This report was generated by python-coverage-comment-action

manu-sj and others added 7 commits May 21, 2026 09:20
…ries

https://hopsworks.atlassian.net/browse/FSTORE-2030

Switch the lookback helper imports in `feature_view_engine` to a module
import (`from hsfs.constructor import lookback as _lookback`) and
qualify the call sites accordingly. Importing the three names with
per-name aliases tripped ruff's isort I001 rule under the project's
ruff configuration; the module-import form is idiomatic for this style
and keeps the call sites short.

Signed-off-by: Manu Sathyarajan Joseph <manu.joseph@logicalclocks.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ries

https://hopsworks.atlassian.net/browse/FSTORE-2030

Re-run `ruff format` on the lookback module and its tests so they
conform to the project formatter. The CI's `ruff format --check`
flagged the two files as unformatted after the original commits; the
content is unchanged behaviorally.

Signed-off-by: Manu Sathyarajan Joseph <manu.joseph@logicalclocks.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ries

https://hopsworks.atlassian.net/browse/FSTORE-2030

Extend the per-call lookback API so different joined feature groups can
carry different windows on the same query. The v1 implementation fanned
a single Lookback across every join; v3 introduces a LookbackPlan
container with an optional uniform default plus a per-feature-group
override map, while keeping the leaf Lookback class for the uniform 90%
case. Field names move from `start_window` / `end_window` to `start` /
`end` on both the SDK and the wire; the persistence columns keep their
historical names via JPA @column mappings so no new migration ships.

The SDK adds LookbackPlan as a second @public class alongside the leaf
Lookback. The four FeatureView methods that accept `lookback` route the
value through a `_coerce_lookback` helper that disambiguates the dict
form on top-level keys: `{key, start, end}` constructs a Lookback,
`{default, feature_groups}` constructs a LookbackPlan. Empty dicts,
mixed-keys, legacy `start_window` / `end_window` aliases, and unknown
keys raise ValueError with verbatim messages defined in the API spec so
the docstring and the SDK stay in lockstep. Per-feature-group keys
accept six forms (FG instance, (FG, prefix), 3-tuple, 3-tuple with None
prefix, 2-tuple, bare string) and normalize to a (name, version, prefix)
triple at attach time; ambiguous, unknown, and colliding keys raise with
the canonical triples listed so the call can be corrected. The wire
shape produced by `build_joins_lookback_list` is the List<JoinLookbackEntry>
the backend now expects, deduplicated by triple. Tests assert the exact
spec error strings so the API contract stays the single source of truth.

Signed-off-by: Manu Sathyarajan Joseph <manu.joseph@logicalclocks.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ries

https://hopsworks.atlassian.net/browse/FSTORE-2030

Codex pre-review on the v3 diff flagged that the ambiguity ValueError
in _resolve_feature_groups listed only the first canonical triple as
the disambiguator hint. The API contract calls for every matching
triple to appear so a user joining the same feature group under
several prefixes can pick the right one without re-reading the feature
view definition.

Rewrite the message to repeat the full list of matching triples in the
"Disambiguate by passing one of the full triples ..." clause so the
hint covers every possible resolution. The existing SDK test that
asserts the ambiguity branch matches on the substring "ambiguous", so
no test surface changes.

Signed-off-by: Manu Sathyarajan Joseph <manu.joseph@logicalclocks.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ries

https://hopsworks.atlassian.net/browse/FSTORE-2030

End-to-end cluster verification of LookbackPlan surfaced a wire-ordering
gap: an ambiguous bare-name `feature_groups` key (an FV that joins the
same feature group twice under different prefixes) bypassed the SDK's
`_resolve_feature_groups` check and reached the engine, which then
raised a cryptic DuckDB "Ambiguous reference to table" error instead of
the actionable SDK ValueError that names the canonical triples.

`attach_to_joins` runs against the backend-reconstructed query returned
from `/featureviews/.../batchquery`, whose joins no longer carry the
original feature-group identity that `_resolve_feature_groups` matches
against. Pre-validate the lookback against `feature_view_obj.query.joins`
(the local query the user built, with feature-group instances intact)
BEFORE the HTTP roundtrip via a new `validate_against_joins` helper
that runs the resolver without mutating joins. After the roundtrip the
existing `attach_to_joins` call continues to fan the resolved entries
onto the reconstructed joins for the wire payload.

`create_training_dataset` is unaffected — it already builds the wire
list from `feature_view_obj.query.joins` directly, so the validation
already runs against the local query.

Signed-off-by: Manu Sathyarajan Joseph <manu.joseph@logicalclocks.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ries

https://hopsworks.atlassian.net/browse/FSTORE-2030

Cluster verification verified the PIT join returns correct rows under
the v3 lookback contract but did not prove partition pruning was
happening: for the seeded data, the PIT join returns the same scores
whether the lookback's lower bound filters out historical partitions
or not, because no out-of-window row would ever be the latest match
for an in-window root row. Without inspecting the generated SQL or the
engine's executed plan, the integration tests cannot distinguish a
lookback that fires from one that is silently dropped.

Extend `FeatureView.get_batch_query` (the SQL-string-returning method)
to accept an optional `lookback` parameter so users (and the loadtest
suite) can inspect the predicate the backend emits. The engine path
runs the same pre-validate against the FV's local join tree as
`get_batch_query` does, then attaches the resolved entries to the
backend-reconstructed Query before `construct_query` runs the SQL
emitter. The downstream loadtest commit adds a
`test_partition_key_emits_pruning_predicate` test that asserts the
returned SQL contains the lookback lower-bound literal for both
`partition_key` and `event_time` modes.

Signed-off-by: Manu Sathyarajan Joseph <manu.joseph@logicalclocks.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ries

https://hopsworks.atlassian.net/browse/FSTORE-2030

Customer workload exposed that v3 lookback applied only to joined feature
groups, never the root. For a feature view that joins three dimension
FGs onto a single root, the SQL emitted partition predicates on each
joined FG but left the root unbounded — so the customer's largest
table, the one driving query volume, got no I/O reduction. Every
pre-existing loadtest and SDK test passed because correctness was
unchanged (PIT semantics make out-of-window dim rows irrelevant whether
pruned or row-filtered) and the SQL/plan regex assertions matched the
first occurrence of the bound anywhere in the SQL.

Extend the SDK's lookback resolution to cover the root: `_resolve_feature_groups`,
`validate_against_joins`, `attach_to_joins`, and `build_joins_lookback_list`
all take an optional root_fg / root_query parameter. The root's
`(name, version, prefix=None)` triple participates in key matching like
any joined FG: bare-name keys, full triples, and the LookbackPlan
default all resolve onto it, raising the same ambiguity / unknown-key
errors with the canonical triples listed. `attach_to_joins` stores the
resolved root lookback on `Query._lookback`; `build_joins_lookback_list`
prepends a wire entry for the root so the training-dataset write path
ships it alongside the joined entries. `feature_view.py`'s read and
training-data call sites pass `feature_view_obj.query._left_feature_group`
into the helpers so every path participates.

Wire-format extension on Query: a top-level `lookback: LookbackDTO`
field for the construct_query read path, serialized when
`Query._lookback` is set. The joinsLookback list shape on the training-
dataset DTO is unchanged — the root just gets an entry there.

A regression test (`test_uniform_lookback_must_include_root_fg`)
asserts the wire payload now contains four entries for the customer's
4-FG view shape; this test was written to fail on the previous
implementation and now passes.

Signed-off-by: Manu Sathyarajan Joseph <manu.joseph@logicalclocks.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant