[onboarding] general fixes for user onboarding via wizard#945
Open
MagicLex wants to merge 13 commits into
Open
[onboarding] general fixes for user onboarding via wizard#945MagicLex wants to merge 13 commits into
MagicLex wants to merge 13 commits into
Conversation
The CLI's hops fg list and hops fv list called fs.get_feature_groups()
on the active project's own feature store only, so groups shared from
other projects were invisible. Onboarding users landing in a fresh
project with only shared data saw an empty list and assumed the CLI
was broken.
Add Project.get_feature_stores() (and Connection.get_feature_stores())
backed by a new FeatureStoreApi.get_all() that hits
/project/{id}/featurestores, the same endpoint the UI's feature-store
picker uses. The CLI list commands now iterate every visible store
and add a PROJECT column so the source of each row is obvious.
Pass --current-only on either command to restrict the listing to the
active project's store.
Also add a caveat doc explaining that hopsworks-apigen shim modules
are generated at build time and gitignored, so pytest run against an
unbuilt source tree fails with confusing import errors.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Coverage reportClick to see where and how coverage changed
This report was generated by python-coverage-comment-action |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
hops fg info / features / preview / insert / derive / stats / search / keywords and hops fv info / read / delete / get-feature-vector all called fs.get_feature_group / fs.get_feature_view on the active project's own feature store. The SDK returns None when the entity is missing from that store, so requests for shared groups silently rendered "?" everywhere. Joins in hops fg derive and hops fv create could not source shared base or joined feature groups either. Centralise the lookup: session.get_feature_stores(ctx) caches the visible-stores list per invocation, and shared _get_fg / _get_fv helpers walk those stores, return the first match, and raise a clear "not found in any visible feature store. Run hops fg list / hops fv list" when nothing matches. The fix flows through every call site that resolves an entity by name without retouching the individual commands. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
A new feature pipeline today needs four CLI moves to leave the IDE: hops files upload, hops job create, hops job schedule, hops job run. The first three are pure plumbing the SDK can do itself; the onboarding wizard should be able to ship a script with one command. Add JobApi.deploy(local_path, name, type=None, environment_name=None, args=None, remote_dir=None, overwrite=True) -> Job on the SDK side. It composes dataset_api.upload (script lands in /Resources/jobs/<name>/ by default) and the existing PUT-backed create_job, and infers the job type from the extension (.py = PYTHON, .jar = SPARK). Re-deploying the same name overwrites the script and updates the job definition in place, so the call is idempotent. Add hops job deploy LOCAL_FILE --name NAME on the CLI side. It calls JobApi.deploy and then chains job.schedule when --cron is set and job.run when --run is set, so the full upload + register + schedule + launch chain fits on a single line. Live-verified on the onboarding test project: deploy creates the job, re-deploy is idempotent, --run --wait runs to completion and prints the heartbeat in the captured stdout, and --cron attaches a schedule visible via hops job schedule-info. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ice disable Two bugs found while smoke-testing the wizard's pipeline step end to end: 1. hops <command> --json mixed SDK chatter with the JSON payload. The SDK calls logging.basicConfig(stream=sys.stdout) at import time and the login banner is a bare print, so a fresh hops fg list --json round-tripped seven log lines before the actual JSON array, which broke every downstream json.loads. set_json_mode now snapshots sys.stdout at call time, swaps it for sys.stderr so subsequent prints and root-handler logs land on stderr, raises the hopsworks/hopsworks_common/hsfs/hsml loggers to WARNING, and routes print_json to the captured stdout. The snapshot is re-taken on each call so Click's per-invocation CliRunner buffer in tests is honoured. 2. arrow_flight_client._disable_feature_query_service_client() crashed with AttributeError on a brand-new session because the guard inverted the None check: when _arrow_flight_instance was None, the code tried to call .ArrowFlightClient(...) on None instead of constructing one. Fix is a one-line typo: assign the new ArrowFlightClient(disabled_for_session=True) to the module global. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… live pipeline test A pipeline that reads from a HUDI feature group via the Python engine fails with FlightServerError: Catalog Error: Table ... does not exist on this cluster, even though fg.commit_details() reports the right commit and the offline materialization job ran to SUCCEEDED. The Hive fallback is gone in 4.x so there is no second path. The caveat walks through the SDK payload (which is well-formed), the per-query DuckDB registration in FlyingDuck's query_engine.py, and the three likely root causes on the cluster side (HopsFS read permission for the FlyingDuck pod, stale warehouse mount, race between insert ack and materialization visibility). Includes the probes that confirmed the SDK side is clean so the next dev loop goes straight to the FlyingDuck pod logs. Captures the two SDK / CLI fixes that came out of the same investigation so the caveat is self-contained. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The bug is fixed end-to-end now that
``logicalclocks/hopsworks-ee#2996`` (Java emits the two-part FROM
identifier) and ``logicalclocks/flyingduck#196`` (registration uses a
real DuckDB schema) are both in. Verified live on lexterm2:
``hops fg preview`` resolves shared HUDI feature groups and the
project's own materialised FG without raising the prior catalog
error.
Per the project's caveat convention ("known gotchas; add new ones to
this folder"), once the gotcha disappears the file should follow.
Keeping a stale caveat around teaches the wrong shape — the PRs and
commit history carry the diagnostic walkthrough for anyone who needs
it later.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move the working doc and the live-tested sample scripts into the repo so a future agent can pick the loop back up from PR state alone (the pod the loop ran in is ephemeral; anything in /tmp or the user homedir disappears when it cycles). Doc lands at .claude/docs/onboarding-flow.md and now opens with a "Handoff state" section pointing at the three cross-repo PRs, the lexterm2 kubeconfig at /hopsfs/Resources/kubeconfig-lexterm2 (which is project-scoped and survives pod restarts), and the cluster-side state of the live demo project. Samples under .claude/docs/samples/ carry the heartbeat job, the BTC feature pipeline, and the vectorised retrieval-time transformations. CLI fixes from the Step 4 / Step 5 loop, committed together so the sample scripts and the doc references all match HEAD: - hops fg stats no longer dumps the raw statistics JSON to stdout in human mode. Renders FEATURE / TYPE / COUNT / MIN / MAX / MEAN / STDDEV / COMPLETENESS as a table, formats wide numerics with thousands separators and four significant digits. - hops transformation create iterates every @udf-decorated function in the source file instead of refusing files with more than one, and emits a per-function "Created transformation NAME vV" line. Mirrors the SDK shape (one create_transformation_function call per UDF) without forcing the caller to split the file. - hops transformation create now defaults to --version 1, working around the backend HTTP 500 / NPE when version is null. The SDK side still passes null straight through and the backend should default to 1 (or the next free version); tracked in the gaps list. - hops fv create --transform now accepts the fn[version]:col shape so an FV can pin a specific transform version. The SDK's fs.get_transformation_function(name=) defaults to v1, so if a fix re-registers as v2, FVs created without an explicit version stay bound to the broken v1. Pinning closes the gap. - Tests updated: validates "register every UDF in source" and the "reject source with no @udf" path; old "reject multiple" parametrize case removed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…opsfs path is not a dataset Two gaps hit while running the autoresearch flow on lexterm2: cli: HOPSWORKS_ENGINE was required in interactive Hopsworks pods. hopsworks_common.connection picks the spark engine whenever pyspark imports and we're in-pod, but the typical terminal/notebook pod has pyspark installed transitively without a usable Spark master (Spark Connect, no SparkSession.builder.getOrCreate() fallback), so login crashes on getOrCreate. The CLI is single-process and never needs an in-process spark engine: heavy operations dispatch jobs server-side. hopsworks.cli.auth.login now defaults engine=python when neither the caller nor HOPSWORKS_ENGINE picks one. Explicit args and the env var both still win. hsml: model.save() of a path under /hopsfs/ that is not actually a project dataset path errored with "Path not found". Per-pod mount layouts vary, FUSE writes may not have synced, and user-home folders under /hopsfs/Users/<user>/ are not project datasets. _normalize_ hopsfs_mount_path is a pure string strip, so we now also verify the normalized path with dataset_api.path_exists; only then do we move it, otherwise we fall back to the regular local upload path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…schema is timestamp/date pd.read_json and json.loads both parse ISO 8601 timestamps as object columns, and fg.insert(df) then rejects them with a "wrong type" error even though the value is well-formed. Inspect the FG schema (preferring the new fg.columns, falling back to the deprecated fg.features) and coerce any timestamp/date column in place with pd.to_datetime, so hops fg insert --file row.json on a typical event-time FG just works. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…loop) Two orchestration briefs the in-app Wizard pastes into Claude Code in the Terminal. Each is a short conversational program: a few inputs from the user, then CLI-first execution. They wrap the canonical 10-step onboarding flow into focused, agent-loadable specs. - docs/wizard/time-series.md: raw FGs through a feature pipeline, feature view, training pipeline, registered model, deployment, and Streamlit app. Built segment by segment with hops context after each so the user sees what just appeared. Hard rule against heredoc-python orchestration; everything goes through the hops CLI. - docs/wizard/research.md: autonomous research loop on a feature view, logged to autoresearch_experiments_<tag> and the model registry. Mirrors hopsworks-autoresearch/program.md's contract, with CLI commands inlined. The briefs explicitly call out conversation rules (one question per turn, summarise after each segment, integrate user pushback) and anti-patterns (heredoc python, pip-install inline, skip-the-schedule) that came out of testing the flow end-to-end on lexterm2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
15 numbered issues captured live while running the time-series wizard brief end-to-end on lexterm2 (raw FGs to deployed Streamlit). Each entry is dated by severity (blocker / time-waster / papercut), reports the symptom verbatim, names the root cause where known, and proposes a fix. The two highest-leverage ones are logicalclocks#10 (model files not copied to Deployments/<name>/<v>/, crashloops the predictor) and logicalclocks#13 (batch get_feature_vectors silently returns 0 rows on ISO-date strings while the singular form accepts them). Lives next to the briefs that produced it so the next agent loading the wizard context sees the known traps before walking into them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four more items hit while wiring the Streamlit consumer onto the deployed model: - logicalclocks#16 hops app create --path rejects absolute /hopsfs/ paths and produces a doubled // HDFS URL; opposite semantics from hops job deploy. - logicalclocks#17 hops app start has --no-wait only (start blocks by default); hops deployment start has --wait. Naming-symmetry break. - logicalclocks#19 deployment.predict() does bare json.dumps() on inputs, so any datetime.date payload TypeError-s in hsml internals. Combined with logicalclocks#13, the date type round-trip is a cliff: SDK forces string, get_feature_vectors silently drops it. - logicalclocks#18 python-app-pipeline env lacks plotly. Crash surfaces only after upload + start. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
hops fg listandhops fv listnow span every feature store visible to the project (own + shared), with a newPROJECTcolumn to show provenance.Project.get_feature_stores()/Connection.get_feature_stores()backed by a newFeatureStoreApi.get_all(), hitting the same/project/{id}/featurestoresendpoint the UI uses.--current-onlyon either command to restrict the listing to the active project.Why
Onboarding users who land in a fresh project with only shared FGs (e.g.
hopsworks_default) sawhops fg listreturn empty and assumed the CLI was broken. The old code only looked atfs.get_feature_groups()for the project's own feature store.Also in this PR
A
caveats/doc explaining thathopsworks-apigenshim modules are generated at build time and gitignored, so running pytest against an unbuilt source tree fails with confusing import errors. Saved a few hours of head-scratching in this branch.Test plan
uv tool installfrom the patched source, thenhops fg listshows the 8 shared FGs fromhopsworks_defaultwith the project column populated.hops fg list --current-onlyreturns only the active project's FGs (empty in our test project).hops fg list --jsonincludes thePROJECTkey.hops fv listfollows the same shape (verified, no shared FVs in our test backend so list is empty, but the table headers and iteration are correct).test_fg_list_spans_shared_stores,test_fg_list_current_only_skips_shared,test_fv_list_spans_shared_storespass; existing list tests updated for the new column.ruff checkandruff formatclean on touched files.docsigclean on the new methods (only pre-existing failures remain).🤖 Generated with Claude Code