Skip to content

Add multiproject design docs and phase 1 plumbing#4276

Open
pcnudde wants to merge 11 commits intoNVIDIA:mainfrom
pcnudde:feature/multitenant
Open

Add multiproject design docs and phase 1 plumbing#4276
pcnudde wants to merge 11 commits intoNVIDIA:mainfrom
pcnudde:feature/multitenant

Conversation

@pcnudde
Copy link
Collaborator

@pcnudde pcnudde commented Mar 6, 2026

Summary

  • add the multiproject design docs and rollout guidance
  • add phase 1 project propagation through the admin API, recipes, and server job metadata
  • add project name validation and tests, while keeping the Kubernetes launcher change at TODO-only scope

pcnudde added 6 commits March 5, 2026 13:13
- Rename design docs to multiproject.md / multiproject_implementation.md
- Deprecate scope concept in favor of project
- Add PocEnv project parameter alongside ProdEnv
- New project.yml v4 schema: sites/admins/projects separation
- admins section explicitly optional for SSO migration path
- Add Phase 1 implementation section
@pcnudde
Copy link
Collaborator Author

pcnudde commented Mar 6, 2026

/build

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 6, 2026

Greptile Summary

This PR delivers Phase 1 of multi-project support for NVFLARE: a design document plus the minimal plumbing to thread a project name from user-facing APIs (PocEnv, ProdEnv, Session) through the admin command layer into job metadata, so that the Docker launcher can mount a project-specific workspace directory. No access control, job-store partitioning, or scheduler changes are included — those are deferred to later phases.

Key changes:

  • JobMetaKey.PROJECT added as a first-class job metadata field.
  • DNS-label-style project name validation (name_check) added to format_check.py, applied at the PocEnv/ProdEnv constructor, Session.__init__, and again server-side in _add_project_to_meta (defense in depth).
  • DockerJobLauncher._resolve_docker_workspace now selects a project-specific subdirectory and validates its existence before attempting Docker launch.
  • K8sJobLauncher receives only a TODO comment for now (Phase 1 scope).
  • PROJECT added to CLONED_META_KEYS so cloned jobs inherit their source job's project.
  • Comprehensive unit tests cover validation edge cases, Docker workspace resolution, session parameter propagation, and the full server-side submit_job flow.

The main issue found is a type annotation mismatch in DockerJobLauncher.launch_job: the method can now explicitly return None when workspace resolution fails, but it is declared -> JobHandleSpec (matching the abstract spec), which will confuse type checkers and may silently produce AttributeError in callers that don't guard for None. Additionally, the sentinel string "default" used to identify the backward-compatible project is an inline magic literal that would benefit from being a named constant.

Confidence Score: 4/5

  • PR is safe to merge; the type annotation mismatch is low-risk in practice but should be addressed before a wider rollout.
  • The Phase 1 scope is narrow and well-tested. Validation is applied at multiple layers (client and server). The only notable issue is that DockerJobLauncher.launch_job can return None while its declared return type is JobHandleSpec, which is a type annotation inconsistency that may confuse static analysis tools and callers. All other changes are additive with backward-compatible defaults.
  • nvflare/app_opt/job_launcher/docker_launcher.py — return type annotation mismatch and magic "default" string.

Important Files Changed

Filename Overview
nvflare/app_opt/job_launcher/docker_launcher.py Adds _resolve_docker_workspace to select a project-scoped subdirectory; introduces an explicit return None whose type is inconsistent with the declared -> JobHandleSpec annotation, and uses a magic "default" string.
nvflare/private/fed/server/job_cmds.py Adds server-side _add_project_to_meta with double validation (type + regex) for both submit_job and clone_job; also adds PROJECT to CLONED_META_KEYS so clones inherit the source job's project.
nvflare/fuel/flare_api/flare_api.py Threads project through Session, new_session, new_secure_session, and new_insecure_session; validates at construction time and forwards as CMD_PROPS to submit/clone commands. Clean and well-tested.
nvflare/apis/utils/format_check.py Adds "project" regex pattern (^[a-z0-9](?:[a-z0-9-]{0,61}[a-z0-9])?$) enforcing DNS-label-style names; correct and well-tested with 10 parametrized cases.
nvflare/recipe/poc_env.py Adds optional project parameter with pydantic model validation and propagation to SessionManager session params; straightforward and covered by tests.
nvflare/recipe/prod_env.py Mirrors poc_env.py changes: adds project with validation in pydantic model validator and propagation to SessionManager.
nvflare/apis/job_def.py Minimal, correct addition of PROJECT = "project" enum member to JobMetaKey.
nvflare/app_opt/job_launcher/k8s_launcher.py Phase 1 scope: adds only a TODO comment describing the intended project-aware settings resolution; no behavioral change.
tests/unit_test/private/fed/server/job_cmds_test.py Thorough tests for _add_project_to_meta (valid, edge-case, and invalid inputs) plus an end-to-end submit_job integration test confirming project propagates to both fire_event meta and job_def_manager.create.
tests/unit_test/app_opt/docker_launcher_test.py New test file covering: missing env var returns None, missing project subdirectory returns None, and valid project workspace results in a correctly-mounted Docker container.
docs/design/multiproject.md Comprehensive multi-project design document covering project model, RBAC matrix, provisioning schema, runtime isolation strategies, migration path, and future SSO direction.

Sequence Diagram

sequenceDiagram
    participant DS as Data Scientist
    participant Env as PocEnv / ProdEnv
    participant SM as SessionManager
    participant Sess as Session (FlareAPI)
    participant Srv as JobCommandModule (Server)
    participant L as DockerJobLauncher

    DS->>Env: ProdEnv(project="cancer-research")
    Env->>Env: name_check(project, "project")
    DS->>Env: recipe.execute(env)
    Env->>SM: SessionManager({project: "cancer-research", ...})
    SM->>Sess: new_secure_session(project="cancer-research")
    Sess->>Sess: name_check(project) → store self._project

    DS->>Sess: submit_job(job_path)
    Sess->>Sess: props = {project: "cancer-research"}
    Sess->>Srv: do_command(SUBMIT_JOB, props=props)

    Srv->>Srv: _add_project_to_meta(meta, conn)
    Note over Srv: name_check(project) server-side re-validation
    Srv->>Srv: meta[project] = "cancer-research"
    Srv->>Srv: fire_event(SUBMIT_JOB, fl_ctx{JOB_META=meta})
    Srv->>Srv: job_def_manager.create(meta, ...)

    Note over Srv,L: At job launch time
    Srv->>L: launch_job(job_meta)
    L->>L: _resolve_docker_workspace(job_id, project)
    Note over L: os.path.isdir check
    L->>L: volumes = {workspace/cancer-research: /workspace}
    L->>L: docker.containers.run(image, volumes=...)
Loading

Comments Outside Diff (3)

  1. nvflare/app_opt/job_launcher/docker_launcher.py, line 125-136 (link)

    Return type annotation doesn't include None

    launch_job is declared as returning JobHandleSpec (matching the parent JobLauncherSpec abstract spec), but the new code can now explicitly return None when _resolve_docker_workspace fails. This violates the declared type contract and will be flagged by static analysis / mypy. Callers that do not guard against a None handle will get an AttributeError at runtime rather than a meaningful exception.

    The return type should be updated to Optional[JobHandleSpec] here (and ideally also in JobLauncherSpec.launch_job):

    You'll also need from typing import Optional if it isn't already imported.

  2. nvflare/app_opt/job_launcher/docker_launcher.py, line 108 (link)

    Missing type annotation for project parameter

    The project parameter has no type annotation, while job_id does. For consistency with the rest of the codebase and to make the method signature self-documenting:

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

  3. nvflare/app_opt/job_launcher/docker_launcher.py, line 147 (link)

    "default" is a magic string

    The string "default" appears here as the sentinel for "no project / backward-compatible path". The same sentinel is referenced in the design doc and in multiple places in the Phase 1 plumbing. Defining it as a module-level constant (e.g., DEFAULT_PROJECT = "default") would make future changes safer and the code more readable. The same constant could be imported by job_cmds.py and test code rather than each duplicating the literal.

Last reviewed commit: f4c17a8

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant