diff --git a/docs/design/multiproject.md b/docs/design/multiproject.md new file mode 100644 index 0000000000..9f26bb3c85 --- /dev/null +++ b/docs/design/multiproject.md @@ -0,0 +1,482 @@ +# Multi-Project Support in Flare + +## Revision History + +| Version | Notes | +|---------|-------| +| 1 | Initial version | +| 2 | Incorporate feedback and Mayo discussion | + +## Introduction + +Flare currently operates as a single-tenant system. All server and client processes run under the same Linux user, all jobs share a flat store (`jobs//`), and every authorized admin can see and act on every job. There is no data segregation between different collaborations running on the same infrastructure. + +To achieve genuine multi-tenancy, we introduce a **project** concept as the primary tenant boundary. A project encapsulates a private dataset, a set of participants (users and sites), an authorization policy, and runtime isolation. This document specifies the required changes across the full Flare stack. + +### Design Principles + +1. **Least privilege by default** — users see nothing outside their project(s) +2. **Defense in depth** — logical access control (authz) + physical isolation (containers/PVs) +3. **Backward compatible** — a `default` project preserves current single-tenant behavior +4. **`scope` deprecated** — the existing `scope` data-governance concept is superseded by `project`; `scope` will be removed in a future release +5. **Phased rollout** — Phase 1 project plumbing is available without `api_version: 4`; full multitenancy enforcement is gated on `api_version: 4` in `project.yml` + + +--- + +## Project Model + +A project is a named, immutable tenant boundary with these properties: + +| Property | Description | +|----------|-------------| +| `name` | Unique identifier (e.g., `cancer-research`) | +| `sites` | Set of FL sites enrolled in this project (must reference client-type site entries) | +| `users` | Set of admin users with per-project roles | +| `authorization` | Per-project authorization policy | + +- Users are associated with one or more projects, each with an independent role. +- **Clients participate in all projects they are enrolled in simultaneously.** Data isolation on shared clients is achieved through the runtime environment: K8s jobs mount project-specific PVs, Docker jobs mount project-specific host directories. The Flare parent process on the client does not access project data directly. +- Jobs belong to exactly one project (immutable after submission). +- A `default` project exists for backward compatibility. + +--- + +## User Experience + +### Data Scientist (Recipe API) + +The recipe is unchanged. The project is specified via `ProdEnv`: + +```python +recipe = FedAvgRecipe( + name="hello-pt", + min_clients=n_clients, + num_rounds=num_rounds, + initial_model=SimpleNetwork(), + train_script=args.train_script, +) + +env = ProdEnv( + startup_kit_location=args.startup_kit_location, + project="cancer-research", +) +run = recipe.execute(env) +``` + +If `project` is omitted in `ProdEnv`, it remains `None` (no API default change). + +### Admin (FLARE API / Admin Console) + +The `Session` gains a project context: + +```python +sess = new_secure_session( + username="admin@org_a.com", + startup_kit_location="./startup", + project="cancer-research", # new +) +# All subsequent operations scoped to this project +jobs = sess.list_jobs() # only caller-visible jobs in cancer-research +sess.submit_job("./my_job") # tagged to cancer-research +``` + +Admin console equivalent: + +``` +> set_project cancer-research +Project set to: cancer-research + +> list_jobs +... only shows caller-visible jobs in cancer-research ... +``` + +A user with roles in multiple projects can switch context: + +``` +> set_project multiple-sclerosis +Project set to: multiple-sclerosis +``` + +### Platform Administrator + +A new **platform admin** role (distinct from per-project `project_admin`) manages cross-project concerns: + +- Assign clients to projects +- Assign project admins +- View system-wide health (without seeing job data) + +Project create/archive is deferred for v1 (projects are provisioning-time config in `project.yml`). + +--- + +## Data Model Changes + +### Job Metadata + +`project` becomes a first-class, immutable field on every job. Set at submission time from the user's active project context. Cannot be changed after creation. + +The project value is syntactically validated at the user-facing API layer and again on the server before it is persisted into job metadata. This prevents invalid or path-like values from reaching runtime launchers. + +### Job Store Partitioning + +New multitenant jobs are stored at `jobs///` (vs. current `jobs//`). No migration of existing jobs — they remain at `jobs//` and implicitly belong to the `default` project. + +Legacy `default` jobs continue to be served by the main server process for compatibility. New server job pods mount only the project-partitioned slice needed for the active job. + +Physical partitioning enables: +- Filesystem-level isolation (different mount points per project in K8s) +- Simpler backup/restore per project +- Prevents cross-project data access via path traversal + +### Project Registry + +The server loads `project.yml` directly at startup for project/role lookup. No separate registry format or database needed. + +--- + +## Access Control Changes + +### Role Model + +Roles are **per-project**, not global. A user can be `lead` in one project and `member` in another. + +Today, the role is baked into the X.509 certificate (`UNSTRUCTURED_NAME` field). A single cert cannot encode multiple per-project roles. + +**Layered resolution (no breaking change):** +1. If `ProjectRegistry` exists AND user has a mapping for the active project → use registry role +2. Else if active project is `default` → fall back to cert-embedded role (legacy compatibility) +3. Otherwise → deny (`user not assigned to active project`) + +The cert format is unchanged. Existing deployments with `api_version: 3` certs keep working. The cert role field is not removed or made vestigial in this version — it remains the primary source for single-tenant deployments and fallback for the `default` project. + +### Admin Role Hierarchy + +| Role | Scope | Capabilities | +|------|-------|-------------| +| `platform_admin` | Global | Assign clients/admins to provisioned projects, system shutdown, view all sessions | +| `project_admin` | Per-project | All job ops within project, view project's clients (no client lifecycle control) | +| `org_admin` | Per-project | Manage own-org jobs, view own-org clients within project | +| `lead` | Per-project | Submit/manage own jobs, view own-org clients within project | +| `member` | Per-project | View-only within project | + +### Command Authorization Matrix + +Every command is scoped to the user's active project. Operations on resources outside the active project are denied. + +If the same human has multiple roles (for example `platform_admin` globally and `project_admin` in some projects), no explicit role-switch is required: +- Project-scoped job commands are authorized by the user's role in the active project +- Platform/global commands are authorized by `platform_admin` +- `platform_admin` alone does not imply project job-data permissions + +#### Job Operations + +| Command | project_admin | org_admin | lead | member | +|---------|:---:|:---:|:---:|:---:| +| `submit_job` | yes | no | yes | no | +| `list_jobs` | all in project | own-org jobs | own jobs | all in project | +| `get_job_meta` | all in project | own-org jobs | own jobs | all in project | +| `download_job` | all in project | own-org jobs | own jobs | no | +| `download_job_components` | all in project | own-org jobs | own jobs | no | +| `clone_job` | all in project | no | own jobs | no | +| `abort_job` | all in project | own-org jobs | own jobs | no | +| `delete_job` | all in project | own-org jobs | own jobs | no | +| `show_stats` | all in project | all in project | all in project | all in project | +| `show_errors` | all in project | all in project | all in project | all in project | +| `app_command` | all in project | own-org jobs | own jobs | no | +| `configure_job_log` | all in project | own-org jobs | own jobs | no | + +**"all in project"** = any job within the active project. +**"own-org jobs"** = jobs submitted by a user in the same org, within the active project. +**"own jobs"** = jobs submitted by this user, within the active project. + +#### Infrastructure Operations + +Since clients are shared across projects, **only `platform_admin` can perform client lifecycle operations** (restart, shutdown, remove). Disrupting a client affects all projects running on it. + +| Command | platform_admin | project_admin | org_admin | lead | member | +|---------|:---:|:---:|:---:|:---:|:---:| +| `check_status` | all clients | project's clients (view) | own-org + project (view) | own-org + project (view) | project's clients (view) | +| `restart` | all | no | no | no | no | +| `shutdown` | all | no | no | no | no | +| `shutdown_system` | yes | no | no | no | no | +| `remove_client` | all | no | no | no | no | +| `sys_info` | all | project's clients | own-org + project | own-org + project | no | +| `report_resources` | all | project's clients | own-org + project | own-org + project | no | +| `report_env` | all | project's clients | own-org + project | own-org + project | no | + +#### Shell Commands + +| Command | platform_admin | project_admin | org_admin | lead | member | +|---------|:---:|:---:|:---:|:---:|:---:| +| `pwd`, `ls`, `cat`, `head`, `tail`, `grep` | all | project's clients | own-org + project | own-org + project | no | + +Shell command behavior needs deeper design discussion because parent-process and job-pod filesystems can diverge (including standard K8s setups). See Unresolved Questions. + +#### Session / Platform Commands + +| Command | platform_admin | project_admin | org_admin | lead | member | +|---------|:---:|:---:|:---:|:---:|:---:| +| `list_sessions` | all | project's sessions | no | no | no | +| `set_project` | any project | assigned projects | assigned projects | assigned projects | assigned projects | +| `list_projects` | all | assigned only | assigned only | assigned only | assigned only | +| `dead` | yes | no | no | no | no | + +--- + +## Authorization Enforcement + +Two layers, evaluated in order: + +1. **Project filter** (new): Is this resource in the user's active project? If no, invisible. +2. **RBAC policy** (existing): Does the user's project-role permit this operation on this resource? + +The existing `authorization.json` policy format is largely unchanged — project scoping happens above it. + +--- + +## Provisioning Changes + +### project.yml + +The v4 schema uses three top-level sections with a deliberate separation of concerns: + +- **`sites`** — infrastructure participants (server, clients). Always present. Identity and trust are cert-based; these entries never go away. +- **`admins`** — human participants with per-platform and per-project roles. **Optional.** Omit entirely when using SSO (see [Future: SSO](#future-sso-for-human-users)); roles are then provided by IdP claims. +- **`projects`** — tenant definitions: which sites are enrolled (client-type entries), and (optionally) which admins have which roles. The `admins:` block inside each project is also omitted under SSO. + +This separation is intentional: `sites` and `projects.sites` form the **permanent skeleton** of the file. The `admins` sections are an **optional overlay** that exists today but disappears when SSO is introduced — with no restructuring of the rest of the file. + +```yaml +api_version: 4 + +# Infrastructure — always present, cert-based mTLS +sites: + server1.example.com: { type: server, org: nvidia } + hospital-a: { type: client, org: org_a } + hospital-b: { type: client, org: org_a } + hospital-c: { type: client, org: org_b } + +# Human admins — omit entirely when using SSO +admins: + platform-admin@nvidia.com: { org: nvidia, role: platform_admin } + trainer@org_a.com: { org: org_a } + viewer@org_b.com: { org: org_b } + +projects: + cancer-research: + sites: [hospital-a, hospital-b] + # Omit when using SSO (roles come from IdP claims) + admins: + trainer@org_a.com: lead + + multiple-sclerosis: + sites: [hospital-a, hospital-c] + admins: + trainer@org_a.com: member + viewer@org_b.com: lead +``` + +**SSO migration**: drop the top-level `admins:` block and the `admins:` entries inside each project. The rest of the file is unchanged: + +```yaml +api_version: 4 + +sites: + server1.example.com: { type: server, org: nvidia } + hospital-a: { type: client, org: org_a } + hospital-b: { type: client, org: org_a } + hospital-c: { type: client, org: org_b } + +projects: + cancer-research: + sites: [hospital-a, hospital-b] + multiple-sclerosis: + sites: [hospital-a, hospital-c] +``` + +### Certificate Changes + +Certs continue to encode identity (name, org) and role. **No change to cert format.** The `UNSTRUCTURED_NAME` role field remains populated and serves as the fallback for single-tenant mode. + +In multitenant mode (`api_version: 4`), per-project roles are resolved from the `ProjectRegistry` loaded from `project.yml` at server startup. The cert role is only used when no registry mapping exists (backward compat). + +### Startup Kit Changes + +- **Server startup kit** includes `project.yml` — the authoritative source for project definitions, client enrollment, and user roles +- **Admin startup kits** are unchanged (cert for identity; project membership is server-side knowledge) + +--- + +## Job Scheduler Changes + +The scheduler becomes project-aware: + +1. **Candidate filtering**: Only schedule jobs to sites enrolled in the job's project (client-type sites) +2. **Validation**: `deploy_map` sites must be a subset of the project's enrolled sites +3. **Quota/priority**: Deferred. K8s-level resource quotas per namespace may suffice initially. Future option: route different projects to different K8s scheduling queues via pod labels/nodeSelectors. + +--- + +## Runtime Isolation (ProdEnv) + +The project becomes a property of the job, and ProdEnv prepares the corresponding isolated environment. + +### Subprocess (Default — Single-Tenant Only) + +- Job workspace isolated to `///` (logical separation only) +- **No physical isolation**: same Linux user, shared `/tmp`, shared filesystem, shared GPU memory +- **Not suitable for multi-tenant deployments** — use K8s, Docker, or Slurm for cross-project isolation +- Retained for single-tenant and trusted environments (e.g., single org, development, POC) + +### Docker + +- Per-project volume mounts: each project's jobs mount a **different host directory** (e.g., `/data//`) as the workspace +- Per-container `/tmp`: each container gets its own tmpfs or bind mount — no shared host `/tmp` +- Per-project Docker network (no cross-project container communication) +- Container name includes project: `--` + +### Kubernetes (Primary Target) + +Clients participate in all their enrolled projects. **Data isolation is achieved by mounting project-scoped workspace volumes in each job pod.** The Flare client parent process runs in its own pod (or on the node) and does not mount project data volumes — it only orchestrates job pod creation. + +| Concern | Mechanism | +|---------|-----------| +| Namespace isolation | Deployment-defined strategy (recommended: one namespace per project; supported: shared namespace or per-job namespace) | +| Storage isolation | Workspace volume resolved by `(project, client, job pod namespace)` (not hostPath) | +| Temp directory isolation | Each pod gets its own `/tmp` via `emptyDir` — no shared host `/tmp` | +| Network isolation | NetworkPolicy scoped by project name | +| Resource limits | ResourceQuota policy per deployment strategy (deferred, see Scheduler) | +| Pod security | PodSecurityPolicy/Standards per namespace | + +Workspace volume naming/provisioning must remain project-aware and work with either shared namespaces or per-job namespaces. + +### Slurm + +- Per-project Slurm accounts/partitions +- Per-project storage paths +- Job submission includes `--account=` + +--- + +## FLARE API Changes + +- `Session` gains an optional `project` parameter (defaults to `None`) and `set_project()`/`list_projects()` methods +- `list_jobs` is filtered by active project and caller role (`project_admin`: all in project, `org_admin`: own-org, `lead`: own jobs, `member`: all in project) +- `get_system_info` returns only clients enrolled in the active project +- All job operations validate that the target job belongs to the active project + +--- + +## Audit Trail + +Every audit log entry gains a `project` field: + +``` +[2026-02-18 10:30:00] user=trainer@org_a.com project=cancer-research action=submit_job job_id=abc123 +[2026-02-18 10:31:00] user=trainer@org_a.com project=cancer-research action=list_jobs +``` + +Audit logs should be queryable per project for compliance. + +--- + +## Migration / Backward Compatibility + +1. **Phase 1 is ungated**: project plumbing (`project` argument + metadata propagation to launchers) is available independent of `api_version`. +2. **Feature gate for full multitenancy**: project registry, project-scoped RBAC, scheduler constraints, and job-store partitioning are enabled only when `project.yml` has `api_version: 4` with a `projects:` section. +3. **Default project**: all existing jobs, clients, and users are in the `default` project +4. **Cert role fallback**: if no project registry exists, fall back to cert-embedded role; if registry exists but user has no mapping, fallback applies only when active project is `default` +5. **API compatibility**: omitted `project` remains `None` (no default change) across phases +6. **Config version**: `api_version: 4` in `project.yml` signals full multi-project enforcement; version 3 continues to work as single-tenant + +### Release Transition Strategy (2.8 -> 2.9) + +1. **Upgrade to 2.8 (Phase 1 only)**: optional project tagging/plumbing is available, but no multitenant access-control or scheduler behavior changes are enabled. +2. **Upgrade to 2.9 with existing v3 deployments**: keep current `project.yml` (`api_version: 3`) and startup kits; system remains single-tenant/compatibility mode. +3. **Existing jobs continue to work**: legacy jobs remain at `jobs//` as `default`; no data migration is required. +4. **Activate full multi-project mode when ready**: deploy a v4 `project.yml` (`api_version: 4` + `projects:`) to server startup artifacts and restart server to load registry-backed project scoping. +5. **Provisioning impact**: no full reprovision is required; keep dynamic provisioning behavior by updating server-side artifacts and generating startup kits only for newly added or changed participants. + +--- + +## Design Decisions + +| # | Question | Decision | +|---|----------|----------| +| D1 | Can clients participate in multiple projects? | **Yes.** Clients participate in all enrolled projects simultaneously. Data isolation is physical: K8s mounts different PVs per project; Docker mounts different host directories. The Flare parent process does not access project data. | +| D2 | Project lifecycle management? | **Deferred.** Projects are defined at provisioning time in `project.yml`. Runtime project CRUD is not in scope for v1. | +| D3 | Per-project quota management? | **Deferred.** Rely on K8s ResourceQuota per namespace for now. Future: route projects to different K8s scheduling queues via pod labels. | +| D4 | `check_status` information leakage? | **Server has global knowledge, filtering the response is sufficient.** The server parent process knows about all clients and jobs; it filters responses to only include resources in the user's active project. No architectural change needed. | +| D5 | Server-side job store isolation? | **Server job pods must only access their project's data.** The server job process (running in K8s/Docker) must not mount the entire job store — only the project-partitioned slice for new-layout jobs (`jobs//...`). Legacy jobs remain at `jobs//` under `default`; they are served by the main server process for compatibility and are not mounted into new server job pods. Current `FilesystemStorage` will be replaced by a database or object store in the future, which will enforce project-scoped access natively. | +| D6 | Role storage: certs vs. server-side registry? | **Layered: registry overrides cert.** `project.yml` defines per-project roles; the server loads it at startup via `ProjectRegistry`. Certs continue to authenticate identity (name, org) and carry a role as fallback. No cert format change required. | +| D7 | How do shared clients know which project PV to mount? | **The launcher passes the project name to the client.** Job metadata carries the project; the server includes it when dispatching to clients. The client-side `K8sJobLauncher`/`DockerJobLauncher` uses the project name to select the correct PV/volume mount. | +| D8 | Cross-project isolation in subprocess mode? | **Subprocess mode is single-tenant/trusted only.** Only K8s, Docker, and Slurm launchers provide secure multi-tenant isolation (separate namespaces, volumes, `/tmp`). The default subprocess launcher offers no physical isolation and is only suitable for single-tenant or trusted environments. | +| D9 | Cross-project visibility for `platform_admin` job data? | **No.** `platform_admin` does not get cross-project job metadata/data visibility and there is no `list_jobs --all-projects` behavior in v1. If the same human also has a project-scoped role in the active project, only that project-scoped role grants job access. | +| D10 | Provisioning model at scale? | **Keep dynamic provisioning behavior.** Adding sites/users should not require reprovisioning all existing sites; update server-side config/startup artifacts and generate kits only for newly added or changed participants. | + +--- + +## Unresolved Questions + +1. **Shell-command replacement UX**: Parent-process shell commands are backend-dependent and cannot be relied on for job workspace access (notably in K8s, but this can happen in single-project setups too). The right policy and UX replacement need further study (for example log/artifact APIs vs pod-targeted debug workflows). + +--- + +## Future: SSO for Human Users + +The current design separates two kinds of participants that today are both managed via X.509 certs: + +- **Sites** (server, clients, relays) — infrastructure with stable identity, long-lived +- **Humans** (admins) — users who change roles, join/leave projects, need MFA + +In a future version, human authentication moves to a standard SSO system (OIDC/SAML) with short-lived tokens, while sites continue using mutual TLS with provisioned certs. + +| | Sites (v1 and future) | Humans (v1) | Humans (future) | +|---|---|---|---| +| **Authentication** | mTLS certs | mTLS certs | SSO (OIDC/SAML) tokens | +| **Identity source** | Cert CN + org | Cert CN + org | IdP claims | +| **Role source** | N/A | `project.yml` registry (cert fallback) | IdP claims or `project.yml` | +| **Lifecycle** | Provisioned, long-lived | Provisioned, long-lived | IdP-managed, dynamic | +| **Startup kit** | Yes (certs, config) | Yes (certs, config) | No — just a login URL | + +**Why this matters for v1 design decisions:** + +The server-side `ProjectRegistry` (loaded from `project.yml`) is the right abstraction because it decouples role resolution from the cert. Today the registry overrides the cert role; in the future, the registry (or IdP) replaces the cert entirely for humans. The same `ProjectRegistry` interface can be backed by `project.yml` now and by an IdP adapter later. + +This also means per-project startup kits for humans (alternative approach considered) would be a dead end — SSO eliminates admin certs entirely, so building around per-project certs for humans would be throwaway work. + +The v4 `project.yml` schema is designed with this migration in mind: the `admins:` section (top-level and per-project) is explicitly optional. A deployment using SSO simply omits it; the `sites:` and `projects:` skeleton is identical in both modes. No schema version bump or file restructuring is needed when migrating to SSO. + +--- + +## Implementation + +See [multiproject_implementation.md](multiproject_implementation.md) for the full implementation plan. + +--- + +## Phase 1: Minimal Project Plumbing + +Phase 1 delivers no access control, no job store partitioning, and no cert/registry changes. The sole goal is to thread the `project` name from user-facing APIs into the runtime launchers so K8s and Docker can mount the correct volume/directory. + +### Scope + +1. Add `project: Optional[str] = None` parameter to `ProdEnv`. +2. Pass `project` through to the job metadata at submission/clone time, with syntax validation before persistence. +3. `K8sJobLauncher` reads `project` from job metadata and selects the corresponding project workspace volume. +4. `DockerJobLauncher` reads `project` from job metadata and mounts `/data//` as the workspace volume. +5. No changes to authorization, job store paths, `project.yml`, scheduler, or any other component. + +### What this enables + +- Data scientists can tag jobs with a project and get physical data isolation on K8s/Docker immediately. +- Lays the plumbing for all subsequent phases without requiring a full multitenancy deployment. + +### What this does NOT do + +- No access control — any user can submit to any valid project name. +- No job store partitioning (`jobs//` path unchanged). +- No `project.yml` parsing or `ProjectRegistry`. +- No `set_project` / `list_projects` admin commands. +- Subprocess launcher unchanged (single-tenant/trusted only). diff --git a/nvflare/apis/job_def.py b/nvflare/apis/job_def.py index 56faae4fcb..be9f9b3062 100644 --- a/nvflare/apis/job_def.py +++ b/nvflare/apis/job_def.py @@ -76,6 +76,7 @@ class JobMetaKey(str, Enum): CUSTOM_PROPS = "custom_props" EDGE_METHOD = "edge_method" JOB_CLIENTS = "job_clients" # clients that participated the job + PROJECT = "project" def __repr__(self): return self.value diff --git a/nvflare/apis/utils/format_check.py b/nvflare/apis/utils/format_check.py index 41c11ea600..5f91584ac1 100644 --- a/nvflare/apis/utils/format_check.py +++ b/nvflare/apis/utils/format_check.py @@ -27,6 +27,7 @@ "email": r"^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}$", "org": r"^[A-Za-z0-9_]+$", "simple_name": r"^[A-Za-z0-9_]+$", + "project": r"^[a-z0-9](?:[a-z0-9-]{0,61}[a-z0-9])?$", } diff --git a/nvflare/app_opt/job_launcher/docker_launcher.py b/nvflare/app_opt/job_launcher/docker_launcher.py index 8779e4711d..649431f9ea 100644 --- a/nvflare/app_opt/job_launcher/docker_launcher.py +++ b/nvflare/app_opt/job_launcher/docker_launcher.py @@ -20,6 +20,7 @@ from nvflare.apis.event_type import EventType from nvflare.apis.fl_constant import FLContextKey, JobConstants from nvflare.apis.fl_context import FLContext +from nvflare.apis.job_def import JobMetaKey from nvflare.apis.job_launcher_spec import JobHandleSpec, JobLauncherSpec, JobReturnCode, add_launcher from nvflare.apis.workspace import Workspace from nvflare.utils.job_launcher_utils import extract_job_image, generate_client_command, generate_server_command @@ -104,6 +105,23 @@ def __init__(self, mount_path: str = "/workspace", network: str = "nvflare-netwo self.network = network self.timeout = timeout + def _resolve_docker_workspace(self, job_id: str, project) -> str: + docker_workspace = os.environ.get("NVFL_DOCKER_WORKSPACE") + if not docker_workspace: + self.logger.error(f"Failed to launch job {job_id}: NVFL_DOCKER_WORKSPACE is not set.") + return "" + + # Keep legacy jobs on the existing workspace root; only non-default projects + # get a project-specific subdirectory under the configured Docker workspace. + if isinstance(project, str) and project and project != "default": + docker_workspace = os.path.join(docker_workspace, project) + + if not os.path.isdir(docker_workspace): + self.logger.error(f"Failed to launch job {job_id}: Docker workspace does not exist: {docker_workspace}") + return "" + + return docker_workspace + def launch_job(self, job_meta: dict, fl_ctx: FLContext) -> JobHandleSpec: self.logger.debug("DockerJobLauncher start to launch job") job_image = extract_job_image(job_meta, fl_ctx.get_identity_name()) @@ -117,8 +135,13 @@ def launch_job(self, job_meta: dict, fl_ctx: FLContext) -> JobHandleSpec: command = f' /bin/bash -c "export PYTHONPATH={python_path};{cmd}"' self.logger.info(f"Launch image:{job_image}, run command: {command}") - docker_workspace = os.environ.get("NVFL_DOCKER_WORKSPACE") - self.logger.info(f"launch_job {job_id} in docker_workspace: {docker_workspace}") + project = job_meta.get(JobMetaKey.PROJECT.value, "") + docker_workspace = self._resolve_docker_workspace(job_id, project) + if not docker_workspace: + return None + + self.logger.info(f"launch_job {job_id} in docker_workspace: {docker_workspace} (project={project})") + docker_client = docker.from_env() try: container = docker_client.containers.run( diff --git a/nvflare/app_opt/job_launcher/k8s_launcher.py b/nvflare/app_opt/job_launcher/k8s_launcher.py index 19e6716bc2..9410cab4ec 100644 --- a/nvflare/app_opt/job_launcher/k8s_launcher.py +++ b/nvflare/app_opt/job_launcher/k8s_launcher.py @@ -228,6 +228,13 @@ def launch_job(self, job_meta: dict, fl_ctx: FLContext) -> JobHandleSpec: raise RuntimeError(f"missing {FLContextKey.JOB_PROCESS_ARGS} in FLContext") _, job_cmd = job_args[JobProcessArgs.EXE_MODULE] + # TODO: Make the K8s launcher project-aware with minimal code churn. + # The intended change is only to read the optional job_meta["project"] + # and use it to resolve project-specific Kubernetes settings before pod + # launch. That settings lookup may include workspace volume/path plus + # any other K8s deployment settings required for the selected project. + # Keep the existing launch flow unchanged; only the settings resolution + # should become project-aware. job_config = { "name": job_id, "image": job_image, diff --git a/nvflare/fuel/flare_api/flare_api.py b/nvflare/fuel/flare_api/flare_api.py index 14b5624228..5d9ddb6cc6 100644 --- a/nvflare/fuel/flare_api/flare_api.py +++ b/nvflare/fuel/flare_api/flare_api.py @@ -19,6 +19,7 @@ from nvflare.apis.fl_constant import AdminCommandNames from nvflare.apis.job_def import JobMetaKey +from nvflare.apis.utils.format_check import name_check from nvflare.apis.workspace import Workspace from nvflare.fuel.common.excepts import ConfigError from nvflare.fuel.hci.client.api import AdminAPI, APIStatus, ResultKey @@ -69,6 +70,7 @@ def __init__( startup_path: str, secure_mode: bool = True, debug: bool = False, + project: Optional[str] = None, ): """Initializes a session with the NVFLARE system. @@ -77,11 +79,11 @@ def __init__( startup_path (str): path to the provisioned startup kit, which contains endpoint of the system secure_mode (bool): whether to log in with secure mode debug (bool): turn on debug or not + project (Optional[str]): project name to tag submitted/cloned jobs; None keeps existing behavior """ assert isinstance(username, str), "username must be str" assert isinstance(startup_path, str), "startup_path must be str" assert os.path.isdir(startup_path), f"startup kit does not exist at {startup_path}" - workspace = Workspace(root_dir=startup_path) conf = secure_load_admin_config(workspace) admin_config = conf.get_admin_config() @@ -105,6 +107,11 @@ def __init__( ) self.upload_dir = upload_dir self.download_dir = download_dir + if project is not None: + err, reason = name_check(project, "project") + if err: + raise ValueError(reason) + self._project = project def close(self): """Close the session.""" @@ -209,7 +216,8 @@ def clone_job(self, job_id: str) -> str: """ self._validate_job_id(job_id) - result = self._do_command(AdminCommandNames.CLONE_JOB + " " + job_id) + props = {JobMetaKey.PROJECT.value: self._project} if self._project else None + result = self._do_command(AdminCommandNames.CLONE_JOB + " " + job_id, props=props) meta = result[ResultKey.META] job_id = meta.get(MetaKey.JOB_ID, None) info = meta.get(MetaKey.INFO, "") @@ -241,7 +249,8 @@ def submit_job(self, job_definition_path: str) -> str: else: raise InvalidJobDefinition(f"job_definition_path '{job_definition_path}' is not a valid folder") - result = self._do_command(AdminCommandNames.SUBMIT_JOB + " " + job_definition_path) + props = {JobMetaKey.PROJECT.value: self._project} if self._project else None + result = self._do_command(AdminCommandNames.SUBMIT_JOB + " " + job_definition_path, props=props) meta = result[ResultKey.META] job_id = meta.get(MetaKey.JOB_ID, None) if not job_id: @@ -935,13 +944,26 @@ def new_session( secure_mode: bool = True, debug: bool = False, timeout: float = 10.0, + project: Optional[str] = None, ) -> Session: - session = Session(username=username, startup_path=startup_kit_location, debug=debug, secure_mode=secure_mode) + session = Session( + username=username, + startup_path=startup_kit_location, + debug=debug, + secure_mode=secure_mode, + project=project, + ) session.try_connect(timeout) return session -def new_secure_session(username: str, startup_kit_location: str, debug: bool = False, timeout: float = 10.0) -> Session: +def new_secure_session( + username: str, + startup_kit_location: str, + debug: bool = False, + timeout: float = 10.0, + project: Optional[str] = None, +) -> Session: """Create a new secure FLARE API session with the NVFLARE system. Args: @@ -949,20 +971,27 @@ def new_secure_session(username: str, startup_kit_location: str, debug: bool = F startup_kit_location (str): path to the provisioned startup folder, the root admin dir containing the startup folder debug (bool): enable debug mode timeout (float): how long to try to establish the session, in seconds + project (Optional[str]): project name to tag submitted/cloned jobs Returns: a Session object """ - return new_session(username, startup_kit_location, True, debug, timeout) + return new_session(username, startup_kit_location, True, debug, timeout, project=project) -def new_insecure_session(startup_kit_location: str, debug: bool = False, timeout: float = 10.0) -> Session: +def new_insecure_session( + startup_kit_location: str, + debug: bool = False, + timeout: float = 10.0, + project: Optional[str] = None, +) -> Session: """Create a new insecure FLARE API session with the NVFLARE system. Args: startup_kit_location (str): path to the provisioned startup folder debug (bool): enable debug mode timeout (float): how long to try to establish the session, in seconds + project (Optional[str]): project name to tag submitted/cloned jobs Returns: a Session object @@ -970,5 +999,10 @@ def new_insecure_session(startup_kit_location: str, debug: bool = False, timeout """ return new_session( - username="", startup_kit_location=startup_kit_location, secure_mode=False, debug=debug, timeout=timeout + username="", + startup_kit_location=startup_kit_location, + secure_mode=False, + debug=debug, + timeout=timeout, + project=project, ) diff --git a/nvflare/private/fed/server/job_cmds.py b/nvflare/private/fed/server/job_cmds.py index 36058c5818..ab799f84d7 100644 --- a/nvflare/private/fed/server/job_cmds.py +++ b/nvflare/private/fed/server/job_cmds.py @@ -25,6 +25,7 @@ from nvflare.apis.job_def_manager_spec import JobDefManagerSpec, RunStatus from nvflare.apis.shareable import Shareable from nvflare.apis.storage import DATA, JOB_ZIP, META, META_JSON, WORKSPACE, WORKSPACE_ZIP, StorageSpec +from nvflare.apis.utils.format_check import name_check from nvflare.fuel.hci.conn import Connection from nvflare.fuel.hci.proto import ConfirmMethod, MetaKey, MetaStatusValue, make_meta from nvflare.fuel.hci.reg import CommandModule, CommandModuleSpec, CommandSpec @@ -53,8 +54,11 @@ JobMetaKey.MIN_CLIENTS.value, JobMetaKey.MANDATORY_CLIENTS.value, JobMetaKey.DATA_STORAGE_FORMAT.value, + JobMetaKey.PROJECT.value, } +PROJECT_CMD_PROP_KEY = JobMetaKey.PROJECT.value + def _create_list_job_cmd_parser(): parser = SafeArgumentParser(prog=AdminCommandNames.LIST_JOBS) @@ -78,6 +82,34 @@ def __init__(self): super().__init__() self.logger = get_obj_logger(self) + @staticmethod + def _add_project_to_meta(meta: dict, conn: Connection) -> bool: + """Validate optional project from command props and persist it into job metadata.""" + + cmd_props = conn.get_prop(ConnProps.CMD_PROPS) + project = "" + error = "" + + if isinstance(cmd_props, dict): + candidate = cmd_props.get(PROJECT_CMD_PROP_KEY) + if candidate: + if not isinstance(candidate, str): + error = f"project must be str but got {type(candidate)}" + else: + invalid, reason = name_check(candidate, "project") + if invalid: + error = reason + else: + project = candidate + + if error: + conn.append_error(error, meta=make_meta(MetaStatusValue.INVALID_JOB_DEFINITION, error)) + return False + + if project: + meta[JobMetaKey.PROJECT.value] = project + return True + def get_spec(self): return CommandModuleSpec( name="job_mgmt", @@ -502,6 +534,8 @@ def clone_job(self, conn: Connection, args: List[str]): job_meta[JobMetaKey.SUBMITTER_ORG.value] = conn.get_prop(ConnProps.USER_ORG) job_meta[JobMetaKey.SUBMITTER_ROLE.value] = conn.get_prop(ConnProps.USER_ROLE) job_meta[JobMetaKey.CLONED_FROM.value] = job_id + if not self._add_project_to_meta(job_meta, conn): + return meta = job_def_manager.clone(from_jid=job_id, meta=job_meta, fl_ctx=fl_ctx) new_job_id = meta.get(JobMetaKey.JOB_ID) @@ -583,6 +617,9 @@ def submit_job(self, conn: Connection, args: List[str]): f"job_def_manager in engine is not of type JobDefManagerSpec, but got {type(job_def_manager)}" ) + if not self._add_project_to_meta(meta, conn): + return + fl_ctx.set_prop(FLContextKey.JOB_META, meta, private=True, sticky=False) engine.fire_event(EventType.SUBMIT_JOB, fl_ctx) block_reason = fl_ctx.get_prop(FLContextKey.JOB_BLOCK_REASON) diff --git a/nvflare/recipe/prod_env.py b/nvflare/recipe/prod_env.py index de7dc6f9f9..030a7f6c8a 100644 --- a/nvflare/recipe/prod_env.py +++ b/nvflare/recipe/prod_env.py @@ -18,6 +18,7 @@ from pydantic import BaseModel, PositiveFloat, model_validator +from nvflare.apis.utils.format_check import name_check from nvflare.job_config.api import FedJob from nvflare.recipe.spec import ExecEnv from nvflare.recipe.utils import _collect_non_local_scripts @@ -34,11 +35,16 @@ class _ProdEnvValidator(BaseModel): startup_kit_location: str login_timeout: PositiveFloat = 5.0 username: str = DEFAULT_ADMIN_USER + project: Optional[str] = None @model_validator(mode="after") def check_startup_kit_location_exists(self) -> "_ProdEnvValidator": if not os.path.exists(self.startup_kit_location): raise ValueError(f"startup_kit_location path does not exist: {self.startup_kit_location}") + if self.project is not None: + err, reason = name_check(self.project, "project") + if err: + raise ValueError(reason) return self @@ -48,6 +54,7 @@ def __init__( startup_kit_location: str, login_timeout: float = 5.0, username: str = DEFAULT_ADMIN_USER, + project: Optional[str] = None, extra: Optional[dict] = None, ): """Production execution environment for submitting and monitoring NVFlare jobs. @@ -58,6 +65,7 @@ def __init__( startup_kit_location (str): Path to the admin's startup kit directory. login_timeout (float): Timeout (in seconds) for logging into the Flare API session. Must be > 0. username (str): Username to log in with. + project (Optional[str]): Project name to tag submitted/cloned jobs. extra: extra env info. """ super().__init__(extra) @@ -66,11 +74,13 @@ def __init__( startup_kit_location=startup_kit_location, login_timeout=login_timeout, username=username, + project=project, ) self.startup_kit_location = v.startup_kit_location self.login_timeout = v.login_timeout self.username = v.username + self.project = v.project self._session_manager = None # Lazy initialization def get_job_status(self, job_id: str) -> Optional[str]: @@ -103,6 +113,7 @@ def _get_session_manager(self): "username": self.username, "startup_kit_location": self.startup_kit_location, "timeout": self.login_timeout, + "project": self.project, } self._session_manager = SessionManager(session_params) return self._session_manager diff --git a/tests/unit_test/apis/utils/format_check_test.py b/tests/unit_test/apis/utils/format_check_test.py index e90b565503..824d37c010 100644 --- a/tests/unit_test/apis/utils/format_check_test.py +++ b/tests/unit_test/apis/utils/format_check_test.py @@ -55,3 +55,23 @@ def test_admin(self, name, err_value): assert err == err_value err, reason = name_check(name, "email") assert err == err_value + + @pytest.mark.parametrize( + "name, err_value", + [ + ["default", False], + ["cancer-research", False], + ["a", False], + ["a" * 63, False], + ["", True], + ["A", True], + ["abc_", True], + ["-abc", True], + ["abc-", True], + ["a" * 64, True], + ["with space", True], + ], + ) + def test_project(self, name, err_value): + err, reason = name_check(name, "project") + assert err == err_value diff --git a/tests/unit_test/app_opt/docker_launcher_test.py b/tests/unit_test/app_opt/docker_launcher_test.py new file mode 100644 index 0000000000..cd63793d24 --- /dev/null +++ b/tests/unit_test/app_opt/docker_launcher_test.py @@ -0,0 +1,116 @@ +# Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from unittest.mock import Mock + +from nvflare.apis.fl_constant import FLContextKey, JobConstants, ReservedKey +from nvflare.apis.fl_context import FLContext +from nvflare.apis.job_def import JobMetaKey +from nvflare.app_opt.job_launcher.docker_launcher import DockerJobHandle, DockerJobLauncher + + +class _DummyWorkspace: + def get_app_custom_dir(self, job_id): + return "" + + +class _DummyDockerLauncher(DockerJobLauncher): + def get_command(self, job_meta, fl_ctx) -> (str, str): + return "test-container", "python worker.py" + + +def _make_fl_ctx(): + fl_ctx = FLContext() + fl_ctx.set_prop(FLContextKey.WORKSPACE_OBJECT, _DummyWorkspace(), private=True, sticky=False) + fl_ctx.set_prop(ReservedKey.IDENTITY_NAME, "server", private=True, sticky=False) + return fl_ctx + + +def _make_job_meta(project=""): + job_meta = { + JobConstants.JOB_ID: "job-1", + JobMetaKey.DEPLOY_MAP.value: { + "app": [ + { + JobConstants.SITES: ["server"], + JobConstants.JOB_IMAGE: "nvflare:test", + } + ] + }, + } + if project: + job_meta[JobMetaKey.PROJECT.value] = project + return job_meta + + +def test_launch_job_returns_none_when_workspace_env_missing(monkeypatch): + launcher = _DummyDockerLauncher() + fl_ctx = _make_fl_ctx() + job_meta = _make_job_meta(project="cancer-research") + docker_from_env = Mock() + + monkeypatch.delenv("NVFL_DOCKER_WORKSPACE", raising=False) + monkeypatch.setattr("nvflare.app_opt.job_launcher.docker_launcher.docker.from_env", docker_from_env) + + handle = launcher.launch_job(job_meta, fl_ctx) + + assert handle is None + docker_from_env.assert_not_called() + + +def test_launch_job_returns_none_when_project_workspace_missing(monkeypatch, tmp_path): + launcher = _DummyDockerLauncher() + fl_ctx = _make_fl_ctx() + job_meta = _make_job_meta(project="cancer-research") + docker_from_env = Mock() + + workspace_root = tmp_path / "workspace" + workspace_root.mkdir() + monkeypatch.setenv("NVFL_DOCKER_WORKSPACE", str(workspace_root)) + monkeypatch.setattr("nvflare.app_opt.job_launcher.docker_launcher.docker.from_env", docker_from_env) + + handle = launcher.launch_job(job_meta, fl_ctx) + + assert handle is None + docker_from_env.assert_not_called() + + +def test_launch_job_uses_project_workspace_when_present(monkeypatch, tmp_path): + launcher = _DummyDockerLauncher() + fl_ctx = _make_fl_ctx() + job_meta = _make_job_meta(project="cancer-research") + + workspace_root = tmp_path / "workspace" + project_workspace = workspace_root / "cancer-research" + project_workspace.mkdir(parents=True) + + fake_container = Mock() + fake_container.id = "container-id" + fake_client = Mock() + fake_client.containers.run.return_value = fake_container + + monkeypatch.setenv("NVFL_DOCKER_WORKSPACE", str(workspace_root)) + monkeypatch.setattr("nvflare.app_opt.job_launcher.docker_launcher.docker.from_env", Mock(return_value=fake_client)) + monkeypatch.setattr(DockerJobHandle, "enter_states", Mock(return_value=True)) + + handle = launcher.launch_job(job_meta, fl_ctx) + + assert isinstance(handle, DockerJobHandle) + fake_client.containers.run.assert_called_once() + assert fake_client.containers.run.call_args.kwargs["volumes"] == { + str(project_workspace): { + "bind": launcher.mount_path, + "mode": "rw", + } + } diff --git a/tests/unit_test/fuel/flare_api/flare_api_project_test.py b/tests/unit_test/fuel/flare_api/flare_api_project_test.py new file mode 100644 index 0000000000..f2c9aa4714 --- /dev/null +++ b/tests/unit_test/fuel/flare_api/flare_api_project_test.py @@ -0,0 +1,77 @@ +# Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +from unittest.mock import patch + +from nvflare.fuel.flare_api.flare_api import Session, new_secure_session +from nvflare.fuel.hci.client.api import ResultKey +from nvflare.fuel.hci.proto import MetaKey + + +def _make_session_for_project(project): + session = Session.__new__(Session) + session.upload_dir = "/tmp" + session._project = project + return session + + +def test_submit_job_sends_project_cmd_props(): + session = _make_session_for_project("cancer-research") + captured = {} + + def _fake_do_command(command, enforce_meta=True, props=None): + captured["props"] = props + return {ResultKey.META: {MetaKey.JOB_ID: "job-1"}} + + session._do_command = _fake_do_command + with patch("os.path.isdir", return_value=True): + session.submit_job("/tmp/job") + + assert captured["props"] == {"project": "cancer-research"} + + +def test_clone_job_sends_project_cmd_props(): + session = _make_session_for_project("multiple-sclerosis") + captured = {} + + def _fake_do_command(command, enforce_meta=True, props=None): + captured["props"] = props + return {ResultKey.META: {MetaKey.JOB_ID: "job-2"}} + + session._do_command = _fake_do_command + session.clone_job("source-job") + + assert captured["props"] == {"project": "multiple-sclerosis"} + + +def test_submit_job_without_project_keeps_cmd_props_empty(): + session = _make_session_for_project(None) + captured = {} + + def _fake_do_command(command, enforce_meta=True, props=None): + captured["props"] = props + return {ResultKey.META: {MetaKey.JOB_ID: "job-3"}} + + session._do_command = _fake_do_command + with patch("os.path.isdir", return_value=True): + session.submit_job("/tmp/job") + + assert captured["props"] is None + + +def test_new_secure_session_forwards_project(): + with patch("nvflare.fuel.flare_api.flare_api.new_session") as mock_new_session: + new_secure_session("admin@nvidia.com", "/tmp/kit", project="cancer-research") + _, kwargs = mock_new_session.call_args + assert kwargs["project"] == "cancer-research" diff --git a/tests/unit_test/private/fed/server/job_cmds_test.py b/tests/unit_test/private/fed/server/job_cmds_test.py index a5b4290495..46ab57fce7 100644 --- a/tests/unit_test/private/fed/server/job_cmds_test.py +++ b/tests/unit_test/private/fed/server/job_cmds_test.py @@ -16,7 +16,12 @@ import pytest -from nvflare.private.fed.server.job_cmds import _create_list_job_cmd_parser +from nvflare.apis.event_type import EventType +from nvflare.apis.fl_constant import FLContextKey +from nvflare.apis.job_def import JobMetaKey +from nvflare.fuel.hci.server.constants import ConnProps +from nvflare.private.fed.server import job_cmds as job_cmds_module +from nvflare.private.fed.server.job_cmds import JobCommandModule, _create_list_job_cmd_parser TEST_CASES = [ ( @@ -41,3 +46,112 @@ def test_parse_args(self, args: list[str], expected_args): parser = _create_list_job_cmd_parser() parsed_args = parser.parse_args(args) assert parsed_args == expected_args + + +class _MockConnection: + def __init__(self, cmd_props=None, app_ctx=None, props=None): + self._props = dict(props or {}) + self._props.setdefault(ConnProps.CMD_PROPS, cmd_props) + self.app_ctx = app_ctx + self.errors = [] + self.strings = [] + self.successes = [] + + def get_prop(self, key, default=None): + return self._props.get(key, default) + + def append_error(self, msg, meta=None): + self.errors.append((msg, meta)) + + def append_string(self, msg, meta=None): + self.strings.append((msg, meta)) + + def append_success(self, msg, meta=None): + self.successes.append((msg, meta)) + + +class TestProjectCmdProps: + @pytest.mark.parametrize( + "cmd_props, expected_meta", + [ + (None, {}), + ("not-a-dict", {}), + ({}, {}), + ({"project": ""}, {}), + ({"project": "cancer-research"}, {"project": "cancer-research"}), + ({"project": "default"}, {"project": "default"}), + ], + ) + def test_add_project_to_meta(self, cmd_props, expected_meta): + conn = _MockConnection(cmd_props=cmd_props) + meta = {} + + assert JobCommandModule._add_project_to_meta(meta, conn) is True + assert meta == expected_meta + assert conn.errors == [] + + @pytest.mark.parametrize("project", [123, "Bad Project", " cancer-research ", "../escape"]) + def test_add_project_to_meta_rejects_invalid_values(self, project): + conn = _MockConnection(cmd_props={"project": project}) + meta = {} + + assert JobCommandModule._add_project_to_meta(meta, conn) is False + assert meta == {} + assert len(conn.errors) == 1 + + +class _FakeJobMetaValidator: + def validate(self, folder_name, zip_file_name): + assert folder_name == "job_folder" + assert zip_file_name == "job.zip" + return True, "", {} + + +class _FakeJobDefManager: + def __init__(self): + self.created_meta = None + + def create(self, meta, uploaded_content, fl_ctx): + self.created_meta = dict(meta) + result = dict(meta) + result[JobMetaKey.JOB_ID.value] = "new-job-id" + return result + + +class _FakeEngine: + def __init__(self): + self.job_def_manager = _FakeJobDefManager() + self.submit_event_meta = None + + def new_context(self): + from nvflare.apis.fl_context import FLContext + + return FLContext() + + def fire_event(self, event_type, fl_ctx): + assert event_type == EventType.SUBMIT_JOB + self.submit_event_meta = dict(fl_ctx.get_prop(FLContextKey.JOB_META, {})) + + +def test_submit_job_exposes_project_in_submit_event(monkeypatch): + monkeypatch.setattr(job_cmds_module, "JobMetaValidator", _FakeJobMetaValidator) + monkeypatch.setattr(job_cmds_module, "JobDefManagerSpec", object) + + engine = _FakeEngine() + conn = _MockConnection( + app_ctx=engine, + props={ + ConnProps.FILE_LOCATION: "job.zip", + ConnProps.CMD_PROPS: {"project": "cancer-research"}, + ConnProps.USER_NAME: "submitter", + ConnProps.USER_ORG: "org", + ConnProps.USER_ROLE: "role", + }, + ) + + JobCommandModule().submit_job(conn, ["submit_job", "job_folder"]) + + assert conn.errors == [] + assert len(conn.successes) == 1 + assert engine.submit_event_meta == {JobMetaKey.PROJECT.value: "cancer-research"} + assert engine.job_def_manager.created_meta[JobMetaKey.PROJECT.value] == "cancer-research" diff --git a/tests/unit_test/recipe/prod_env_test.py b/tests/unit_test/recipe/prod_env_test.py new file mode 100644 index 0000000000..64ac03397e --- /dev/null +++ b/tests/unit_test/recipe/prod_env_test.py @@ -0,0 +1,35 @@ +# Copyright (c) 2026, NVIDIA CORPORATION. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import tempfile +from unittest.mock import patch + +import pytest + +from nvflare.recipe.prod_env import ProdEnv + + +def test_prod_env_session_manager_passes_project(): + with tempfile.TemporaryDirectory() as startup_kit_location: + env = ProdEnv(startup_kit_location=startup_kit_location, project="cancer-research") + with patch("nvflare.recipe.prod_env.SessionManager") as mock_session_manager: + env._get_session_manager() + session_params = mock_session_manager.call_args[0][0] + assert session_params["project"] == "cancer-research" + + +def test_prod_env_rejects_invalid_project_name(): + with tempfile.TemporaryDirectory() as startup_kit_location: + with pytest.raises(ValueError): + ProdEnv(startup_kit_location=startup_kit_location, project="Bad Project")