You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file provides guidance to AI agents when working with the OSMO codebase.
Overview
OSMO is a workflow orchestration platform for Physical AI, managing heterogeneous Kubernetes clusters for training, simulation, and edge compute workloads.
Workflow Requirements
Before making any code changes in this repo, you MUST:
Explore first: Use the Codebase Structure section below to orient yourself, then read relevant source files before proposing changes. Read existing implementations, tests, and related modules. Never modify code you haven't read.
Plan before implementing: For any non-trivial change (more than a simple one-line fix), create an explicit plan that identifies:
Which files need to change and why
How the change fits with existing patterns in the codebase
What tests exist and what new tests are needed
Any cross-cutting concerns (e.g., auth, storage backends, IPC protocols)
A verification plan: how to confirm the change works (e.g., specific tests to run, build commands, manual checks)
Check for downstream impact: This is a multi-service platform — changes in shared libraries (lib/, utils/) can affect multiple services. Grep for usages before modifying shared code.
Verify after implementation: After completing changes, execute the verification plan — run the relevant tests/builds and confirm they pass before claiming the work is done. Never assert success without evidence.
Simplify before committing: Review your changes for unnecessary complexity, redundancy, and over-engineering before committing. Prefer the simplest solution that meets the requirements.
Update documentation: If adding, removing, or renaming a service, module, or major component, update the "Codebase Structure" section in this file as part of the same change.
Team Guidelines
Follow existing code patterns and conventions in the codebase
Use Bazel for builds and testing
Go code follows standard Go conventions
Write self-describing code; avoid redundant comments that simply restate what the code does
Copyright headers must keep "All rights reserved." on the same line as "NVIDIA CORPORATION & AFFILIATES"
If copyright lines exceed 100 characters, add # pylint: disable=line-too-long comment instead of breaking into multiple lines
Python Coding Standards
Import Statements
All imports must be at the top level of the module
Place all imports at the top of the file after the module docstring
No exceptions: Imports inside functions are not allowed
If circular dependencies exist, the code must be refactored to remove them
Common refactoring strategies:
Extract shared code into a separate module
Use dependency inversion (import abstractions, not concrete implementations)
Restructure module hierarchy to break the cycle
Use late binding or forward references for type hints (PEP 563)
Variable Naming
Do not use abbreviations in variable names unless they are well-understood abbreviations or common conventions
Good: topology_key, config, i (iterator), x, y, z (coordinates)
osmo_ctrl — Orchestrates workflow execution. WebSocket to workflow service. Unix socket to osmo_user. Manages data download/upload, barriers for multi-task sync, port forwarding.
runtime/cmd/user/
osmo_user — Executes user commands with PTY. Streams stdout/stderr to ctrl. Handles checkpointing (periodic uploads).
runtime/cmd/rsync/
osmo_rsync — Rsync daemon with bandwidth limiting.
Go Runtime Packages (runtime/pkg/)
Package
Purpose
args/
CLI flag parsing for ctrl and user containers.
messages/
IPC message protocol between containers (exec lifecycle, log streaming, barriers).
API layer: OpenAPI-generated types (lib/api/generated.ts — DO NOT EDIT) + adapter layer (lib/api/adapter/) that bridges backend quirks to UI expectations