kubeflow-sdk/AGENTS.md at main · opendatahub-io/kubeflow-sdk

Who This Is For

AI agents: Automate repository tasks with minimal context
Contributors: Humans using AI assistants or working directly
Maintainers: Ensure assistants follow project conventions and CI rules

Agent Behavior Policy

AI agents should:

Make atomic, minimal, and reversible changes.
Prefer local analysis (uv run, make verify, pytest) before proposing commits.
NEVER modify configuration, CI/CD, or release automation unless explicitly requested.
Avoid non-deterministic code or random seeds without fixtures.
Use AGENTS.md and Makefile as the source of truth for development commands.

Agents must NOT:

Bypass tests or linters
Introduce dependencies without updating pyproject.toml
Generate or commit large autogenerated files

Context Awareness

Before writing code, agents should:

Read docstrings and existing test cases for pattern alignment
Match import patterns from neighboring files
Preserve existing logging and error-handling conventionso

Repository Map

.github/                         # GitHub actions for CI/CD
docs/                            # Kubeflow SDK documentation
examples/                        # Kubeflow SDK examples
kubeflow/                        # Main Python package
├── common/                        # Shared utilities and types across all projects
|
├── trainer/                       # Kubeflow Trainer
│   ├── api/                         # TrainerClient - main user interface
│   ├── backends/                    # Execution backend implementations
│   │   ├── kubernetes/                # Kubernetes backend
│   │   │   ├── backend.py
│   │   ├── container/                 # Container backend for local development
│   │   │   ├── backend.py
│   │   │   └── adapters/                # Docker & Podman adapter implementations
│   │   └── localprocess/              # Subprocess backend for quick prototyping
│   ├── constants/                   # Common trainer constants and defaults
│   ├── options/                     # Backend configuration options (KubernetesOptions, etc.)
│   ├── types/                       # Common trainer types (e.g. TrainJob, CustomTrainer, BuiltinTrainer)
|
├── optimizer/                   # Kubeflow Optimizer
│   ├── api/                       # OptimizerClient - main user interface
│   ├── backends/                  # Execution backend implementations
│   │   └── kubernetes/              # Kubernetes backend
│   ├── types/                     # Common optimizer types (e.g. OptimizationJob, Search)
│   └── constants/                 # Common optimizer constants and defaults
|
└── hub/                         # Kubeflow Hub
    └── api/                       # ModelRegistryClient - main user interface

Environment & Tooling

Package manager: uv (creates .venv automatically via targets)
Lint/format: ruff (isort integrated)
Tests: pytest with coverage
Build: Hatchling (optional uv build)
Pre-commit: Config provided and enforced in CI

Commands

Setup:

make install-dev              # Install uv, create .venv, sync deps

Verify (CI parity):

make verify                   # Runs ruff check --show-fixes and ruff format --check

Testing:

make test-python              # All unit tests + coverage (HTML by default)
make test-python report=xml   # XML coverage report
uv run pytest -q kubeflow/trainer/utils/utils_test.py                    # One file
uv run pytest -q kubeflow/trainer/utils/utils_test.py::test_name -k "pattern"  # One test
uv run coverage run -m pytest <path> && uv run coverage report          # Ad-hoc coverage

Local lint/format:

uv run ruff check --fix .     # Fix lint issues
uv run ruff format kubeflow   # Format code

Type checking:

uv run mypy kubeflow          # Run type checker

Pre-commit:

uv run pre-commit install                    # Install hooks
uv run pre-commit run --all-files           # Run all hooks

Development Workflow for AI Agents

Preferred commands: use uv run ... to ensure tool consistency and .venv usage

Before making changes:

Read existing code patterns and docstrings for alignment
Follow the Core Development Principles below
Run validation commands before proposing changes

Validation before proposing changes:

Lint/format: make verify
Tests: make test-python or targeted pytest invocations
Type checking: uv run mypy kubeflow (if available)

Commit/PR hygiene:

Follow Conventional Commits in titles and messages
Include rationale ("why") in commit messages/PR descriptions
Do not push secrets or change git config
Scope discipline: only modify files relevant to the task; keep diffs minimal

Core Development Principles

1. Maintain Stable Public Interfaces ⚠️ CRITICAL

Always attempt to preserve function signatures, argument positions, and names for exported/public methods.

❌ Bad - Breaking Change:

def train_model(id, verbose=False):  # Changed from `model_id`
    pass

✅ Good - Stable Interface:

def train_model(model_id: str, verbose: bool = False) -> TrainingResult:
    """Train model with optional verbose output."""
    pass

Before making ANY changes to public APIs:

Check if the function/class is exported in __init__.py
Look for existing usage patterns in tests and examples
Use keyword-only arguments for new parameters: *, new_param: str = "default"
Mark experimental features clearly with docstring warnings

2. Code Quality Standards

All Python code MUST include type hints and return types.

❌ Bad:

def p(u, d):
    return [x for x in u if x not in d]

✅ Good:

def filter_completed_jobs(jobs: list[str], completed: set[str]) -> list[str]:
    """Filter out jobs that are already completed.

    Args:
        jobs: List of job identifiers to filter.
        completed: Set of completed job identifiers.

    Returns:
        List of jobs that are not yet completed.
    """
    return [job for job in jobs if job not in completed]

Style Requirements:

Line length 100, Python 3.10 target, double quotes, spaces indent
Imports: isort via ruff; first-party is kubeflow; prefer absolute imports
Naming: pep8-naming; functions/vars snake_case, classes PascalCase, constants UPPER_SNAKE_CASE; prefix private with _
Use descriptive, self-explanatory variable names. Avoid overly short or cryptic identifiers
Break up complex functions (>20 lines) into smaller, focused functions where it makes sense
Follow existing patterns in the codebase you're modifying

3. Testing Requirements

Every new feature or bugfix MUST be covered by unit tests.

Test Organization:

Unit tests: kubeflow/trainer/**/*_test.py (no network calls allowed)
Use pytest as the testing framework
See kubeflow/trainer/test/common.py for fixtures and patterns
Unit test structure must be consistent between each other (see kubeflow/trainer/backends/kubernetes/backend_test.py for reference)

Test Structure Pattern (following backend_test.py):

Use TestCase dataclass for parametrized tests
Include name, expected_status, config, expected_output/error fields
Print test execution status for debugging
Handle both success and exception cases in the same test function
Use pytest.mark.parametrize with TestCase dataclass for multiple test scenarios:

@pytest.mark.parametrize(
    "test_case",
    [
        TestCase(
            name="valid flow with all defaults",
            expected_status=SUCCESS,
            config={"name": "job-1"},
            expected_output=["job-1"],
        ),
        TestCase(
            name="empty jobs list",
            expected_status=SUCCESS,
            config={"name": "empty"},
            expected_output=[],
        ),
    ],
)
def test_filter_jobs_parametrized(test_case):
    """Test job filtering with multiple scenarios."""
    result = filter_jobs(**test_case.config)
    assert result == test_case.expected_output

4. Security and Risk Assessment

Security Checklist:

No eval(), exec(), or pickle on user-controlled input
Proper exception handling (no bare except:) and use descriptive error messages
Remove unreachable/commented code before committing
Ensure proper resource cleanup (file handles, connections)
No secrets in code, logs, or examples

❌ Bad:

def load_config(path):
    with open(path) as f:
        return eval(f.read())  # ⚠️ Never eval user input

✅ Good:

import yaml

def load_config(path: str) -> dict:
    """Load configuration from YAML file."""
    with open(path, 'r') as f:
        return yaml.safe_load(f)

5. Documentation Standards

Use Google-style docstrings with Args section for all public functions.

❌ Insufficient Documentation:

def submit_job(name, config):
    """Submit a job."""

✅ Complete Documentation:

def submit_job(name: str, config: dict, *, priority: str = "normal") -> str:
    """Submit a training job with specified configuration.

    Args:
        name: The job name identifier.
        config: Job configuration dictionary.
        priority: Job priority level ('low', 'normal', 'high').

    Returns:
        Job ID string for tracking the submitted job.

    Raises:
        InvalidConfigError: If the configuration is invalid.
        ResourceUnavailableError: If required resources are not available.
    """

Documentation Guidelines:

Types go in function signatures, NOT in docstrings
Focus on "why" rather than "what" in descriptions
Document all parameters, return values, and exceptions
Keep descriptions concise but clear
Use Pydantic v2 models in kubeflow.trainer.types for schemas

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Who This Is For

Agent Behavior Policy

Context Awareness

Repository Map

Environment & Tooling

Commands

Development Workflow for AI Agents

Core Development Principles

1. Maintain Stable Public Interfaces ⚠️ CRITICAL

2. Code Quality Standards

3. Testing Requirements

4. Security and Risk Assessment

5. Documentation Standards

FilesExpand file tree

AGENTS.md

Latest commit

History

AGENTS.md

File metadata and controls

Who This Is For

Agent Behavior Policy

Context Awareness

Repository Map

Environment & Tooling

Commands

Development Workflow for AI Agents

Core Development Principles

1. Maintain Stable Public Interfaces ⚠️ CRITICAL

2. Code Quality Standards

3. Testing Requirements

4. Security and Risk Assessment

5. Documentation Standards