Skip to content

Latest commit

 

History

History
316 lines (235 loc) · 10 KB

File metadata and controls

316 lines (235 loc) · 10 KB

Who This Is For

  • AI agents: Automate repository tasks with minimal context
  • Contributors: Humans using AI assistants or working directly
  • Maintainers: Ensure assistants follow project conventions and CI rules

Agent Behavior Policy

AI agents should:

  • Make atomic, minimal, and reversible changes.
  • Prefer local analysis (uv run, make verify, pytest) before proposing commits.
  • NEVER modify configuration, CI/CD, or release automation unless explicitly requested.
  • Avoid non-deterministic code or random seeds without fixtures.
  • Use AGENTS.md and Makefile as the source of truth for development commands.

Agents must NOT:

  • Bypass tests or linters
  • Introduce dependencies without updating pyproject.toml
  • Generate or commit large autogenerated files

Context Awareness

Before writing code, agents should:

  • Read docstrings and existing test cases for pattern alignment
  • Match import patterns from neighboring files
  • Preserve existing logging and error-handling conventionso

Repository Map

.github/                         # GitHub actions for CI/CD
docs/                            # Kubeflow SDK documentation
examples/                        # Kubeflow SDK examples
kubeflow/                        # Main Python package
├── common/                        # Shared utilities and types across all projects
|
├── trainer/                       # Kubeflow Trainer
│   ├── api/                         # TrainerClient - main user interface
│   ├── backends/                    # Execution backend implementations
│   │   ├── kubernetes/                # Kubernetes backend
│   │   │   ├── backend.py
│   │   ├── container/                 # Container backend for local development
│   │   │   ├── backend.py
│   │   │   └── adapters/                # Docker & Podman adapter implementations
│   │   └── localprocess/              # Subprocess backend for quick prototyping
│   ├── constants/                   # Common trainer constants and defaults
│   ├── options/                     # Backend configuration options (KubernetesOptions, etc.)
│   ├── types/                       # Common trainer types (e.g. TrainJob, CustomTrainer, BuiltinTrainer)
|
├── optimizer/                   # Kubeflow Optimizer
│   ├── api/                       # OptimizerClient - main user interface
│   ├── backends/                  # Execution backend implementations
│   │   └── kubernetes/              # Kubernetes backend
│   ├── types/                     # Common optimizer types (e.g. OptimizationJob, Search)
│   └── constants/                 # Common optimizer constants and defaults
|
└── hub/                         # Kubeflow Hub
    └── api/                       # ModelRegistryClient - main user interface

Environment & Tooling

  • Package manager: uv (creates .venv automatically via targets)
  • Lint/format: ruff (isort integrated)
  • Tests: pytest with coverage
  • Build: Hatchling (optional uv build)
  • Pre-commit: Config provided and enforced in CI

Commands

Setup:

make install-dev              # Install uv, create .venv, sync deps

Verify (CI parity):

make verify                   # Runs ruff check --show-fixes and ruff format --check

Testing:

make test-python              # All unit tests + coverage (HTML by default)
make test-python report=xml   # XML coverage report
uv run pytest -q kubeflow/trainer/utils/utils_test.py                    # One file
uv run pytest -q kubeflow/trainer/utils/utils_test.py::test_name -k "pattern"  # One test
uv run coverage run -m pytest <path> && uv run coverage report          # Ad-hoc coverage

Local lint/format:

uv run ruff check --fix .     # Fix lint issues
uv run ruff format kubeflow   # Format code

Type checking:

uv run mypy kubeflow          # Run type checker

Pre-commit:

uv run pre-commit install                    # Install hooks
uv run pre-commit run --all-files           # Run all hooks

Development Workflow for AI Agents

Preferred commands: use uv run ... to ensure tool consistency and .venv usage

Before making changes:

  1. Read existing code patterns and docstrings for alignment
  2. Follow the Core Development Principles below
  3. Run validation commands before proposing changes

Validation before proposing changes:

  • Lint/format: make verify
  • Tests: make test-python or targeted pytest invocations
  • Type checking: uv run mypy kubeflow (if available)

Commit/PR hygiene:

  • Follow Conventional Commits in titles and messages
  • Include rationale ("why") in commit messages/PR descriptions
  • Do not push secrets or change git config
  • Scope discipline: only modify files relevant to the task; keep diffs minimal

Core Development Principles

1. Maintain Stable Public Interfaces ⚠️ CRITICAL

Always attempt to preserve function signatures, argument positions, and names for exported/public methods.

Bad - Breaking Change:

def train_model(id, verbose=False):  # Changed from `model_id`
    pass

Good - Stable Interface:

def train_model(model_id: str, verbose: bool = False) -> TrainingResult:
    """Train model with optional verbose output."""
    pass

Before making ANY changes to public APIs:

  • Check if the function/class is exported in __init__.py
  • Look for existing usage patterns in tests and examples
  • Use keyword-only arguments for new parameters: *, new_param: str = "default"
  • Mark experimental features clearly with docstring warnings

2. Code Quality Standards

All Python code MUST include type hints and return types.

Bad:

def p(u, d):
    return [x for x in u if x not in d]

Good:

def filter_completed_jobs(jobs: list[str], completed: set[str]) -> list[str]:
    """Filter out jobs that are already completed.

    Args:
        jobs: List of job identifiers to filter.
        completed: Set of completed job identifiers.

    Returns:
        List of jobs that are not yet completed.
    """
    return [job for job in jobs if job not in completed]

Style Requirements:

  • Line length 100, Python 3.10 target, double quotes, spaces indent
  • Imports: isort via ruff; first-party is kubeflow; prefer absolute imports
  • Naming: pep8-naming; functions/vars snake_case, classes PascalCase, constants UPPER_SNAKE_CASE; prefix private with _
  • Use descriptive, self-explanatory variable names. Avoid overly short or cryptic identifiers
  • Break up complex functions (>20 lines) into smaller, focused functions where it makes sense
  • Follow existing patterns in the codebase you're modifying

3. Testing Requirements

Every new feature or bugfix MUST be covered by unit tests.

Test Organization:

  • Unit tests: kubeflow/trainer/**/*_test.py (no network calls allowed)
  • Use pytest as the testing framework
  • See kubeflow/trainer/test/common.py for fixtures and patterns
  • Unit test structure must be consistent between each other (see kubeflow/trainer/backends/kubernetes/backend_test.py for reference)

Test Structure Pattern (following backend_test.py):

  • Use TestCase dataclass for parametrized tests
  • Include name, expected_status, config, expected_output/error fields
  • Print test execution status for debugging
  • Handle both success and exception cases in the same test function
  • Use pytest.mark.parametrize with TestCase dataclass for multiple test scenarios:
@pytest.mark.parametrize(
    "test_case",
    [
        TestCase(
            name="valid flow with all defaults",
            expected_status=SUCCESS,
            config={"name": "job-1"},
            expected_output=["job-1"],
        ),
        TestCase(
            name="empty jobs list",
            expected_status=SUCCESS,
            config={"name": "empty"},
            expected_output=[],
        ),
    ],
)
def test_filter_jobs_parametrized(test_case):
    """Test job filtering with multiple scenarios."""
    result = filter_jobs(**test_case.config)
    assert result == test_case.expected_output

4. Security and Risk Assessment

Security Checklist:

  • No eval(), exec(), or pickle on user-controlled input
  • Proper exception handling (no bare except:) and use descriptive error messages
  • Remove unreachable/commented code before committing
  • Ensure proper resource cleanup (file handles, connections)
  • No secrets in code, logs, or examples

Bad:

def load_config(path):
    with open(path) as f:
        return eval(f.read())  # ⚠️ Never eval user input

Good:

import yaml

def load_config(path: str) -> dict:
    """Load configuration from YAML file."""
    with open(path, 'r') as f:
        return yaml.safe_load(f)

5. Documentation Standards

Use Google-style docstrings with Args section for all public functions.

Insufficient Documentation:

def submit_job(name, config):
    """Submit a job."""

Complete Documentation:

def submit_job(name: str, config: dict, *, priority: str = "normal") -> str:
    """Submit a training job with specified configuration.

    Args:
        name: The job name identifier.
        config: Job configuration dictionary.
        priority: Job priority level ('low', 'normal', 'high').

    Returns:
        Job ID string for tracking the submitted job.

    Raises:
        InvalidConfigError: If the configuration is invalid.
        ResourceUnavailableError: If required resources are not available.
    """

Documentation Guidelines:

  • Types go in function signatures, NOT in docstrings
  • Focus on "why" rather than "what" in descriptions
  • Document all parameters, return values, and exceptions
  • Keep descriptions concise but clear
  • Use Pydantic v2 models in kubeflow.trainer.types for schemas