Skip to content

feat: multi-project YAML config file with single-process orchestration #121

@qinqon

Description

@qinqon

Summary

Replace the single-repo-per-process model with a YAML config file (--config <path>) that defines multiple projects in one oompa process. Each project/role runs as an independent goroutine with its own Agent and structured logger. Existing CLI flags preserved for single-repo use.

Problem

Oompa currently requires one process per repo, configured entirely via CLI flags and env vars. Running 5+ projects means managing 5+ systemd units with overlapping config, making it hard to add/remove projects, audit the full configuration, or understand what oompa is doing at a glance.

Config File Shape

agent: opencode
agent-model: google-vertex-anthropic/claude-opus-4-6@default
poll-interval: 2m
log-level: debug
exit-on-new-version: qinqon/oompa

projects:
  - repo: ovn-kubernetes/ovn-kubernetes
    create-flaky-issues: true
    flaky-label: kind/ci-flake
    prs:
      watch: [6252, 6229, 6118, 6306]
      reactions: [ci, conflicts, rebase]

  - repo: openperouter/openperouter
    prs:
      watch: [313, 304]
      reactions: [ci, conflicts, rebase]
      skip-comment: [ci-unrelated, ci-infrastructure]

  - repo: nmstate/kubernetes-nmstate
    fork: qinqon/kubernetes-nmstate
    issues:
      only-assigned: true
    triage:
      jobs:
        - https://prow.example.com/...
      schedule: "09:00 Europe/Madrid"
      create-flaky-issues: true
      flaky-label: ci-flake

  - repo: openshift/hypershift
    prs:
      watch: [8365]
      reactions: [ci, conflicts, rebase]

  - repo: qinqon/oompa
    issues:
      label: good-for-ai

Architecture

Based on research of similar tools (Atlantis, Renovate, Prow), the architecture follows the Atlantis model: single Go binary, all repos, goroutine-per-role.

Goroutine Layout

main goroutine (orchestrator, signal handling, version check)
├── goroutine: ovn-kubernetes/ovn-kubernetes [prs]
├── goroutine: openperouter/openperouter [prs]
├── goroutine: openshift/hypershift [prs]
├── goroutine: nmstate/kubernetes-nmstate [issues]
├── goroutine: nmstate/kubernetes-nmstate [triage]
└── goroutine: qinqon/oompa [issues]
  • 1 goroutine per project/role (mirrors today's systemd units)
  • PRs within a project processed sequentially (no worker pool -- YAGNI)
  • Each goroutine has its own Agent, WorktreeManager, scoped logger, and poll loop
  • Shared GitHub client and auth across all goroutines

Structured Logging

Every log line includes structured slog fields for filtering:

  • Always present: project (owner/repo), role (prs/issues/triage)
  • Per-role context: watch_prs, triage_jobs, label
  • Per-operation: pr, issue, check, sha

Example: time=... level=INFO msg="CI failing" project=ovn-kubernetes/ovn-kubernetes role=prs pr=6118 check="e2etests (operator)"

Filterable with journalctl --user -u oompa | grep 'project=ovn'.

Graceful Shutdown (two-signal pattern)

  • First SIGINT/SIGTERM: cancels context → all goroutines finish their current poll cycle → WaitGroup completes → clean exit
  • Second SIGINT/SIGTERM: force exit immediately (escape hatch for stuck goroutines)
  • --exit-on-new-version: detects new version → calls cancel() → same graceful flow → systemd restarts with new binary
  • Per-goroutine panic recovery: log + continue others, don't crash all projects

CLI Behavior

  • --config <path> enables multi-project mode; per-repo CLI flags are ignored
  • Without --config, existing CLI flag behavior is preserved (single-repo mode, backward compatible)
  • --dry-run, --one-shot still work as global overrides

Implementation Units

U1. Define YAML config types and parser

Add ProjectConfig, RoleConfig, FileConfig structs, YAML unmarshaling, LoadConfig(path) function. Add gopkg.in/yaml.v3 dependency. Validation of required fields and valid values.

U2. Composite state key for cross-repo safety

Change State.ActiveIssues key from int to string (owner/repo#number). State is rebuilt from GitHub on startup so no migration needed.

U3. Remove max-workers

Strip MaxWorkers from Config, CLI flag, and worker pool in the loop. Each project runs sequentially within its goroutine. If concurrency limits are ever needed, a simple semaphore (~10 lines) can be added later.

U4. Per-role structured logging

Each Agent gets a scoped logger with slog.With("project", ..., "role", ...). Ensure pr, issue, check, sha are consistent slog fields across all log sites.

U5. Multi-project orchestrator with graceful shutdown

Main orchestrator: load YAML, create Agent per project/role, run in goroutines with WaitGroup. Two-signal graceful shutdown. --exit-on-new-version triggers context cancellation instead of immediate exit. Per-goroutine panic recovery.

U6. Internal triage scheduler

Parse schedule: "09:00 Europe/Madrid" from config. Triage goroutine calculates next run time and sleeps until then. Replaces the systemd timer.

U7. Example config, docs, systemd migration

Create config.example.yaml matching current 5-service setup. Update README, OOMPA_SETUP, specs. Single systemd unit replaces 5 units + 1 timer.

Dependency Order

U1 (config types) ─┬─► U5 (orchestrator) ─► U7 (docs/migration)
U2 (state keys)  ──┤
U3 (remove workers)┤
U4 (logging)  ─────┘
U6 (triage scheduler) ─► U5

U1-U4 and U6 are independent and can be done in any order. U5 depends on all of them. U7 is last.

Risks

Risk Mitigation
One crashing goroutine takes down all projects Per-goroutine panic recovery: log, continue others
State key collision across repos U2 changes key to owner/repo#number
Config syntax error prevents all projects from starting Validate entire config at startup, fail fast with clear errors
Interleaved logs hard to read U4 adds project and role fields to every log line
Stuck goroutine blocks graceful shutdown Two-signal pattern: second signal force-exits

Key Decisions

  • YAML format -- best readability for nested config, familiar from k8s/GitHub Actions
  • Goroutine-per-role, no worker pool -- mirrors today's systemd model, YAGNI on concurrency limits
  • Config file replaces per-repo flags when --config is used; single-repo CLI mode preserved
  • Two-signal graceful shutdown -- standard Unix daemon pattern, works with systemd TimeoutStopSec
  • Structured logging over per-role log files -- simpler, works with journalctl, industry standard (Renovate, Prow all do this)

Prior Art

  • Atlantis (Go, single binary, all repos, goroutine-per-event) -- closest architectural match
  • Renovate (single process, sequential per-repo, structured JSON logging with logContext)
  • Prow (microservices on k8s, org/repo keyed YAML config) -- config structure inspiration

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions