Skip to content

feature: TO GROOM orchestration #29

@pevab

Description

@pevab

Orchestration spec

on veut run 1000 simulations

  • e.g., server centralisé + n workers + f byzantins
  • en changeant valeurs
    • de f
    • ou l'aggregateurs
    • paramètres de l'aggregateurs
  • récupérer les métriques de chacune des simulations
    • loss à chaque step/epoch
    • gradient norm, curvature
    • custom metric research-specific
    • etc.

une simulation

  • module python, bien custom (completely written by end-user)
  • run as os process
    • to avoid oom killer to kill all simulations and the orchestrator
  • simulation output = collection of metrics
    • base class metric tracking instantiations
    • serialized in some folder

orchestration plan

  • in python
  • requested simulations with parameters
  • requested metrics
  • each metric kind has an associated visualization

orchestration execution

  • launches all simulation subprocesses
  • happy path (all simulations exit 0)
    • collect all metric events
    • generate associated visualizations
  • sick path (some simulations exit $\ne$ 1)
    • best effort for metric collections and visualizations
    • on new plan run, only the failed simulations are re-run

orchestration state

  • content: execution graph, etc.
  • lock to prevent two orchestrators to run simultaneously
  • state inferred from simulation subfolders

Metadata

Metadata

Assignees

Labels

to refineTicket should be refined

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions