Problem
Current training progress events are emitted as loosely structured payloads (message + numeric test index), and the frontend stores them by array index.
This makes the contract fragile and hard to extend:
- UI behavior depends on parsing free-form strings like
"Starting" / "Finish"
- No stable
run_id in event payload to scope updates
- No explicit epoch/batch/metric fields for charting
- Hard to support multi-run comparison and robust progress visualizations
Why this matters
A stable telemetry contract is required for:
- real-time training charts
- run comparison dashboard
- reliable retry/reconnect handling
- future tuning/interpretability workflows
String-based progress messages are hard to validate and can silently break UI.
Current behavior (observed)
- Backend emits generic message events from
CustomProgressBar
- Frontend in
Training.jsx maps updates using resultValues[parseInt(resp.test)]
- Progress state is not strongly typed and not run-scoped
Proposed solution
Introduce versioned structured events for training:
training_started
training_update
training_complete
training_error
Each event should include a typed payload, e.g.:
{
"run_id": "<uuid-or-id>",
"phase": "train|eval",
"epoch": 3,
"batch": 12,
"steps": 100,
"metrics": {
"loss": 0.42,
"accuracy": 0.88,
"val_loss": 0.50,
"val_accuracy": 0.84
},
"message": "optional human-readable message",
"timestamp": "ISO-8601"
}
Problem
Current training progress events are emitted as loosely structured payloads (
message+ numerictestindex), and the frontend stores them by array index.This makes the contract fragile and hard to extend:
"Starting"/"Finish"run_idin event payload to scope updatesWhy this matters
A stable telemetry contract is required for:
String-based progress messages are hard to validate and can silently break UI.
Current behavior (observed)
CustomProgressBarTraining.jsxmaps updates usingresultValues[parseInt(resp.test)]Proposed solution
Introduce versioned structured events for training:
training_startedtraining_updatetraining_completetraining_errorEach event should include a typed payload, e.g.:
{ "run_id": "<uuid-or-id>", "phase": "train|eval", "epoch": 3, "batch": 12, "steps": 100, "metrics": { "loss": 0.42, "accuracy": 0.88, "val_loss": 0.50, "val_accuracy": 0.84 }, "message": "optional human-readable message", "timestamp": "ISO-8601" }