Skip to content

Region Builder: add Build Summary card with live phase/LM progress/ETA/log tail from ORS service logs #40

@sfc-gh-obielov

Description

@sfc-gh-obielov

Triage note (2026-04-23): This ticket absorbs the progress-bar fix from #36 (Layer 3 / LM N/4 parsing). #36 itself already acknowledged this path: "If #36's Layer 3 lands first, this ticket reduces to tasks 3–6. If this ticket lands first, #36 Layer 3 is absorbed." — we are taking the latter option. The VERSION_INFO missing-table bug from #36 is split out into its own small issue. Close #36 as absorbed-by this ticket + VERSION_INFO split-out (#57).

Consolidated scope:

  1. Log-summariser helper parseOrsBuildLogs() (regex set covers 1/4 … 4/4 calling LM prepare.doWork, LM <variant> finished, edge, nodes: N, usedMB:X, Started Application).
  2. Enriched /api/regions/:region/build-progress endpoint — backward-compatible superset.
  3. New /api/regions/:region/logs tail endpoint.
  4. <BuildSummaryCard /> React component wired into Region Builder.
  5. Long-run duration banner for US/EU extracts (previously Region Builder: US stuck at 95% during LM preparation + VERSION_INFO table missing causes SQL errors #36 Layer 5).
  6. Unit tests against captured log fixture.

The new card replaces today's opaque 95% pin with a live phase / LM step / ETA / log-tail display.


Summary

Surface the same log-derived build diagnostics (phase, LM N/4, elapsed, ETA, memory, recent logs) natively inside the Region Builder tab of the ORS Control App — so users can self-diagnose region provisioning without needing Snowsight or chat support.

Complements #36 (which covers the underlying progress-bar fix). This ticket is the UI/observability feature on top of the same plumbing.

Motivation

Today, when a user provisions a large region (e.g., United States) and the bar sits at 95%, the only way to confirm progress is to run in Snowsight:

SELECT SYSTEM$GET_SERVICE_LOGS('OPENROUTESERVICE_APP.CORE.ORS_SERVICE_UNITEDSTATESOFAMERICA', 0, 'ors', 1000);
SELECT OPENROUTESERVICE_APP.CORE.ORS_STATUS('unitedstatesofamerica');

This requires Snowsight access, SQL comfort, and knowledge of ORS log formats. The control app already calls SYSTEM$GET_SERVICE_LOGS every 5 s at /api/regions/:region/build-progress — we just extract more structured data and render it.

Approach

Regex parsing in Node (no LLM, no Cortex calls).

Rationale: live progress polling needs to be fast (<5 ms), free, and deterministic. ORS log markers (N/4 calling LM prepare.doWork, LM <variant> finished, edge, nodes: N ... usedMB:X, Started Application) are stable in ORS 9.0.x. A probabilistic LLM summary on every 5 s poll would burn credits and risk hallucinated ETAs. An on-demand "Explain this build" LLM button is a viable follow-up but out of scope here.

Architecture

SYSTEM$GET_SERVICE_LOGS(ORS_SERVICE_REGION)
    -> parseOrsBuildLogs() helper
    -> /api/regions/:r/build-progress  (enriched, polled every 5 s)
    -> /api/regions/:r/logs            (tail endpoint, on-demand)
    -> <BuildSummaryCard /> in Region Builder

One helper feeds two endpoints. The React card polls the enriched endpoint (reusing today's buildProgress poller) and fetches the log tail only when the user expands it.

Work items

1. Log summariser helper

Add parseOrsBuildLogs(logs: string): BuildSummary in services/ors_control_app/server/index.ts (or new server/logParser.ts).

Output shape:

type BuildSummary = {
  phase: 'waiting' | 'initializing' | 'importing' | 'ch_preparing' | 'ch_contracting' | 'lm_preparing' | 'ready' | 'unknown';
  progress: number;                          // 0..100
  currentProfile: string | null;
  profiles: { name: string; state: 'pending'|'importing'|'ch'|'lm'|'done'; startedAt?: string; finishedAt?: string }[];
  lm?: {
    stepIndex: number;      // 1..4
    stepTotal: number;      // 4
    currentVariant: string; // e.g. car_ors_fastest_with_turn_costs
    finishedVariants: string[];
    lastEventAt: string;    // ISO
    avgStepMs: number | null;
    etaMs: number | null;
    elapsedMs: number;
  };
  ch?: { nodesStart: number; nodesRemaining: number; fractionDone: number };
  memory?: { usedMB: number; totalMB: number };
  healthReady: boolean;
  serviceReady: boolean;    // from ORS_STATUS(region)
  startedApplication: boolean;
  warnings: string[];       // last 5 WARN lines
};

Regex set (all visible in current logs):

  • Creating LM preparations
  • (\d)\/\d calling LM prepare\.doWork for (\S+) — step index + variant
  • LM (\S+) finished — completed variants
  • edge, nodes:\s*([\d\s]+\d).*totalMB:(\d+),\s*usedMB:(\d+) — memory + CH progress
  • Started Application in [\d.]+ seconds
  • start creating graph | Creating CH preparations | Creating LM preparations
  • Leading timestamp ^(\d{4}-\d\d-\d\d \d\d:\d\d:\d\d) for elapsed/ETA

ETA = avg(gap between consecutive "LM … finished" timestamps) * (expectedLMs - finishedLMs) with expectedLMs = profileCount * 4.

2. Enrich /api/regions/:region/build-progress

Replace the body at server/index.ts:768-859:

const logs = (await runSql(`SELECT SYSTEM$GET_SERVICE_LOGS('${svcName}', 0, 'ors', 1000) AS LOGS`))?.[0]?.LOGS || '';
const summary = parseOrsBuildLogs(logs);
try {
  const statusRows = await runSql(`SELECT TO_VARCHAR(${SF_DATABASE}.CORE.ORS_STATUS('${safeRegion}')) AS S`);
  const status = JSON.parse(statusRows?.[0]?.S || '{}');
  summary.serviceReady = !!status.service_ready;
  summary.healthReady  = !!status.health_ready;
  if (summary.serviceReady && summary.healthReady) { summary.phase = 'ready'; summary.progress = 100; }
} catch {}
res.json(summary);

Backward-compatible: today's progress, phase, currentProfile, completedProfiles, totalProfiles fields are preserved (populated from the structured summary). New fields are additive.

3. New /api/regions/:region/logs tail endpoint

app.get('/api/regions/:region/logs', async (req, res) => {
  const safeRegion = sanitizeIdentifier(req.params.region);
  const tail = Math.min(parseInt(String(req.query.tail || '100'), 10), 500);
  const svcName = `${SF_DATABASE}.CORE.ORS_SERVICE_${safeRegion.toUpperCase()}`;
  try {
    const rows = await runSql(`SELECT SYSTEM$GET_SERVICE_LOGS('${svcName}', 0, 'ors', ${tail}) AS LOGS`);
    const raw = rows?.[0]?.LOGS || '';
    const lines = raw.replace(/\x1b\[[0-9;]*m/g, '').split('\n').slice(-tail);
    res.json({ lines });
  } catch (err: any) { res.status(500).json({ error: err.message }); }
});

Called only when the user expands "Recent logs" in the card — keeps the 5-second poller cheap.

4. BuildSummaryCard React component

New file services/ors_control_app/src/components/BuildSummaryCard.tsx.

Target layout:

+------------------------------------------------------+
| United States     [badge: LM 3/4]   [badge: 83% mem] |
| Phase: Landmark preparation                          |
| Progress [||||||||||||||         ] 88%               |
| LM 3/4 · car_ors_shortest_with_turn_costs            |
| elapsed 00:44 · ~01:10 remaining                     |
| Profiles:  driving-car [LM]                          |
| [Show recent logs >]                                 |
+------------------------------------------------------+

Props: { region: string; summary: BuildSummary }. Uses useState<boolean> for logs-expanded. When expanded, fetches /api/regions/${region}/logs?tail=100 and renders in a monospace <pre> with scroll. Warnings render as amber chips when non-empty.

5. Wire into RegionBuilder.tsx

In services/ors_control_app/src/components/RegionBuilder.tsx:

  1. Extend the buildProgress state type (lines 106-109) to include the new summary fields.
  2. Replace the inline progress block in the Active Jobs section (lines 333-354) with <BuildSummaryCard region={job.region} summary={buildProgress[job.region]} />.
  3. Render it in the Provisioned Regions row when isBuilding is true (lines 387-408).
  4. Existing 5 s poller at lines 180-196 is unchanged — it just consumes a richer payload.

6. Styles

Add to the existing control-app CSS:

  • .build-summary-card (reuse .job-card spacing)
  • .build-summary-row (flex row for badges)
  • .build-summary-logs pre (monospace, 12 px, max-height: 220px; overflow: auto)

Reuse existing .progress-bar-track/.progress-bar-fill. No new design system.

7. Deploy

Per AGENTS.md "Control App Image Deployment":

  1. docker build --platform linux/amd64 -f Dockerfile.runtime -t <repo>/ors_control_app:v<next>
  2. docker push
  3. Bump image tag in ors_control_app_service.yaml
  4. snow stage copy spec to stage
  5. ALTER SERVICE ... SUSPEND; ALTER SERVICE ... FROM @stage SPECIFICATION_FILE=...; ALTER SERVICE ... RESUME;
  6. Print the new endpoint URL.

No schema changes, no new Snowflake objects — read-only additions. Session query_tag unchanged.

Risk & test plan

  • Log format drift: ORS core-LM handler strings are stable in ORS 9.0.x. Add a unit test for parseOrsBuildLogs against a captured US log sample (tests/fixtures/ors-lm-sample.log).
  • Cost: SYSTEM$GET_SERVICE_LOGS is already called every 5 s — zero incremental load.
  • Failure mode: if parseOrsBuildLogs throws, the endpoint falls back to {phase:'unknown',progress:0} (same as today's catch).
  • Manual test: trigger a fresh small-region provision (e.g., Berlin) and confirm the card transitions importing -> ch_preparing -> ch_contracting -> lm_preparing -> ready with correct timestamps.

Out of scope (follow-ups)

Relationship to #36

If #36's Layer 3 lands first, this ticket reduces to tasks 3–6 (card + tail endpoint + wiring + deploy). If this ticket lands first, #36 Layer 3 is absorbed.

Priority

Medium — user-visible quality-of-life improvement, unblocks self-diagnosis for long-running region builds.

Metadata

Metadata

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions