You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Triage note (2026-04-23): This ticket absorbs the progress-bar fix from #36 (Layer 3 / LM N/4 parsing). #36 itself already acknowledged this path: "If #36's Layer 3 lands first, this ticket reduces to tasks 3–6. If this ticket lands first, #36 Layer 3 is absorbed." — we are taking the latter option. The VERSION_INFO missing-table bug from #36 is split out into its own small issue. Close #36 as absorbed-by this ticket + VERSION_INFO split-out (#57).
Consolidated scope:
Log-summariser helper parseOrsBuildLogs() (regex set covers 1/4 … 4/4 calling LM prepare.doWork, LM <variant> finished, edge, nodes: N, usedMB:X, Started Application).
The new card replaces today's opaque 95% pin with a live phase / LM step / ETA / log-tail display.
Summary
Surface the same log-derived build diagnostics (phase, LM N/4, elapsed, ETA, memory, recent logs) natively inside the Region Builder tab of the ORS Control App — so users can self-diagnose region provisioning without needing Snowsight or chat support.
Complements #36 (which covers the underlying progress-bar fix). This ticket is the UI/observability feature on top of the same plumbing.
Motivation
Today, when a user provisions a large region (e.g., United States) and the bar sits at 95%, the only way to confirm progress is to run in Snowsight:
This requires Snowsight access, SQL comfort, and knowledge of ORS log formats. The control app already calls SYSTEM$GET_SERVICE_LOGS every 5 s at /api/regions/:region/build-progress — we just extract more structured data and render it.
Approach
Regex parsing in Node (no LLM, no Cortex calls).
Rationale: live progress polling needs to be fast (<5 ms), free, and deterministic. ORS log markers (N/4 calling LM prepare.doWork, LM <variant> finished, edge, nodes: N ... usedMB:X, Started Application) are stable in ORS 9.0.x. A probabilistic LLM summary on every 5 s poll would burn credits and risk hallucinated ETAs. An on-demand "Explain this build" LLM button is a viable follow-up but out of scope here.
Architecture
SYSTEM$GET_SERVICE_LOGS(ORS_SERVICE_REGION)
-> parseOrsBuildLogs() helper
-> /api/regions/:r/build-progress (enriched, polled every 5 s)
-> /api/regions/:r/logs (tail endpoint, on-demand)
-> <BuildSummaryCard /> in Region Builder
One helper feeds two endpoints. The React card polls the enriched endpoint (reusing today's buildProgress poller) and fetches the log tail only when the user expands it.
Work items
1. Log summariser helper
Add parseOrsBuildLogs(logs: string): BuildSummary in services/ors_control_app/server/index.ts (or new server/logParser.ts).
Leading timestamp ^(\d{4}-\d\d-\d\d \d\d:\d\d:\d\d) for elapsed/ETA
ETA = avg(gap between consecutive "LM … finished" timestamps) * (expectedLMs - finishedLMs) with expectedLMs = profileCount * 4.
2. Enrich /api/regions/:region/build-progress
Replace the body at server/index.ts:768-859:
constlogs=(awaitrunSql(`SELECT SYSTEM$GET_SERVICE_LOGS('${svcName}', 0, 'ors', 1000) AS LOGS`))?.[0]?.LOGS||'';constsummary=parseOrsBuildLogs(logs);try{conststatusRows=awaitrunSql(`SELECT TO_VARCHAR(${SF_DATABASE}.CORE.ORS_STATUS('${safeRegion}')) AS S`);conststatus=JSON.parse(statusRows?.[0]?.S||'{}');summary.serviceReady=!!status.service_ready;summary.healthReady=!!status.health_ready;if(summary.serviceReady&&summary.healthReady){summary.phase='ready';summary.progress=100;}}catch{}res.json(summary);
Backward-compatible: today's progress, phase, currentProfile, completedProfiles, totalProfiles fields are preserved (populated from the structured summary). New fields are additive.
Props: { region: string; summary: BuildSummary }. Uses useState<boolean> for logs-expanded. When expanded, fetches /api/regions/${region}/logs?tail=100 and renders in a monospace <pre> with scroll. Warnings render as amber chips when non-empty.
5. Wire into RegionBuilder.tsx
In services/ors_control_app/src/components/RegionBuilder.tsx:
Extend the buildProgress state type (lines 106-109) to include the new summary fields.
Replace the inline progress block in the Active Jobs section (lines 333-354) with <BuildSummaryCard region={job.region} summary={buildProgress[job.region]} />.
Render it in the Provisioned Regions row when isBuilding is true (lines 387-408).
Existing 5 s poller at lines 180-196 is unchanged — it just consumes a richer payload.
6. Styles
Add to the existing control-app CSS:
.build-summary-card (reuse .job-card spacing)
.build-summary-row (flex row for badges)
.build-summary-logs pre (monospace, 12 px, max-height: 220px; overflow: auto)
Reuse existing .progress-bar-track/.progress-bar-fill. No new design system.
ALTER SERVICE ... SUSPEND; ALTER SERVICE ... FROM @stage SPECIFICATION_FILE=...; ALTER SERVICE ... RESUME;
Print the new endpoint URL.
No schema changes, no new Snowflake objects — read-only additions. Session query_tag unchanged.
Risk & test plan
Log format drift: ORS core-LM handler strings are stable in ORS 9.0.x. Add a unit test for parseOrsBuildLogs against a captured US log sample (tests/fixtures/ors-lm-sample.log).
Cost:SYSTEM$GET_SERVICE_LOGS is already called every 5 s — zero incremental load.
Failure mode: if parseOrsBuildLogs throws, the endpoint falls back to {phase:'unknown',progress:0} (same as today's catch).
Manual test: trigger a fresh small-region provision (e.g., Berlin) and confirm the card transitions importing -> ch_preparing -> ch_contracting -> lm_preparing -> ready with correct timestamps.
Out of scope (follow-ups)
Persisting summaries to a history table for post-mortem analytics.
On-demand "Explain this build" LLM button (AI_COMPLETE over the last 200 log lines) — valuable but needs its own ticket so cost/UX is reviewed separately.
This ticket = the UI card, log-tail endpoint, and full diagnostic surface on top of that plumbing.
If #36's Layer 3 lands first, this ticket reduces to tasks 3–6 (card + tail endpoint + wiring + deploy). If this ticket lands first, #36 Layer 3 is absorbed.
Priority
Medium — user-visible quality-of-life improvement, unblocks self-diagnosis for long-running region builds.
Summary
Surface the same log-derived build diagnostics (phase, LM N/4, elapsed, ETA, memory, recent logs) natively inside the Region Builder tab of the ORS Control App — so users can self-diagnose region provisioning without needing Snowsight or chat support.
Complements #36 (which covers the underlying progress-bar fix). This ticket is the UI/observability feature on top of the same plumbing.
Motivation
Today, when a user provisions a large region (e.g., United States) and the bar sits at 95%, the only way to confirm progress is to run in Snowsight:
This requires Snowsight access, SQL comfort, and knowledge of ORS log formats. The control app already calls
SYSTEM$GET_SERVICE_LOGSevery 5 s at/api/regions/:region/build-progress— we just extract more structured data and render it.Approach
Regex parsing in Node (no LLM, no Cortex calls).
Rationale: live progress polling needs to be fast (<5 ms), free, and deterministic. ORS log markers (
N/4 calling LM prepare.doWork,LM <variant> finished,edge, nodes: N ... usedMB:X,Started Application) are stable in ORS 9.0.x. A probabilistic LLM summary on every 5 s poll would burn credits and risk hallucinated ETAs. An on-demand "Explain this build" LLM button is a viable follow-up but out of scope here.Architecture
One helper feeds two endpoints. The React card polls the enriched endpoint (reusing today's
buildProgresspoller) and fetches the log tail only when the user expands it.Work items
1. Log summariser helper
Add
parseOrsBuildLogs(logs: string): BuildSummaryinservices/ors_control_app/server/index.ts(or newserver/logParser.ts).Output shape:
Regex set (all visible in current logs):
Creating LM preparations(\d)\/\d calling LM prepare\.doWork for (\S+)— step index + variantLM (\S+) finished— completed variantsedge, nodes:\s*([\d\s]+\d).*totalMB:(\d+),\s*usedMB:(\d+)— memory + CH progressStarted Application in [\d.]+ secondsstart creating graph | Creating CH preparations | Creating LM preparations^(\d{4}-\d\d-\d\d \d\d:\d\d:\d\d)for elapsed/ETAETA =
avg(gap between consecutive "LM … finished" timestamps) * (expectedLMs - finishedLMs)withexpectedLMs = profileCount * 4.2. Enrich
/api/regions/:region/build-progressReplace the body at
server/index.ts:768-859:Backward-compatible: today's
progress,phase,currentProfile,completedProfiles,totalProfilesfields are preserved (populated from the structured summary). New fields are additive.3. New
/api/regions/:region/logstail endpointCalled only when the user expands "Recent logs" in the card — keeps the 5-second poller cheap.
4.
BuildSummaryCardReact componentNew file
services/ors_control_app/src/components/BuildSummaryCard.tsx.Target layout:
Props:
{ region: string; summary: BuildSummary }. UsesuseState<boolean>for logs-expanded. When expanded, fetches/api/regions/${region}/logs?tail=100and renders in a monospace<pre>with scroll. Warnings render as amber chips when non-empty.5. Wire into
RegionBuilder.tsxIn
services/ors_control_app/src/components/RegionBuilder.tsx:buildProgressstate type (lines 106-109) to include the new summary fields.<BuildSummaryCard region={job.region} summary={buildProgress[job.region]} />.isBuildingis true (lines 387-408).6. Styles
Add to the existing control-app CSS:
.build-summary-card(reuse.job-cardspacing).build-summary-row(flex row for badges).build-summary-logs pre(monospace, 12 px,max-height: 220px; overflow: auto)Reuse existing
.progress-bar-track/.progress-bar-fill. No new design system.7. Deploy
Per
AGENTS.md"Control App Image Deployment":docker build --platform linux/amd64 -f Dockerfile.runtime -t <repo>/ors_control_app:v<next>docker pushors_control_app_service.yamlsnow stage copyspec to stageALTER SERVICE ... SUSPEND; ALTER SERVICE ... FROM @stage SPECIFICATION_FILE=...; ALTER SERVICE ... RESUME;No schema changes, no new Snowflake objects — read-only additions. Session
query_tagunchanged.Risk & test plan
parseOrsBuildLogsagainst a captured US log sample (tests/fixtures/ors-lm-sample.log).SYSTEM$GET_SERVICE_LOGSis already called every 5 s — zero incremental load.parseOrsBuildLogsthrows, the endpoint falls back to{phase:'unknown',progress:0}(same as today's catch).importing -> ch_preparing -> ch_contracting -> lm_preparing -> readywith correct timestamps.Out of scope (follow-ups)
AI_COMPLETEover the last 200 log lines) — valuable but needs its own ticket so cost/UX is reviewed separately.Relationship to #36
build-progress) + killVERSION_INFOerror noise.If #36's Layer 3 lands first, this ticket reduces to tasks 3–6 (card + tail endpoint + wiring + deploy). If this ticket lands first, #36 Layer 3 is absorbed.
Priority
Medium — user-visible quality-of-life improvement, unblocks self-diagnosis for long-running region builds.