Skip to content

fix(validation-harness): dynamic RTSP live-proxy port + stale sensor residue gotchas#1079

Open
ary1111 wants to merge 3 commits into
feat/build-vision-agent-skillfrom
feat/build-vision-agent-skill-3-2
Open

fix(validation-harness): dynamic RTSP live-proxy port + stale sensor residue gotchas#1079
ary1111 wants to merge 3 commits into
feat/build-vision-agent-skillfrom
feat/build-vision-agent-skill-3-2

Conversation

@ary1111

@ary1111 ary1111 commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

Description

  • Section 4 step 4: replace hardcoded rtsp://HOST:30554/live/SID with a retry loop that reads the actual port from vss-vios-streamprocessing logs (RtspLoadBalancer pool is dynamic; landed on 30556 in last run). VIOS GET /sensor//streams returns empty .url for Rtsp sensors so the log line is the only discovery source (Finding B, 2026-06-17).

  • Gotchas: add bullet documenting the dynamic live-proxy port and the correct grep command to extract it (Finding B).

  • Gotchas: add live registration residue on re-run bullet (Finding C, 2026-06-17) -- vst_data is a host bind-mount that survives down/-v, so a prior NvStreamer sensor stays in Postgres and causes HTTP 400 on re-run. Documents DELETE preferred for iterative runs vs full rm -rf vst_data for clean-slate runs.

Checklist

  • I am familiar with the Contributing Guidelines.
  • I have installed and run pre-commit hooks locally (uv run pre-commit install once, then hooks run on every git commit).
  • Every commit on this PR is DCO sign-off'd (git commit -s adds a Signed-off-by trailer that certifies you have the right to submit the change under Apache-2.0).
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

…residue gotchas

- Section 4 step 4: replace hardcoded rtsp://HOST:30554/live/SID with a
  retry loop that reads the actual port from vss-vios-streamprocessing
  logs (RtspLoadBalancer pool is dynamic; landed on 30556 in last run).
  VIOS GET /sensor/<id>/streams returns empty .url for Rtsp sensors so
  the log line is the only discovery source (Finding B, 2026-06-17).

- Gotchas: add bullet documenting the dynamic live-proxy port and the
  correct grep command to extract it (Finding B).

- Gotchas: add live registration residue on re-run bullet (Finding C,
  2026-06-17) -- vst_data is a host bind-mount that survives down/-v,
  so a prior NvStreamer sensor stays in Postgres and causes HTTP 400
  on re-run. Documents DELETE preferred for iterative runs vs full
  rm -rf vst_data for clean-slate runs.

Signed-off-by: Adam Ryason <aryason@nvidia.com>
@copy-pr-bot

copy-pr-bot Bot commented Jun 18, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@ary1111

ary1111 commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator Author

/ok to test a57dfab

@github-actions

Copy link
Copy Markdown
Contributor

vss-playbook-compliance

Step 1 - playbook compliance

PASS: no errors - errors=0; warnings=1; skills_checked=1

Severity Rule Skill File Message
WARNING NAM-003 vss-build-vision-agent Token at position 2 ('build') after team prefix 'vss-' is not an approved verb. Expected '--' or '-'. Approved verbs: analyze, ask, audit, bootstrap, bump, call, create, deploy, fix, format, generate, ingest, inspect, install, list, manage, migrate, profile, query, review, run, scaffold, search, setup, summarize, tune.

vss-playbook-compliance - commit a57dfab; view full log.

@greptile-apps

greptile-apps Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR fixes the hardcoded RTSP proxy port in the NvStreamer smoke-test sequence — VIOS's RtspLoadBalancer assigns the live-proxy port dynamically (landed on 30556, not 30554, in the most recent run), so the script now retries docker logs vss-vios-streamprocessing for the \"Live proxy url\" log line to discover the actual port. It also adds MODEL_ID runtime-resolution via GET /v1/models and documents two new operational gotchas.

  • validation-harness.md: hardcoded PROXY replaced with a 6-iteration retry loop; new MODEL_ID step before caption drive; new gotcha bullets for dynamic VIOS port, sensor residue on re-run, and VOD clip_storage residue.
  • New patch/deploy/integrate reference files for Alert Bridge and Video Summarization, plus SKILL.md additions (VS LLM-source disambiguation, .env overwrite guard, deploy-skill YAML frontmatter) and standalone-compose-patches.md updates (sudo mkdir/chmod, Redis log cleanup, named-volume pre-chmod, Patch 5 wait-for-redis).
  • NvStreamer image registry fix: nvcr.io/nvstaging/ \u2192 nvcr.io/nvidia/ for the nvstreamer-validation service block.

Confidence Score: 5/5

Safe to merge — all changes are documentation and runbook content; no production code paths are modified.

The core RTSP port fix is correct and well-evidenced. The new patch/deploy/integrate reference files are thorough and internally consistent. The only new finding is a workflow gap in the Alert Bridge .env template where the runtime model ID is hardcoded, but the hf-1208 tag appears stable for the 3.2.0 release and the fix path is documented.

skills/vss-build-vision-agent/references/patch-alerts.md — the generated .env template hardcodes the RT-VLM model tag; the deploy workflow should include a step to resolve it dynamically after RT-VLM starts.

Important Files Changed

Filename Overview
skills/vss-build-vision-agent/references/validation-harness.md Replaces hardcoded PROXY string with a 6-iteration retry loop discovering the dynamic VIOS live-proxy RTSP port from streamprocessing logs; adds MODEL_ID runtime resolution; adds new gotcha bullets. Previously-flagged thread issues around missing guards and empty-SID wildcard remain open.
skills/vss-build-vision-agent/references/patch-alerts.md New file - Alert Bridge patch machinery. Patch 1/2/3 mechanics are correct; but VLM_NAME hardcodes nim_nvidia_cosmos-reason2-8b_hf-1208 while the same PR establishes this ID is runtime-generated and should be resolved dynamically.
skills/vss-build-vision-agent/references/patch-lvs.md New file - VS patch machinery. Component-services block, Patch 1/2 strip logic, .env overrides, and validated IN-1-1 wiring notes are all consistent and well-sourced.
skills/vss-build-vision-agent/references/patch-rt-vlm.md Adds RTVI_VLM_FILE_URL_ALLOWED_DIRS documentation, VOD path design section, and runtime model-ID resolution principle. Changes are well-reasoned and self-consistent.
skills/vss-build-vision-agent/references/standalone-compose-patches.md Adds sudo to mkdir/chmod pre-flight, stale Redis log cleanup via busybox, named-volume pre-chmod, and Patch 5 (wait-for-redis network_mode: host). All well-documented.
skills/vss-build-vision-agent/SKILL.md Adds VS/summarization LLM source disambiguation, .env overwrite guard rule, and YAML frontmatter requirement for generated deploy skills.
skills/vss-build-vision-agent/references/env-file-enumeration.md Adds IN-1 required variable table for RTVI_VLM_FILE_URL_ALLOWED_DIRS and SAMPLE_VIDEO_DATASET with clear purpose annotations.
skills/vss-manage-alerts/references/deploy-alerts-service.md New file - standalone deployment contract for Alert Bridge. Known-issues table is thorough; all sections are well-sourced from compose ground truth.
skills/vss-manage-alerts/references/integrate-alerts.md New file - integration contract for Alert Bridge. API schema, inputs/outputs, env vars, network, and known constraints are well-documented.
skills/vss-summarize-video/references/deploy-lvs-service.md New file - deployment reference for VS. Known-issues table is comprehensive and consistent with integrate-lvs.md.
skills/vss-summarize-video/references/integrate-lvs.md New file - integration contract for VS. Important correction: structured summaries go to default_ not lvs-events when a stream id is present.
.github/skill-eval/adapters/vss-build-vision-agent/init.py Adds SPDX license header to empty init.py. Trivial and correct.

Sequence Diagram

%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
    participant Script as Smoke Script
    participant NV as NvStreamer :31000
    participant VIOS as VIOS :30888
    participant SPLogs as vss-vios-streamprocessing logs
    participant RTVLM as RT-VLM :8018

    Script->>NV: GET /sensor/list (retry until NVSID found)
    NV-->>Script: sensorId
    Script->>NV: GET /sensor/NVSID/streams
    NV-->>Script: URL rtsp HOST 315xx nvstream

    Script->>VIOS: POST /sensor/add sensorUrl URL
    VIOS-->>Script: SID

    loop up to 6x every 5s
        Script->>SPLogs: docker logs grep Live proxy url grep SID
        SPLogs-->>Script: rtsp HOST DYNAMIC_PORT live SID
        Note over Script: PROXY found break
    end
    Note over Script: Port is dynamic e.g. 30556 not 30554

    Script->>RTVLM: GET /v1/models
    RTVLM-->>Script: MODEL_ID nim_nvidia_cosmos-reason2-8b_hf-1208

    Script->>RTVLM: POST /v1/streams/add liveStreamUrl PROXY
    RTVLM-->>Script: STREAM_ID

    Script->>RTVLM: POST /v1/generate_captions id STREAM_ID model MODEL_ID
    RTVLM-->>Script: SSE caption stream
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
    participant Script as Smoke Script
    participant NV as NvStreamer :31000
    participant VIOS as VIOS :30888
    participant SPLogs as vss-vios-streamprocessing logs
    participant RTVLM as RT-VLM :8018

    Script->>NV: GET /sensor/list (retry until NVSID found)
    NV-->>Script: sensorId
    Script->>NV: GET /sensor/NVSID/streams
    NV-->>Script: URL rtsp HOST 315xx nvstream

    Script->>VIOS: POST /sensor/add sensorUrl URL
    VIOS-->>Script: SID

    loop up to 6x every 5s
        Script->>SPLogs: docker logs grep Live proxy url grep SID
        SPLogs-->>Script: rtsp HOST DYNAMIC_PORT live SID
        Note over Script: PROXY found break
    end
    Note over Script: Port is dynamic e.g. 30556 not 30554

    Script->>RTVLM: GET /v1/models
    RTVLM-->>Script: MODEL_ID nim_nvidia_cosmos-reason2-8b_hf-1208

    Script->>RTVLM: POST /v1/streams/add liveStreamUrl PROXY
    RTVLM-->>Script: STREAM_ID

    Script->>RTVLM: POST /v1/generate_captions id STREAM_ID model MODEL_ID
    RTVLM-->>Script: SSE caption stream
Loading

Reviews (3): Last reviewed commit: "feat(skill): VS + Alerting additive prof..." | Re-trigger Greptile

Comment thread skills/vss-build-vision-agent/references/validation-harness.md
Comment on lines +143 to +145
for i in 1 2 3 4 5 6; do
PROXY=$(docker logs vss-vios-streamprocessing 2>&1 | grep "Live proxy url" | grep "$SID" | tail -1 \
| sed -E 's#.*(rtsp://[^[:space:]]+).*#\1#')

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Empty $SID turns grep into a wildcard

If POST /sensor/add fails and SID evaluates to empty or the literal string null (from jq -r), then grep "$SID" matches every "Live proxy url" line in the container logs, and tail -1 will return the proxy URL of whichever sensor was registered last — potentially one from a previous run. The extracted PROXY will then point to the wrong sensor. Adding a check for SID being non-empty and not equal to null before entering the loop would prevent silently feeding the wrong stream to RT-VLM.

Comment thread skills/vss-build-vision-agent/references/validation-harness.md
@github-actions

Copy link
Copy Markdown
Contributor

🔬 Skills Review — 6-paradigm consolidation

19 findings across 1 skill(s) · 🔴 critical=1 · 🟠 high=5 · 🟡 medium=5 · 🔵 low=8
5 need verification (critical or single-lens). Full report in the run artifact.

🔴 Critical / high (act now)

Sev Agree Skill File Finding
🔴 critical 1/6 vss-build-vision-agent skills/vss-build-vision-agent/references/validation-harness.md:176 Wrong vst_data path: VIOS streamprocessing, not NvStreamer
🟠 high 2/6 vss-build-vision-agent skills/vss-build-vision-agent/references/validation-harness.md:96 Section header still lists 30554 as fixed VIOS proxy port after fix
🟠 high 1/6 vss-build-vision-agent skills/vss-build-vision-agent/SKILL.md:394 scripts/validate-references.py listed but scripts/ dir empty
🟠 high 1/6 vss-build-vision-agent skills/vss-build-vision-agent/references/validation-harness.md:143 PROXY loop silently passes with empty string on timeout
🟠 high 1/6 vss-build-vision-agent skills/vss-build-vision-agent/references/validation-harness.md:178 sudo rm -rf on unvalidated env var without guard — destructive if var unset
🟠 high 1/6 vss-build-vision-agent skills/vss-build-vision-agent/references/validation-harness.md:38 nvstaging org in NvStreamer image ref (pre-existing)

⚠️ Verify before acting (critical or single-lens)

  • 🔴 vss-build-vision-agent skills/vss-build-vision-agent/references/validation-harness.md:176 — Wrong vst_data path: VIOS streamprocessing, not NvStreamer (agreement 1)
  • 🟠 vss-build-vision-agent skills/vss-build-vision-agent/SKILL.md:394 — scripts/validate-references.py listed but scripts/ dir empty (agreement 1)
  • 🟠 vss-build-vision-agent skills/vss-build-vision-agent/references/validation-harness.md:143 — PROXY loop silently passes with empty string on timeout (agreement 1)
  • 🟠 vss-build-vision-agent skills/vss-build-vision-agent/references/validation-harness.md:178 — sudo rm -rf on unvalidated env var without guard — destructive if var unset (agreement 1)
  • 🟠 vss-build-vision-agent skills/vss-build-vision-agent/references/validation-harness.md:38 — nvstaging org in NvStreamer image ref (pre-existing) (agreement 1)

Skills Review (advisory) · run

…, .env guard

standalone-compose-patches.md:
- Patch 0: use sudo mkdir/chmod for root-owned bind-mount dirs
- Patch 0: busybox named-volume pre-chmod (mdx-kafka, mdx-elastic-data/logs)
  before first `up`; PROJ=mdx not basename of BUILD_DIR
- Patch 0: busybox rm -f stale redis *.log before up (dnsmasq:bootstrap
  0600 ownership blocks Redis even after chmod 777 on the directory)
- Patch 5 (new): set network_mode: host on wait-for-redis in patched SDRC
  compose to fix 120s bridge->host timeout on iptables FORWARD DROP hosts

validation-harness.md:
- Section 2: nvstaging -> nvidia image registry for nvstreamer-validation
- Section 4: runtime model ID resolution via GET /v1/models before
  generate_captions (hardcoded cosmos-reason2-8b returns BadParameters);
  renumber duplicate step 5 -> steps 5/6/7
- Gotchas: VOD clip_storage split-brain residue (file on disk, absent from
  Postgres) -- busybox rm -f through bind-mount to unblock re-upload

patch-rt-vlm.md:
- Add FILE_URL_ALLOWED_DIRS env var injection (RTVI_VLM_FILE_URL_ALLOWED_DIRS
  in .env + FILE_URL_ALLOWED_DIRS in rtvi-vlm environment block)
- Add VOD path design section: VIOS vodUrl -> RT-VLM shared clip_storage
  mount flow (upload -> resolve -> file:// register -> caption)
- Document runtime model ID resolution requirement and error it prevents

env-file-enumeration.md:
- Add "Required additions for IN-1" table: RTVI_VLM_FILE_URL_ALLOWED_DIRS
  and SAMPLE_VIDEO_DATASET (both absent from all upstream .env files)

SKILL.md:
- Step 6: .env overwrite guard -- write to .env.new when existing .env
  contains production values (NGC_CLI_API_KEY/HOST_IP sentinel check)

All surfaced live 2026-06-18, IN-1 expanded eval.

Signed-off-by: Adam Ryason <aryason@nvidia.com>
@ary1111 ary1111 requested a review from a team as a code owner June 18, 2026 15:23
Comment thread skills/vss-build-vision-agent/references/validation-harness.md
…improvements

New pair files for two IN-1 extension profiles validated E2E on 2026-06-18:

Video Summarization (VS) — IN-1-1 (flag bp_developer_in_1_lvs, 23 services):
- skills/vss-summarize-video/references/integrate-lvs.md: VS integration
  contract; corrected ES output to default_<file_id> (not lvs-events);
  local NIM documented as upstream default (dev-profile-lvs local_shared)
- skills/vss-summarize-video/references/deploy-lvs-service.md: VS deployment
  contract (CPU-only, port 38111)
- skills/vss-build-vision-agent/references/patch-lvs.md: component_services
  block; GPU-slot check (host_gpu_count - profile_gpu_count >= 1 -> local NIM
  default); cloud inference (integrate.api.nvidia.com) documented as fallback
  only with cloud-entitlement-vs-NGC-key warning; Patch 2 keep/strip rule for
  nvidia-nemotron-nano-9b-v2 when defined vs undefined

VLM Real-time Alerting — IN-1-2 (flag bp_developer_an_1, 22 services):
- skills/vss-manage-alerts/references/integrate-alerts.md: alert-bridge
  contract; POST /api/v1/realtime corrected to live_stream_url/alert_type/
  prompt (not name/model/schedule); always_on:false default; ES index
  mdx-vlm-incidents-YYYY-MM-DD (date-suffixed); CPU-only, no 2nd GPU
- skills/vss-manage-alerts/references/deploy-alerts-service.md: alert
  deployment contract
- skills/vss-build-vision-agent/references/patch-alerts.md: component_services
  block; Patch 3 materializes realtime-config.yml; Patch 2 strips 9 undefined
  depends_on peers

SKILL.md Step 4 improvements:
- GPU-slot check for VS profiles: propose local NIM when spare GPU available
- Terminology: Long Video Summarization (LVS) -> Video Summarization (VS) in
  prose; protected names unchanged (lvs-server, LVS_* env vars, mdx-lvs, etc.)

All surfaced live 2026-06-18, IN-1-1 + IN-1-2 E2E validation.
deploy/docker byte-identical.

Signed-off-by: Adam Ryason <aryason@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant