fix(validation-harness): dynamic RTSP live-proxy port + stale sensor residue gotchas#1079
fix(validation-harness): dynamic RTSP live-proxy port + stale sensor residue gotchas#1079ary1111 wants to merge 3 commits into
Conversation
…residue gotchas - Section 4 step 4: replace hardcoded rtsp://HOST:30554/live/SID with a retry loop that reads the actual port from vss-vios-streamprocessing logs (RtspLoadBalancer pool is dynamic; landed on 30556 in last run). VIOS GET /sensor/<id>/streams returns empty .url for Rtsp sensors so the log line is the only discovery source (Finding B, 2026-06-17). - Gotchas: add bullet documenting the dynamic live-proxy port and the correct grep command to extract it (Finding B). - Gotchas: add live registration residue on re-run bullet (Finding C, 2026-06-17) -- vst_data is a host bind-mount that survives down/-v, so a prior NvStreamer sensor stays in Postgres and causes HTTP 400 on re-run. Documents DELETE preferred for iterative runs vs full rm -rf vst_data for clean-slate runs. Signed-off-by: Adam Ryason <aryason@nvidia.com>
|
/ok to test a57dfab |
vss-playbook-complianceStep 1 - playbook compliancePASS: no errors - errors=0; warnings=1; skills_checked=1
vss-playbook-compliance - commit |
Greptile SummaryThis PR fixes the hardcoded RTSP proxy port in the NvStreamer smoke-test sequence — VIOS's
Confidence Score: 5/5Safe to merge — all changes are documentation and runbook content; no production code paths are modified. The core RTSP port fix is correct and well-evidenced. The new patch/deploy/integrate reference files are thorough and internally consistent. The only new finding is a workflow gap in the Alert Bridge .env template where the runtime model ID is hardcoded, but the hf-1208 tag appears stable for the 3.2.0 release and the fix path is documented. skills/vss-build-vision-agent/references/patch-alerts.md — the generated .env template hardcodes the RT-VLM model tag; the deploy workflow should include a step to resolve it dynamically after RT-VLM starts. Important Files Changed
Sequence Diagram%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
participant Script as Smoke Script
participant NV as NvStreamer :31000
participant VIOS as VIOS :30888
participant SPLogs as vss-vios-streamprocessing logs
participant RTVLM as RT-VLM :8018
Script->>NV: GET /sensor/list (retry until NVSID found)
NV-->>Script: sensorId
Script->>NV: GET /sensor/NVSID/streams
NV-->>Script: URL rtsp HOST 315xx nvstream
Script->>VIOS: POST /sensor/add sensorUrl URL
VIOS-->>Script: SID
loop up to 6x every 5s
Script->>SPLogs: docker logs grep Live proxy url grep SID
SPLogs-->>Script: rtsp HOST DYNAMIC_PORT live SID
Note over Script: PROXY found break
end
Note over Script: Port is dynamic e.g. 30556 not 30554
Script->>RTVLM: GET /v1/models
RTVLM-->>Script: MODEL_ID nim_nvidia_cosmos-reason2-8b_hf-1208
Script->>RTVLM: POST /v1/streams/add liveStreamUrl PROXY
RTVLM-->>Script: STREAM_ID
Script->>RTVLM: POST /v1/generate_captions id STREAM_ID model MODEL_ID
RTVLM-->>Script: SSE caption stream
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
participant Script as Smoke Script
participant NV as NvStreamer :31000
participant VIOS as VIOS :30888
participant SPLogs as vss-vios-streamprocessing logs
participant RTVLM as RT-VLM :8018
Script->>NV: GET /sensor/list (retry until NVSID found)
NV-->>Script: sensorId
Script->>NV: GET /sensor/NVSID/streams
NV-->>Script: URL rtsp HOST 315xx nvstream
Script->>VIOS: POST /sensor/add sensorUrl URL
VIOS-->>Script: SID
loop up to 6x every 5s
Script->>SPLogs: docker logs grep Live proxy url grep SID
SPLogs-->>Script: rtsp HOST DYNAMIC_PORT live SID
Note over Script: PROXY found break
end
Note over Script: Port is dynamic e.g. 30556 not 30554
Script->>RTVLM: GET /v1/models
RTVLM-->>Script: MODEL_ID nim_nvidia_cosmos-reason2-8b_hf-1208
Script->>RTVLM: POST /v1/streams/add liveStreamUrl PROXY
RTVLM-->>Script: STREAM_ID
Script->>RTVLM: POST /v1/generate_captions id STREAM_ID model MODEL_ID
RTVLM-->>Script: SSE caption stream
Reviews (3): Last reviewed commit: "feat(skill): VS + Alerting additive prof..." | Re-trigger Greptile |
| for i in 1 2 3 4 5 6; do | ||
| PROXY=$(docker logs vss-vios-streamprocessing 2>&1 | grep "Live proxy url" | grep "$SID" | tail -1 \ | ||
| | sed -E 's#.*(rtsp://[^[:space:]]+).*#\1#') |
There was a problem hiding this comment.
Empty
$SID turns grep into a wildcard
If POST /sensor/add fails and SID evaluates to empty or the literal string null (from jq -r), then grep "$SID" matches every "Live proxy url" line in the container logs, and tail -1 will return the proxy URL of whichever sensor was registered last — potentially one from a previous run. The extracted PROXY will then point to the wrong sensor. Adding a check for SID being non-empty and not equal to null before entering the loop would prevent silently feeding the wrong stream to RT-VLM.
🔬 Skills Review — 6-paradigm consolidation19 findings across 1 skill(s) · 🔴 critical=1 · 🟠 high=5 · 🟡 medium=5 · 🔵 low=8 🔴 Critical / high (act now)
|
…, .env guard standalone-compose-patches.md: - Patch 0: use sudo mkdir/chmod for root-owned bind-mount dirs - Patch 0: busybox named-volume pre-chmod (mdx-kafka, mdx-elastic-data/logs) before first `up`; PROJ=mdx not basename of BUILD_DIR - Patch 0: busybox rm -f stale redis *.log before up (dnsmasq:bootstrap 0600 ownership blocks Redis even after chmod 777 on the directory) - Patch 5 (new): set network_mode: host on wait-for-redis in patched SDRC compose to fix 120s bridge->host timeout on iptables FORWARD DROP hosts validation-harness.md: - Section 2: nvstaging -> nvidia image registry for nvstreamer-validation - Section 4: runtime model ID resolution via GET /v1/models before generate_captions (hardcoded cosmos-reason2-8b returns BadParameters); renumber duplicate step 5 -> steps 5/6/7 - Gotchas: VOD clip_storage split-brain residue (file on disk, absent from Postgres) -- busybox rm -f through bind-mount to unblock re-upload patch-rt-vlm.md: - Add FILE_URL_ALLOWED_DIRS env var injection (RTVI_VLM_FILE_URL_ALLOWED_DIRS in .env + FILE_URL_ALLOWED_DIRS in rtvi-vlm environment block) - Add VOD path design section: VIOS vodUrl -> RT-VLM shared clip_storage mount flow (upload -> resolve -> file:// register -> caption) - Document runtime model ID resolution requirement and error it prevents env-file-enumeration.md: - Add "Required additions for IN-1" table: RTVI_VLM_FILE_URL_ALLOWED_DIRS and SAMPLE_VIDEO_DATASET (both absent from all upstream .env files) SKILL.md: - Step 6: .env overwrite guard -- write to .env.new when existing .env contains production values (NGC_CLI_API_KEY/HOST_IP sentinel check) All surfaced live 2026-06-18, IN-1 expanded eval. Signed-off-by: Adam Ryason <aryason@nvidia.com>
…improvements New pair files for two IN-1 extension profiles validated E2E on 2026-06-18: Video Summarization (VS) — IN-1-1 (flag bp_developer_in_1_lvs, 23 services): - skills/vss-summarize-video/references/integrate-lvs.md: VS integration contract; corrected ES output to default_<file_id> (not lvs-events); local NIM documented as upstream default (dev-profile-lvs local_shared) - skills/vss-summarize-video/references/deploy-lvs-service.md: VS deployment contract (CPU-only, port 38111) - skills/vss-build-vision-agent/references/patch-lvs.md: component_services block; GPU-slot check (host_gpu_count - profile_gpu_count >= 1 -> local NIM default); cloud inference (integrate.api.nvidia.com) documented as fallback only with cloud-entitlement-vs-NGC-key warning; Patch 2 keep/strip rule for nvidia-nemotron-nano-9b-v2 when defined vs undefined VLM Real-time Alerting — IN-1-2 (flag bp_developer_an_1, 22 services): - skills/vss-manage-alerts/references/integrate-alerts.md: alert-bridge contract; POST /api/v1/realtime corrected to live_stream_url/alert_type/ prompt (not name/model/schedule); always_on:false default; ES index mdx-vlm-incidents-YYYY-MM-DD (date-suffixed); CPU-only, no 2nd GPU - skills/vss-manage-alerts/references/deploy-alerts-service.md: alert deployment contract - skills/vss-build-vision-agent/references/patch-alerts.md: component_services block; Patch 3 materializes realtime-config.yml; Patch 2 strips 9 undefined depends_on peers SKILL.md Step 4 improvements: - GPU-slot check for VS profiles: propose local NIM when spare GPU available - Terminology: Long Video Summarization (LVS) -> Video Summarization (VS) in prose; protected names unchanged (lvs-server, LVS_* env vars, mdx-lvs, etc.) All surfaced live 2026-06-18, IN-1-1 + IN-1-2 E2E validation. deploy/docker byte-identical. Signed-off-by: Adam Ryason <aryason@nvidia.com>
Description
Section 4 step 4: replace hardcoded rtsp://HOST:30554/live/SID with a retry loop that reads the actual port from vss-vios-streamprocessing logs (RtspLoadBalancer pool is dynamic; landed on 30556 in last run). VIOS GET /sensor//streams returns empty .url for Rtsp sensors so the log line is the only discovery source (Finding B, 2026-06-17).
Gotchas: add bullet documenting the dynamic live-proxy port and the correct grep command to extract it (Finding B).
Gotchas: add live registration residue on re-run bullet (Finding C, 2026-06-17) -- vst_data is a host bind-mount that survives down/-v, so a prior NvStreamer sensor stays in Postgres and causes HTTP 400 on re-run. Documents DELETE preferred for iterative runs vs full rm -rf vst_data for clean-slate runs.
Checklist
uv run pre-commit installonce, then hooks run on everygit commit).git commit -sadds aSigned-off-bytrailer that certifies you have the right to submit the change under Apache-2.0).