Skip to content

Commit e432ade

Browse files
committed
merge: audit-fix-f-ci into chore/cleanup-session-plus-8 (Phase F round 1+2+3+4 CONVERGED)
2 parents 9587be0 + c88f890 commit e432ade

7 files changed

Lines changed: 344 additions & 14 deletions

File tree

.github/workflows/release.yml

Lines changed: 167 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -70,12 +70,170 @@ jobs:
7070
- name: Verify installer/*.ps1 files start with UTF-8 BOM if non-ASCII
7171
run: python3 installer/check_ps1_bom.py
7272

73+
# ---------------------------------------------------------------------------
74+
# Preflight: version consistency guard (Phase F audit-fix round 1, B-03).
75+
#
76+
# Asserts at tag-push time that the three version sources agree:
77+
# 1. git tag (refs/tags/vX.Y.Z, stripped 'v')
78+
# 2. sparrow-engine-cli/Cargo.toml ([package].version — the source `spe --version` reads via CARGO_PKG_VERSION at sparrow-engine-cli/src/main.rs:43)
79+
# 3. sparrow-engine-python/pyproject.toml ([project].version — the source the PyPI wheel METADATA carries)
80+
#
81+
# All three MUST equal each other before any wheel / CLI tarball build starts.
82+
# A mismatch means the tag was cut without bumping one of the manifests, which
83+
# would either (a) ship a wheel whose METADATA disagrees with PyPI's stored
84+
# version (publish-pypi-cpu's existing tag-vs-wheel check would catch THAT but
85+
# too late — the GPU build also runs unnecessarily) or (b) ship a CLI tarball
86+
# whose `spe --version` output disagrees with the wheel users see in `pip show`.
87+
#
88+
# Surfaced by Phase 4.5 lane 1 finding L1-F5 (MT-4.5-97/-98/-102): `spe --version`
89+
# reported `0.1.0` while PyPI shipped 0.1.12 and brew shipped 0.1.10. Once Phase D
90+
# bumps sparrow-engine-cli/Cargo.toml in lockstep with pyproject.toml, this guard
91+
# prevents future drift.
92+
#
93+
# Gated to tag-push: workflow_dispatch / push-to-branch don't carry a tag-name
94+
# commitment so the comparison is N/A and the job no-ops (skipped by `if:`).
95+
# This means downstream `needs:` lists can include this job without slowing
96+
# non-release runs.
97+
# ---------------------------------------------------------------------------
98+
check-version-consistency:
99+
name: Preflight — version consistency (tag ↔ Cargo.toml ↔ pyproject.toml)
100+
# Runs on every trigger. The internal step short-circuits with a PASS
101+
# message on non-tag-push triggers so downstream `needs:` are unambiguously
102+
# satisfied across workflow_dispatch / branch-push / tag-push. (Relying on
103+
# GitHub's "skipped jobs satisfy needs" implicit rule is fragile when the
104+
# downstream's own `if:` interacts with needs.* results.)
105+
runs-on: ubuntu-latest
106+
steps:
107+
- uses: actions/checkout@v4
108+
- name: Compare git tag, sparrow-engine-cli Cargo.toml, sparrow-engine-python pyproject.toml
109+
shell: bash
110+
run: |
111+
set -euo pipefail
112+
# Trigger taxonomy (Phase F R2 F-R2-4):
113+
# tag-push : enforce tag ↔ cli ↔ py three-way agreement (release-critical).
114+
# workflow_dispatch: enforce cli ↔ py two-way agreement (manual release rehearsal
115+
# — no tag yet, but Cargo/Python must already agree so a follow-up
116+
# tag-push doesn't blow up).
117+
# branch-push / PR: skip (most common dev case; pre-tag drift is intentional and
118+
# gets caught at workflow_dispatch / tag-push time).
119+
mode=""
120+
if [ "${GITHUB_EVENT_NAME}" = "push" ] && [[ "${GITHUB_REF}" == refs/tags/v* ]]; then
121+
mode="tag-push"
122+
elif [ "${GITHUB_EVENT_NAME}" = "workflow_dispatch" ]; then
123+
mode="workflow-dispatch"
124+
else
125+
echo "Non-release trigger (event=${GITHUB_EVENT_NAME}, ref=${GITHUB_REF}). Skipping check."
126+
exit 0
127+
fi
128+
echo "Enforcement mode: $mode"
129+
# Strip optional leading 'v' from the tag name. Empty string on
130+
# workflow_dispatch (no tag context); tag-version comparisons below
131+
# are gated on `mode == tag-push` so the empty value is never read
132+
# for enforcement in that path.
133+
tag_version=""
134+
if [ "$mode" = "tag-push" ]; then
135+
tag_version="${GITHUB_REF_NAME#v}"
136+
fi
137+
138+
# sparrow-engine-cli Cargo.toml [package].version — awk-extracted (no python heredoc,
139+
# no cargo / jq install). Looks for the `version = "..."` line under the `[package]`
140+
# section header, stops at the next `[…]` section.
141+
cli_version="$(awk '
142+
/^\[package\][[:space:]]*$/ { in_pkg = 1; next }
143+
in_pkg && /^\[/ { in_pkg = 0 }
144+
in_pkg && /^version[[:space:]]*=/{ match($0, /"[^"]+"/); print substr($0, RSTART+1, RLENGTH-2); exit }
145+
' sparrow-engine/sparrow-engine-cli/Cargo.toml)"
146+
if [ -z "$cli_version" ]; then
147+
echo "::error::could not extract [package].version from sparrow-engine/sparrow-engine-cli/Cargo.toml"
148+
exit 2
149+
fi
150+
151+
# sparrow-engine-python pyproject.toml [project].version — same awk pattern against [project].
152+
py_version="$(awk '
153+
/^\[project\][[:space:]]*$/ { in_proj = 1; next }
154+
in_proj && /^\[/ { in_proj = 0 }
155+
in_proj && /^version[[:space:]]*=/{ match($0, /"[^"]+"/); print substr($0, RSTART+1, RLENGTH-2); exit }
156+
' sparrow-engine/sparrow-engine-python/pyproject.toml)"
157+
if [ -z "$py_version" ]; then
158+
echo "::error::could not extract [project].version from sparrow-engine/sparrow-engine-python/pyproject.toml"
159+
exit 2
160+
fi
161+
162+
echo "Tag version (stripped 'v'): ${tag_version:-<n/a — workflow_dispatch>}"
163+
echo "sparrow-engine-cli Cargo.toml: $cli_version"
164+
echo "sparrow-engine-python pyproject.toml: $py_version"
165+
166+
fail=0
167+
if [ "$mode" = "tag-push" ]; then
168+
if [ "$tag_version" != "$cli_version" ]; then
169+
echo "::error::tag ($tag_version) ≠ sparrow-engine-cli Cargo.toml ($cli_version)"
170+
echo " -> bump sparrow-engine/sparrow-engine-cli/Cargo.toml [package].version to '$tag_version' before re-tagging."
171+
fail=1
172+
fi
173+
if [ "$tag_version" != "$py_version" ]; then
174+
echo "::error::tag ($tag_version) ≠ sparrow-engine-python pyproject.toml ($py_version)"
175+
echo " -> bump sparrow-engine/sparrow-engine-python/pyproject.toml [project].version to '$tag_version' before re-tagging."
176+
fail=1
177+
fi
178+
fi
179+
# cli ↔ py agreement is enforced on BOTH tag-push and workflow_dispatch
180+
# (F-R2-4 round-2 fix): a workflow_dispatch release rehearsal must surface
181+
# version drift before tag-push time, otherwise the manual dispatch path
182+
# gives false-PASS while the eventual tag still fails.
183+
if [ "$cli_version" != "$py_version" ]; then
184+
echo "::error::sparrow-engine-cli Cargo.toml ($cli_version) ≠ sparrow-engine-python pyproject.toml ($py_version)"
185+
fail=1
186+
fi
187+
if [ "$fail" -ne 0 ]; then
188+
echo ""
189+
echo "FAIL: version consistency guard (Phase F B-03)."
190+
echo "Refs: docs/review/phase4.5-cleanup-audit-fix-f/round_01/reviewer_review.md § B-03"
191+
echo " docs/review/phase4.5-cleanup-audit-fix-f/round_02/fixer_report.md § F-R2-4"
192+
exit 1
193+
fi
194+
if [ "$mode" = "tag-push" ]; then
195+
echo "PASS: all three version sources agree on '$tag_version'."
196+
else
197+
echo "PASS (workflow_dispatch): cli ↔ py agree on '$cli_version' (tag check skipped — no tag context)."
198+
fi
199+
200+
- name: Compare ORT_VERSION across Dockerfile.cpu and Dockerfile.gpu
201+
shell: bash
202+
run: |
203+
set -euo pipefail
204+
# F-R2-6 (round 2): ARG ORT_VERSION is duplicated across the two
205+
# Dockerfiles. A future ORT bump must touch both atomically or the
206+
# CPU and GPU images drift into different ORT runtimes — which is
207+
# the exact root cause B-06/B-07 fixed in round 1. Cheap grep guard
208+
# in CI is simpler than refactoring to a shared build-arg source.
209+
cpu_ort="$(awk '/^ARG[[:space:]]+ORT_VERSION=/{
210+
sub(/^ARG[[:space:]]+ORT_VERSION=/, ""); print; exit
211+
}' sparrow-engine/docker/Dockerfile.cpu)"
212+
gpu_ort="$(awk '/^ARG[[:space:]]+ORT_VERSION=/{
213+
sub(/^ARG[[:space:]]+ORT_VERSION=/, ""); print; exit
214+
}' sparrow-engine/docker/Dockerfile.gpu)"
215+
if [ -z "$cpu_ort" ] || [ -z "$gpu_ort" ]; then
216+
echo "::error::could not extract ARG ORT_VERSION from one or both Dockerfiles"
217+
echo " Dockerfile.cpu: '${cpu_ort:-<missing>}'"
218+
echo " Dockerfile.gpu: '${gpu_ort:-<missing>}'"
219+
exit 2
220+
fi
221+
echo "Dockerfile.cpu ARG ORT_VERSION: $cpu_ort"
222+
echo "Dockerfile.gpu ARG ORT_VERSION: $gpu_ort"
223+
if [ "$cpu_ort" != "$gpu_ort" ]; then
224+
echo "::error::ARG ORT_VERSION drift: Dockerfile.cpu=$cpu_ort, Dockerfile.gpu=$gpu_ort"
225+
echo " -> bump both Dockerfiles atomically; ORT-side ABI must match across CPU and GPU images."
226+
echo " Refs: docs/review/phase4.5-cleanup-audit-fix-f/round_02/fixer_report.md § F-R2-6"
227+
exit 1
228+
fi
229+
echo "PASS: Dockerfile.cpu and Dockerfile.gpu agree on ORT_VERSION=$cpu_ort."
230+
73231
# -------- CPU build matrix --------
74232

75233
build-cpu-linux:
76234
name: Build CPU wheel (Linux manylinux_2_28 x86_64)
77235
runs-on: ubuntu-latest
78-
needs: check-installer-ps1-bom
236+
needs: [check-installer-ps1-bom, check-version-consistency]
79237
steps:
80238
- uses: actions/checkout@v4
81239

@@ -156,7 +314,7 @@ jobs:
156314
build-cpu-macos-arm64:
157315
name: Build CPU wheel (macOS arm64)
158316
runs-on: macos-14
159-
needs: check-installer-ps1-bom
317+
needs: [check-installer-ps1-bom, check-version-consistency]
160318
env:
161319
MACOSX_DEPLOYMENT_TARGET: '11.0'
162320
steps:
@@ -199,7 +357,7 @@ jobs:
199357
build-cpu-windows:
200358
name: Build CPU wheel (Windows x86_64)
201359
runs-on: windows-latest
202-
needs: check-installer-ps1-bom
360+
needs: [check-installer-ps1-bom, check-version-consistency]
203361
steps:
204362
- uses: actions/checkout@v4
205363

@@ -243,7 +401,7 @@ jobs:
243401
build-gpu-linux:
244402
name: Build GPU wheel (Linux x86_64, CUDA 12.6 + cuDNN, Rocky 8 / glibc 2.28)
245403
runs-on: ubuntu-latest
246-
needs: check-installer-ps1-bom
404+
needs: [check-installer-ps1-bom, check-version-consistency]
247405
container:
248406
# Rocky 8 base = RHEL 8 clone = glibc 2.28 (the manylinux_2_28 floor).
249407
# Phase F (2026-05-25): swapped from ubuntu24.04 (glibc 2.39) so the
@@ -338,7 +496,7 @@ jobs:
338496
build-gpu-windows:
339497
name: Build GPU wheel (Windows x86_64)
340498
runs-on: windows-latest
341-
needs: check-installer-ps1-bom
499+
needs: [check-installer-ps1-bom, check-version-consistency]
342500
# No CUDA Toolkit on the runner. cudarc's `fallback-dynamic-loading`
343501
# feature (vendor/cudarc/build.rs:70-78) activates the `dynamic-loading`
344502
# cfg from the feature flag alone, with no nvcc / driver probing.
@@ -607,7 +765,7 @@ jobs:
607765
build-cli-linux-cpu:
608766
name: Build CLI tarball (sparrow-engine-cpu, Linux x86_64)
609767
runs-on: ubuntu-latest
610-
needs: check-installer-ps1-bom
768+
needs: [check-installer-ps1-bom, check-version-consistency]
611769
container:
612770
# manylinux_2_28 = glibc 2.28 floor, matches build-cpu-linux's wheel target.
613771
image: quay.io/pypa/manylinux_2_28_x86_64
@@ -671,7 +829,7 @@ jobs:
671829
build-cli-linux-gpu:
672830
name: Build CLI tarball (sparrow-engine-gpu, Linux x86_64)
673831
runs-on: ubuntu-latest
674-
needs: check-installer-ps1-bom
832+
needs: [check-installer-ps1-bom, check-version-consistency]
675833
container:
676834
# Same Rocky 8 / glibc 2.28 image as build-gpu-linux.
677835
image: nvidia/cuda:12.6.3-cudnn-devel-rockylinux8
@@ -761,7 +919,7 @@ jobs:
761919
build-cli-macos-arm64:
762920
name: Build CLI tarball (sparrow-engine-cpu, macOS arm64)
763921
runs-on: macos-14
764-
needs: check-installer-ps1-bom
922+
needs: [check-installer-ps1-bom, check-version-consistency]
765923
env:
766924
MACOSX_DEPLOYMENT_TARGET: '11.0'
767925
steps:
@@ -825,7 +983,7 @@ jobs:
825983
build-cli-windows:
826984
name: Build CLI tarball (sparrow-engine-cpu, Windows x86_64)
827985
runs-on: windows-latest
828-
needs: check-installer-ps1-bom
986+
needs: [check-installer-ps1-bom, check-version-consistency]
829987
steps:
830988
- uses: actions/checkout@v4
831989

sparrow-engine/CHANGELOG.md

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,52 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
88

99
## [Unreleased]
1010

11+
### Fixed
12+
13+
- **Phase 4.5 audit-fix Phase F (CI + Docker + release plumbing) — Round 1**
14+
2026-05-28 (HEAD `3052a70`, `6c6bbaf`); Round 2 hardening 2026-05-28.
15+
- **B-03**: new `check-version-consistency` preflight job in `.github/workflows/release.yml`
16+
enforces `git tag ↔ sparrow-engine-cli/Cargo.toml [package].version ↔
17+
sparrow-engine-python/pyproject.toml [project].version` agreement on tag-push.
18+
Round 2 (F-R2-4) extended enforcement to `workflow_dispatch` for the cli ↔ py pair
19+
so manual release rehearsals catch drift before tag-push time. All 8 release build
20+
jobs gain `needs: check-version-consistency`. `scripts/package_cli_tarball.sh`
21+
defaults VERSION from `cargo metadata` on `sparrow-engine-cli/Cargo.toml` when the
22+
caller doesn't set it, anchoring the tarball name on the same SSOT.
23+
- **B-04**: PyPI wheels (`sparrow-engine`, `sparrow-engine-gpu`) are decided Python-API
24+
only — the CLI binaries (`spe`, `spe-gpu`) and `sparrow-engine-server` do NOT ship
25+
via `pip`. Decision rationale in `sparrow-engine-python/pyproject.toml` `[tool.maturin]`
26+
comment (3 alternatives investigated, rejected on wheel-size + per-platform-binary
27+
grounds). Round 2 (F-R2-2) extended `[project].description` with the warning so the
28+
routing is visible on the PyPI project page; install via brew, system installer
29+
(`sparrow-engine-install.{sh,ps1}`), or GitHub Release tarball.
30+
- **B-06**: CPU Docker image (`docker/Dockerfile.cpu`) — bumped `ARG ORT_VERSION`
31+
from `1.24.2``1.25.1` (aligns with `onnxruntime>=1.25.1,<1.26` pin in
32+
`pyproject.toml` and the `api-24` ORT API the `sparrow-engine-cpu` Rust binding
33+
requires post commit `5c86dbf`). Added
34+
`RUN ln -sf libonnxruntime.so.1 /usr/local/lib/libonnxruntime.so && ldconfig`
35+
after the existing `RUN ldconfig` so `dlopen("libonnxruntime.so")` (bare unversioned
36+
name emitted by `ort/load-dynamic` when `ORT_DYLIB_PATH` is unset) resolves via
37+
`/usr/local/lib` filesystem search. Symlink anchored on the ldconfig-managed SONAME
38+
`libonnxruntime.so.1` — version-bump-resilient.
39+
- **B-07**: GPU Docker image (`docker/Dockerfile.gpu`) — same ORT bump `1.24.2`
40+
`1.25.1` (root cause identical to B-06: `api-24` Rust binding ↔ ORT 1.24.2 runtime
41+
mismatch caused the GPU server to boot and list models, then silently spin on the
42+
first CUDA EP / Session creation call). Same defensive
43+
`ln -sf libonnxruntime.so.1 /usr/local/lib/libonnxruntime.so && ldconfig` for parity
44+
with B-06. Round 2 verified end-to-end via CPU + GPU image rebuilds + `/healthz` +
45+
`/v1/health` smoke
46+
+ GPU `POST /v1/detect` against MDv6 — see
47+
`docs/review/phase4.5-cleanup-audit-fix-f/round_02/docker_smoke_results.txt`.
48+
- **F-R2-6** (round 2): new `check-version-consistency` step asserts
49+
`ARG ORT_VERSION` matches across `Dockerfile.cpu` and `Dockerfile.gpu`. Cheap grep
50+
guard against future ORT-bump drift (root cause family for B-06/B-07).
51+
- **build.sh / package_cli_tarball.sh observability** (round 1 `6c6bbaf`):
52+
`build.sh` post-build prints `[project.scripts]` entry-point names extracted from
53+
the built wheel via `wheel unpack` (returns 0 on absence — pure diagnostic).
54+
`package_cli_tarball.sh` defaults `VERSION` from `cargo metadata` when unset
55+
(CI callers pass explicit VERSION; the default keeps local invocations consistent).
56+
1157
### Changed
1258

1359
- **Phase 4 (Sparrow Engine-side data primitives for sibling integration) substantively complete**

sparrow-engine/docker/Dockerfile.cpu

Lines changed: 20 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -23,8 +23,15 @@ COPY sparrow-engine-cli/ sparrow-engine-cli/
2323
COPY sparrow-engine-python/ sparrow-engine-python/
2424
COPY vendor/ vendor/
2525

26-
# Download ORT shared library (dynamic linking avoids glibc 2.38+ requirement of static builds)
27-
ARG ORT_VERSION=1.24.2
26+
# Download ORT shared library (dynamic linking avoids glibc 2.38+ requirement of static builds).
27+
#
28+
# Phase F audit-fix round 1 (2026-05-28, B-06): bumped 1.24.2 -> 1.25.1.
29+
# Phase 3.8 Phase C commit 5c86dbf added `ort/load-dynamic` + `api-24` features to
30+
# `sparrow-engine-cpu/Cargo.toml`. `api-24` targets ORT runtime 1.25.x; the older
31+
# 1.24.2 tarball is API-incompatible and either silently spins or aborts at first
32+
# ORT call. Aligning the Docker ORT to the same `>=1.25.1,<1.26` pin the Python wheel
33+
# carries (sparrow-engine-python/pyproject.toml) keeps every consumer on one ORT major.minor.
34+
ARG ORT_VERSION=1.25.1
2835
RUN mkdir -p /build/ort-lib && \
2936
curl -sL https://github.com/microsoft/onnxruntime/releases/download/v${ORT_VERSION}/onnxruntime-linux-x64-${ORT_VERSION}.tgz | \
3037
tar xz --strip-components=1 -C /build/ort-lib
@@ -45,6 +52,17 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
4552
COPY --from=builder /build/target/release/sparrow-engine-server /usr/local/bin/
4653
COPY --from=builder /build/ort-lib/lib/libonnxruntime.so* /usr/local/lib/
4754
RUN ldconfig
55+
# Phase F audit-fix round 1 (2026-05-28, B-06): `sparrow-engine-server` is built with
56+
# `ort/load-dynamic` (sparrow-engine-cpu/Cargo.toml:14). The Rust binding dlopens the
57+
# bare unversioned name `libonnxruntime.so` when `ORT_DYLIB_PATH` is unset. `ldconfig`
58+
# above only indexes the SONAME `libonnxruntime.so.1` -> `libonnxruntime.so.1.25.1`;
59+
# the unversioned name has no `/etc/ld.so.cache` entry and the regular file copied by
60+
# the wildcard COPY above is invisible to dlopen-by-name lookup. Adding an explicit
61+
# symlink anchored on the ldconfig-managed SONAME makes the bare-name dlopen succeed
62+
# without ENV LD_LIBRARY_PATH or ENV ORT_DYLIB_PATH workarounds, and survives future
63+
# `ARG ORT_VERSION` bumps unchanged.
64+
# Refs: Phase C Lane 3 finding F2 (LD_DEBUG=libs trace) + Phase F reviewer_review.md B-06.
65+
RUN ln -sf libonnxruntime.so.1 /usr/local/lib/libonnxruntime.so && ldconfig
4866

4967
RUN groupadd -g 1000 sparrow-engine 2>/dev/null || true && useradd -u 1000 -g 1000 -s /sbin/nologin sparrow-engine 2>/dev/null || true
5068
USER 1000:1000

0 commit comments

Comments
 (0)