-
Notifications
You must be signed in to change notification settings - Fork 42
Description
Feature Description
Add first-class support for models that produce multiple output tensors (e.g., YOLO/SSD/EfficientDet with separate heads or multi-scale outputs).
Problem Statement
Right now Zant assumes a single output tensor at inference time. This limits us to models where post-processing expects one tensor. Many object-detection architectures emit several tensors (boxes, scores, classes; or multi-scale feature maps). Without multi-output support, these models can’t be deployed cleanly, require ad-hoc graph surgery, or force lossy concatenation that breaks memory/layout assumptions.
Suggested Solution
-
Runtime API: Extend inference to return N outputs with metadata.
- Zig: return
[]Tensor(slice) or a struct{ tensors: []Tensor, names: [][]const u8 }. - C FFI: expose an array of buffers + shapes + dtypes + names; keep single-output API for backward compatibility.
- Zig: return
-
Codegen: Parse ONNX graph outputs in order, preserve names and shapes, and emit a static descriptor for each output so we can pre-plan memory.
-
Memory model: Allow either (a) separate buffers per output, or (b) one contiguous arena with offsets per output for embedded targets; pick via a codegen flag.
-
Post-processing hooks: Let users map outputs by name or index to their post-proc (e.g., NMS) without manual tensor reshaping.
-
Validation: Add tests with a toy model that emits two outputs (e.g., logits + aux) and a small YOLO-like graph (multi-scale heads).
Additional Context
Common cases: YOLOv5/8/11, SSD, BlazeFace/MediaPipe models produce multiple heads. Multi-output support will unblock a broad class of detection models and reduce custom forks/workarounds on embedded targets. Backward compatibility: keep the current single-tensor path; if a model has one output nothing changes.