Karen/needle v2 by kar-m · Pull Request #690 · cactus-compute/cactus

kar-m · 2026-06-04T16:43:09Z

No description provided.

…er between needle and whisper

Copilot

Pull request overview

This PR introduces first-class support for the Needle model family across the Python transpilation/conversion stack and the C++ runtime by adding a new metadata-driven encoder → cross-KV → decoder-step execution route, including optional NPU/CoreML source-encoder export and runtime loading.

Changes:

Add Needle family detection, naming rules, tokenizer conversion handling, and a new Needle ModelProfile with component-pipeline defaults.
Add component-spec generation for a new encoder_cross_kv_decoder_step runtime route (used by Needle and refactoring Whisper to use the same route metadata).
Extend the C++ engine/runtime to load per-component metadata from the bundle manifest, run the new route (including external cross-KV cache inputs), and add Needle-specific prompt/tooling support.

Reviewed changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
python/tests/test_encoder_cross_kv_route.py	Adds a unit test validating Needle component spec routing and metadata.
python/cactus/transpile/npu/source.py	Adds CoreML/ANE export for a generic “source encoder” component.
python/cactus/transpile/npu/pipeline.py	Wires `source_encoder` into the NPU export pipeline.
python/cactus/transpile/model_profiles.py	Adds Needle model profile and `default_max_new_tokens` to profiles.
python/cactus/transpile/model_adapters.py	Adds Needle component adapters/spec builder and introduces shared encoder→crossKV→step spec helper; refactors Whisper spec generation.
python/cactus/transpile/lower.py	Adds lowering support for external cross-KV cache inputs and cached-attention lowering path.
python/cactus/transpile/hf_model.py	Writes per-component metadata into the manifest and updates component ordering.
python/cactus/transpile/component_plan.py	Carries profile `default_max_new_tokens` through planning.
python/cactus/convert/model_adapters/naming.py	Adds Needle-specific weight naming rewrites (gates + encoder layer prefixing).
python/cactus/convert/model_adapters/detection.py	Adds `needle` to supported family detection.
python/cactus/convert/model_adapters/adapters.py	Registers a `needle` family adapter.
python/cactus/convert/cactus_adapters/tokenizer.py	Adds Needle tokenizer/model handling (SentencePiece path + extra special tokens handling).
python/cactus/convert/cactus_adapters/config_utils.py	Adds Needle model-type detection and exports decoder start/prompt token IDs (incl. Whisper prompt IDs).
python/cactus/cli/model.py	Uses profile-provided `default_max_new_tokens` when available.
cactus/CMakeLists.txt	Adds a macOS curl fallback link strategy when the vendored curl archive is missing.
cactus-engine/src/utils.h	Adds parsing for Needle `<tool_call>` output blocks before falling back to Gemma parsing.
cactus-engine/src/transcribe.cpp	Switches Whisper default decoder prompt tokens from hard-coded text to config-driven IDs.
cactus-engine/src/tokenizer.cpp	Adds Needle model-type detection and Needle-specific chat prompt formatting.
cactus-engine/src/npu_ane.mm	Adds INT32 multi-input support for CoreML multi-array inputs.
cactus-engine/src/model.cpp	Adds encoder→crossKV→step route selection via manifest metadata; adds external cross-KV cache packing/quantization helpers; adds Needle decode path and NPU source-encoder loading; extends config parsing.
cactus-engine/src/model_npu.cpp	Adds NPU source-encoder loading and INT32 input plumbing for source encoding.
cactus-engine/src/engine.h	Extends config with decoder start/prompt token IDs; adds Needle tokenizer type; extends NPU named input typing; adds new decode-route fields.
cactus-engine/src/constraints.cpp	Adds Needle-specific tool-call constraining and trie-based name/arg-key token filtering.
cactus-engine/src/complete.cpp	Adds Needle tool schema serialization and Needle-specific prompt/tool constraint behavior tweaks.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+            case State::NEEDLE_START:
+                if (generated_text_.find("<tool_call>") != std::string::npos) {
+                    state_ = State::DONE;
+                    generated_text_.clear();
+                }
+                break;


+    if (!decoder_start_token_seen) {
+        decoder_start_token_id = bos_token_id;
+    }


+    components: tuple[str, ...] | None = None,
+) -> list[ComponentModuleSpec] | None:
+    input_ids = named_tensors.get("input_ids")
+    if input_ids is None:
+        return None


Signed-off-by: jakmro <kubamroz124@gmail.com>

kar-m added 5 commits June 1, 2026 18:49

added needle and made a shared abstraction for encoder-cross_kv-decod…

17afa52

…er between needle and whisper

npu works and constrained generation added

6abe05a

cleanup

43fd7f9

merge origin v2 into working branch

1caa8a5

made needle max tokens 1024 by default

1636e0b

Copilot AI review requested due to automatic review settings June 4, 2026 16:43

Copilot started reviewing on behalf of kar-m June 4, 2026 16:43 View session

Copilot AI reviewed Jun 4, 2026

View reviewed changes

jakmro added 4 commits June 8, 2026 19:05

Merge branch 'main' into karen/needle_v2

5a88d04

drop dead kwargs and drive-by edits

573609c

Signed-off-by: jakmro <kubamroz124@gmail.com>

fixes

252bf6e

Merge branch 'main' into karen/needle_v2

4924e70

jakmro merged commit 7a4ad3a into main Jun 10, 2026
3 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Karen/needle v2#690

Karen/needle v2#690
jakmro merged 9 commits into
mainfrom
karen/needle_v2

kar-m commented Jun 4, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kar-m commented Jun 4, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants