Skip to content

Karen/needle v2#690

Merged
jakmro merged 9 commits into
mainfrom
karen/needle_v2
Jun 10, 2026
Merged

Karen/needle v2#690
jakmro merged 9 commits into
mainfrom
karen/needle_v2

Conversation

@kar-m

@kar-m kar-m commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator

No description provided.

Copilot AI review requested due to automatic review settings June 4, 2026 16:43

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces first-class support for the Needle model family across the Python transpilation/conversion stack and the C++ runtime by adding a new metadata-driven encoder → cross-KV → decoder-step execution route, including optional NPU/CoreML source-encoder export and runtime loading.

Changes:

  • Add Needle family detection, naming rules, tokenizer conversion handling, and a new Needle ModelProfile with component-pipeline defaults.
  • Add component-spec generation for a new encoder_cross_kv_decoder_step runtime route (used by Needle and refactoring Whisper to use the same route metadata).
  • Extend the C++ engine/runtime to load per-component metadata from the bundle manifest, run the new route (including external cross-KV cache inputs), and add Needle-specific prompt/tooling support.

Reviewed changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
python/tests/test_encoder_cross_kv_route.py Adds a unit test validating Needle component spec routing and metadata.
python/cactus/transpile/npu/source.py Adds CoreML/ANE export for a generic “source encoder” component.
python/cactus/transpile/npu/pipeline.py Wires source_encoder into the NPU export pipeline.
python/cactus/transpile/model_profiles.py Adds Needle model profile and default_max_new_tokens to profiles.
python/cactus/transpile/model_adapters.py Adds Needle component adapters/spec builder and introduces shared encoder→crossKV→step spec helper; refactors Whisper spec generation.
python/cactus/transpile/lower.py Adds lowering support for external cross-KV cache inputs and cached-attention lowering path.
python/cactus/transpile/hf_model.py Writes per-component metadata into the manifest and updates component ordering.
python/cactus/transpile/component_plan.py Carries profile default_max_new_tokens through planning.
python/cactus/convert/model_adapters/naming.py Adds Needle-specific weight naming rewrites (gates + encoder layer prefixing).
python/cactus/convert/model_adapters/detection.py Adds needle to supported family detection.
python/cactus/convert/model_adapters/adapters.py Registers a needle family adapter.
python/cactus/convert/cactus_adapters/tokenizer.py Adds Needle tokenizer/model handling (SentencePiece path + extra special tokens handling).
python/cactus/convert/cactus_adapters/config_utils.py Adds Needle model-type detection and exports decoder start/prompt token IDs (incl. Whisper prompt IDs).
python/cactus/cli/model.py Uses profile-provided default_max_new_tokens when available.
cactus/CMakeLists.txt Adds a macOS curl fallback link strategy when the vendored curl archive is missing.
cactus-engine/src/utils.h Adds parsing for Needle <tool_call> output blocks before falling back to Gemma parsing.
cactus-engine/src/transcribe.cpp Switches Whisper default decoder prompt tokens from hard-coded text to config-driven IDs.
cactus-engine/src/tokenizer.cpp Adds Needle model-type detection and Needle-specific chat prompt formatting.
cactus-engine/src/npu_ane.mm Adds INT32 multi-input support for CoreML multi-array inputs.
cactus-engine/src/model.cpp Adds encoder→crossKV→step route selection via manifest metadata; adds external cross-KV cache packing/quantization helpers; adds Needle decode path and NPU source-encoder loading; extends config parsing.
cactus-engine/src/model_npu.cpp Adds NPU source-encoder loading and INT32 input plumbing for source encoding.
cactus-engine/src/engine.h Extends config with decoder start/prompt token IDs; adds Needle tokenizer type; extends NPU named input typing; adds new decode-route fields.
cactus-engine/src/constraints.cpp Adds Needle-specific tool-call constraining and trie-based name/arg-key token filtering.
cactus-engine/src/complete.cpp Adds Needle tool schema serialization and Needle-specific prompt/tool constraint behavior tweaks.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread cactus-engine/src/constraints.cpp Outdated
Comment on lines +318 to +323
case State::NEEDLE_START:
if (generated_text_.find("<tool_call>") != std::string::npos) {
state_ = State::DONE;
generated_text_.clear();
}
break;
Comment on lines +3551 to +3553
if (!decoder_start_token_seen) {
decoder_start_token_id = bos_token_id;
}
Comment on lines +5051 to +5055
components: tuple[str, ...] | None = None,
) -> list[ComponentModuleSpec] | None:
input_ids = named_tensors.get("input_ids")
if input_ids is None:
return None
@jakmro jakmro merged commit 7a4ad3a into main Jun 10, 2026
3 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants