Skip to content

Commit d73cd07

Browse files
arnu515ngxson
andauthored
graph: Fix granite speech model inference by applying embedding scale when deepstack is not used (#24357)
* llama-graph : apply embedding scale when deepstack is not used * nits: remove non-existant hunyuan-vl from the tests * apply suggestion from @gabe-l-hart --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co>
1 parent e25a32e commit d73cd07

2 files changed

Lines changed: 3 additions & 4 deletions

File tree

src/llama-graph.cpp

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1873,9 +1873,9 @@ ggml_tensor * llm_graph_context::build_inp_embd(ggml_tensor * tok_embd) const {
18731873
res->t_inp_embd = cur;
18741874

18751875
// For Granite architecture
1876-
// NOTE: Only apply scale to token inputs. Raw embeddings are assumed to be
1877-
// multimodal inputs that should not be scaled.
1878-
if (ubatch.token && hparams.f_embedding_scale != 0.0f) {
1876+
// NOTE: For deepstack models, only apply scale to token inputs (ie text-only input).
1877+
// Raw embeddings are assumed to be multimodal inputs that should not be scaled.
1878+
if (hparams.f_embedding_scale != 0.0f && (ubatch.token || hparams.n_deepstack_layers == 0)) {
18791879
if (!ggml_is_contiguous(cur)) {
18801880
cur = ggml_cont(ctx0, cur);
18811881
}

tools/mtmd/tests.sh

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,6 @@ add_test_vision "ggml-org/LightOnOCR-1B-1025-GGUF:Q8_0"
9191
add_test_vision "ggml-org/DeepSeek-OCR-GGUF:Q8_0" -p "Free OCR." --chat-template deepseek-ocr
9292
add_test_vision "ggml-org/dots.ocr-GGUF:Q8_0" -p "OCR"
9393
add_test_vision "ggml-org/HunyuanOCR-GGUF:Q8_0" -p "OCR"
94-
add_test_vision "ggml-org/HunyuanVL-4B-GGUF:Q8_0"
9594
add_test_vision "ggml-org/gemma-4-E2B-it-GGUF:Q8_0" --jinja
9695

9796
add_test_audio "ggml-org/ultravox-v0_5-llama-3_2-1b-GGUF:Q8_0"

0 commit comments

Comments
 (0)