Skip to content

Commit 1c92b00

Browse files
localai-botmudler
andauthored
fix(turboquant): guard upstream-only grpc-server fields for fork (#10043)
fix(turboquant): guard upstream-only grpc-server fields for fork build backend/cpp/llama-cpp/grpc-server.cpp is reused by the turboquant build, which compiles against an older llama.cpp fork (TheTom/llama-cpp-turboquant). Two recent changes added references to upstream-only struct fields outside the existing LOCALAI_LEGACY_LLAMA_CPP_SPEC guards: - common_params::checkpoint_min_step (default + option handler), added with the ggml-org/llama.cpp 35c9b1f3 bump (#9998) - the common_params_speculative::draft tensor_buft_overrides sentinel termination (#9919), which sat after the guard's #endif The fork has neither field, so grpc-server.cpp failed to compile for every turboquant flavor. Wrap the three references in #ifndef LOCALAI_LEGACY_LLAMA_CPP_SPEC, matching the existing fork-compat guards, so the stock llama-cpp build is unchanged and the fork build skips them. Update patch-grpc-server.sh's doc comment to record what the macro now gates out. Verified by a local fallback-flavor turboquant build: grpc-server.cpp compiles against the fork and the backend image builds. Assisted-by: Claude:claude-opus-4-7 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io> Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
1 parent b81a6d0 commit 1c92b00

2 files changed

Lines changed: 24 additions & 3 deletions

File tree

backend/cpp/llama-cpp/grpc-server.cpp

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -573,8 +573,12 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt
573573
// checkpoint_min_step: minimum spacing between context checkpoints in
574574
// tokens (0 disables the minimum). Match upstream's default (256). This
575575
// field was renamed from `checkpoint_every_nt` in llama.cpp; the semantics
576-
// also shifted from a fixed cadence to a minimum spacing.
576+
// also shifted from a fixed cadence to a minimum spacing. The turboquant
577+
// fork branched before the field existed, so skip it on the legacy path
578+
// (LOCALAI_LEGACY_LLAMA_CPP_SPEC is injected by patch-grpc-server.sh).
579+
#ifndef LOCALAI_LEGACY_LLAMA_CPP_SPEC
577580
params.checkpoint_min_step = 256;
581+
#endif
578582

579583
// decode options. Options are in form optname:optvale, or if booleans only optname.
580584
for (int i = 0; i < request->options_size(); i++) {
@@ -748,11 +752,18 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt
748752
params.cache_idle_slots = false;
749753
}
750754

755+
#ifndef LOCALAI_LEGACY_LLAMA_CPP_SPEC
751756
// --- minimum context-checkpoint spacing (upstream -cms / --checkpoint-min-step) ---
752757
// 0 disables the minimum-spacing gate. Old option names (`checkpoint_every_nt`,
753758
// `checkpoint_every_n_tokens`) are kept as aliases for backward compatibility
754759
// with existing user configs: upstream renamed the field and shifted its
755760
// semantics from a fixed cadence to a minimum spacing.
761+
//
762+
// Gated out for the turboquant fork, which lacks common_params::
763+
// checkpoint_min_step. The leading `}` closing the cache_idle_slots
764+
// branch is removed with this block; the next `} else if` (n_ubatch)
765+
// then closes cache_idle_slots, so braces stay balanced under both
766+
// preprocessor branches.
756767
} else if (!strcmp(optname, "checkpoint_min_step") || !strcmp(optname, "checkpoint_min_spacing") ||
757768
!strcmp(optname, "checkpoint_every_nt") || !strcmp(optname, "checkpoint_every_n_tokens")) {
758769
if (optval != NULL) {
@@ -762,6 +773,7 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt
762773
// If conversion fails, keep default value (256)
763774
}
764775
}
776+
#endif
765777

766778
// --- physical batch size (upstream -ub / --ubatch-size) ---
767779
// Note: line ~482 already aliases n_ubatch to n_batch as a default; this
@@ -1165,9 +1177,15 @@ static void params_parse(server_context& /*ctx_server*/, const backend::ModelOpt
11651177
params.tensor_buft_overrides.push_back({nullptr, nullptr});
11661178
}
11671179
}
1180+
// The draft tensor_buft_overrides are only populated under the modern
1181+
// (post-#22838) layout, whose population code is itself gated by
1182+
// LOCALAI_LEGACY_LLAMA_CPP_SPEC above. The turboquant fork lacks
1183+
// common_params_speculative::draft entirely, so skip the sentinel there too.
1184+
#ifndef LOCALAI_LEGACY_LLAMA_CPP_SPEC
11681185
if (!params.speculative.draft.tensor_buft_overrides.empty()) {
11691186
params.speculative.draft.tensor_buft_overrides.push_back({nullptr, nullptr});
11701187
}
1188+
#endif
11711189

11721190
// TODO: Add yarn
11731191

backend/cpp/turboquant/patch-grpc-server.sh

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -124,8 +124,11 @@ fi
124124
# 5. Define LOCALAI_LEGACY_LLAMA_CPP_SPEC at the top of the file so the
125125
# grpc-server option parser skips the new option-handler blocks (ngram_mod,
126126
# ngram_map_k, ngram_map_k4v, ngram_cache, draft.cache_type_*, draft.cpuparams*,
127-
# draft.tensor_buft_overrides) introduced for the post-#22838 layout. Those
128-
# blocks reference struct fields that simply do not exist in the fork.
127+
# draft.tensor_buft_overrides) introduced for the post-#22838 layout, the
128+
# draft.tensor_buft_overrides sentinel termination, and the
129+
# common_params::checkpoint_min_step default/option (added with the
130+
# 35c9b1f3 bump). Those blocks reference struct fields that simply do not
131+
# exist in the fork.
129132
if grep -q '^#define LOCALAI_LEGACY_LLAMA_CPP_SPEC' "$SRC"; then
130133
echo "==> $SRC already defines LOCALAI_LEGACY_LLAMA_CPP_SPEC, skipping"
131134
else

0 commit comments

Comments
 (0)