You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* feat(config): add chat_template_kwargs model field + resolver
Adds the ChatTemplateKwargs model-config map and RequestMetadata carrier,
plus ResolveChatTemplateKwargs which layers the config map under coerced
request metadata. Foundation for generic jinja chat-template kwargs (issue #10329).
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(backend): forward resolved chat_template_kwargs blob to backends
gRPCPredictOpts now merges per-request client metadata over the server-derived
enable_thinking/reasoning_effort (reaching all backends via the standalone keys)
and serialises the resolved chat_template_kwargs map into a JSON blob for
llama.cpp, written last so a client cannot clobber it. Issue #10329.
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(http): wire request metadata to config.RequestMetadata
The OpenAI request metadata field was parsed but unused; stamp it onto the
per-request ModelConfig so gRPCPredictOpts forwards it as chat_template_kwargs
overrides. Issue #10329.
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* feat(llama-cpp): generic chat_template_kwargs merge (drop per-key blocks)
Replace the per-key enable_thinking/reasoning_effort handling in both the
streaming and non-streaming chat paths with a single block that parses the
chat_template_kwargs JSON blob resolved by the Go layer and merges every key
into body_json. New jinja template levers (e.g. preserve_thinking) now need
no C++ change. Issue #10329.
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* docs: document custom chat_template_kwargs (model + per-request)
Issue #10329.
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* test(backend): pin reasoning_effort as a string in the chat_template_kwargs blob
Issue #10329.
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* test(http): e2e guard pinning chat_template_kwargs forwarded to gRPC
Adds an ECHO_PREDICT_METADATA marker to the mock-backend that echoes the
received PredictOptions.Metadata, and an app_test.go spec that drives a real
/v1/chat/completions request (model chat_template_kwargs + per-request metadata
override) and asserts the exact metadata + chat_template_kwargs blob the REST
layer forwards to gRPC. Locks the REST->gRPC contract against regressions. Issue #10329.
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
* test(config): grandfather chat_template_kwargs in registry coverage
chat_template_kwargs is a free-form map[string]any (like engine_args, already
on the list), not a scalar the config UI registry can surface, so it is exempt
from the registry-entry requirement. Fixes the TestAllFieldsHaveRegistryEntries
failure introduced by the new field. Issue #10329.
Assisted-by: Claude:claude-opus-4-8
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
---------
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Co-authored-by: Ettore Di Giacinto <mudler@localai.io>
FeatureFlagFeatureFlag`yaml:"feature_flags,omitempty" json:"feature_flags,omitempty"`// Feature Flag registry. We move fast, and features may break on a per model/backend basis. Registry for (usually temporary) flags that indicate aborting something early.
0 commit comments