Skip to content

Add llama.acceleration.fit#560

Open
will-lms wants to merge 22 commits intomainfrom
will/llama-fit
Open

Add llama.acceleration.fit#560
will-lms wants to merge 22 commits intomainfrom
will/llama-fit

Conversation

@will-lms
Copy link
Contributor

Add llama.acceleration.fit to the KV Config schema. This is a new boolean that will enable / disable llama.cpp's -fit model loading logic.

When enabled, the other GPU parameters will be ignored.

To maintain compatibility with existing SDK usage, I set the default value in schema to false. This allows a call such as load({gpu: {ratio: 0.5}}) to still work as originally intended. I plan to default the standard GUI and CLI paths to true though via explicit overrides in those locations.

@github-actions github-actions bot added the CLA signed Indicates if all contributors have signed the CLA label Mar 12, 2026
@will-lms
Copy link
Contributor Author

Review

Architecture / Unintended Consequences

At a high level this PR is well-scoped. The public type, schema, conversion logic, and regression tests all move together.

The only subtle area is the inferred fit=false behavior, because it is intentionally encoding precedence rules from the config stack into the converter. In this case that is justified: the converter is exactly where callers expect “preserve my intent when serializing config” logic to live.

I do not see broader architectural problems or unintended cross-package consequences in the actual origin/main...HEAD diff.

Findings

No blocking findings in the corrected PR diff.

Residual Risk

The only minor gap I see is that the tests are centered on partial-config behavior. There is no targeted test for the useDefaultsForMissingKeys: true path in kvConfigToLLMLoadModelConfig, but that is a testing gap rather than a clear defect in this PR.

@will-lms will-lms marked this pull request as ready for review March 12, 2026 04:52
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5e639b1c19

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

* where 0 means no work is distributed and 1 means all work is distributed. Can also specify the
* string "off" to mean 0 and the string "max" to mean 1.
*
* When `fit` is true, this field is ignored.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debate: should this be ignored, or should we throw?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do share the concern of this type of implicit ignorance of parameters/relationship between parameters here, and wonder if there's a better way to represent the relationship (maybe not)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential language enforced mechanism for this general type of thing: Discriminated union where you can't set the other params if you set fit. Haven't thought through what the backwards compat impact of doing this here would be, but the relationship itself does seem discriminatory?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to be backwards compatible, I think we have to ignore.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 94c807417c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1222ccd6ff

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

}
`;

exports[`LLM with default model fixture Can tokenize correctly 1`] = `

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Remove stale snapshot keys after renaming heavy tests

The snapshot file now contains both the old LLM ... keys and the new LLM with default model fixture ... keys, but LLM.heavy.test.ts only defines the renamed tests (see the describe("with default model fixture") block), so running this heavy test file will report obsolete snapshots and fail until they are cleaned up. Keeping both sets of keys makes the snapshot suite unstable for anyone running heavy tests.

Useful? React with 👍 / 👎.

"identifier": Any<String>,
"indexedModelIdentifier": "lmstudio-community/Qwen2.5-0.5B-Instruct-GGUF/Qwen2.5-0.5B-Instruct-Q4_K_M.gguf",
"instanceReference": Any<String>,
"lastUsedTime": 1773374807159,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Stop snapshotting volatile last-used timestamps

This snapshot pins lastUsedTime to a concrete Unix timestamp, but lastUsedTime is inherently time-dependent model metadata, so future heavy-test runs will drift and fail even when behavior is correct. Add a property matcher (e.g. expect.any(Number)/null) for this field instead of hardcoding a specific value.

Useful? React with 👍 / 👎.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 45725f5624

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

config.gpu?.ratio !== undefined
|| config.gpu?.numCpuExpertLayersRatio !== undefined
|| config.gpu?.mainGpu !== undefined
|| config.gpu?.splitStrategy !== undefined;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Exclude default splitStrategy from fit inference

Inferring fit=false from any defined gpu.splitStrategy causes a regression when the strategy was auto-populated as the default ("evenly") rather than explicitly chosen by the caller. Fresh evidence: starting from { gpu: { disabledGpus: [1] } }, llmLoadModelConfigToKVConfig writes a split config, kvConfigToLLMLoadModelConfig reconstructs gpu.splitStrategy: "evenly", and this condition then forces llama.fit=false on the next serialization, which unexpectedly overrides higher-priority defaults where fit should remain enabled (disabled GPUs are documented as fit-compatible).

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA signed Indicates if all contributors have signed the CLA

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants