Performance Anomaly: Setting nglto Model Layer Count Causes Unexpected Slowdown in LM Studio!

**Which version of LM Studio?**
Example: LM Studio 0.4.6

**Which operating system?**

•   User Hardware:
    ◦   OS: Win11

    ◦   CPU: Intel Core i7-12700

    ◦   RAM: 32 GB

    ◦   GPU: AMD Radeon RX 7900 XT

•   Testing Software & Versions:

    ◦   LM Studio (using the ROCm backend of llamacpp 2.7.0)

    ◦   Ollama 0.18.0

    ◦   Standalone llamacpp executable (llama-b8327-bin-win-hip-radeon-x64)
**What is the bug?**
Bug Report

Basic Information
•   Model: DeepSeek R1 Distill Qwen 14B (Known to have 48 layers)

Problem Description
There is a significant and unreasonable discrepancy in the generation token speed (Generation t/s) when the same model is run on different inference backends. The core observation is: When the GPU offloading layer count parameter (ngl) is set to the model's total number of layers (48), the generation speed is the slowest. Conversely, increasing the ngl parameter (beyond the model's total layers, e.g., 99) or adjusting it slightly (e.g., 49) results in a significant performance improvement. This strongly suggests a potential issue with the logic for setting the GPU offloading layer count or parameter passing when LM Studio calls llamacpp.

Detailed Test Data Comparison

Environment / Parameter Setting Prompt Processing Speed (Prompt t/s) Generation Speed (Generation t/s) Notes

Ollama 0.18.0 Not separately listed ~80 t/s (total speed) Used as a performance baseline.

LM Studio (llamacpp 2.7.0) Not separately listed ~50 t/s (total speed) Speed is significantly lower than Ollama, GPU Offload is 48.

Standalone llamacpp (ngl 48) 106.8 t/s 37.2 t/s Key Finding: Setting ngl equal to the model's total layers (48) yields the slowest generation speed.

Standalone llamacpp (ngl 99) 134.0 t/s 57.1 t/s Setting ngl to a value far exceeding the model's layers (99) improves generation speed by over 50%.

Standalone llamacpp (ngl 49) 140.5 t/s 56.6 t/s Setting ngl slightly above the model's layers (49) also results in a substantial speed boost, similar to ngl 99.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Anomaly: Setting nglto Model Layer Count Causes Unexpected Slowdown in LM Studio! #1649

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Performance Anomaly: Setting nglto Model Layer Count Causes Unexpected Slowdown in LM Studio! #1649

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions