Skip to content

Performance Anomaly: Setting nglto Model Layer Count Causes Unexpected Slowdown in LM Studio! #1649

@zzh799

Description

@zzh799

Which version of LM Studio?
Example: LM Studio 0.4.6

Which operating system?

• User Hardware:
◦ OS: Win11

◦   CPU: Intel Core i7-12700

◦   RAM: 32 GB

◦   GPU: AMD Radeon RX 7900 XT

• Testing Software & Versions:

◦   LM Studio (using the ROCm backend of llamacpp 2.7.0)

◦   Ollama 0.18.0

◦   Standalone llamacpp executable (llama-b8327-bin-win-hip-radeon-x64)

What is the bug?
Bug Report

Basic Information
• Model: DeepSeek R1 Distill Qwen 14B (Known to have 48 layers)

Problem Description
There is a significant and unreasonable discrepancy in the generation token speed (Generation t/s) when the same model is run on different inference backends. The core observation is: When the GPU offloading layer count parameter (ngl) is set to the model's total number of layers (48), the generation speed is the slowest. Conversely, increasing the ngl parameter (beyond the model's total layers, e.g., 99) or adjusting it slightly (e.g., 49) results in a significant performance improvement. This strongly suggests a potential issue with the logic for setting the GPU offloading layer count or parameter passing when LM Studio calls llamacpp.

Detailed Test Data Comparison

Environment / Parameter Setting Prompt Processing Speed (Prompt t/s) Generation Speed (Generation t/s) Notes

Ollama 0.18.0 Not separately listed ~80 t/s (total speed) Used as a performance baseline.

LM Studio (llamacpp 2.7.0) Not separately listed ~50 t/s (total speed) Speed is significantly lower than Ollama, GPU Offload is 48.

Standalone llamacpp (ngl 48) 106.8 t/s 37.2 t/s Key Finding: Setting ngl equal to the model's total layers (48) yields the slowest generation speed.

Standalone llamacpp (ngl 99) 134.0 t/s 57.1 t/s Setting ngl to a value far exceeding the model's layers (99) improves generation speed by over 50%.

Standalone llamacpp (ngl 49) 140.5 t/s 56.6 t/s Setting ngl slightly above the model's layers (49) also results in a substantial speed boost, similar to ngl 99.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions