GTX 1650: disambiguate GDDR5/GDDR6 variants by memory clock (measured 192 GB/s) by cms-pm · Pull Request #115 · Andyyyy64/whichllm

cms-pm · 2026-06-15T04:04:22Z

Summary

GPU_BANDWIDTH["GTX 1650"] = 128.0 is correct for the original GDDR5 card,
but the GTX 1650 also shipped in a later GDDR6 revision at 192 GB/s
(12 Gbps × 128-bit). Both use the same TU117 die and PCI device id 0x1F82, and
the driver reports the same name, so the curated single value under-states the
GDDR6 variant by 50% — which under-predicts tok/s and under-recommends models for
a card that is still widely sold.

This PR disambiguates the two by max memory clock, the only reliable
discriminator, and leaves 128 as the conservative default when the clock is
unknown.

Change

data/gpu.py: GPU_MEMORY_CLOCK_VARIANTS — GTX 1650 → 192 GB/s when max
memory clock ≥ 5500 MHz (GDDR6 ≈ 6001), else 128 (GDDR5 ≈ 4001). The 5500 split
sits well clear of both regimes.
gpu_db.py: resolve_detected_bandwidth(name, vram_bytes, mem_clock_mhz=None)
consults the variant table first, then the existing curated/dbgpu path.
Behaviour is byte-identical when mem_clock_mhz is unknown.
nvidia.py: capture max memory clock via NVML (NVML_CLOCK_MEM) and the
nvidia-smi fallback (clocks.max.memory). The smi query retries without the
clock field if a driver rejects it, so an optional field can never reduce
detection to zero GPUs.

Validation (measured, not inferred)

Clock-locked, sole-tenant, llama.cpp build 2cbfdc6, on a confirmed GDDR6
board (VBIOS 90.17.4D.00.1E, nvidia-smi clocks.max.memory = 6001 MHz):

model (Q4_K_M)	measured tok/s	est @128	est @192
Qwen3-1.7B	75.4 ± 0.12	52 (m/p 1.45)	78 (m/p 0.97)
Llama-3.2-3B	49.6 ± 0.03	35 (m/p 1.42)	52 (m/p 0.94)

At 128, whichllm under-predicts by ~45%; at 192 it lands within ~6%. The patched
detector returns ('NVIDIA GeForce GTX 1650', 192.0) on the card.

Data & reproduction. Methodology and the full four-model calibration are in
the accompanying paper (SSRN 6941538, The kv4 Trade-off Is Workload-Dependent);
the measured evidence + clock-locked harness are public and reproducible at
https://github.com/cms-pm/kv4-edge-inference (analysis/analyze.py reprints the
tok/s used here from the shipped data). This mirrors how PR #75 cited a paper for
a legacy GPU — but with measured, independently re-runnable numbers.

Tests

tests/test_gtx1650_variants.py — 15 tests: resolver disambiguation incl.
threshold boundary and back-compat (unknown clock → 128); nvidia-smi parse incl.
[N/A] clock and the 3-field-fails→2-field-retry regression guard; and a
measured-calibration check (192 estimate scales 1.5× over 128 and lands near the
measured 75.4). Full suite green.

Scope and honest limitations

Cross-platform via NVML/nvidia-smi; only the bare-WMI fallback lacks a clock.
detect_hardware() calls detect_nvidia_gpus() on every OS, and that path uses
pynvml (nvml.dll) with an nvidia-smi fallback — both of which ship with the
NVIDIA driver on Windows. So a GDDR6 GTX 1650 on a normal Windows box is
disambiguated by the same memory-clock code as Linux. The 128 default is only
reached in the narrow case where NVML and nvidia-smi both fail and detection
falls through to the pure-WMI path (hardware/windows.py), which has no memory
clock (Win32_VideoController does not expose VRAM clock). Teaching that
fallback to shell nvidia-smi is a small, hardware-free follow-up; flagged here
so the one remaining gap is explicit, not accidental.
Generality. The mechanism is intentionally generic — the same name+clock
ambiguity affects other cards, most notably the GT 1030 (DDR4 ~16 GB/s vs
GDDR5 ~48 GB/s), a 3× error. This PR populates only the GTX 1650 (the variant
measured here); GT 1030 and similar can be added to the same table as data
without further code.
The GDDR5 clock (~4001 MHz) is from NVIDIA's 8 Gbps spec, not independently
measured here; only the GDDR6 board was measured. The 5500 threshold is safe
given the 8-vs-12 Gbps gap regardless.

Backward compatibility

Additive only. All pre-existing tests pass; the new mem_clock_mhz parameters
default to None, preserving exact prior behaviour for every code path that does
not supply a clock.

The GTX 1650 ships in two memory configurations the driver name and PCI device id (0x1F82) cannot tell apart: original GDDR5 (8 Gbps x 128-bit = 128 GB/s) and a later GDDR6 revision (12 Gbps x 128-bit = 192 GB/s). whichllm's single "GTX 1650": 128.0 is right for GDDR5 but under-states GDDR6 by 50%, which then under-predicts tok/s and under-recommends models. Resolve by max memory clock at detection time (the only reliable discriminator): - data/gpu.py: GPU_MEMORY_CLOCK_VARIANTS maps GTX 1650 -> 192 when max mem clock >= 5500 MHz (GDDR6 ~6001) else 128 (GDDR5 ~4001). 128 stays the curated default. - gpu_db.py: resolve_detected_bandwidth(..., mem_clock_mhz) tries the variant first, then the existing curated/dbgpu path. Identical behaviour when unknown. - nvidia.py: capture max mem clock via NVML (NVML_CLOCK_MEM) and nvidia-smi (clocks.max.memory). The smi query retries without the clock field on error so a missing optional field can never wipe out detection. Validation (clock-locked, sole-tenant, build 2cbfdc6 on a GDDR6 board, VBIOS 90.17.4D.00.1E): nvidia-smi clocks.max.memory = 6001 MHz; Qwen3-1.7B Q4_K_M decodes 75.4 tok/s, matching the 192 estimate (~78) and not 128's (~52). The patched detector returns ('NVIDIA GeForce GTX 1650', 192.0) on the card. Back-compatible: all existing tests pass; mem-clock params default to None.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GTX 1650: disambiguate GDDR5/GDDR6 variants by memory clock (measured 192 GB/s)#115

GTX 1650: disambiguate GDDR5/GDDR6 variants by memory clock (measured 192 GB/s)#115
cms-pm wants to merge 1 commit into
Andyyyy64:mainfrom
cms-pm:gtx1650-gddr6-variant

cms-pm commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cms-pm commented Jun 15, 2026

Summary

Change

Validation (measured, not inferred)

Tests

Scope and honest limitations

Backward compatibility

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants