gemma-4-31B-it: hot-patch transformers CUDA→numpy crash on image requests by lloydmak99 · Pull Request #52 · nearai/cvm-compose-files

lloydmak99 · 2026-05-30T06:06:28Z

Problem

google/gemma-4-31B-it returns HTTP 500 on a subset of multimodal (image) requests:

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

Live on both backends (gpu11 + gpu02), ~1000 errors / 15 min. Valid images succeed; the crash fires for inputs SGLang decodes to a GPU tensor (video data-URLs, broken image URLs). Tracking: nearai/infra#156.

Root cause

A bare image.numpy() on a CUDA tensor at transformers/image_processing_backends.py:458, reached via the gemma4 image processor (processing_gemma4.py → image_processing_pil_gemma4.py). The earlier --disable-fast-image-processor (v0.0.196) only closed the generic fast-processor path; this path is unaffected because the tensor is already on GPU upstream of that flag.

Fix

Wrap sglang serve in a shell that sed-patches the offending line to image.cpu().numpy() before launch:

.cpu() is a no-op on CPU tensors → idempotent and safe across restarts (the pattern no longer matches once patched).
Avoids rebuilding the pinned lmsysorg/sglang:gemma4@sha256:87cecd… image.
All serve flags unchanged (incl. --disable-fast-image-processor); exec keeps sglang as PID 1 under init: true for correct signal handling.

Verification

Pre-patch: valid PNG (any size) → 200; broken image URL and video data-URL → 500 on both gpu11 + gpu02 (reproduced).
YAML validated; wrapper shape + sed + all serve flags asserted intact.
Post-merge deploy must force-recreate the gemma container on gpu11 + gpu02 (compose-manager compose/up with force_recreate:true) — a plain up won't recreate it.

Follow-up (not in this PR)

Upstream fix belongs in transformers (gemma4 image processor should .cpu() before .numpy()); this is a deploy-side hotfix until the image is rebuilt/repinned on a fixed transformers.

…ests Multimodal requests to gemma-4-31B-it return HTTP 500 with `TypeError: can't convert cuda:0 device type tensor to numpy` for inputs SGLang decodes to a GPU tensor (video data-URLs, broken image URLs, etc). The crash is a bare `image.numpy()` on a CUDA tensor at transformers/image_processing_backends.py:458, reached via the gemma4 image processor. `--disable-fast-image-processor` (added in v0.0.196) only closed the generic fast-processor path; this second path is unaffected because the tensor is already on GPU upstream of that flag. Wrap `sglang serve` in a shell that sed-patches the line to `image.cpu().numpy()` before launch. `.cpu()` is a no-op on CPU tensors, so the patch is idempotent and safe across restarts. Avoids rebuilding the pinned SGLang image; all serve flags (incl. --disable-fast-image-processor) are unchanged. Verified: valid images already return 200; video/broken-URL inputs reproduce the 500 on both backends pre-patch. See nearai/infra#156.

The sed patch could silently no-op (path moved on a python/transformers bump, image repin, or pattern change) and `sed` returns 0 on no-match, so sglang would start unpatched and resume 500ing on image requests with no signal — text traffic stays green and the error rate barely moves. Add a post-sed `grep` guard that aborts startup (exit 1) unless the fixed `image.cpu().numpy()` form is present. Checking the fixed form (not that sed changed something) also tolerates a future image that already carries the upstream fix. Use `$$BACKENDS` so docker compose doesn't interpolate it.

PierreLeGuen

Reviewed the PR diff and surrounding Gemma service config in small-models.yaml. Verified the rendered Compose command, the pinned image entrypoint/config, and the baked transformers commit/source path containing the targeted image.numpy() line. Ran git diff --check and docker compose config --quiet for all YAML files; all passed. CI validate checks are green. I did not run a live GPU/model request locally.

lloydmak99 requested review from Evrard-Nil and PierreLeGuen May 30, 2026 06:10

PierreLeGuen approved these changes May 30, 2026

View reviewed changes

lloydmak99 merged commit 36e1a46 into main May 30, 2026
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gemma-4-31B-it: hot-patch transformers CUDA→numpy crash on image requests#52

gemma-4-31B-it: hot-patch transformers CUDA→numpy crash on image requests#52
lloydmak99 merged 2 commits into
mainfrom
fix/gemma4-cuda-numpy-cpu

lloydmak99 commented May 30, 2026

Uh oh!

PierreLeGuen left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

lloydmak99 commented May 30, 2026

Problem

Root cause

Fix

Verification

Follow-up (not in this PR)

Uh oh!

PierreLeGuen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants