Skip to content

gemma-4-31B-it: hot-patch transformers CUDA→numpy crash on image requests#52

Merged
lloydmak99 merged 2 commits into
mainfrom
fix/gemma4-cuda-numpy-cpu
May 30, 2026
Merged

gemma-4-31B-it: hot-patch transformers CUDA→numpy crash on image requests#52
lloydmak99 merged 2 commits into
mainfrom
fix/gemma4-cuda-numpy-cpu

Conversation

@lloydmak99
Copy link
Copy Markdown
Contributor

Problem

google/gemma-4-31B-it returns HTTP 500 on a subset of multimodal (image) requests:

TypeError: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

Live on both backends (gpu11 + gpu02), ~1000 errors / 15 min. Valid images succeed; the crash fires for inputs SGLang decodes to a GPU tensor (video data-URLs, broken image URLs). Tracking: nearai/infra#156.

Root cause

A bare image.numpy() on a CUDA tensor at transformers/image_processing_backends.py:458, reached via the gemma4 image processor (processing_gemma4.pyimage_processing_pil_gemma4.py). The earlier --disable-fast-image-processor (v0.0.196) only closed the generic fast-processor path; this path is unaffected because the tensor is already on GPU upstream of that flag.

Fix

Wrap sglang serve in a shell that sed-patches the offending line to image.cpu().numpy() before launch:

  • .cpu() is a no-op on CPU tensors → idempotent and safe across restarts (the pattern no longer matches once patched).
  • Avoids rebuilding the pinned lmsysorg/sglang:gemma4@sha256:87cecd… image.
  • All serve flags unchanged (incl. --disable-fast-image-processor); exec keeps sglang as PID 1 under init: true for correct signal handling.

Verification

  • Pre-patch: valid PNG (any size) → 200; broken image URL and video data-URL → 500 on both gpu11 + gpu02 (reproduced).
  • YAML validated; wrapper shape + sed + all serve flags asserted intact.
  • Post-merge deploy must force-recreate the gemma container on gpu11 + gpu02 (compose-manager compose/up with force_recreate:true) — a plain up won't recreate it.

Follow-up (not in this PR)

Upstream fix belongs in transformers (gemma4 image processor should .cpu() before .numpy()); this is a deploy-side hotfix until the image is rebuilt/repinned on a fixed transformers.

…ests

Multimodal requests to gemma-4-31B-it return HTTP 500 with
`TypeError: can't convert cuda:0 device type tensor to numpy` for inputs
SGLang decodes to a GPU tensor (video data-URLs, broken image URLs, etc).
The crash is a bare `image.numpy()` on a CUDA tensor at
transformers/image_processing_backends.py:458, reached via the gemma4 image
processor. `--disable-fast-image-processor` (added in v0.0.196) only closed
the generic fast-processor path; this second path is unaffected because the
tensor is already on GPU upstream of that flag.

Wrap `sglang serve` in a shell that sed-patches the line to
`image.cpu().numpy()` before launch. `.cpu()` is a no-op on CPU tensors, so
the patch is idempotent and safe across restarts. Avoids rebuilding the
pinned SGLang image; all serve flags (incl. --disable-fast-image-processor)
are unchanged.

Verified: valid images already return 200; video/broken-URL inputs reproduce
the 500 on both backends pre-patch. See nearai/infra#156.
The sed patch could silently no-op (path moved on a python/transformers
bump, image repin, or pattern change) and `sed` returns 0 on no-match, so
sglang would start unpatched and resume 500ing on image requests with no
signal — text traffic stays green and the error rate barely moves.

Add a post-sed `grep` guard that aborts startup (exit 1) unless the fixed
`image.cpu().numpy()` form is present. Checking the fixed form (not that sed
changed something) also tolerates a future image that already carries the
upstream fix. Use `$$BACKENDS` so docker compose doesn't interpolate it.
Copy link
Copy Markdown
Contributor

@PierreLeGuen PierreLeGuen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the PR diff and surrounding Gemma service config in small-models.yaml. Verified the rendered Compose command, the pinned image entrypoint/config, and the baked transformers commit/source path containing the targeted image.numpy() line. Ran git diff --check and docker compose config --quiet for all YAML files; all passed. CI validate checks are green. I did not run a live GPU/model request locally.

@lloydmak99 lloydmak99 merged commit 36e1a46 into main May 30, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants