Issues · huggingface/text-generation-inference · GitHub

This repository was archived by the owner on Mar 21, 2026. It is now read-only.

Labels Milestones

Expose optional intermediate hidden states / layer logits during generation (for reasoning metrics and model introspection)

#3358

· BromigoTools opened

on Mar 14, 2026

Performance: Replace inefficient polling in server loop with event-driven shutdown and optimize Health check

#3355

· amadhan882 opened

on Feb 9, 2026

glm-4.7-flash and QWEN3-Coder-Next什么时候适配？

#3354

· 470335075 opened

on Feb 8, 2026

KV-cache / long-context: smallest canonical repro boundary + metric (7-day receipts eval)

#3351

· StanByriukov02 opened

on Jan 9, 2026

FlashAttention CUDA "no kernel image" crash on RTX 5060 Ti

#3342

· pauli31 opened

on Dec 9, 2025

'Qwen2Model' object has no attribute 'model'

#3335

· Sunhill666 opened

on Oct 10, 2025

How to use prefix caching

#3333

· Noha-Magdy opened

on Sep 27, 2025

Feature request: Apple MPS flash attention for GGUF

#3331

· qdrddr opened

on Sep 20, 2025

please use transformers latest supper gpt-oss please

#3328

· wang824892540 opened

on Sep 12, 2025

Gemma3: CUDA error: an illegal memory access was encountered.

#3321

· Behnamhb opened

on Sep 4, 2025

ghcr.io/huggingface/text-generation-inference:3.3.5 doesn't exist

#3320

· chuijh opened

on Sep 3, 2025

Infinite tool call loop: `HuggingFaceModel` and `text-generation-inference`

#3318

· baughmann opened

on Aug 31, 2025