Forge Neo - SageAttention broken

### Package

Stable Diffusion WebUI Forge - Neo

### When did the issue occur?

Running the Package

### What GPU / hardware type are you using?

NVIDIA GeForce RTX 5070 Ti

### What happened?

Forge Neo with installed SageAttention doesn't actually use it. When trying to generate an image Forge Neo throws an error and the generation is done without SageAttention. Forge Neo installation outside of SM works fine. ComfyUI inside SM with SageAttention also works fine.

### Console output

```
Python 3.11.13 (main, Jul 23 2025, 00:29:09) [MSC v.1944 64 bit (AMD64)]
Version: neo
Installing triton
Installing sageattention
Installing gradio
Installing requirements
Installing Legacy Preprocessor Requirement: handrefinerportable
Installing Legacy Preprocessor Requirement: depth_anything
Installing Legacy Preprocessor Requirement: depth_anything_v2
Launching Web UI with arguments: --sage --pin-shared-memory --cuda-malloc --cuda-stream --skip-python-version-check --gradio-allowed-path 'D:\AI\Data\Images'
Using cudaMallocAsync backend.
Total VRAM 16303 MB, total RAM 97849 MB
pytorch version: 2.8.0+cu128
Set vram state to: NORMAL_VRAM
Always pin shared GPU memory
Device: cuda:0 NVIDIA GeForce RTX 5070 Ti : cudaMallocAsync
VAE dtype preferences: [torch.bfloat16, torch.float32] -> torch.bfloat16
CUDA Using Stream: True
Using SageAttention 2
Using PyTorch Attention for VAE
=======================================================================================
You are running torch 2.8.0+cu128, which is really outdated.
            To install the latest version, run with commandline flag --reinstall-torch.
=======================================================================================

Use --skip-version-check commandline argument to disable the version check(s).

ControlNet preprocessor location: D:\AI\Data\Packages\forge-neo\models\ControlNetPreprocessor
[ControlNet] - INFO - ControlNet UI callback registered.
You do not have any model!
Model selected: {'checkpoint_info': None, 'additional_modules': [], 'unet_storage_dtype': None}
Using online LoRAs in FP16: False
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 58.3s (prepare environment: 37.6s, launcher: 0.6s, forge init: 11.6s, shared init: 0.4s, misc. imports: 4.7s, load scripts: 1.8s, create ui: 1.1s, gradio launch: 0.5s).
Environment vars changed: {'stream': False, 'inference_memory': 1619.2, 'pin_shared_memory': False}

Model selected: {'checkpoint_info': {'filename': 'D:\\AI\\Data\\Packages\\forge-neo\\models\\Stable-diffusion\\sd\\bananaSplitzXL_vee9PointOh.safetensors', 'hash': 'bab4cf56'}, 'additional_modules': [], 'unet_storage_dtype': None}
Using online LoRAs in FP16: False
Model selected: {'checkpoint_info': {'filename': 'D:\\AI\\Data\\Packages\\forge-neo\\models\\Stable-diffusion\\sd\\bananaSplitzXL_vee9PointOh.safetensors', 'hash': 'bab4cf56'}, 'additional_modules': [], 'unet_storage_dtype': None}
Using online LoRAs in FP16: False
Loading Model: {'checkpoint_info': {'filename': 'D:\\AI\\Data\\Packages\\forge-neo\\models\\Stable-diffusion\\sd\\bananaSplitzXL_vee9PointOh.safetensors', 'hash': 'bab4cf56'}, 'additional_modules': [], 'unet_storage_dtype': None}
[Unload] Trying to free all memory for cuda:0 with 0 models keep loaded ... Done.
[Unload] Trying to free all memory for cpu with 0 models keep loaded ... Done.
StateDict Keys: {'unet': 1680, 'vae': 248, 'text_encoder': 197, 'text_encoder_2': 518, 'ignore': 0}
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
K-Model Created: {'storage_dtype': torch.float16, 'computation_dtype': torch.float16}
Calculating sha256 for D:\AI\Data\Packages\forge-neo\models\Stable-diffusion\sd\bananaSplitzXL_vee9PointOh.safetensors: ad4793e2ef23f9d50ff64b5b18c79bb0e003026301715d1cdc1f2524b78bea6a
Model loaded in 4.9s (unload existing model: 0.2s, forge model load: 0.9s, calculate hash: 3.9s).
[Unload] Trying to free 4541.61 MB for cuda:0 with 0 models keep loaded ... Done.
[Memory Management] Target: JointTextEncoder, Free GPU: 14997.00 MB, Model Require: 1752.68 MB, Previously Loaded: 0.00 MB, Inference Require: 2438.40 MB, Remaining: 10805.92 MB, Moving model(s) has taken 0.41 seconds
[Unload] Trying to free 2438.40 MB for cuda:0 with 1 models keep loaded ... Done.
[Unload] Trying to free 10470.39 MB for cuda:0 with 0 models keep loaded ... Done.
[Memory Management] Target: KModel, Free GPU: 13177.42 MB, Model Require: 4897.06 MB, Previously Loaded: 0.00 MB, Inference Require: 2438.40 MB, Remaining: 5841.96 MB, Moving model(s) has taken 1.68 seconds
  0%|          | 0/24 [00:00<?, ?it/s]attention_sage: AssertionError
Traceback (most recent call last):
  File "D:\AI\Data\Packages\forge-neo\backend\attention.py", line 377, in attention_sage
    out = sageattn(q, k, v, attn_mask=mask, is_causal=False, tensor_layout=tensor_layout)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\AI\Data\Packages\forge-neo\venv\Lib\site-packages\sageattention\core.py", line 159, in sageattn
    return sageattn_qk_int8_pv_fp8_cuda(q, k, v, tensor_layout=tensor_layout, is_causal=is_causal, qk_quant_gran="per_warp", sm_scale=sm_scale, return_lse=return_lse, pv_accum_dtype=pv_accum_dtype)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\AI\Data\Packages\forge-neo\venv\Lib\site-packages\sageattention\core.py", line 705, in sageattn_qk_int8_pv_fp8_cuda
    assert SM89_ENABLED, "SM89 kernel is not available. Make sure you GPUs with compute capability 8.9."
           ^^^^^^^^^^^^
AssertionError: SM89 kernel is not available. Make sure you GPUs with compute capability 8.9.

100%|██████████| 24/24 [00:05<00:00,  4.37it/s]
[Unload] Trying to free 5194.33 MB for cuda:0 with 0 models keep loaded ... Done.
[Memory Management] Target: IntegratedAutoencoderKL, Free GPU: 8263.99 MB, Model Require: 159.56 MB, Previously Loaded: 0.00 MB, Inference Require: 2438.40 MB, Remaining: 5666.03 MB, Moving model(s) has taken 0.02 seconds
Total progress: 100%|██████████| 24/24 [00:05<00:00,  4.35it/s]

```
### Version

2.15.5

### What Operating System are you using?

Windows

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Forge Neo - SageAttention broken #1518

Package

When did the issue occur?

What GPU / hardware type are you using?

What happened?

Console output

Version

What Operating System are you using?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Forge Neo - SageAttention broken #1518

Description

Package

When did the issue occur?

What GPU / hardware type are you using?

What happened?

Console output

Version

What Operating System are you using?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions