Skip to content

Conversation

@Panchovix
Copy link
Collaborator

Just updated some reqs as it is done on reForge. Tested quickly some XL models and seem to work so far.

Testers are welcome.

Just updated some reqs as it is done on reForge. Tested quickly some XL models and seem to work so far.

Testers are welcome.
@Panchovix Panchovix requested a review from lllyasviel as a code owner July 27, 2025 22:27
@Panchovix Panchovix requested review from catboxanon and removed request for lllyasviel July 27, 2025 22:34
@MisterChief95
Copy link

reForge has the fix already so it can probably be copied to this PR, but bumping the Pillow version will cause XYZ Plot script to fail due to multiline_textsize function being deprecated.
https://github.com/Panchovix/stable-diffusion-webui-reForge/blob/20ddc5f80a7bb2c336f55f4b0ddcb2125495f7d7/modules/images.py#L169

@Panchovix
Copy link
Collaborator Author

Panchovix commented Jul 31, 2025

@MisterChief95 nice catch, I didn't remember doing that commit 1+ year ago lol.
Did the change now.

@lcretan
Copy link

lcretan commented Nov 17, 2025

In addition to support Python 3.12,

** To squeeze the total processing power from just one PC, we have some routes in some layers, at least as follows **

  1. In addition to Python 3.12, No GIL Python versions, 3.13t, 3.14t, pre-GA 3.15t,...?

  2. Python's Distributed Data Parallel Processing Libraries with different resolutions like Ray, Dask,... just in the same machine with hetero xPUs (dGPU, iGPU, xNPU,...)?

  3. PyTorch's DDP: Distributed Data Parallel and FSDP2: Fully Sharded Data Parallel by torch.multiprocessing, torch.distributed, Monarch, etc...?

  4. GPU driving kernels like CUDA 12.9, 13.0 for NVIDIA different chips are NOT sufficiently squeezing the processing power of the recent chips, only 30% at the worst.

  • In case of NVIDIA chips, Triton language, developed by OpenAI which syntax is PyThon-like, could drive them much faster for the specialized processings inside Forge and SD-based ones. Also, Triton has the autonomous optimizer, resulting in less effort and code, which is similar with Python's libraries and PyTorch's DDP/FSDP2 functionalities above.

  • Samely, Intel has Special tuned fork of PyTorch for its XPUs. Original PyTorch is merging it.

  • AMD provides rocm and others, also the 3rd parties are developing. Currently, Forge has deployed a generic library, not optimized for AMD chips.

** As Intel's answer, Intel OneAPI's libraries, OneMKL, OneDNN, OpenVINO, etc, which have the same APIs of PyTorch, however they are written with C++ to optimize the higher performance. It can combine xPUs made not only by Intel, but also AMD and NVIDIA to work at the same time. **

Is there any plans to deploy additional optimizers?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants