Skip to content

Releases: leomaxwell973/Triton-3.3.0-UPDATE_FROM_3.2.0_and_FIXED-Windows-Nvidia-Prebuilt

SageAttention v2.1.1

18 May 02:07
0b819a7

Choose a tag to compare

This SageAttention was compiled using this repo's Triton port of windows.

image

image

image

image

for py 310, py 312, 2 packages seperate.

both for sm_86 (RTX 3060)

3.3.0 cu128 Py312

14 May 09:26
0b91497

Choose a tag to compare

So, I been using ComfyUI more and I roll with whatever a platform gives me, which was 3.10.6 usually, until Comfy.
So I dusted off the build and fired up 3.12.10 python, and resolved the weird little intricate differences.

Build - Updated to Py312!
Compile - Migrated to my Py312 environment, ensure isolation and segregation from python 310 through full environment assimilation.
Triton - After much debugging, only 1 line of code was changed in the post-python code (static/python lib as triton calls it) to achieve 312 compatibility, some issue with 312 and security(?) blind siding pythons own libraries, anyway, just changed the runtime.build and added a LIBPATH to check it manually each time.

Conclusion - Should be just as good as the 310 but I'm not testing it hot off the compiler like before
In theory, 1 Line of code change
then same tests passed, but after not out the box

repackaged, re-rip out debugging files, and posted.

Note: if you use both 312 and 310...for whatever reason, or one then other, if returning to 310 causes any kind of:
dll/lib/cannot compile/failed to make driver/pyd/kernel etc. issues
then navigate to your C:\Users.triton\cache
delete all the random named files in there and try again, triton has no mechanism for checking if cache files are using multi-triton or not

image

image

3.3.0 cu128 Py310 build all fixed

11 May 06:40
6cf26de

Choose a tag to compare

fixed everything and updated to 3.3.0, bit busy, but it was tested with the standard of it must launch all tests successfully upon build, no exceptions and passed. Not sure how else I can be more strict on testing.

Triton works and full imports with fully converted and custom windows workarounds etc.
Proton works
The whl should have the same exact parameters as my test, as, my method was install dev -e, uninstall, build wheel from build files, install whl, so the build was not tested or used or distributed, not the dev/raw one at least but the whl distribution itself was.

per complaints this is ultra optimized as well
see video and pause if you want more details, im sure i put a flag tracker somewhere in the scripts

for details on the install parameters' optimizations and proof of test passing etc.:
https://youtu.be/8rwJ1vzczEQ

i didn't do the same cleanup as last time with the debug files, if they bug you, delete them
if you don't want the utility files or extras, delete them
if you don't want my custom tests, delete them
if there is anything you recognize as an unnecessary component delete it if it bothers you

the only thing that would concern me is the inclusion or exclusion of things that actually affect operational capabilities, aside from the loss of the AMD side which is not a loss in the scope really of this package.

all my custom tests you may find in the package are for quick sanity tests for your use, or not, they are straight forward, python to run them, the only one that is ish tricky is proton_smoke, python it, then, proton_view proton.hatchet... or proton_test.hatchet, whichever hatchet shows up. (requires an additional install, error should tell you what one, its NOT "hatchet" its got some prefix, just look for the error if you're that interested in testing proton, this is also only for reading the test results, not actually the functionality of proton itself)

there is also fallback measures installed in case some .so shenanigans happen, it basically has native dlopen/dl_load dlsym dlclose functionality built in just in case, if you encounter any issues delete the compat py on the root, but i dont see why it should except maybe conflicts with your own windows dl solutions maybe, but none native i'd fore see

also has backwards compatibility with matmul ops of bygone tritons, so dated programs that relied on triton.ops have a dirty fix, and will match the light inferior windows fork of triton that last time i checked has no llvm capabilities or gpu optimization hooks., since they still use ops in 3.2.0 and possibly later for some reason... so less reason to use theirs now.

ive been done with this for few weeks now, just waiting for some things to install and found a min to drop this and spew the deets, I'll try and keep an eye out but im behind on atlas, SD, LTX, and other AI related projects this took up and im having to catchup with the current toolsets and sadly tool depreciations. So sorry for any non/delayed responses that may happen for questions support etc.

P.S.
Actually I lied, I did get around to removing debug files at some point. So, its tiny as can be (and still closer to 200MB, in wheel form, looking at you windows branch, 100MB unpacked, iirc)

Triton-3.2.0-cp310-cp310-win_amd64.whl (likely broken)

04 Apr 09:11

Choose a tag to compare

UNLESS YOU NEEEEED 3.2.0 for SOME reason, dont use, probably broken bits still, the only one i can say with confidence is working 100% is 3.3.0, original build is also def broken for sure. just leaving it here in case it saves someone half the work of getting 3.2 on win (and not the win branch version which is llvm-less) like i said.

  • There were issues with how the post-compile code ran as well as some overlooked hardcoded variables and paths that needed to be patched.

  • As of this version and my testing, there is no longer a need to modify torch for the AttrsDescriptor issue(s).

    • This was tested with a fresh install of Torch, unmodified with the new version.
  • Previous issues such as libcuda.so.1 not found or failed to open should be resolved for the most part

    • Exception: Proton, proton/libproton/proton.dll has an overlooked hardcoded pathing looking for libcuda.so.1, this is fixed by the following:
      (Administrator CMD prompt):
      MKLINK C:\Windows\System32\libcuda.so.1 C:\Windows\System32\nvcuda.dll
      This seems to be only necessary for when the proton/profiling routines are used, I'm not 100% sure how necessary it is, but even so, python test_trition.py, triton_test.py and runtest.py all run with "python <script.py>" successfully, and test_trition and trition_test will fail if attempted to run via proton, as described above, and runtest.py i don't think is applicable, point being, Triton will run without this symlink, proton will not, but, just make the symlink to restore full functionality... this may be fixed if I recompile in the future and find where this oops ended up to fix the hardcoded pathing.
  • New Tests:
    Included are the testing files i used to work out these bugs, in _C and the root folder: triton_test.py, test_triton.py and runtest.py. you can use these as a quick check to see if you're operational with Triton on windows. The output should be straightforward with no errors (runtest.py just outputs a ms time score).
    These tests are ran with either "Python <test.py name/path>" or if you have the symlink fix above done AND have the proton files in your python scripts folder (or other path) "proton <test.py name/path>".

  • Included proton.exe and proton-viewer.exe scripts:
    I realized that without running the compile from source routine, these would be missing from python/Scripts, if you are wanting proton / full functionality add these to the Scripts folder of your python instance.

🔗 Install Download with:

pip install .\Triton-3.2.0-cp310-cp310-win_amd64.whl

OR directly from PIP

pip install https://github.com/leomaxwell973/Triton-3.2.0-Windows-Nvidia-Prebuilt/releases/download/3.2.0_Build_2/Triton-3.2.0-cp310-cp310-win_amd64.whl

Triton-3.2.0-cp310-cp310-win_amd64.whl (broken, use 3.3.0, unless you need and self-service)

18 Mar 14:02

Choose a tag to compare

UNLESS YOU NEEEEED 3.2.0 for SOME reason, dont use, probably broken bits still, the only one i can say with confidence is working 100% is 3.3.0, original build is also def broken for sure. just leaving it here in case it saves someone half the work of getting 3.2 on win (and not the win branch version which is llvm-less) like i said.

🔥 Triton-3.2.0 Windows-NVIDIA Release Package

📦 Triton-3.2.0-cp310-cp310-win_amd64.whl

This is the prebuilt Windows-NVIDIA exclusive release of Triton 3.2.0, optimized for native Windows MSVC with no AMD dependencies.

Fully compatible with PIP for easy installation.
NVIDIA-only (No AMD HIP, No POSIX overhead, No Linux workarounds).
Prebuilt and optimized for Windows 11, MSVC, CUDA 12.1, and Python 3.10.6.

📜 For full details, see the README on GitHub.

🔗 Install Download with:

pip install .\Triton-3.2.0-cp310-cp310-win_amd64.whl

OR directly from PIP

pip install https://github.com/leomaxwell973/Triton-3.2.0-Windows-Nvidia-Prebuilt/releases/latest/download/Triton-3.2.0-cp310-cp310-win_amd64.whl

🔥 The cleanest, fastest Triton for Windows + NVIDIA! 🚀