0.5.0 add support for most Triton's features; large rewrite/refactoring#383
Open
Arech8 wants to merge 1 commit intojax-ml:mainfrom
Open
0.5.0 add support for most Triton's features; large rewrite/refactoring#383Arech8 wants to merge 1 commit intojax-ml:mainfrom
Arech8 wants to merge 1 commit intojax-ml:mainfrom
Conversation
4d14ff0 to
28a56c6
Compare
28a56c6 to
e921d3f
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
0.5.0
Breaking changes
floatnow follows upstream convention and is represented asfp32, instead of oldfp64zeroed_outputs=parameter oftriton_call()no longer supports zeroing of aliasedinput-output arguments.
New features / bugfixing
similarly to the upstream (via single
kwargsdictionary).@jt.kerneldecorator and a concise Triton-native form of launching akernel with
kernel[grid](*args, **kwargs)syntax.launcher when
out_names=is set or if a new dictionary form ofout_shape=is used.to the upstream Triton code, which enables full support for default values, kernel
parameter annotations, related
@triton.jit()arguments such asdo_not_specialize,and also using tuples (including deeply nested), callables or strings as kernel
arguments.
out_shape,input_output_aliasesandzeroed_outputshandling is fully reworkedto support nested tuples and now is based on a kernel signature coordinate system,
instead of flat array indices, leading to a much clearer launcher syntax.
input_output_aliases=is deprecated, but is still fully supportedCAN_USE_TRITONguard is dropped due to obsolescencePerformance
Despite obviously huge additional code to support kernel's coordinate space for
triton_call()parameters, performance of jitted launchers remain indistinguishable from the previous version. Non-jitted launchers are indeed roughly 25% slower for comparable features (I have a changeset to lower this number to ~10% at most, but it makes code less simple and straightforward, so I'll publish it after this one is merged). Typical current numbers for launching the following kernelon a scalar or 4G array are:
To reproduce put the following preparation script in a sibling directory to the checkout and run it:
The
comparative_bm.pyscript is: