Release Hidet v0.6.0 · hidet-org/hidet

What's Changed

[Dependency] Move pygraphviz to dev dependency
[Fix] Wrapup the release
[CI] Use default runner instead of large-runners
[Docs] Move release guide
[Apps] Phase out the app-level abstraction
[Security] Fix the security issue of using tempfile.mktemp()
[BUG] Simplify symbol variables
[CI] Add CI requirements installation
[Bugfix] Add signal handler to clean up NCCL sync file
[CI] Add permissions.contents: read to all Workflows
[Fix] fix ci failure due to interface change of mbarrier_try_wait
[PERF] Flattening batch dimensions for batched matmul
[CI] Fix Nightly Workflow
[CI] Using base Docker image for all functional tests
[Feature][CuTe] add mbarrier operators in Hexcute
[FEATURE] Add flow graph visualization
[Release][Wrapup] Prepare for the release
[BUG][CI] Fix the CI failure caused by PyTorch version 2.7.0
[Fix] fix matmul
[Bug] Release the reserved memory in hidet for kv cache
[Bugfix] Add rank information to flowgraph cache hash key
[BUG] Fix memory error triggered while compiling model with cuBLAS
[TESTs] Add more tests for torch.compile and split op
[Bugfix] Change grid dimension to support large batch size
[PERF] Enabled interval dispatch table by default
[FEATURE] fp8_scaled_mm
[Bugfix] Fixes and refactors to support Deepseek R1 compilation
[Fix] Fix the mma config name for int8 tensor core
[Graph Cache] Dump graph visual to cache when needed
[HOTFIX] Hot fix for current CI fails
[PERF] Speedup broadcast
[PERF] Improving Expr simplification
[Package] Refactor dependency configuration
[Dependents] Upgrade black to 25.1.0
[Refactor] Refactor the property methods of data types
[FEATURE] fp8_mm
[Dependents] Remove the restriction on jinja2 version in docs building
[Dependents] Remove gpt2 example with tensorflow dependency
[PERF] Graph dispatch table optimization and support for nested shapes
[Hopper] add wgmma inst in hexcute
[Feature] Support FlowGraph to CompiledGraph cache
[Fixbug] Fix a bug in the instantiate_symbols pass
[Options] Add options to control two nvcc compilation flags
[Enhancement] Add function to gather unsupported ops
[Pass] Optimization for addition chain
[Perf] Support fused_moe_awq_gptq
[Feature] finalize warp specialization
[CI][Fix] trap heartbeat logging to ensure it exits if build-docs fails
[Bugfix] Cast tensor shape to int64 when computing tensor nbytes
[Codegen][Runtime] Add try-catch to protect the public function
[PERF] Implement identity op
[FIX][BUG] remove assign stmt in code generation in cute
[CI][Fix] trap heartbeat logging to ensure it exits for any failed tests
[IR][Runtime] Support pointer type symbolic variable
[BUG] Fix the complex expr shows up in shape
[Hopper] add a cost model
[CI][Fix] use same fix for build-docs in all test workflows
[BUG] Fix cuBLAS error occurred when serving Llama-3.1-8B model
[BUG] Fix torch.nn.functional.group_norm implementation
[Utils] Add utility function to launch compute-sanitizer
[FEATURE] Various Operator Support + Bug Fixes
[Feature] add wgmma fence operand
[Hidet Script] Add support for lambda and fix assignment issue
[BUG] fix flatten tensor index pass
[Wheel] Fix a bug when we install the package with pip install .
[Fix][Primitives] fix cp_async_bulk_tensor_s2g
[BUG] Fix broken mbarrier CI test
[PERF] Add graph rewrite rule: Transpose(B) + Matmul -> MatmulNT
[FEATURE] Add support for warp specialization context managers
[Docs] Update the copyright year
[Fix] fix reduce test failure on H100.
[CI][PR Title] Allow multiple categories in PR title
[CI] fix test workflow gpu params for push event github action
[CI] Use hidet api in benchmarking for Llama MLP layer
[CI] run Tests workflow on l4, h100 by default, only require l4 success
[CI] add extra logging to build-docs, set lower priority for make step
[PERF] Minimized version of dispatch table with options
[Operators] Add support for operator.floordiv
[Fixbug] Fix a bug in runtime that does not update workspace size
[FIX] Fix deploy doc workflows
[BUG] Avoid comp of view during call Tensor.torch for fp8 tensor
[BUG] Fix choosing stream in hidet.Event.record()
[PROJECT] Create pyproject.toml
[FEATURE] Automatically deploy docs to website
[FEATURE] Add fp8 wgmma, mma support
[HIP] Add HipGraph class and related HIP graph functionalities
[BUGFIX] Fix vLLM backend parallel build failure
[FEATURE] Add fp8 (e4m3,e5m2) support
[AMD] mma for float16 and float32 on AMD GPU (gfx90a/MI200s)
[FEATURE] Increase the accuracy of benchmarking of small kernels
[HIP] Switch to using HIP Python for HIP runtime wrappers
[Bug] Fix the way to lint the python source code
[AMD] Support batch matmul with matrix core
[FEATURE] Tensor view operator
[CI] Use torch 2.6.0 for Perf Tests
[AMD] mfma instructions for gfx90a
[FEATURE] In compilation server clean memory after every compilation
[FEATURE] Use permanent processes to handle fixed commits in the compilation server
[HIP] Support f32 matmul and llama end to end example
[Format] Show progress bar to formatting process
[HIP] Resnet end to end example
[COMPTIME] Parallel task build + parallel tuning
[CI] Make Linear without bias for Regression
[Workflow] Fix bug in PR title checking workflow to allow xxx
[CI] Fix attention mask shapes in regression
[CI] Update github action version
[CI] Split the tests of operators into two folders to speed up CI
[PERF] Move parallel_k to the search space of hexcute matmul kernel
[BUG] Fix for torch==2.6.0. Attempt 2
Add parallel_k to the tuning space of matmul kernel
[Enhancement] Support cuBLAS for matmul_nt
[BUG] Fix for torch==2.6.0
[FEATURE] Postpone import torch
[COMPTIME] Preparation to nested parallelization
[HIP] Autoscheduler
[HIP] Support HIP for Hidet script
[FIX] Fix typo: 'compilaion' -> 'compilation'
[Enhancement] support important dynamic patterns in LLM
[HIP] Event and stream support for HIP runtime
[Test] Check hidet import time
[BUG] Change the broken test cases for scatter_ operators
[HIP] Hip runtime - memory and device
[CI] Add Linear to Regression
[Enhance] Extend search space of hexcute matmul & turn it on by default
[Workflow] Add a workflow for PR title checking
[CI] Modification of generative Regression tests
[FEATURE] Disable garbage collector during benchmarking
[OPS] Several changes in result of debugging torch.compile(Sampler.forward())
[Fix] add scale argument to sdpa function
[CI] Skip test_matmul_bf16_sm90 on non-hopper GPUs
[PERF] Change GPU clock frequency for benchmarking inside hidet
Kaihang/matmul bf16 wgmma swizzle
[DLPack] Remove workaround for bool type in dlpack
[Utils] Enable faulthandler in hidet to print trackback when segmentfault
[Tests/CI] Update tests and add a temperary AMD CI
[RUNTIME] Add missing try-catch guards
[vllm] Use example_inputs to determine shapes
Add support for torch.empty_like
[Bug] Fix an error in cudnn runtime calls
[BUG] Fixing a bug caused by parallel compilation with on-demand WGMMA instruction registration
[DataType] Add float8_e4m3 data type
[Stream] Change the impl of get_current_stream
[REFACT] Refactoring parallel compilation/tuning
[PERF] Reduce the execution time of import hidet
Kaihang/matmul f16 wgmma with swizzle layout
Change fast div transform log level to debug
[COMPSERVER] Make the same port of compilation server for server and client
[CI] Remove batch_matmul tests from Regression
[COMPSERVER] Speed up compilation server
[CI] Update docker image for Regression -> update to torch v2.5.1
[OPs] hardtanh inplace variant
[Enhancement] Add option and functionality to set torch stream as the current stream
[BUG] Several changes inspired by Release
add a flag for hexcute kernels
Update requirements.txt
[FEATURE] In compilation server clean memory after every compilation
[BUG] Fix for torch==2.6.0. Attempt 2
[BUG] Fix for torch==2.6.0
[FEATURE] Use permanent processes to handle fixed commits in the compilation server
[CI] chore: normalize ci runner naming
[BUG] Hot fix of comp server requirements.txt

Full Changelog: v0.5.0...v0.6.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hidet v0.6.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

What's Changed

Uh oh!