Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
19 commits
Select commit Hold shift + click to select a range
9f35295
[CK DSL] Add hipDNN provider implementation plan.
DarylHawkinsAMD May 21, 2026
8e67c87
[CK DSL] Add provider skeleton (M1 step I-1).
DarylHawkinsAMD May 21, 2026
1719e2b
[CK DSL] Embed CPython interpreter in provider plugin (M1 step I-2).
DarylHawkinsAMD May 21, 2026
0aaab25
[CK DSL] Add Python compile-service bridge (M1 step I-3).
DarylHawkinsAMD May 21, 2026
de37e42
[CK DSL] Hold GIL while releasing CompileServiceBridge module ref.
DarylHawkinsAMD May 21, 2026
290cfab
[CK DSL] Commit transient WIP scratch for session handoff.
DarylHawkinsAMD May 21, 2026
8121d36
[CK DSL] Add KernelArtifact/HipModule round-trip (M1 step I-4).
DarylHawkinsAMD May 21, 2026
e856424
[CK DSL] Add in-memory JitCache (M1 step I-5).
DarylHawkinsAMD May 21, 2026
9cc2202
[CK DSL] Add ConvImplicitGemm spec + adapter + payload (M1 step I-6).
DarylHawkinsAMD May 21, 2026
881c5e3
[CK DSL] Add ConvImplicitGemm plan builder + JIT path (M1 step I-7).
DarylHawkinsAMD May 21, 2026
88824fe
[CK DSL] Wire ConvImplicitGemmPlan::execute() (M1 step I-8).
DarylHawkinsAMD May 21, 2026
55b381b
[CK DSL] Add PerfMeasurement hipEvent helper (M1 step I-9).
DarylHawkinsAMD May 21, 2026
ada82f0
[CK DSL] Add IntegrationGpuCkDslConvFp16 (M1 step I-10).
DarylHawkinsAMD May 21, 2026
0b71746
[CK DSL] Wire ck-dsl-provider check targets (M1 step I-11).
DarylHawkinsAMD May 22, 2026
7092194
Fixed PR feedback
DarylHawkinsAMD May 29, 2026
0f21fcb
[CK DSL] Add architecture diagrams to ck-dsl-provider README
DarylHawkinsAMD May 29, 2026
afd4f82
[CK DSL] Parameterize conv integration test across shapes.
DarylHawkinsAMD May 29, 2026
93e28ac
[CK DSL] Provider cleanup: dedup graph signature, drop dead code
DarylHawkinsAMD May 30, 2026
353c954
[CK DSL] Link provider README to design plan
DarylHawkinsAMD May 30, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ set(AVAILABLE_COMPONENTS
miopen-provider
hipblaslt-provider
hip-kernel-provider
ck-dsl-provider
hipdnn-samples
)

Expand Down Expand Up @@ -219,6 +220,14 @@ if("hip-kernel-provider" IN_LIST ROCM_LIBS_ENABLE_COMPONENTS)
)
endif()

if("ck-dsl-provider" IN_LIST ROCM_LIBS_ENABLE_COMPONENTS)
add_subdirectory_with_message(
COMPONENT ck-dsl-provider
PREFIX_PATH dnn-providers
EXPECT_TARGET ck_dsl_provider_plugin
)
endif()

if("hipdnn-samples" IN_LIST ROCM_LIBS_ENABLE_COMPONENTS)
add_subdirectory_with_message(
COMPONENT samples
Expand Down
8 changes: 8 additions & 0 deletions CMakePresets.json
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,14 @@
"ROCM_LIBS_ENABLE_COMPONENTS": "hipdnn;hipdnn-integration-tests;hip-kernel-provider"
}
},
{
"name": "ck-dsl-provider",
"description": "Build hipdnn and ck-dsl-provider",
"inherits": ["default:release"],
"cacheVariables": {
"ROCM_LIBS_ENABLE_COMPONENTS": "hipdnn;ck-dsl-provider"
}
},
{
"name": "hipdnn-samples",
"description": "Build hipdnn, all supported providers, integration tests, and samples",
Expand Down
3 changes: 3 additions & 0 deletions WIP/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# WIP is a scratch area; logs and build trees regenerate on every run.
*.log
*/build/
106 changes: 106 additions & 0 deletions WIP/STATUS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
# CK DSL hipDNN Provider — M1 Complete

Branch: `users/dahawkin/ck-dsl-provider`.
Cut from `users/vanantha/ck-dsl-prototype` at `bf7546ed99e`.

## Plan progress (per plan v0.9 §6.2)

| Step | Status |
|---|---|
| Prep P-1 … P-7 | done; synthesis in `WIP/prep_findings/PREP_FINDINGS.md` |
| I-1 skeleton | ✅ `73116d68404` |
| I-2 embedded interpreter | ✅ `e980348fcab` |
| I-3 compile-service bridge | ✅ `2fa22bc9c8e` (+ GIL-dtor fix `0fcec098917`) |
| I-4 KernelArtifact / HipModule | ✅ `a0f8288e395` |
| I-5 JitCache | ✅ `c6c8b480f37` |
| I-6 ConvImplicitGemmAdapter + Spec | ✅ `9f1a92afe58` |
| I-7 PlanBuilder + JIT path | ✅ `a82e0f72c57` |
| I-8 Plan::execute | ✅ `87f45907093` |
| I-9 PerfMeasurement | ✅ `0cbf119bcc7` |
| I-10 Integration test | ✅ `de7bcf4873b` |
| I-11 CI / pre-commit clean | ✅ (this commit) |

## M1 result

End-to-end JIT path works on gfx950 (MI350-series):

- `ninja ck-dsl-provider-unit-check` — green (42 tests, ~10 s wall)
- `ninja ck-dsl-provider-integration-check` — green (1 test, ~9 s wall)
- `IntegrationGpuCkDslConvFp16.BakeOffConv` reports **131 TFLOPS median**
on the bake-off shape (N=8, 56×56×64→64, 3×3, s=1, p=1, FP16, NHWC).
Numerical agreement against `CpuFpReferenceConvolution::fprop`
passes at 5e-2 absolute tolerance over all ~1.6M output elements.
- `pre-commit run` over the full provider tree — clean.

The bake-off example documents 248 TFLOPS on MI300X for the same
configuration (`bake_off_implicit_gemm.py:69-72`). Closing that gap
on MI350 is M2 autotuning work.

## What M1 ships

`dnn-providers/ck-dsl-provider/`:

```
CMakeLists.txt finalize_test_targets("ck-dsl-provider")
cmake/ version + Python-path helpers
python/ck_dsl_provider/
compile_service.py dispatch on op_kind, conv-igemm builder
src/
CkDslContainer.{hpp,cpp} one engine per CK DSL op (M5+ adds siblings)
CkDslHandle.{hpp,cpp} stream + container reference + detached buffers
CkDslContext.hpp plan + settings storage
CkDslPluginPublic.cpp the only C-ABI source (5 macros + .inl include)
adapters/conv_implicit_gemm/
ConvImplicitGemmSpec.hpp pure-C++ mirror of the dataclass (P-5 defaults)
ConvImplicitGemmAdapter.cpp FB ConvolutionFwdAttributes -> Spec
ConvImplicitGemmPayload.cpp Spec -> py::dict (Python boundary)
engines/conv_implicit_gemm/
CkDslConvImplicitGemmEngine IEngine -> plan builder
ConvImplicitGemmPlanBuilder isApplicable + buildPlan via JitCache
ConvImplicitGemmPlan IPlan::execute (uid -> DevPtr -> launch)
graph/
GraphSignature FNV-1a over op_kind + dtypes + shape + DSL SHA
perf/
PerfMeasurement hipEvent warmup/timed; [CkDslPerf] log line
python/
EmbeddedInterpreter singleton libpython init
CompileServiceBridge noopSmoke + compileSmoke + compile(opKind, payload)
PythonError py::error_already_set -> HipdnnPluginException
runtime/
KernelArtifact + ArgSchema P-1's schema-driven HSACO + launch ABI
LaunchAbi contiguous-buffer arg packing
HipModule RAII hipModule_t + hipFunction_t + launch
JitCache mutex-guarded SignatureHash -> shared_ptr<HipModule>
tests/ unit tests (host-only + GPU-gated)
integration_tests/ ninja ck-dsl-provider-integration-check
```

Build artifact: `build/lib/hipdnn_plugins/engines/libck_dsl_provider_plugin.so`.

## Deferred to M1.5 / M2

- **Frontend Graph API + .so plugin loader.** The integration test
drives `ConvImplicitGemmPlanBuilder` directly rather than going
through the hipDNN backend's plugin-loader path. The plan-builder
surface is the same code the backend would call after `dlopen`;
wiring `hipdnnSetEnginePluginPaths_ext` + `hipdnn_frontend::graph::
Graph` is additive on top of what M1 already proves.
- **Autotuning.** The constexpr defaults in `ConvImplicitGemmSpec`
ship the bake-off values verbatim (P-5). Adapter knob surfacing is
M2.
- **Second op.** M2 adds `CkDslGemmEngine` (or similar) as a sibling
engine -- the M1 file layout (`engines/<op>/`, `adapters/<op>/`) was
designed so this is additive.
- **On-disk HSACO cache.** M3 (plan §3.4): `$XDG_CACHE_HOME/
ck-dsl-provider/<hash>.hsaco`, invalidated on
`CK_DSL_PROVIDER_VERSION_STRING` change (the same key the in-memory
cache already uses).

## Hardware constraint to carry forward

`ck_dsl` is gfx950-only (`runtime/comgr.py:210`,
`helpers/compile.py:68,82,129`, `examples/bake_off_implicit_gemm.py:44`
all hardcode `amdgcn-amd-amdhsa--gfx950`; the DSL also emits
MFMA-32×32×16-fp16, `ds_swizzle_b32`, `v_permlane32_swap_b32`,
`ds_read_b64_tr_b{8,16}`, and scaled FP8/BF8 converts unconditionally).
M1 hardware target is **MI350-series**. No fallback to MI300/MI250.
Loading
Loading