@@ -23,6 +23,137 @@ adapter accepts the hipDNN `ConvolutionFwdAttributes` graph shape and
2323rejects asymmetric padding, true-convolution mode, non-FP16 dtypes,
2424and 3D conv.
2525
26+ ## Architecture
27+
28+ ### Component view
29+
30+ The plugin is organised into layers. The container is created once per
31+ handle generation and owns the long-lived collaborators (the embedded
32+ interpreter, the Python compile bridge, and the process-wide JIT
33+ cache); the engine, adapter, and plan are per-request.
34+
35+ ``` mermaid
36+ flowchart TB
37+ subgraph host["hipDNN host"]
38+ SDK["hipDNN SDK<br/>(EnginePluginImpl)"]
39+ end
40+
41+ subgraph plugin["ck_dsl_provider_plugin.so"]
42+ direction TB
43+
44+ subgraph entry["Entry / lifetime"]
45+ Container["CkDslContainer<br/>(EngineManager + bridge + cache)"]
46+ Handle["CkDslHandle<br/>(HIP stream)"]
47+ Context["CkDslContext<br/>(holds Plan)"]
48+ end
49+
50+ subgraph engine["Engine layer (per request)"]
51+ Engine["CkDslConvImplicitGemmEngine"]
52+ Builder["ConvImplicitGemmPlanBuilder"]
53+ Plan["ConvImplicitGemmPlan<br/>execute() → launch"]
54+ end
55+
56+ subgraph adapt["Adapter layer"]
57+ Adapter["ConvImplicitGemmAdapter<br/>validate + buildSpec()"]
58+ Spec["ConvImplicitGemmSpec"]
59+ Payload["convImplicitGemmSpecToPayload()"]
60+ Sig["GraphSignature<br/>→ SignatureHash key"]
61+ end
62+
63+ subgraph runtime["Runtime layer"]
64+ Cache["JitCache<br/>key → shared_ptr<HipModule>"]
65+ Module["HipModule<br/>(hipModule_t + hipFunction_t)"]
66+ Artifact["KernelArtifact<br/>(HSACO + launch metadata)"]
67+ Abi["LaunchAbi<br/>pack() arg buffer"]
68+ end
69+
70+ subgraph pybound["Python boundary"]
71+ Interp["EmbeddedInterpreter<br/>(isolated CPython)"]
72+ Bridge["CompileServiceBridge<br/>compile(opKind, payload)"]
73+ end
74+ end
75+
76+ subgraph pysrc["Trusted Python source (sys.path)"]
77+ Service["ck_dsl_provider.compile_service"]
78+ DSL["ck_dsl<br/>(build + compile_kernel)"]
79+ end
80+
81+ SDK -->|create| Container
82+ SDK -->|isApplicable / init| Engine
83+ SDK -->|execute| Plan
84+
85+ Container --> Engine
86+ Container --> Bridge
87+ Container --> Cache
88+ Engine --> Builder
89+ Builder --> Adapter
90+ Adapter --> Spec --> Payload
91+ Builder --> Sig
92+ Builder -->|getOrLoad key, loader| Cache
93+ Cache -->|miss| Bridge
94+ Payload -.payload dict.-> Bridge
95+ Bridge --> Interp
96+ Bridge -->|GIL| Service
97+ Service --> DSL
98+ DSL -.HSACO + metadata.-> Bridge
99+ Bridge --> Artifact --> Module
100+ Cache --> Module
101+ Builder --> Plan
102+ Plan --> Module
103+ Plan -.uses.-> Abi
104+ Plan --> Context
105+ ```
106+
107+ ### End-to-end sequence
108+
109+ The compile step (heavy, Python/DSL) runs once per logical shape; the
110+ ` JitCache ` short-circuits every subsequent request with the same
111+ ` GraphSignature ` to a cached ` HipModule ` .
112+
113+ ``` mermaid
114+ sequenceDiagram
115+ autonumber
116+ participant SDK as hipDNN SDK
117+ participant Eng as Engine
118+ participant Bld as PlanBuilder
119+ participant Adp as Adapter
120+ participant Cache as JitCache
121+ participant Br as CompileServiceBridge
122+ participant Py as compile_service.py / ck_dsl
123+ participant Mod as HipModule
124+ participant Plan as ConvImplicitGemmPlan
125+
126+ SDK->>Eng: isApplicable(graph)
127+ Eng->>Bld: isApplicable(graph)
128+ Bld->>Adp: buildSpec(convAttr, tensors)
129+ Adp-->>Bld: Spec (or reject)
130+
131+ SDK->>Eng: initializeExecutionContext(graph)
132+ Eng->>Bld: buildPlan(graph, context)
133+ Bld->>Adp: buildSpec(...)
134+ Bld->>Bld: GraphSignature::computeForSpec → key
135+ Bld->>Cache: getOrLoad(key, loader)
136+
137+ alt cache miss
138+ Cache->>Br: compile(opKind, payload) [GIL]
139+ Br->>Py: compile(op_kind, payload)
140+ Py-->>Br: dict{hsaco, kernel_name, grid, block, arg_schema, ...}
141+ Br-->>Cache: KernelArtifact
142+ Cache->>Mod: HipModule(artifact)<br/>hipModuleLoadData + GetFunction
143+ else cache hit
144+ Cache-->>Bld: cached HipModule
145+ end
146+
147+ Cache-->>Bld: shared_ptr of HipModule
148+ Bld->>Plan: new Plan(module, uids, byte sizes)
149+ Bld-->>SDK: plan stored in context
150+
151+ SDK->>Plan: execute(handle, deviceBuffers)
152+ Plan->>Plan: resolve x/w/y pointers, pack 36-byte args
153+ Plan->>Mod: launch(args, grid, block, stream)
154+ Mod-->>SDK: hipModuleLaunchKernel
155+ ```
156+
26157## Trust boundary
27158
28159The Python source tree that this plugin loads from is part of the
0 commit comments