Skip to content

Commit 55d37c7

Browse files
committed
cudnn_frontend v1.0 prerelease introduces new API aimed to simplify graph construction.
[New API] In FE v1.0 API, users can describe multiple operations that form subgraph through cudnn_frontend::graph::Graph object. Unlike the FE v0.x API, users dont need to worry about specifying shapes and sizes of the intermediate virtual tensors. See README.FE.1.0.md for more details. [New Feature] Python bindings for the FE 1.0 API. See, Python API section in README.md for building the python bindings. Details of python API and its kw arguments are in the README.FE.1.0.md. Python API samples are in samples/python/*.py [Deprecation] v0.x API are now labelled deprecated and may be removed in v2.0. Consider, moving to v1.0 API. If there are issues or missing features, please create a github issue.
1 parent 12f35fa commit 55d37c7

File tree

105 files changed

+29490
-18496
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

105 files changed

+29490
-18496
lines changed

CMakeLists.txt

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,15 @@
11
cmake_minimum_required(VERSION 3.17)
22

3-
project(cudnn_frontend VERSION 0.9)
3+
project(cudnn_frontend VERSION 1.0.0)
44

55
option(CUDNN_FRONTEND_BUILD_SAMPLES "Defines if samples are built or not." ON)
66

7+
if(MSVC OR MSYS OR MINGW)
8+
option(CUDNN_FRONTEND_BUILD_PYTHON_BINDINGS "Defines if python bindings are built or not." OFF)
9+
else()
10+
option(CUDNN_FRONTEND_BUILD_PYTHON_BINDINGS "Defines if python bindings are built or not." ON)
11+
endif()
12+
713
add_library(cudnn_frontend INTERFACE)
814

915
target_include_directories(
@@ -12,8 +18,12 @@ target_include_directories(
1218
$<INSTALL_INTERFACE:${CMAKE_INSTALL_INCLUDEDIR}>
1319
)
1420

15-
target_compile_features(cudnn_frontend INTERFACE cxx_std_11)
21+
target_compile_features(cudnn_frontend INTERFACE cxx_std_17)
22+
23+
if (CUDNN_FRONTEND_BUILD_SAMPLES)
24+
add_subdirectory(samples)
25+
endif()
1626

17-
if (${CUDNN_FRONTEND_BUILD_SAMPLES})
18-
add_subdirectory(samples)
27+
if (CUDNN_FRONTEND_BUILD_PYTHON_BINDINGS)
28+
add_subdirectory(python_bindings)
1929
endif()

Doxyfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,7 @@ PROJECT_NAME = "CUDNN Frontend API"
3838
# could be handy for archiving the generated documentation or if some version
3939
# control system is used.
4040

41-
PROJECT_NUMBER = 0.9.2
41+
PROJECT_NUMBER = 1.0.0
4242

4343
# Using the PROJECT_BRIEF tag one can provide an optional one line description
4444
# for a project that appears at the top of each page and should give viewer a

README.FE.0.x.md

Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,97 @@
1+
# cuDNN FE 0.x API
2+
3+
## Introduction
4+
FE v0.x API is wraps [cuDNN C backend API](https://docs.nvidia.com/deeplearning/cudnn/api/index.html#cudnn-backend-api) in C++ APIs.
5+
For a general introduction to FE, please first refer README.md.
6+
7+
## Organization
8+
Each `cudnnBackendDescriptorType_t` documented in the enum is organized into its header file.
9+
- cudnn_frontend_Tensor.h -> CUDNN_BACKEND_TENSOR_DESCRIPTOR
10+
- cudnn_frontend_ConvDesc.h -> CUDNN_BACKEND_CONVOLUTION_DESCRIPTOR
11+
- cudnn_frontend_PointWiseDesc.h -> CUDNN_BACKEND_POINTWISE_DESCRIPTOR
12+
- cudnn_frontend_MatMulDesc.h -> CUDNN_BACKEND_MATMUL_DESCRIPTOR
13+
- cudnn_frontend_ReductionDesc.h -> CUDNN_BACKEND_REDUCTION_DESCRIPTOR
14+
- cudnn_frontend_Operation.h -> CUDNN_BACKEND_OPERATION_*_DESCRIPTOR
15+
- cudnn_frontend_OperationGraph.h -> CUDNN_BACKEND_OPERATIONGRAPH_DESCRIPTOR
16+
- cudnn_frontend_Heuristics.h -> CUDNN_BACKEND_ENGINEHEUR_DESCRIPTOR
17+
- cudnn_frontend_Engine.h -> CUDNN_BACKEND_ENGINE_DESCRIPTOR
18+
- cudnn_frontend_EngineConfig.h -> CUDNN_BACKEND_ENGINECFG_DESCRIPTOR
19+
- cudnn_frontend_ExecutionPlan.h -> CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR
20+
- cudnn_frontend_ExecutionPlan.h -> CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR
21+
- cudnn_frontend_VariantPack.h -> CUDNN_BACKEND_VARIANT_PACK_DESCRIPTOR
22+
23+
### Utility Functions
24+
- cudnn_frontend_find_plan.h -> Implements the `cudnnFindPlan` function
25+
- cudnn_frontend_get_plan.h -> Implements the `cudnnGetPlan` function
26+
- cudnn_frontend_Filters.h -> List of helpful utility functions to filter out execution plans
27+
- cudnn_frontend_ExecutionPlanCache.h -> Describes and implements the execution plan caching.
28+
29+
### Logging
30+
- cudnn_frontend_Logging.h -> Implements a basic logging framework for cudnn_frontend
31+
32+
### Error Handling
33+
- cudnn_frontend_utils.h
34+
35+
## Samples
36+
37+
Samples are meant to illustrate FE v0.x API usage to users.
38+
- `samples/conv_samples.cpp` contains conv/dgrad/wgrad-fusion samples.
39+
- `samples/norm_samples.cpp` contains batch normalization-fusion samples.
40+
- `samples/fusion_samples.cpp` contains fusion samples that use cudnn's runtime fusion engine.
41+
- `samples/fused_mha_samples.cpp` contains flash attention sample.
42+
43+
Sample tests are written using [Catch2](https://github.com/catchorg/Catch2) test framework and are controlled by `samples/test_list.cpp`.
44+
45+
## cudnnFindPlan and cudnnGetPlan:
46+
Prior to cuDNN V8, cuDNN provided `cudnnFindConvolution*` and `cudnnGetConvolution*` functions, which provided a way to sample all the algorithms for a given problem and study the run times. This can be further used to cache the best algorithms for a given problem. In cuDNN V8, this has been replaced with `cudnnFindPlan` and `cudnnGetPlan`.
47+
48+
In order to use `cudnnFindPlan`, a user needs to provide:
49+
- Source for a pruned list of `engineConfig`s for the given problem statement
50+
- Filter function to Filter out the execution plan based on the prerequisite conditions
51+
52+
The `cudnnFindPlan` in turn
53+
- Creates a set of execution plans that are supported
54+
- Execute each filtered plan and ranks them in order of the execution plan runtime
55+
56+
The most common `engineConfig` generation is the built-in heuristics of cuDNN V8. Generally, this is appended with the fallback list. An example of usage can be seen in `run_from_cudnn_find(...)` function in `conv_sample.cpp`.
57+
58+
## Errata Filter:
59+
Errata filter gives the cuDNN team an opportunity to block certain faulty kernels from being executed. cuDNN team can eitherprovide a json file which blocks certain engine configs from being executed. The users can augment to this list if they find certain characteristics to be undesirable (Eg. Bad memory access, Execution plan failure). Users can either declare the json file statically or load from a file during runtime using the environment variable "CUDNN_ERRATA_JSON_FILE".
60+
61+
#### Json format
62+
version : 1 - Mandatory. Tells the format version of the json.
63+
rules : [] - Mandatory. Array of rule object which identifies the engine config
64+
rule_id : "" - Optional. Used to uniquely identify a rule. Has no purpose other than being easy to debug.
65+
operation : "" - Mandatory. Stringified version of the operation graph.
66+
engine : "" - Mandatory. Stringified version of the engine ID.
67+
knob : "" - Optional. Stringified version of the knob. If specified only the engineConfig for the engine matching the knobs will be blocked. Else, all possible combination of knobs for the engine will be blocked.
68+
input_shape : [] - Optional. Array of input shape for kernel (ex. [64, 32, 128, 128]) to be filtered out. Use -1 if you don't want to filter that dimension. (ex. [-1, -1, 128, 128] to only filter HxW for NCHW format)
69+
filter_shape : [] - Optional. Array of kernel/filter shape for kernel (ex. [32, 32, 5, 5]) to be filtered out. Use -1 if you don't want to filter that dimension. (ex. [-1, -1, 5, 5] to only filter 5x5 filter sizes)
70+
shape_format : "" - Mandatory if input_shape and/or kernel_shape is present. Optional otherwise. Shape format of tensors as a string. (Ex. "NCHW", "NHWC").
71+
cudnn_version_start : 0 - Optional. Denotes the cudnn version after which the engine started having issues.
72+
cudnn_version_end : -1 - Optional. Denotes the cudnn_version when the issue was fixed. "-1" denotes its an ongoing issue.
73+
arch : "" - Optional. Architectures where this kernel might be faulty.
74+
75+
PS: The errata filter note is still in beta version. We may add/modify certain features as necessary.
76+
77+
## Execution Plan Caching
78+
cuDNN through heuristics provides a way to query a list of good engine configs. Based on this query we build the cudnn_frontend_find_plan function which runs all the engineConfig(s) on the given user system and returns a sorted list of plans. This process of running multiple plans through several iterations is time consuming. The ExecutionPlanCache allows the user to build a cache with operation graph as the key to query an execution plan. It is the responsibilty of the user to maintain different caches for different types of operation_graphs (For eg. different cache for convolutionForward compared to Dgrad or Wgrad). The `is_fastest_plan_stable` builds on top of this by making sure the same plan is chosen by the cudnnFind multiple times.
79+
80+
### API:
81+
- void add_plan_to_cache(const cudnn_frontend::OperationGraph &op_graph, const cudnn_frontend::ExecutionPlan &plan) : Creates a mapping between the operation graph and executionPlan
82+
- bool get_plan_from_cache(const cudnn_frontend::OperationGraph &op_graph, const cudnn_frontend::ExecutionPlan *&plan) : Sets the executionPlan in the plan pointer and returns true if found.
83+
- cudnnFindPlanAndCache(cudnnHandle_t handle, cudnn_frontend::OperationGraph &opGraph, cudnn_frontend::VariantPack const &variantPack, cudnn_frontend::ExecutionPlanCache &cache, Predicate pred) -> cudnn_frontend::ExecutionPlan
84+
The above API chains the output of cudnn_frontend_find_plan and caches the result for future usage.
85+
86+
87+
PS: ExecutionPlanCaching today supports only single operation operation_graphs.
88+
89+
## Execution Plan Serialization and Deserialization (Experimental)
90+
cuDNN v8.4 and above provides exeuction plan serialization and deserialization to save the execution plan as a string in JSON format. The execution plan can be then restored from that string at a later point, and this also saves compilation time compared to rebuilding the plan from scratch. Currently, this is an experimental feature that only supports the runtime fusion engine. No forward/backward or cross-device compatibility guarantee is offered at this time.
91+
92+
### API:
93+
- std::string cudnn_frontend::ExecutionPlan_v8::getJsonRepresentation() : Serialize the execution plan into a string in JSON format.
94+
- cudnn_frontend::ExecutionPlan_v8&& cudnn_frontend::ExecutionPlanBuilder_v8::loadFromJson(const std::string &json_plan) : Deserialize from a string containing the JSON representation of the execution plan.
95+
96+
## Deprecation
97+
v0.x API may be deprecated in version 2.0 of the API. Please, consider adopting 1.0 API. If there are any issues, or missing functionalities in v1.0 API, please create a gitlab issue for this.

0 commit comments

Comments
 (0)