Skip to content
This repository was archived by the owner on Sep 15, 2025. It is now read-only.

Commit 7b226d4

Browse files
committed
Update gpurt from commit dee2554b
Export option for persistent launch Fix validation when using disk mapping CPS Hit Object refactor StackCommon: add the ability to discard data Add intrinsic for atomic conditional sub on LDS Add HPLOC support Move EncodeTopLevelCommon to shadersClean Add missing include to EncodeTopLevelCommon.hlsli Use an InlineBuffer for the DispatchRaysConstantData to remove one level of indirection Fix validation with symlinks/mounts Enable the no-parameter-usage warning in shader library Fix performance drop in persistent workgroup StackCommon: fix DECLARE_VALUE_SET_I32 signature Update the readme [NFC] Refactor GpuRt related files Move BuildCommon.hlsl to shadersClean/ FixLaneGroup.hlsli Validation / Add BitOr64 LaneGroup Intrinsic [NFC] Move DispatchRaysConstBuf and ray pipeline flags codes into shadersClean New FVM Morton Code Generator Change a few ternaries over to select() Refactor EncodeTriangleNode function body StackCommon: manually declare intrinsics. [NFC] traversal: Move ray state related structs into shadersClean Remove PASS_HIT_OBJECT_ARG define BuildCommonScratch(/Global) and TaskCounter Refactor RayQuery related files and functions DGF: Preliminary support for DGF compressed geometry in BVH builds Add an Unbiased Origin workaround that can be set from driver Add a shader to support Vulkan CaptureReplay feature
1 parent f734985 commit 7b226d4

File tree

122 files changed

+9408
-5639
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

122 files changed

+9408
-5639
lines changed

README.md

Lines changed: 29 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,44 @@
1-
### GPU Ray Tracing Library
1+
# GPU Ray Tracing Library
22

33
The GPU Ray Tracing (GPURT) library is a static library (source deliverable) that provides ray tracing related functionalities for AMD drivers supporting DXR (DirectX 12®) and the Vulkan® RT API. The GPURT library is built on top of AMD's Platform Abstraction Library (PAL). Refer to PAL documentation for information regarding interfaces used by GPURT for various operations including command generation, memory allocation etc.
44

5-
GPURT uses a C++ interface. The public interface is defined in .../gpurt/gpurt, and clients must only include headers from that directory. The interface is divided into multiple header files based on their dependencies and usage.
6-
7-
* gpurt/gpurt.h
8-
* Provides definiton of the public GPURT device interface used to perform various ray tracing operations and querying traversal shader code.
9-
* gpurt/gpurtAccelStruct.h
10-
* Provides definition of the GPURT acceleration structure produced by the GPURT acceleration structure builder.
11-
* gpurt/gpurtCounter.h
12-
* Shared header file between C++ and HLSL code that provides the definitions of GPURT counters and related data structures.
5+
GPURT provides the majority of non-compiler functionality required to support DX12 and Vulkan raytracing APIs.
136

147
The primary functionality provided by GPURT includes the following:
158

169
* Building acceleration structures.
1710
* Traversal loop and associated intrinsic functions written in HLSL.
11+
* Ray tracing traversal counter and ray history data capture.
1812
* Acceleration structure capture and decode.
19-
* Ray tracing traversal counter and ray history data capture.
13+
14+
## Code Style
2015

2116
GPURT follows [PAL Coding Standards](https://github.com/GPUOpen-Drivers/pal/blob/dev/doc/process/palCodingStandards.md) whenever possible.
2217

18+
## Code Organization
19+
20+
We are moving code from the unorganized src/shaders/ directory to the src/shadersClean/ directory. Within src/shadersClean the code is organized as follows:
21+
22+
| Directory | Description |
23+
| --------- | ----------- |
24+
| _build_ | Code related to BuildRaytracingAccelerationStructure and vkCmdBuildAccelerationStructuresKHR |
25+
| _traversal_ | Code related to DispatchRays & vkCmdTraceRaysKHR. |
26+
| _common_ | Code common to build and traversal. |
27+
| _debug_ | Code related to implementing GPU_ASSERT. |
28+
29+
Code in the src/shadersClean directory undergoes validation. Before compiling shader entrypoints, each HLSL file in src/shadersClean is compiled separately (results are discarded) to check if it has no implicit dependencies. HLSL files can only include HLSLI files (or C++ Interfaces) and HLSLI files can only include other HLSLI files (or C++ Interfaces).
30+
31+
### C++ Interfaces
32+
33+
GPURT uses a C++ interface. The public interface is defined in .../gpurt/gpurt, and clients must only include headers from that directory. The interface is divided into multiple header files based on their dependencies and usage.
34+
35+
* gpurt/gpurt.h
36+
* Provides definition of the public GPURT device interface used to perform various ray tracing operations and querying traversal shader code.
37+
* gpurt/gpurtAccelStruct.h
38+
* Shared header file between C++ and HLSL that provides the definition of the GPURT acceleration structure produced by the GPURT acceleration structure builder.
39+
* gpurt/gpurtCounter.h
40+
* Shared header file between C++ and HLSL code that provides the definitions of GPURT counters and related data structures.
41+
2342
## Acceleration Structure
2443

2544
GPURT interfaces support various acceleration structure operations including:

cmake/GpuRtGenerateShaders.cmake

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
##
22
#######################################################################################################################
33
#
4-
# Copyright (c) 2022-2024 Advanced Micro Devices, Inc. All Rights Reserved.
4+
# Copyright (c) 2022-2025 Advanced Micro Devices, Inc. All Rights Reserved.
55
#
66
# Permission is hereby granted, free of charge, to any person obtaining a copy
77
# of this software and associated documentation files (the "Software"), to deal
@@ -67,6 +67,13 @@ set(gpurtTraceShadersSpirv "${gpurtOutputDir}/g_GpuRtLibrary_spv.h")
6767

6868
set(gpurtDebugInfoFile "${CMAKE_CURRENT_BINARY_DIR}/g_gpurtDebugInfo.h")
6969

70+
# Find binaries in PATH
71+
find_program(gpurtDxcCompiler dxc REQUIRED)
72+
find_program(gpurtSpirvRemap spirv-remap REQUIRED)
73+
# Find dxcompiler library.
74+
get_filename_component(gpurtDxcCompilerDirectory "${gpurtDxcCompiler}" DIRECTORY)
75+
find_library(gpurtDxcompilerLib dxcompiler HINTS ${gpurtDxcCompilerDirectory} /usr/lib/dxc REQUIRED)
76+
7077
set(originalShaderSourceDir "${GPU_RAY_TRACING_SOURCE_DIR}/src/shaders/")
7178
set(originalShaderSource ${GPURT_SHADER_SOURCE_FILES})
7279
list(TRANSFORM originalShaderSource PREPEND "${originalShaderSourceDir}")
@@ -136,6 +143,7 @@ if(GPURT_CLIENT_API STREQUAL "VULKAN")
136143
${gpurtStripWhitelist}
137144
${gpurtDxcCompiler}
138145
${gpurtSpirvRemap}
146+
139147
COMMAND ${RT_SHADER_VALIDATION_COMMAND}
140148

141149
COMMAND Python3::Interpreter "${gpurtCompileScript}"

gpurt/gpurt.h

Lines changed: 45 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -84,7 +84,7 @@ enum class StaticPipelineFlag : uint32
8484
{
8585
SkipTriangles = 0x100, // Always skip triangle node intersections
8686
SkipProceduralPrims = 0x200, // Always skip procedural node intersections
87-
Unused = (1u << 31), // Available for use
87+
UseUnbiasedOrigin = (1u << 31), // Avoid biasing the origin by TMin
8888
UseTreeRebraid = (1u << 30), // Use Tree Rebraid for TraceRays
8989
EnableAccelStructTracking = (1u << 29), // Enable logging of TLAS addresses using AccelStructTracker
9090
EnableTraversalCounter = (1u << 28), // Enable Traversal counters
@@ -277,9 +277,10 @@ struct NodeMapping
277277
// Structure containing shader code in various intermediate forms.
278278
struct PipelineShaderCode
279279
{
280+
#if GPURT_CLIENT_INTERFACE_MAJOR_VERSION < 55
280281
const void* pAmdilCode; // Code in AMD intermediate language form
281282
size_t amdilSize; // Size in bytes of AMDIL code
282-
283+
#endif
283284
const void* pDxilCode; // Code in DXIL form
284285
size_t dxilSize; // Size in bytes of DXIL code
285286

@@ -354,6 +355,7 @@ enum class InternalRayTracingCsType : uint32
354355
#endif
355356
BuildFastAgglomerativeLbvh,
356357
EncodeQuadNodes,
358+
BuildHPLOC,
357359
#if GPURT_BUILD_RTIP3_1
358360
BuildTrivialBvh,
359361
BuildSingleThreadGroup32,
@@ -366,6 +368,8 @@ enum class InternalRayTracingCsType : uint32
366368
Update3_1,
367369
RefitInstanceBounds,
368370
#endif
371+
EncodeDGF,
372+
PrepareShadowSbtForReplay,
369373
Count
370374
};
371375

@@ -509,8 +513,10 @@ typedef uint32 GeometryFlags;
509513
// Type of geometry node
510514
enum class GeometryType : uint32
511515
{
512-
Triangles = 0, // Triangle geometry. Geometry::triangles is valid.
513-
Aabbs // Procedural bounding box geometry. Geometry::aabbs is valid.
516+
Triangles = 0, // Triangle geometry. Geometry::triangles is valid.
517+
Aabbs = 1, // Procedural bounding box geometry. Geometry::aabbs is valid.
518+
CompressedTriangles = 2, // Compressed triangle geometry. Geometry::compressedTriangles is valid.
519+
CompressedTrianglesOmm = 3, // Compressed triangle geometry with OMM. Geometry::compressedTriangles is valid.
514520
};
515521

516522
// Index format for triangle geometry.
@@ -540,6 +546,11 @@ enum class VertexFormat : uint32
540546
R8G8_Unorm // 8-bit fixed-point unsigned normalized R8G8 X,Y,0 format
541547
};
542548

549+
enum class CompressedTriangleFormat : uint32
550+
{
551+
Dgf1, // Dense Geometry Format version 1
552+
};
553+
543554
// Geometry node triangle data
544555
struct GeometryTriangles
545556
{
@@ -561,6 +572,18 @@ struct GeometryAabbs
561572
gpusize aabbByteStride; // Stride in bytes between consecutive bounding boxes
562573
};
563574

575+
// Geometry node compressed triangle data
576+
struct GeometryCompressedTriangles
577+
{
578+
gpusize compressedDataAddr;
579+
gpusize compressedDataSize;
580+
uint32 numTriangles;
581+
uint32 numVertices;
582+
uint32 maxPrimitiveIndex;
583+
uint32 maxGeometryIndex;
584+
CompressedTriangleFormat format;
585+
};
586+
564587
// Bottom-level geometry node information (analogous to D3D12DDI_RAYTRACING_GEOMETRY_DESC).
565588
// This API-independent structure does not match D3D12 or Vulkan. The client must convert the API version to the
566589
// GPURT version through ClientConvertAccelStructBuildGeometry().
@@ -570,8 +593,10 @@ struct Geometry
570593
GeometryFlags flags; // Geometry flags
571594
union
572595
{
573-
GeometryTriangles triangles; // Triangle geometry. Valid if type is Triangles.
574-
GeometryAabbs aabbs; // Procedural AABB geometry. Valid if type is Aabbs.
596+
GeometryTriangles triangles; // Triangle geometry. Valid if type is Triangles.
597+
GeometryAabbs aabbs; // Procedural AABB geometry. Valid if type is Aabbs.
598+
GeometryCompressedTriangles compressedTriangles; // Compressed triangle geometry.
599+
// Valid if type is CompressedTriangles.
575600
};
576601
};
577602

@@ -600,14 +625,14 @@ enum class InputElementLayout : uint32
600625
enum class BvhBuildMode : uint32
601626
{
602627
Linear = 0, // Linear BVH builder
603-
Reserved = 1, // Formerly agglomerative clustering BVH builder
628+
HPLOC = 1, // Hierarchical PLOC
604629
PLOC = 2, // Parallel locally-ordered clustering BVH builder
605630
Auto = 4, // Used in override build to fall back to regular build options
606631
Count
607632
};
608633

609634
static_assert(uint32(BvhBuildMode::Linear) == 0, "Enums encoded in the acceleration structure must not change.");
610-
static_assert(uint32(BvhBuildMode::Reserved) == 1, "Enums encoded in the acceleration structure must not change.");
635+
static_assert(uint32(BvhBuildMode::HPLOC) == 1, "Enums encoded in the acceleration structure must not change.");
611636
static_assert(uint32(BvhBuildMode::PLOC) == 2, "Enums encoded in the acceleration structure must not change.");
612637

613638
// BVH CPU builder modes
@@ -789,6 +814,7 @@ struct DeviceSettings
789814
#endif
790815

791816
uint32 plocRadius; // PLOC nearest neighbor search adius
817+
uint32 hplocRadius; // HPLOC nearest neighbor search radius
792818
#if GPURT_CLIENT_INTERFACE_MAJOR_VERSION < 54
793819
uint32 maxTopDownBuildInstances; // Max instances allowed for top down build
794820
#endif
@@ -863,6 +889,7 @@ struct DeviceSettings
863889
uint32 disableRdfCompression : 1; // Disable compression in RDF chunks
864890
uint32 enableRebraid : 1; // Enable tree rebraid in TLAS
865891
uint32 cullIllegalInstances : 1;
892+
uint32 useUnbiasedOrigin : 1; // Avoids offsetting by tMin in traversal
866893
};
867894

868895
uint64 accelerationStructureUUID; // Acceleration Structure UUID
@@ -1742,6 +1769,16 @@ class IDevice
17421769

17431770
virtual uint32 CalculateBvhPrimitiveCount(const AccelStructBuildInputs& inputs) const = 0;
17441771

1772+
// Prepares shadow shader binding table for replay (Vulkan capture replay feature)
1773+
//
1774+
// @param cmdBuffer [in/out] Opaque handle to command buffer where commands will be written
1775+
// @param userData [in] Addresses of input/output buffers
1776+
// @param totalSbtEntryCount Total number of SBT entries
1777+
virtual void PrepareShadowSbtForReplay(
1778+
ClientCmdBufferHandle cmdBuffer,
1779+
const PrepareShadowSbtForReplayUserData& userData,
1780+
uint32 totalSbtEntryCount) = 0;
1781+
17451782
protected:
17461783

17471784
/// Client must create objects by explicitly calling CreateDevice method

gpurt/gpurtCounter.h

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -432,6 +432,23 @@ GPURT_STATIC_ASSERT(INSTANCE_INPUTS_STRIDE_OFFSET == offsetof(InstanceInputHeade
432432
GPURT_STATIC_ASSERT(INSTANCE_INPUTS_INSTANCES_OFFSET == offsetof(InstanceInputHeader, instances), "");
433433
GPURT_STATIC_ASSERT(sizeof(InstanceInputHeader) == INSTANCE_INPUTS_HEADER_SIZE, "Instance inputs header mismatch.");
434434

435+
// ====================================================================================================================
436+
struct BuildInputDescHeader
437+
{
438+
uint32_t type; // Triangles, AABBs or Instances
439+
uint32_t flags; // Geometry flags
440+
union
441+
{
442+
TriangleInputHeader triangles;
443+
AABBInputHeader aabbs;
444+
InstanceInputHeader instances;
445+
};
446+
};
447+
448+
#define BUILD_INPUT_DESC_TYPE_TRIANGLES 0
449+
#define BUILD_INPUT_DESC_TYPE_AABBS 1
450+
#define BUILD_INPUT_DESC_TYPE_INSTANCES 2
451+
435452
// ====================================================================================================================
436453
// 32-bit unique ray identifier calculated as below.
437454
// uint32_t id = threadID.x + (threadID.y * dim.x) + (threadID.z * dim.x * dim.y);

gpurt/gpurtDispatch.h

Lines changed: 44 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -47,14 +47,6 @@ constexpr uint32 MaxBufferSrdSize = 8;
4747
constexpr uint32 MaxBufferSrdSize = 4;
4848
#endif
4949

50-
// Dispatch rays arguments top-level descriptor table (GPU structure)
51-
struct DispatchRaysTopLevelData
52-
{
53-
uint64 dispatchRaysConstGpuVa; // DispatchRays info constant buffer GPU VA
54-
uint32 internalUavBufferSrd[MaxBufferSrdSize]; // Internal UAV shader resource descriptor
55-
uint32 accelStructTrackerSrd[MaxBufferSrdSize]; // Structured buffer SRD pointing to the accel struct tracker
56-
};
57-
5850
#define DISPATCHRAYSCONSTANTDATA_STRUCT_OFFSET_DISPATCHID 48
5951

6052
// Dispatch rays constant buffer data (GPU structure). Note, using unaligned uint64_t in HLSL constant buffers requires
@@ -108,11 +100,25 @@ struct DispatchRaysConstantData
108100
#pragma pack(pop)
109101
#endif
110102

103+
// Dispatch rays arguments top-level descriptor table (GPU structure)
104+
struct DispatchRaysTopLevelData
105+
{
106+
#if GPURT_CLIENT_INTERFACE_MAJOR_VERSION < 56
107+
uint64 dispatchRaysConstGpuVa; // DispatchRays info constant buffer GPU VA
108+
#else
109+
DispatchRaysConstantData constData; // Dispatch rays constant buffer data
110+
#endif
111+
uint32 internalUavBufferSrd[MaxBufferSrdSize]; // Internal UAV shader resource descriptor
112+
uint32 accelStructTrackerSrd[MaxBufferSrdSize]; // Structured buffer SRD pointing to the accel struct tracker
113+
};
114+
111115
// GPU structure containing all data for DXR/VK ray dispatch command
112116
struct DispatchRaysConstants
113117
{
114118
DispatchRaysTopLevelData descriptorTable; // Top-level internal dispatch bindings (includes pointer to infoData)
119+
#if GPURT_CLIENT_INTERFACE_MAJOR_VERSION < 56
115120
DispatchRaysConstantData constData; // Dispatch rays constant buffer data
121+
#endif
116122
};
117123

118124
#if __cplusplus
@@ -188,7 +194,37 @@ struct InitExecuteIndirectConstants
188194
uint32 cpsGlobalMemoryAddressHi; // Separate CPS stack memory base address high 32-bits
189195
};
190196

197+
// Resource bindings required for PrepareShadowSbtForReplay
198+
struct PrepareShadowSbtForReplayUserData
199+
{
200+
uint64 constantsVa; // PrepareShadowSbtForReplayConstants struct
201+
uint64 shadowRayGenerationTableVa; // Shadow ray generation table
202+
uint64 shadowHitGroupTableVa; // Shadow hit group table
203+
uint64 shadowMissTableVa; // Shadow miss shader table
204+
uint64 shadowCallableTableVa; // Shadow callable shader table
205+
uint64 captureReplayMappingBufferVa; // Capture replay mapping buffer
206+
};
207+
208+
// Constants for PrepareShadowSbtForReplay shader
209+
struct PrepareShadowSbtForReplayConstants
210+
{
211+
uint32 hitGroupTableStrideInBytes; // Hit group table record byte stride
212+
uint32 hitGroupTableEntryCount; // Hit group table entry count
213+
uint32 missTableStrideInBytes; // Miss shader table record byte stride
214+
uint32 missTableEntryCount; // Miss shader table entry count
215+
uint32 callableTableStrideInBytes; // Callable shader table record byte stride
216+
uint32 callableTableEntryCount; // Callable shader table entry count
217+
uint32 captureReplayMappingBufferEntryCount; // Capture replay mapping buffer entry count
218+
};
219+
220+
struct CaptureReplayMappingBufferEntry
221+
{
222+
uint32 capturedVa;
223+
uint32 replayVa;
224+
};
225+
191226
constexpr uint32 InitExecuteIndirectConstantsDw = sizeof(InitExecuteIndirectConstants) / sizeof(uint32);
227+
constexpr uint32 PrepareShadowSbtForReplayConstantsDw = sizeof(PrepareShadowSbtForReplayConstants) / sizeof(uint32);
192228

193229
#if __cplusplus
194230
#if GPURT_CLIENT_INTERFACE_MAJOR_VERSION >= 47

gpurt/gpurtLib.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ namespace GpuRt
4242
// update their definition of GPURT_CLIENT_INTERFACE_MAJOR_VERSION to indicate that they have made the required changes
4343
// to support a new version. When the client version is updated, the old interface will be compiled out and only the
4444
// new one will remain.
45-
#define GPURT_INTERFACE_MAJOR_VERSION 54
45+
#define GPURT_INTERFACE_MAJOR_VERSION 56
4646

4747
#if GPURT_CLIENT_INTERFACE_MAJOR_VERSION < 44
4848
// Minor interface version. This number is incrememnted when a compatible interface change is made. Compatible changes

src/gpurtBvhBatcher.cpp

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -461,6 +461,11 @@ void BvhBatcher::BuildMultiDispatch(Util::Span<BvhBuilder> builders)
461461
Barrier();
462462
BuildPhase(BuildPhaseFlags::BuildFastAgglomerativeLbvh, builders, &BvhBuilder::BuildFastAgglomerativeLbvh);
463463
}
464+
if (PhaseEnabled(BuildPhaseFlags::BuildHPLOC))
465+
{
466+
Barrier(BarrierFlagSyncIndirectArg | BarrierFlagSyncDispatch);
467+
BuildPhase(BuildPhaseFlags::BuildHPLOC, builders, &BvhBuilder::BuildHPLOC);
468+
}
464469
if (PhaseEnabled(BuildPhaseFlags::BuildPLOC))
465470
{
466471
Barrier();

0 commit comments

Comments
 (0)