Add capture and replay support for VK_ARM_tensors and VK_ARM_data_graph#3009
Add capture and replay support for VK_ARM_tensors and VK_ARM_data_graph#3009fabian-lunarg wants to merge 22 commits into
Conversation
MarkY-LunarG
left a comment
There was a problem hiding this comment.
Additional comments from Claude:
Code Review Findings
[
{
"file": "framework/decode/vulkan_rebind_allocator.cpp",
"line": 3875,
"summary": "GetTensorMemoryRequirementsARM stores capture requirements from uninitialized struct before calling the Vulkan function that populates it.",
"failure_scenario": "resource_alloc_info->capture_mem_reqs is assigned from memory_requirements->memoryRequirements at line 3875, but the Vulkan function
get_tensor_memory_requirements (which fills memory_requirements) is not called until line 3878. The stored capture requirements are whatever garbage was in the caller's
stack-allocated VkMemoryRequirements2. During replay, VMA uses these wrong requirements to allocate backing memory, causing allocation failures or size mismatches."
},
{
"file": "framework/encode/vulkan_capture_manager.h",
"line": 1779,
"summary": "PostProcess_vkBindDataGraphPipelineSessionMemoryARM dereferences GetWrapper() result without a null check.",
"failure_scenario": "GetWrapper() at line 1775 returns the DataGraphPipelineSessionARMWrapper for pBindInfos[i].session. If the session handle is invalid or not yet wrapped (e.g.
trimmed capture resumes mid-frame), wrapper is null and wrapper->memory_bindings.begin() at line 1779 crashes. The analogous PostProcess_vkBindBufferMemory includes an equivalent
null guard."
},
{
"file": "framework/decode/vulkan_rebind_allocator.cpp",
"line": 3812,
"summary": "BindTensorMemory silently masks per-tensor failures when binding multiple tensors: the result variable is overwritten by each iteration, so an early failure is erased
if a later bind succeeds.",
"failure_scenario": "With two tensors: AllocateMemoryForTensor fails for tensor[0] (result = VK_ERROR_*). The loop does not return early; iteration i=1 calls
AllocateMemoryForTensor and succeeds (result = VK_SUCCESS). The function returns VK_SUCCESS, and the caller assumes all binds succeeded. Contrast with
BindDataGraphPipelineSessionMemory, which returns immediately on error."
},
{
"file": "framework/decode/vulkan_replay_consumer_base.cpp",
"line": 13680,
"summary": "Hint format compatibility check uses create_info.flowVectorFormat against info.hint_formats instead of a hint-specific format field.",
"failure_scenario": "VkDataGraphPipelineOpticalFlowCreateInfoARM has no hintFormat field. hint_formats is populated by querying
VK_DATA_GRAPH_OPTICAL_FLOW_IMAGE_USAGE_HINT_BIT_ARM formats, which can differ from the output flow vector formats. The check at line 13680 validates whether the flow vector output
format is in the hint format list — semantically wrong. A pipeline where the hint image format differs from flowVectorFormat will either incorrectly fail compatibility or incorrectly
pass, leading to a creation error or a wrong-device fallback at replay time."
},
{
"file": "framework/encode/vulkan_state_tracker_initializers.h",
"line": 743,
"summary": "pDimensions is dereferenced inside a dimensionCount loop without a null guard.",
"failure_scenario": "When TrackStruct processes a VkTensorDescriptionARM where dimensionCount > 0 but pDimensions == nullptr (malformed capture or fuzzing), the access
desc->pDimensions[i] at line 743 is a null pointer dereference. Per spec conventions dimensionCount > 0 implies non-null pDimensions, but this is a decode/state-restore path that
processes potentially untrusted .gfxr file data."
},
{
"file": "framework/decode/vulkan_replay_consumer_base.cpp",
"line": 13734,
"summary": "data_graph_optical_flow_initialized flag is read and set without synchronization, creating a latent TOCTOU race.",
"failure_scenario": "Two threads both hit OverrideCreateDataGraphPipelinesARM simultaneously on the same physical device for the first time. Both see
data_graph_optical_flow_initialized == false, both enter InitializeReplayDataGraphOpticalFlowInfo, both call clear() and then populate data_graph_optical_flow_infos concurrently,
resulting in a data race on the vector. Currently single-threaded replay prevents this, but it is a latent bug with no protection against future multi-threaded replay dispatch."
}
]
Severity:
- Critical
2 - 3) High
4-5) Medium - Low
Bugs 1–3 are the most actionable: #1 will silently corrupt tensor memory allocation sizes on every rebind-allocator replay, #2 is an unconditional crash if a handle is invalid, and #3 means multi-tensor bind errors go unreported.
| GFXRECON_UNREFERENCED_PARAMETER(desc); | ||
| GFXRECON_UNREFERENCED_PARAMETER(queue_family_index); | ||
| GFXRECON_UNREFERENCED_PARAMETER(data); | ||
| GFXRECON_LOG_WARNING_ONCE("Tensor trim snapshot via staging copy is not yet implemented; " |
There was a problem hiding this comment.
We should create an issue to track this and mention it in a comment above the warning.
There was a problem hiding this comment.
I added the missing pieces (required additions to ResourceUtil, along with some consolidation-refactor).
so instead of an follow-up-issue there is now an additional commit which requires review
went through the list, applied corrections where required (some false positives) |
73311af to
a8a1fb0
Compare
- Updated DataGraphPipelineSessionARMWrapper (pipeline_dependency, pipeline_shader_module_dependencies, pipeline_layout_dependency, pipeline_layout_dependencies)
…kDataGraphPipelineSessionMemoryBinding - vulkan_state_tracker.cpp — implemented both (direct port of the TrackBufferMemoryBinding pattern: wrapper bind fields + bound_assets emplace) - vulkan_capture_manager.h — added PostProcess_vkBindDataGraphPipelineSessionMemoryARM, PostProcess_vkBindTensorMemoryARM, PostProcess_vkCreateTensorViewARM - custom_vulkan_encoder_commands.h — added CustomEncoderPostCall specialisations for all three
- Added empty VulkanTensorViewARMInfo - Added memory_property_flags to the existing VulkanDataGraphPipelineSessionARMInfo
… and data_graph_optical_flow_initialized/data_graph_optical_flow_infos fields to VulkanReplayDeviceInfo - vulkan_replay_consumer_base.h — added OverrideBindDataGraphPipelineSessionMemoryARM near the bind memory overrides, all 7 remaining overrides + InitializeReplayDataGraphOpticalFlowInfo before OverrideGetPastPresentationTimingGOOGLE, and SupportsDataGraphOpticalFlowPipeline in private: - vulkan_replay_consumer_base.cpp — added all 10 implementations (~500 lines) at end of file, with ContainsFormat inlined as a local lambda
Add VulkanRebindAllocator implementations for VK_ARM_tensors and VK_ARM_data_graph extensions: - Add GetTensorMemoryUsage helper (mirrors GetImageMemoryUsage pattern, selects VMA memory type based on tensor usage flags and capture memory properties) - Add CreateTensor/DestroyTensor (allocate ResourceAllocInfo, call through to functions_.create_tensor/destroy_tensor) - Add CreateDataGraphPipelineSession/DestroyDataGraphPipelineSession (same pattern as VideoSession) - Add AllocateMemoryForTensor (queries replay memory requirements via get_tensor_memory_requirements, calls VMA, populates VmaMemoryInfo) - Add BindTensorMemory (uses GetRebindOffsetFromOriginalDeviceMemory, not GetRebindOffsetFromVMA — required for vk-bound resources) - Add BindDataGraphPipelineSessionMemory (queries bind-point requirements at replay, allocates per-bind-point via VMA, binds) - Add GetTensorMemoryRequirementsARM (saves capture requirements into ResourceAllocInfo for later rebind use) - Add kDataGraphSession to MemoryInfoType enum (parallel to kVideoSession — multiple memory bindings per resource) - Add private helper declarations (GetTensorMemoryUsage, AllocateMemoryForTensor) to vulkan_rebind_allocator.h - Wire up 9 tensor/datagraph function pointers in vulkan_replay_consumer_base.cpp after the existing destroy_fence assignment Note: aliasing-group path from ARM fork omitted (that infrastructure does not exist in the main repo); AllocateMemoryForTensor handles the common non-aliased case.
Resolve all compilation errors surfaced after the code generator run:
- vulkan_capture_manager.h: qualify GetWrapper calls with
vulkan_wrappers:: namespace; move PostProcess_vkBind*ARM and
PostProcess_vkCreateTensorViewARM from protected: to public: so
CustomEncoderPostCall dispatch can reach them
- custom_vulkan_struct_decoders.h/cpp + forward header: add hand-written
Decoded_VkDataGraphPipelineConstantARM and DecodeStruct implementation
(variable-length pConstantData read via VkTensorDescriptionARM pNext)
- custom_vulkan_struct_encoders.h/cpp: add EncodeStruct for
VkDataGraphPipelineConstantARM using vkuGetFormatInfo to compute the
tensor constant data size from the pNext VkTensorDescriptionARM
- blacklists.json: blacklist
vkGetPhysicalDeviceQueueFamilyDataGraphEngineOperationPropertiesARM
(uses VkBaseOutStructure* output parameter requiring BASE_OUT_STRUCTURE
generator infrastructure not present in main repo)
- custom_vulkan_api_call_encoders.h/cpp: add encoder wrappers for all
four blacklisted ARM functions: vkCreateDataGraphPipelinesARM (full
capture implementation), vkGetTensorOpaque/ViewCaptureDescriptorDataARM
(no-op stubs), vkGetPhysicalDeviceQueueFamilyDataGraphEngineOperation-
PropertiesARM (pass-through to instance table, not captured)
- vulkan_cpp_structs.h/cpp: add stub GenerateStruct for
VkDataGraphPipelineConstantARM (cpp consumer path not supported)
- custom_vulkan_struct_to_json.h/cpp: add FieldToJson for
Decoded_VkDataGraphPipelineConstantARM
- vulkan_rebind_allocator.cpp: remove capture_id assignment in
CreateTensor (field lives in MemoryAllocInfo, not ResourceAllocInfo,
in the main repo — aliasing group support was not ported)
- BindDataGraphPipelineSessionMemory: guard vma_mem_infos.emplace_back
and UpdateAllocInfo behind null checks on resource_alloc_info /
memory_alloc_info; free the VMA allocation and log a warning when
allocator data is missing (trimmed captures)
- DefaultAllocator::CreateTensor / CreateDataGraphPipelineSession:
call Vulkan before allocating ResourceAllocInfo so the heap object
is not leaked when the driver rejects the creation
- VulkanTensorARMInfo::pDimensions: change vector<uint64_t> to
vector<int64_t>, matching VkTensorDescriptionARM::pDimensions in
the spec and TensorARMWrapper on the encode side
- RebindAllocator::BindTensorMemory: initialise result to VK_SUCCESS
inside the valid-args block so zero-count and all-skipped batches
return success instead of VK_ERROR_INITIALIZATION_FAILED
vkGetPhysicalDeviceQueueFamilyDataGraphEngineOperationPropertiesARM
uses VkBaseOutStructure* as an output parameter whose concrete type
varies by sType. Without proper generator support this was left as a
pass-through stub; the generator changes now replace it.
Generator changes (3 Python files):
- vulkan_struct_decoders_forward_generator.py: append
Decoded_VkBaseOutStructure forward declaration and its
DecodeStruct prototype after the existing base-class call
- vulkan_struct_decoders_body_generator.py: add
write_base_out_struct_decoder() to emit the DecodeStruct
implementation for Decoded_VkBaseOutStructure
- vulkan_struct_to_json_header_generator.py: add
write_base_out_to_json_func() to emit the FieldToJson declaration
Hand-written changes:
- struct_pointer_encoder.h: add VkBaseOutStructure* specialization
of EncodeStructPtr that calls EncodePNextStruct instead of the
generic EncodeStruct (which has no overload for that type)
- custom_vulkan_api_call_encoders.{h,cpp}: remove the now-superseded
stub that called through to the instance table without capturing;
the generated encoder handles this function correctly
…cking
Handle VK_DESCRIPTOR_TYPE_TENSOR_ARM in all VkWriteDescriptorSet switch
statements — handles live in the VkWriteDescriptorSetTensorARM pNext struct,
matching the pattern of VK_DESCRIPTOR_TYPE_ACCELERATION_STRUCTURE_KHR:
- mark as pNext-chain type in EncodeStruct, UnwrapStructHandles, and
MapStructHandles so no spurious "unrecognized descriptor type" warnings
are emitted during capture or replay
- allocate tensor_views array in DescriptorInfo for TENSOR_ARM bindings
- populate handle IDs and descriptor_sets_bound_to on write and copy
- unlink tensor views from descriptor sets on DestroyState
- emit VkWriteDescriptorSetTensorARM pNext during trim state write
- validate tensor view handle liveness in CheckDescriptorStatus
Also move VK_DESCRIPTOR_TYPE_TENSOR_ARM to the silent pNext-chain group
in custom_vulkan_struct_to_json.cpp — the data is already serialized via
FieldToJson(pNext), so the warning was a false alarm.
- Add kBindPoint_data_graph to PipelineBindPoints enum and the
corresponding VK_PIPELINE_BIND_POINT_DATA_GRAPH_ARM switch case in
VkPipelinePointToPipelinePoint(); previously hit the error/assert path
- Add explicit DestroyState overloads for TensorARMWrapper and
DataGraphPipelineSessionARMWrapper to preserve create_parameters for
trimmed capture (template fallback would null them)
- Add VK_OBJECT_TYPE_TENSOR_ARM / _TENSOR_VIEW_ARM /
_DATA_GRAPH_PIPELINE_SESSION_ARM cases to the debug name/tag switch
- Initializer specializations capture creation-time dependencies (tensor properties/size, session→pipeline chain, pipeline→shader/layout deps). - State writer wires in tensor/session create, memory bind, and pipeline state reconstruction, following the existing trim patterns. - Adds format::MetaDataType::kInitTensorCommand and InitTensorCommandHeader. - Device-local tensor staging copy stubbed with a warning. # Conflicts: # framework/format/format.h
…hain - Add format::MetaDataType::kInitTensorCommand block parsing, InitTensorArgs payload, and dispatch through ApiDecoder/VulkanDecoderBase to consumers. - ProcessInitTensorCommand replays tensor data via the resource initializer, matching the existing InitBuffer/InitImage pattern.
…pport DataGraphPipelineSessionARMWrapper stored only the last (bindPoint, objectIndex) pair from vkBindDataGraphPipelineSessionMemoryARM, causing trimmed-capture state reconstruction to emit just one bind call per session. Replace the single bind_point/object_index fields with a vector of MemoryBinding entries; PostProcess upserts on (bindPoint, objectIndex) and the state writer iterates all entries. OverrideCreateTensorARM modified the decoded VkTensorDescriptionARM in place via const_cast, which is UB. Copy the description into a local struct and redirect pDescription to it instead.
- add nullptr checks, simplify couple places - codegen: warn for default-case fallthrough in AllocateAppropriate - mark one cmd-injection - add comment about why TRANSFER bits are added - log_warning_once in vkGetTensorOpaqueCaptureDescriptorDataARM
- Implement ReadFromTensorResource via CmdCopyTensorARM - Add OverrideCreateTensorARM: inject VK_TENSOR_USAGE_TRANSFER_SRC_BIT_ARM into tensor creation for trimming. - VulkanResourcesUtil: add StagingTensorContext + Create/Destroy StagingTensor helpers. Consolidate with existing code to allow reuse. - ReadFromTensorResource: creates a staging tensor matching the source description, records a full-tensor CmdCopyTensorARM, submits, maps, and memcopies the result — replacing the FEATURE_NOT_PRESENT stub.
a8a1fb0 to
a2f3877
Compare
MarkY-LunarG
left a comment
There was a problem hiding this comment.
Your new commit looks good too.
Adds initial gfxreconstruct support for two ARM Vulkan extensions:
VK_ARM_tensors: VkTensorARM / VkTensorViewARM objects and their memorybinding, descriptor writes (VK_DESCRIPTOR_TYPE_TENSOR_ARM), and trimmed
capture state reconstruction.
VK_ARM_data_graph: VkDataGraphPipelineARM creation, VkDataGraphPipelineSessionARMcreation/memory binding, and vkCmdDispatchDataGraphARM recording.
Both extensions are removed from
kVulkanUnsupportedDeviceExtensionso thelayer no longer strips them from device feature negotiation.
Capture (encode)
FAIL_ON_COMPILE_REQUIRED early-out and the group-create pattern.
vkBindDataGraphPipelineSessionMemoryARM, and vkCreateTensorViewARM in
VulkanCaptureManager.
TrackDataGraphPipelineSessionMemoryBinding,
VK_DESCRIPTOR_TYPE_TENSOR_ARM descriptor tracking, DestroyState
overloads, and VkObjectType debug name/tag handling for all three new
types.
VkTensorARM, VkTensorViewARM, and VkDataGraphPipelineSessionARM.
Trimmed capture (state writer)
vkBindTensorMemoryARM for all live tensors at trim start.
via a new kInitTensorCommand metadata block
bindings, including recreation of pipelines (and their shader module /
pipeline layout dependencies) that were destroyed before trim start.
descriptor set state writer.
Replay (decode)
initializer on trimmed-capture replay.
OverrideCreateDataGraphPipelineSessionARM,
OverrideBindDataGraphPipelineSessionMemoryARM,
OverrideDestroyTensorARM, OverrideDestroyDataGraphPipelineSessionARM,
OverrideCreateDataGraphPipelinesARM, and
OverrideGetTensorMemoryRequirementsARM added to the replay consumer.
compatibility against the replay device's reported formats.
create infos so the resource initializer can upload snapshot data.
Allocator support
resource and memory-binding entry points.
data graph session memory, including GetTensorMemoryUsage heuristic,
AllocateMemoryForTensor (with VMA), and BindDataGraphPipelineSessionMemory
(queries per-binding requirements from the driver at replay time).
Code generator
decoder header template.
JSON generators extended for all new VK_ARM_tensors / VK_ARM_data_graph
struct types.
Limitations
fixes #2291