Skip to content

Add capture and replay support for VK_ARM_tensors and VK_ARM_data_graph#3009

Open
fabian-lunarg wants to merge 22 commits into
LunarG:devfrom
fabian-lunarg:fabian_vk_arm_tensors_and_data_graph
Open

Add capture and replay support for VK_ARM_tensors and VK_ARM_data_graph#3009
fabian-lunarg wants to merge 22 commits into
LunarG:devfrom
fabian-lunarg:fabian_vk_arm_tensors_and_data_graph

Conversation

@fabian-lunarg

@fabian-lunarg fabian-lunarg commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Adds initial gfxreconstruct support for two ARM Vulkan extensions:

  • VK_ARM_tensors: VkTensorARM / VkTensorViewARM objects and their memory
    binding, descriptor writes (VK_DESCRIPTOR_TYPE_TENSOR_ARM), and trimmed
    capture state reconstruction.
  • VK_ARM_data_graph: VkDataGraphPipelineARM creation, VkDataGraphPipelineSessionARM
    creation/memory binding, and vkCmdDispatchDataGraphARM recording.

Both extensions are removed from kVulkanUnsupportedDeviceExtension so the
layer no longer strips them from device feature negotiation.

Capture (encode)

  • Hand-written encoder for vkCreateDataGraphPipelinesARM to handle the
    FAIL_ON_COMPILE_REQUIRED early-out and the group-create pattern.
  • PostProcess hooks for vkBindTensorMemoryARM,
    vkBindDataGraphPipelineSessionMemoryARM, and vkCreateTensorViewARM in
    VulkanCaptureManager.
  • State tracker additions: TrackTensorMemoryBinding,
    TrackDataGraphPipelineSessionMemoryBinding,
    VK_DESCRIPTOR_TYPE_TENSOR_ARM descriptor tracking, DestroyState
    overloads, and VkObjectType debug name/tag handling for all three new
    types.
  • kBindPoint_data_graph added to PipelineBindPoints.
  • InitializeState / InitializeGroupObjectState specialisations for
    VkTensorARM, VkTensorViewARM, and VkDataGraphPipelineSessionARM.

Trimmed capture (state writer)

  • WriteTensorState / WriteTensorMemoryState emit vkCreateTensorARM and
    vkBindTensorMemoryARM for all live tensors at trim start.
  • WriteTensorSnapshotState / ProcessTensorMemory upload tensor contents
    via a new kInitTensorCommand metadata block
  • WriteDataGraphPipelineSessionMemoryState reconstructs all session memory
    bindings, including recreation of pipelines (and their shader module /
    pipeline layout dependencies) that were destroyed before trim start.
  • VK_DESCRIPTOR_TYPE_TENSOR_ARM descriptor write reconstruction in the
    descriptor set state writer.

Replay (decode)

  • ProcessInitTensorCommand handler uploads tensor data via the resource
    initializer on trimmed-capture replay.
  • OverrideCreateTensorARM, OverrideBindTensorMemoryARM,
    OverrideCreateDataGraphPipelineSessionARM,
    OverrideBindDataGraphPipelineSessionMemoryARM,
    OverrideDestroyTensorARM, OverrideDestroyDataGraphPipelineSessionARM,
    OverrideCreateDataGraphPipelinesARM, and
    OverrideGetTensorMemoryRequirementsARM added to the replay consumer.
  • OverrideCreateDataGraphPipelinesARM checks optical-flow pipeline
    compatibility against the replay device's reported formats.
  • Trimmed-capture replay adds TRANSFER_SRC/DST usage bits to tensor
    create infos so the resource initializer can upload snapshot data.

Allocator support

  • VulkanDefaultAllocator: pass-through implementations for all new
    resource and memory-binding entry points.
  • VulkanRebindAllocator: full rebind implementations for tensor and
    data graph session memory, including GetTensorMemoryUsage heuristic,
    AllocateMemoryForTensor (with VMA), and BindDataGraphPipelineSessionMemory
    (queries per-binding requirements from the driver at replay time).

Code generator

  • Blacklists / replay_overrides updated for all hand-written entry points.
  • Generator fixes for VkBaseOutStructure encoder/decoder wiring and
    decoder header template.
  • Struct decoder, encoder, handle mapper, handle wrapper, deep-copy, and
    JSON generators extended for all new VK_ARM_tensors / VK_ARM_data_graph
    struct types.

Limitations

  • Memory aliasing: Deferred to a follow-up PR.

fixes #2291

@fabian-lunarg fabian-lunarg self-assigned this Jun 8, 2026
@fabian-lunarg fabian-lunarg changed the title Fabian vk arm tensors and data graph Add capture and replay support for VK_ARM_tensors and VK_ARM_data_graph Jun 8, 2026
@fabian-lunarg fabian-lunarg added the approved-to-run-ci Can run CI check on internal LunarG machines label Jun 8, 2026
@fabian-lunarg fabian-lunarg marked this pull request as ready for review June 8, 2026 13:46
@fabian-lunarg fabian-lunarg requested a review from a team as a code owner June 8, 2026 13:46

@MarkY-LunarG MarkY-LunarG left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional comments from Claude:

  Code Review Findings

  [
    {
      "file": "framework/decode/vulkan_rebind_allocator.cpp",
      "line": 3875,
      "summary": "GetTensorMemoryRequirementsARM stores capture requirements from uninitialized struct before calling the Vulkan function that populates it.",
      "failure_scenario": "resource_alloc_info->capture_mem_reqs is assigned from memory_requirements->memoryRequirements at line 3875, but the Vulkan function 
  get_tensor_memory_requirements (which fills memory_requirements) is not called until line 3878. The stored capture requirements are whatever garbage was in the caller's 
  stack-allocated VkMemoryRequirements2. During replay, VMA uses these wrong requirements to allocate backing memory, causing allocation failures or size mismatches."
    },
    {
      "file": "framework/encode/vulkan_capture_manager.h",
      "line": 1779,
      "summary": "PostProcess_vkBindDataGraphPipelineSessionMemoryARM dereferences GetWrapper() result without a null check.",
      "failure_scenario": "GetWrapper() at line 1775 returns the DataGraphPipelineSessionARMWrapper for pBindInfos[i].session. If the session handle is invalid or not yet wrapped (e.g.
  trimmed capture resumes mid-frame), wrapper is null and wrapper->memory_bindings.begin() at line 1779 crashes. The analogous PostProcess_vkBindBufferMemory includes an equivalent 
  null guard."
    },
    {
      "file": "framework/decode/vulkan_rebind_allocator.cpp",
      "line": 3812,
      "summary": "BindTensorMemory silently masks per-tensor failures when binding multiple tensors: the result variable is overwritten by each iteration, so an early failure is erased
  if a later bind succeeds.",
      "failure_scenario": "With two tensors: AllocateMemoryForTensor fails for tensor[0] (result = VK_ERROR_*). The loop does not return early; iteration i=1 calls 
  AllocateMemoryForTensor and succeeds (result = VK_SUCCESS). The function returns VK_SUCCESS, and the caller assumes all binds succeeded. Contrast with 
  BindDataGraphPipelineSessionMemory, which returns immediately on error."
    },
    {
      "file": "framework/decode/vulkan_replay_consumer_base.cpp",
      "line": 13680,
      "summary": "Hint format compatibility check uses create_info.flowVectorFormat against info.hint_formats instead of a hint-specific format field.",
      "failure_scenario": "VkDataGraphPipelineOpticalFlowCreateInfoARM has no hintFormat field. hint_formats is populated by querying 
  VK_DATA_GRAPH_OPTICAL_FLOW_IMAGE_USAGE_HINT_BIT_ARM formats, which can differ from the output flow vector formats. The check at line 13680 validates whether the flow vector output 
  format is in the hint format list — semantically wrong. A pipeline where the hint image format differs from flowVectorFormat will either incorrectly fail compatibility or incorrectly
  pass, leading to a creation error or a wrong-device fallback at replay time."
    },
    {
      "file": "framework/encode/vulkan_state_tracker_initializers.h",
      "line": 743,
      "summary": "pDimensions is dereferenced inside a dimensionCount loop without a null guard.",
      "failure_scenario": "When TrackStruct processes a VkTensorDescriptionARM where dimensionCount > 0 but pDimensions == nullptr (malformed capture or fuzzing), the access 
  desc->pDimensions[i] at line 743 is a null pointer dereference. Per spec conventions dimensionCount > 0 implies non-null pDimensions, but this is a decode/state-restore path that 
  processes potentially untrusted .gfxr file data."
    },
    {
      "file": "framework/decode/vulkan_replay_consumer_base.cpp",
      "line": 13734,
      "summary": "data_graph_optical_flow_initialized flag is read and set without synchronization, creating a latent TOCTOU race.",
      "failure_scenario": "Two threads both hit OverrideCreateDataGraphPipelinesARM simultaneously on the same physical device for the first time. Both see 
  data_graph_optical_flow_initialized == false, both enter InitializeReplayDataGraphOpticalFlowInfo, both call clear() and then populate data_graph_optical_flow_infos concurrently, 
  resulting in a data race on the vector. Currently single-threaded replay prevents this, but it is a latent bug with no protection against future multi-threaded replay dispatch."
    }
  ]

Severity:

  1. Critical
    2 - 3) High
    4-5) Medium
  2. Low

Bugs 1–3 are the most actionable: #1 will silently corrupt tensor memory allocation sizes on every rebind-allocator replay, #2 is an unconditional crash if a handle is invalid, and #3 means multi-tensor bind errors go unreported.

Comment thread framework/decode/custom_vulkan_struct_decoders.cpp Outdated
Comment thread framework/decode/vulkan_default_allocator.cpp
Comment thread framework/decode/vulkan_rebind_allocator.cpp
GFXRECON_UNREFERENCED_PARAMETER(desc);
GFXRECON_UNREFERENCED_PARAMETER(queue_family_index);
GFXRECON_UNREFERENCED_PARAMETER(data);
GFXRECON_LOG_WARNING_ONCE("Tensor trim snapshot via staging copy is not yet implemented; "

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should create an issue to track this and mention it in a comment above the warning.

@fabian-lunarg fabian-lunarg Jun 10, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the missing pieces (required additions to ResourceUtil, along with some consolidation-refactor).
so instead of an follow-up-issue there is now an additional commit which requires review

Comment thread framework/generated/generated_vulkan_struct_decoders.h
Comment thread framework/decode/vulkan_replay_consumer_base.cpp
Comment thread framework/decode/vulkan_replay_consumer_base.cpp
Comment thread framework/decode/vulkan_replay_consumer_base.cpp Outdated
Comment thread framework/encode/custom_vulkan_api_call_encoders.cpp
Comment thread framework/encode/vulkan_state_tracker_initializers.h
@fabian-lunarg

Copy link
Copy Markdown
Contributor Author

Additional comments from Claude:

went through the list, applied corrections where required (some false positives)

@fabian-lunarg fabian-lunarg force-pushed the fabian_vk_arm_tensors_and_data_graph branch 2 times, most recently from 73311af to a8a1fb0 Compare June 9, 2026 11:36
- Updated DataGraphPipelineSessionARMWrapper (pipeline_dependency, pipeline_shader_module_dependencies, pipeline_layout_dependency, pipeline_layout_dependencies)
…kDataGraphPipelineSessionMemoryBinding

- vulkan_state_tracker.cpp — implemented both (direct port of the TrackBufferMemoryBinding pattern: wrapper bind fields + bound_assets emplace)
- vulkan_capture_manager.h — added PostProcess_vkBindDataGraphPipelineSessionMemoryARM, PostProcess_vkBindTensorMemoryARM, PostProcess_vkCreateTensorViewARM
- custom_vulkan_encoder_commands.h — added CustomEncoderPostCall specialisations for all three
- Added empty VulkanTensorViewARMInfo
- Added memory_property_flags to the existing VulkanDataGraphPipelineSessionARMInfo
… and data_graph_optical_flow_initialized/data_graph_optical_flow_infos fields to VulkanReplayDeviceInfo

- vulkan_replay_consumer_base.h — added OverrideBindDataGraphPipelineSessionMemoryARM near the bind memory overrides, all 7 remaining overrides + InitializeReplayDataGraphOpticalFlowInfo before OverrideGetPastPresentationTimingGOOGLE, and SupportsDataGraphOpticalFlowPipeline in private:
- vulkan_replay_consumer_base.cpp — added all 10 implementations (~500 lines) at end of file, with ContainsFormat inlined as a local lambda
Add VulkanRebindAllocator implementations for VK_ARM_tensors and
VK_ARM_data_graph extensions:

- Add GetTensorMemoryUsage helper (mirrors GetImageMemoryUsage pattern,
  selects VMA memory type based on tensor usage flags and capture
  memory properties)
- Add CreateTensor/DestroyTensor (allocate ResourceAllocInfo, call
  through to functions_.create_tensor/destroy_tensor)
- Add CreateDataGraphPipelineSession/DestroyDataGraphPipelineSession
  (same pattern as VideoSession)
- Add AllocateMemoryForTensor (queries replay memory requirements via
  get_tensor_memory_requirements, calls VMA, populates VmaMemoryInfo)
- Add BindTensorMemory (uses GetRebindOffsetFromOriginalDeviceMemory,
  not GetRebindOffsetFromVMA — required for vk-bound resources)
- Add BindDataGraphPipelineSessionMemory (queries bind-point
  requirements at replay, allocates per-bind-point via VMA, binds)
- Add GetTensorMemoryRequirementsARM (saves capture requirements into
  ResourceAllocInfo for later rebind use)
- Add kDataGraphSession to MemoryInfoType enum (parallel to
  kVideoSession — multiple memory bindings per resource)
- Add private helper declarations (GetTensorMemoryUsage,
  AllocateMemoryForTensor) to vulkan_rebind_allocator.h
- Wire up 9 tensor/datagraph function pointers in
  vulkan_replay_consumer_base.cpp after the existing destroy_fence
  assignment

Note: aliasing-group path from ARM fork omitted (that infrastructure
does not exist in the main repo); AllocateMemoryForTensor handles the
common non-aliased case.
  Resolve all compilation errors surfaced after the code generator run:

  - vulkan_capture_manager.h: qualify GetWrapper calls with
    vulkan_wrappers:: namespace; move PostProcess_vkBind*ARM and
    PostProcess_vkCreateTensorViewARM from protected: to public: so
    CustomEncoderPostCall dispatch can reach them

  - custom_vulkan_struct_decoders.h/cpp + forward header: add hand-written
    Decoded_VkDataGraphPipelineConstantARM and DecodeStruct implementation
    (variable-length pConstantData read via VkTensorDescriptionARM pNext)

  - custom_vulkan_struct_encoders.h/cpp: add EncodeStruct for
    VkDataGraphPipelineConstantARM using vkuGetFormatInfo to compute the
    tensor constant data size from the pNext VkTensorDescriptionARM

  - blacklists.json: blacklist
    vkGetPhysicalDeviceQueueFamilyDataGraphEngineOperationPropertiesARM
    (uses VkBaseOutStructure* output parameter requiring BASE_OUT_STRUCTURE
    generator infrastructure not present in main repo)

  - custom_vulkan_api_call_encoders.h/cpp: add encoder wrappers for all
    four blacklisted ARM functions: vkCreateDataGraphPipelinesARM (full
    capture implementation), vkGetTensorOpaque/ViewCaptureDescriptorDataARM
    (no-op stubs), vkGetPhysicalDeviceQueueFamilyDataGraphEngineOperation-
    PropertiesARM (pass-through to instance table, not captured)

  - vulkan_cpp_structs.h/cpp: add stub GenerateStruct for
    VkDataGraphPipelineConstantARM (cpp consumer path not supported)

  - custom_vulkan_struct_to_json.h/cpp: add FieldToJson for
    Decoded_VkDataGraphPipelineConstantARM

  - vulkan_rebind_allocator.cpp: remove capture_id assignment in
    CreateTensor (field lives in MemoryAllocInfo, not ResourceAllocInfo,
    in the main repo — aliasing group support was not ported)
  - BindDataGraphPipelineSessionMemory: guard vma_mem_infos.emplace_back
    and UpdateAllocInfo behind null checks on resource_alloc_info /
    memory_alloc_info; free the VMA allocation and log a warning when
    allocator data is missing (trimmed captures)

  - DefaultAllocator::CreateTensor / CreateDataGraphPipelineSession:
    call Vulkan before allocating ResourceAllocInfo so the heap object
    is not leaked when the driver rejects the creation

  - VulkanTensorARMInfo::pDimensions: change vector<uint64_t> to
    vector<int64_t>, matching VkTensorDescriptionARM::pDimensions in
    the spec and TensorARMWrapper on the encode side

  - RebindAllocator::BindTensorMemory: initialise result to VK_SUCCESS
    inside the valid-args block so zero-count and all-skipped batches
    return success instead of VK_ERROR_INITIALIZATION_FAILED
  vkGetPhysicalDeviceQueueFamilyDataGraphEngineOperationPropertiesARM
  uses VkBaseOutStructure* as an output parameter whose concrete type
  varies by sType. Without proper generator support this was left as a
  pass-through stub; the generator changes now replace it.

  Generator changes (3 Python files):
  - vulkan_struct_decoders_forward_generator.py: append
    Decoded_VkBaseOutStructure forward declaration and its
    DecodeStruct prototype after the existing base-class call
  - vulkan_struct_decoders_body_generator.py: add
    write_base_out_struct_decoder() to emit the DecodeStruct
    implementation for Decoded_VkBaseOutStructure
  - vulkan_struct_to_json_header_generator.py: add
    write_base_out_to_json_func() to emit the FieldToJson declaration

  Hand-written changes:
  - struct_pointer_encoder.h: add VkBaseOutStructure* specialization
    of EncodeStructPtr that calls EncodePNextStruct instead of the
    generic EncodeStruct (which has no overload for that type)
  - custom_vulkan_api_call_encoders.{h,cpp}: remove the now-superseded
    stub that called through to the instance table without capturing;
    the generated encoder handles this function correctly
…cking

  Handle VK_DESCRIPTOR_TYPE_TENSOR_ARM in all VkWriteDescriptorSet switch
  statements — handles live in the VkWriteDescriptorSetTensorARM pNext struct,
  matching the pattern of VK_DESCRIPTOR_TYPE_ACCELERATION_STRUCTURE_KHR:

  - mark as pNext-chain type in EncodeStruct, UnwrapStructHandles, and
    MapStructHandles so no spurious "unrecognized descriptor type" warnings
    are emitted during capture or replay
  - allocate tensor_views array in DescriptorInfo for TENSOR_ARM bindings
  - populate handle IDs and descriptor_sets_bound_to on write and copy
  - unlink tensor views from descriptor sets on DestroyState
  - emit VkWriteDescriptorSetTensorARM pNext during trim state write
  - validate tensor view handle liveness in CheckDescriptorStatus

  Also move VK_DESCRIPTOR_TYPE_TENSOR_ARM to the silent pNext-chain group
  in custom_vulkan_struct_to_json.cpp — the data is already serialized via
  FieldToJson(pNext), so the warning was a false alarm.
  - Add kBindPoint_data_graph to PipelineBindPoints enum and the
    corresponding VK_PIPELINE_BIND_POINT_DATA_GRAPH_ARM switch case in
    VkPipelinePointToPipelinePoint(); previously hit the error/assert path
  - Add explicit DestroyState overloads for TensorARMWrapper and
    DataGraphPipelineSessionARMWrapper to preserve create_parameters for
    trimmed capture (template fallback would null them)
  - Add VK_OBJECT_TYPE_TENSOR_ARM / _TENSOR_VIEW_ARM /
    _DATA_GRAPH_PIPELINE_SESSION_ARM cases to the debug name/tag switch
- Initializer specializations capture creation-time dependencies (tensor
  properties/size, session→pipeline chain, pipeline→shader/layout deps).
- State writer wires in tensor/session create, memory bind, and pipeline
  state reconstruction, following the existing trim patterns.
- Adds format::MetaDataType::kInitTensorCommand and InitTensorCommandHeader.
- Device-local tensor staging copy stubbed with a warning.

# Conflicts:
#	framework/format/format.h
…hain

- Add format::MetaDataType::kInitTensorCommand block parsing, InitTensorArgs
  payload, and dispatch through ApiDecoder/VulkanDecoderBase to consumers.
- ProcessInitTensorCommand replays tensor data via the resource initializer,
  matching the existing InitBuffer/InitImage pattern.
…pport

  DataGraphPipelineSessionARMWrapper stored only the last
  (bindPoint, objectIndex) pair from vkBindDataGraphPipelineSessionMemoryARM,
  causing trimmed-capture state reconstruction to emit just one bind call
  per session. Replace the single bind_point/object_index fields with a
  vector of MemoryBinding entries; PostProcess upserts on (bindPoint,
  objectIndex) and the state writer iterates all entries.

  OverrideCreateTensorARM modified the decoded VkTensorDescriptionARM in
  place via const_cast, which is UB. Copy the description into a local
  struct and redirect pDescription to it instead.
- add nullptr checks, simplify couple places
- codegen: warn for default-case fallthrough in AllocateAppropriate
- mark one cmd-injection
- add comment about why TRANSFER bits are added
- log_warning_once in vkGetTensorOpaqueCaptureDescriptorDataARM
- Implement ReadFromTensorResource via CmdCopyTensorARM
- Add OverrideCreateTensorARM:
  inject VK_TENSOR_USAGE_TRANSFER_SRC_BIT_ARM into tensor creation for trimming.
- VulkanResourcesUtil: add StagingTensorContext + Create/Destroy StagingTensor helpers.
  Consolidate with existing code to allow reuse.
- ReadFromTensorResource: creates a staging tensor matching the source
  description, records a full-tensor CmdCopyTensorARM, submits, maps, and
  memcopies the result — replacing the FEATURE_NOT_PRESENT stub.
@fabian-lunarg fabian-lunarg force-pushed the fabian_vk_arm_tensors_and_data_graph branch from a8a1fb0 to a2f3877 Compare June 10, 2026 07:26

@MarkY-LunarG MarkY-LunarG left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your new commit looks good too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved-to-run-ci Can run CI check on internal LunarG machines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for VK_ARM_tensors and VK_ARM_data_graph

2 participants