Skip to content

Vulkan validation layer and decoder errors and assert fixes#199

Merged
zlatinski merged 7 commits intomainfrom
vulkan-validation-layer-errors-fixes
Feb 10, 2026
Merged

Vulkan validation layer and decoder errors and assert fixes#199
zlatinski merged 7 commits intomainfrom
vulkan-validation-layer-errors-fixes

Conversation

@zlatinski
Copy link
Contributor

No description provided.

@zlatinski
Copy link
Contributor Author

zlatinski commented Feb 4, 2026

dabrain34, @srinathkr-nv and @krajunv, please review these changes.


// For show existing frame, skip stale reference invalidation and return early.
// Show existing frames don't have a valid frameEncodeEncodeOrderNum since they
// don't represent a new encoded frame - they just display a previously encoded one.
Copy link
Contributor

@krajunv krajunv Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a new problem. I did not see any issue when encoding. Either it was holding correct state or initialized to some non-negative value.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed that the assert() was added with commit#1a883255 and I think this check uncovered the issue in debug mode.

Your change looks good to me.

// Show existing frames don't have a valid frameEncodeEncodeOrderNum since they
// don't represent a new encoded frame - they just display a previously encoded one.
if (pFrameInfo->bShowExistingFrame) {
pFrameInfo->stdPictureInfo.refresh_frame_flags = (uint8_t)m_dpbAV1->GetRefreshFrameFlags(pFrameInfo->bShownKeyFrameOrSwitch, pFrameInfo->bShowExistingFrame);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding this line is fine, but updating refresh_frame_flags isn’t required because it isn’t encoded when show_existing_frame is 1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for looking at the changes, @krajunv! Can you please be more specific on the change you want me to make?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line doesn’t affect the show_existing_frame=1 frame header encoding and can be safely removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the refresh_frame_flags is not present in the bitstream when show_existing_frame=1. But we would still want to keep this line because:

Vulkan API validation - The driver may still validate the struct fields even if it doesn't encode them. Setting it to 0 ensures no validation warnings about unexpected non-zero values.

Internal DPB tracking - While the driver won't write it to the bitstream, it may still use the value internally to verify no reference buffer updates are being requested.

Defensive coding - If a future Vulkan implementation or validation layer starts checking this field, having it correctly set to 0 prevents issues.

Bottom line: It's technically unnecessary for bitstream generation, but harmless and follows the principle of keeping the struct consistent with what the encode operation semantically means (no reference updates).

Do you have a strong objection on this, @krajunv?

When creating combined image views for multi-planar YCbCr formats that
have STORAGE_BIT in their image usage, the view must restrict its usage
via VkImageViewUsageCreateInfo since multi-planar formats don't support
storage operations (only per-plane R8/RG8 views do).

This fixes VUID-VkImageViewCreateInfo-usage-02275 validation errors.

Reference: https://gitlab.khronos.org/vulkan/vulkan/-/issues/4624
Signed-off-by: Tony Zlatinski <tzlatinski@nvidia.com>
…YCbCr

Multi-planar formats require VkSamplerYcbcrConversion for sampling.
Combined image views created without YCbCr conversion cannot have
SAMPLED_BIT in their usage. Use VkImageViewUsageCreateInfo to exclude
both STORAGE_BIT and SAMPLED_BIT from combined views.

This fixes VUID-VkImageViewCreateInfo-format-06415 validation errors.

Reference: https://gitlab.khronos.org/vulkan/vulkan/-/issues/4624
Signed-off-by: Tony Zlatinski <tzlatinski@nvidia.com>
…formats

The samplerYcbcrConversion feature must be enabled during device creation
to use VkSamplerYcbcrConversion objects with multi-planar YCbCr formats.

This fixes VUID-vkCreateSamplerYcbcrConversion-None-01648 validation error.

Signed-off-by: Tony Zlatinski <tzlatinski@nvidia.com>
1. Set layerCount=0 when rateControlMode is DEFAULT or DISABLED
   (fixes VUID-VkVideoEncodeRateControlInfoKHR-rateControlMode-08248)

2. Ensure idrPeriod >= gopFrameCount by setting idrPeriod=0 when
   gopFrameCount is larger (e.g., UINT32_MAX for infinite GOP)
   (fixes VUID-VkVideoEncodeH264RateControlInfoKHR-idrPeriod-08284)

Signed-off-by: Tony Zlatinski <tzlatinski@nvidia.com>
When a new sequence starts (not a reconfigure), the DPB is reset via
Deinit() which sets m_dpbMaxSize=0. However, m_pictureToDpbSlotMap and
m_dpbSlotsMask still contained old values from the previous sequence.

When ResetPicDpbSlots() tried to free old slots, it called FreeSlot()
with slot indices that were valid in the old DPB but exceed the new
m_dpbMaxSize, causing an assertion failure:
  'assert((uint8_t)slot < m_dpbMaxSize)' failed

The fix clears m_pictureToDpbSlotMap and m_dpbSlotsMask before
initializing the new DPB when starting a fresh sequence.

This fixes the H264 clip-c test which has multiple resolution changes
(352x288 -> 176x144) within the same stream.

Signed-off-by: Tony Zlatinski <tzlatinski@nvidia.com>
For SVC (scalable video coding) content with multiple spatial layers,
the tile count across all layers in a single frame can exceed the
64-element limit of the tileOffsets[] and tileSizes[] arrays.

Without bounds checking, writes beyond element 64 would overflow into
the pStdSps pointer field, corrupting it with tile offset/size data.
This caused segfaults when decoding SVC content like L2T1 (2 spatial
layers, 1 temporal layer) streams.

GDB analysis showed:
- tileCount reached 70-170 across multi-layer frames
- pStdSps was corrupted to values like 0x31d2000031d0 (tile offsets)
- Crash occurred when dereferencing the corrupted pointer

Fix: Add bounds check in ParseObuTileGroup() to reject frames with
more than 64 tiles, preventing the buffer overflow.

Signed-off-by: Tony Zlatinski <tzlatinski@nvidia.com>
…dling)

When encoding AV1 with B-frames (consecutiveBFrameCount > 0), the
encoder creates "show existing frame" P-frames that display previously
encoded frames. These frames have bShowExistingFrame=true.

The issue was in ProcessDpb():
- frameEncodeEncodeOrderNum is only assigned in StartOfVideoCodingEncodeOrder
  when bShowExistingFrame=false
- ProcessDpb called InvalidateStaleReferenceFrames() with the uninitialized
  frameEncodeEncodeOrderNum (UINT64_MAX) before checking bShowExistingFrame
- This triggered the assertion: frameEncodeEncodeOrderNum <= UINT32_MAX

Fix: Move the bShowExistingFrame early return check before the assertion
and InvalidateStaleReferenceFrames() call. Show existing frames don't need
stale reference invalidation since they don't encode a new frame.

Command line that triggered this crash:
  vk-video-enc-test -i input.yuv -c av1 --inputWidth 352 --inputHeight 288 \
    --inputChromaSubsampling 420 --numFrames 30 -o output.ivf \
    --gopFrameCount 16 --consecutiveBFrameCount 3

This fixes the AV1_gop_16_b3 encoder test.

Signed-off-by: Tony Zlatinski <tzlatinski@nvidia.com>
@zlatinski zlatinski force-pushed the vulkan-validation-layer-errors-fixes branch from b68bce2 to 3ef4fd1 Compare February 10, 2026 20:25
@zlatinski zlatinski merged commit 842ba4a into main Feb 10, 2026
1 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants