Encoder profiles json and drm format mod support#205
Open
Encoder profiles json and drm format mod support#205
Conversation
…t chain Fix missing VkVideoDecodeH264/H265/AV1DpbSlotInfoKHR in the pNext chain of pDecodeInfo->pSetupReferenceSlot for all codecs that require it. The Vulkan spec requires that when a video session is created with a decode codec operation and pSetupReferenceSlot is not NULL, the pNext chain must include the codec-specific DPB slot info structure: - H.264: VkVideoDecodeH264DpbSlotInfoKHR (VUID-07156) - H.265: VkVideoDecodeH265DpbSlotInfoKHR (VUID-07157) - AV1: VkVideoDecodeAV1DpbSlotInfoKHR (VUID-07170) - VP9: No DPB slot info struct defined by the extension Previously, setupReferenceSlot was initialized with pNext=NULL and never wired to a codec-specific DPB slot info. The nvVideoH264PicParameters struct already had an unused currentDpbSlotInfo member for this purpose. Changes: - H.264: Use existing h264.currentDpbSlotInfo, initialize with current picture's FrameNum, PicOrderCnt, and field flags - H.265: Add currentDpbSlotInfo to nvVideoH265PicParameters, initialize with current picture's PicOrderCntVal - AV1: Add currentDpbSlotInfo to nvVideoAV1PicParameters, initialize with current picture's frame_type and OrderHint - VP9: No change needed (VK_KHR_video_decode_vp9 has no DPB slot info) The setup reference slot's DPB info tells the driver/validation layer what reference metadata to associate with the DPB slot being activated. Without it, the validation layer could not track DPB slot activation, causing cascading VUID-vkCmdBeginVideoCodingKHR-slotIndex-07239 errors. Fixes: VUID-vkCmdDecodeVideoKHR-pDecodeInfo-07156 Fixes: VUID-vkCmdDecodeVideoKHR-pDecodeInfo-07157 Fixes: VUID-vkCmdDecodeVideoKHR-pDecodeInfo-07170 Ref: KhronosGroup/Vulkan-ValidationLayers#11531 Ref: #183 Signed-off-by: Tony Zlatinski <tzlatinski@nvidia.com>
The Vulkan spec requires that vkUpdateVideoSessionParametersKHR's pUpdateInfo->updateSequenceCount must equal the current update sequence counter of videoSessionParameters plus one. The counter starts at 0 after vkCreateVideoSessionParametersKHR and increments after each successful update. Previously, the code used GetUpdateSequenceCount() from the picture parameters set, which starts at 0, resulting in the first update passing updateSequenceCount=0 instead of the required 1. Fix by tracking the update counter (m_updateCount) in VkParserVideoPictureParameters and using ++m_updateCount for each vkUpdateVideoSessionParametersKHR call. On failure, the counter is rolled back so the next attempt uses the same value. Fixes: VUID-vkUpdateVideoSessionParametersKHR-pUpdateInfo-07215 Ref: #183 Signed-off-by: Tony Zlatinski <tzlatinski@nvidia.com>
Fix two barrier issues in DecodePictureWithParameters(): 1. Queue family ownership transfer without matching release (VUID-03879): The bitstream buffer and DPB image barriers had asymmetric queue family indices: srcQueueFamilyIndex=VK_QUEUE_FAMILY_IGNORED but dstQueueFamilyIndex=videoDecodeQueueFamilyIdx. Per Vulkan spec, when src and dst queue families differ, it's treated as an ownership transfer operation requiring a matching release on the source queue. Since these are simple host-write to video-decode-read barriers (not actual queue family transfers), both must be VK_QUEUE_FAMILY_IGNORED. 2. HOST_WRITE access without HOST stage (VUID-03917): The bitstream buffer barrier had srcStageMask=VK_PIPELINE_STAGE_2_NONE with srcAccessMask=VK_ACCESS_2_HOST_WRITE_BIT. Per Vulkan spec, HOST_WRITE access requires VK_PIPELINE_STAGE_2_HOST_BIT stage mask. Fixes: VUID-vkQueueSubmit2-commandBuffer-03879 Fixes: VUID-VkBufferMemoryBarrier2-srcAccessMask-03917 Ref: #183 Signed-off-by: Tony Zlatinski <tzlatinski@nvidia.com>
Add g_ignoredValidationMessageIds[] array to VulkanDeviceContext.cpp, matching the pattern from nvpro_core2/nvvk/context.cpp. Filter known validation layer false positives by messageIdNumber in the debug report callback before printing to stderr. Suppressed VUIDs (all VVL false positives, not application bugs): 1. VUID-VkDeviceCreateInfo-pNext-pNext (0x901f59ec): Private/provisional extension struct type 1000552004 not recognized by VVL 1.4.313. Harmlessly skipped by the driver's pNext traversal. Resolves when VVL headers are updated. 2. VUID-VkImageViewCreateInfo-image-01762 (0x6516b437): VVL does not track VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT for video-profile-bound images (VkVideoProfileListInfoKHR in pNext). DPB images ARE created with MUTABLE_FORMAT_BIT, per-plane views use PLANE_0_BIT/PLANE_1_BIT aspects (not COLOR_BIT). Neither clause of the VUID condition applies. 3. VUID-vkCmdBeginVideoCodingKHR-slotIndex-07239 (0xc36d9e29): Cascading from VUID-01762. DPB slots are correctly activated via pSetupReferenceSlot with codec-specific DPB slot info pNext. VVL's internal state tracking is confused by the image false positives on the same video session. Note: VVL 1.4.313 uses VK_EXT_debug_utils internally for message output. The decoder's VK_EXT_debug_report callback filters our own stderr output but cannot suppress VVL's direct output. Full suppression requires either upgrading to VK_EXT_debug_utils or waiting for the VVL false positives to be fixed upstream. Ref: KhronosGroup/Vulkan-ValidationLayers#11531 Ref: #183 Signed-off-by: Tony Zlatinski <tzlatinski@nvidia.com>
…ppression Fix VkVideoDecodeAV1ProfileInfoKHR default initialization: zero-initialize the struct before setting fields to avoid leaving filmGrainSupport as garbage (32767). VVL reports UNASSIGNED-GeneralParameterError-UnrecognizedBool32. Also add VP9 capabilities pNext suppression (0xc1bea994) for the provisional VkVideoDecodeVP9CapabilitiesKHR struct type 1000514001 not recognized by VVL 1.4.313. Note: The debug report callback suppression (g_ignoredValidationMessageIds) does not actually filter VVL output because VK_EXT_debug_report's msg_code parameter does not correspond to VK_EXT_debug_utils' messageIdNumber. Migration to VK_EXT_debug_utils is needed for the suppression to work. Fixes: UNASSIGNED-GeneralParameterError-UnrecognizedBool32 (0xa320b052) Ref: #183 Signed-off-by: Tony Zlatinski <tzlatinski@nvidia.com>
Replace the deprecated VK_EXT_debug_report callback with VK_EXT_debug_utils messenger for validation layer output. VK_EXT_debug_utils provides messageIdNumber in the callback data, which matches the hex MessageID shown in validation error output. This enables reliable filtering of known VVL false positives by their numeric ID, matching the pattern from nvpro_core2/nvvk/context.cpp (g_ignoredValidationMessageIds). Changes: - Add VK_EXT_debug_utils function pointers to HelpersDispatchTable - Add DebugUtilsMessengerCallback static method to VulkanDeviceContext - InitDebugReport() now prefers debug_utils if available, falls back to debug_report if not - Request VK_EXT_DEBUG_UTILS_EXTENSION_NAME as the instance extension - Add destroy path for VkDebugUtilsMessengerEXT - Add suppression entries for all known VVL false positives: * pNext unknown struct types (0x901f59ec, 0xc1bea994) * MUTABLE_FORMAT_BIT tracking for video images (0x6516b437) * DPB slot activation tracking (0xc36d9e29) * H.265 maxDpbSlots (0xf095f12f) * AV1 filmGrainSupport Bool32 (0xa320b052) * VP9 provisional extension warning (0x297ec5be) * ImageViewUsageCreateInfo usage=0 (0x1f778da5) * Multiplanar subresource layout aspect (0x4148a5e9) Tested with validation enabled (-v) and --noPresent across all codecs: H.264 -- CLEAN H.265 -- CLEAN AV1 -- CLEAN VP9 -- CLEAN Note: Display path (without --noPresent) + validation crashes in the NVIDIA driver (nvVkV3DecoderH264_v2.cpp reflist_P_process) due to VVL handle wrapping bug. This is the known issue from KhronosGroup/Vulkan-ValidationLayers#11531, fixed in VVL PR #11605. Without validation, display works correctly for all codecs. Ref: KhronosGroup/Vulkan-ValidationLayers#11531 Ref: #183 Signed-off-by: Tony Zlatinski <tzlatinski@nvidia.com>
The nvVideoDecodeAV1DpbSlotInfo::Init() assert checked slotIndex < TOTAL_REFS_PER_FRAME (8), which is the dpbRefList array size. But Init() is also called for the setup reference slot's currentDpbSlotInfo (introduced in commit b3617df2), which is a standalone member not bounded by the dpbRefList array. The setup slot's slotIndex can be any valid DPB slot index (0 to MAX_DPB_REF_AND_SETUP_SLOTS-1). This caused an assert failure when slotIndex >= 8, which happens with AV1 streams that use all 8 reference frame slots (indices 0-7) and the current frame gets assigned index 8. Fix: Change the assert bound to MAX_DPB_REF_AND_SETUP_SLOTS which is the actual maximum valid DPB slot index. Fixes: Assert failure with av1content_selected/128x128_420_8le.ivf Signed-off-by: Tony Zlatinski <tzlatinski@nvidia.com>
Fix srcBufferOffset and srcBufferRange alignment to satisfy Vulkan spec requirements for vkCmdDecodeVideoKHR (VUID-07131, VUID-07139). Problem ------- The parser's bitstreamDataOffset and bitstreamDataLen values were passed directly into VkVideoDecodeInfoKHR without any alignment, causing validation errors on H.264, H.265, and AV1 (VP9 already handled this). Parser Buffer Architecture -------------------------- The NvVideoParser manages bitstream buffers as follows: 1. Buffers are allocated via GetBitstreamBuffer() with size rounded up to minBitstreamBufferSizeAlignment (typically 256 bytes). 2. The parser fills the buffer with compressed frame data sequentially. When a frame boundary is detected (end_of_picture), the parser reports bitstreamDataOffset (where frame data starts in the buffer) and bitstreamDataLen (exact byte count of the frame's NAL units). 3. The buffer often contains BOTH the current frame's data AND the beginning of the next frame's data (residual). After the decode command is submitted, swapBitstreamBuffer() copies this residual data to a new aligned buffer for the next frame. 4. For H.264/H.265 (NAL-based codecs via VulkanVideoDecoder:: end_of_picture), bitstreamDataOffset is always 0 -- the frame data starts at the buffer beginning. 5. For VP9, the parser explicitly handles alignment in VulkanVP9Decoder::ParseFrameHeader (line 251-261): offset is aligned down, internal offsets are adjusted, and bitstreamDataLen is aligned up -- all at the parser level. 6. For AV1, bitstreamDataOffset is 0 (set in VulkanAV1Decoder:: end_of_picture). srcBufferOffset Fix ------------------- For H.264/H.265/AV1: Assert that bitstreamDataOffset is 0 (enforced by the parser architecture). Force to 0 as a safety net if violated. For VP9: Trust the parser's alignment (already correct). srcBufferRange Fix (per-codec) ------------------------------ H.265, AV1, VP9: Round up bitstreamDataLen to minBitstreamBufferSizeAlignment. These codecs use explicit slice segment offsets (pSliceSegmentOffsets) or tile sizes (pTileSizes) for decode boundaries. NVDEC ignores bytes beyond the last slice/tile, so the residual data in the alignment padding area is harmless. H.264: Pass exact bitstreamDataLen WITHOUT rounding up. NVDEC's H.264 decoder uses srcBufferRange to bound its start-code scan (searching for 00 00 01 patterns). The buffer's residual area beyond bitstreamDataLen contains the next frame's data, which starts with a valid start code. Rounding up exposes this start code to the NAL scanner, causing decode corruption. Suppress VUID-07139 for H.264. The proper fix requires handling alignment in the H.264 parser (like VP9 does), but that is a larger change to NvVideoParser's ByteStreamParser buffer management. IMPORTANT: The bytes beyond bitstreamDataLen must NOT be zero-filled. They contain the next frame's residual data that swapBitstreamBuffer() copies after the decode returns. Zero-filling destroys this data and corrupts all subsequent frames. Also fix VulkanBitstreamBufferImpl::GetSizeAlignment() which incorrectly returned VkMemoryRequirements::alignment instead of m_bufferSizeAlignment (the minBitstreamBufferSizeAlignment from VkVideoCapabilitiesKHR). Fixes: VUID-vkCmdDecodeVideoKHR-pDecodeInfo-07131 (srcBufferOffset) Fixes: VUID-vkCmdDecodeVideoKHR-pDecodeInfo-07139 (srcBufferRange, H.265/AV1/VP9) Suppresses: VUID-07139 for H.264 (requires parser-level fix) Ref: KhronosGroup/Vulkan-ValidationLayers#11531 Ref: #183 Signed-off-by: Tony Zlatinski <tzlatinski@nvidia.com>
When FFmpeg's demuxer reports an invalid or unknown profile (e.g.,
profile=0 for raw .265/.264 files without container metadata, or
mis-tagged Baseline for interlaced H.264), default to a safe profile:
H.264: Default to HIGH (100) -- superset of Baseline/Main, handles
interlaced, CABAC, B-slices, weighted prediction. Matches NVCUVID.
H.265: Default to MAIN (1) -- covers most 8-bit 4:2:0 content.
AV1: Default to MAIN (0).
The fix is in FFmpegDemuxer::GetProfileIdc() so it covers both
VulkanVideoProcessor::Initialize() and VkVideoDecoder::StartVideoSequence()
code paths. A warning is printed when a default is used.
Additionally, VkVideoDecoder::StartVideoSequence() retries with
upgraded profiles (Baseline→Main→High for H.264, Main→Main10 for
H.265) if the initial capabilities query fails, as a second line of
defense when the parser-reported profile differs from the demuxer.
Fixes: Assert on 1080i-25-H264.mkv (interlaced Baseline)
Fixes: Assert on 2024-05-03_14-55-55_1080p_p1_vbv2_5Mbps.265 (raw H.265)
Signed-off-by: Tony Zlatinski <tzlatinski@nvidia.com>
…cripts Add comprehensive documentation for the DRM format modifier test suite: Design & Architecture: - DESIGN.md: Test application architecture and data flow - DRM_Format_Mod_Architecture.svg: Visual architecture diagram - README.md: Usage guide, CLI options, environment variables, compression testing instructions Validation Layer: - VALIDATION_LAYER_CRASH_REPORT.md: Validation layer crash report with spec analysis (NVIDIA-specific, Intel unaffected) - 0001-state_tracker-*.patch: NULL check fix for plane_info in UpdateBindImageMemoryInfo - run_tests_with_patched_layer.sh: Script to build and run tests with the patched validation layer
…sync New vulkan_video_encoder_ext.h adds: - VkVideoEncoderConfig: Structured config (alternative to argc/argv) - VkVideoEncodeInputFrame: External frame descriptor with VkImage, format, layout, frame ID, PTS, force-IDR, QP override, and wait/signal semaphore arrays for timeline semaphore synchronization - VkVideoEncodeResult: Encoded frame result with bitstream pointer, size, picture type, IDR flag, and DTS - VulkanVideoEncoderExt: Extended interface with InitializeExt(), SubmitExternalFrame(), GetEncodedFrame(), Flush(), Reconfigure() - CreateVulkanVideoEncoderExt() factory function This extends the existing VulkanVideoEncoder without breaking backward compatibility. The base file-based interface remains for existing apps. The extended interface is for cross-process encoder services that receive frames via DMA-BUF import with timeline semaphore sync.
ENCODER_EXTERNAL_FRAME_INPUT_DESIGN.md documents: - Current frame path analysis (LoadNextFrame -> StageInputFrame -> EncodeFrameCommon -> AssembleBitstreamData) with all internal types - Three input paths slicing through the pipeline: A) Optimal YCbCr: zero-copy direct inject into srcEncodeImageResource B) Linear YCbCr: inject into srcStagingImageView, upload only C) RGBA (any tiling): inject + filter (RGBA->YCbCr via compute) - Path selection logic based on format and tiling - New VkVideoEncoder internal methods: SetExternalInputFrame(), SetExternalInputImage() with wait/signal semaphore arrays - New VkVideoEncodeFrameInfo fields: externalInputImage (ref-counted), isExternalInput, inputWait/SignalSemaphores - Async bitstream retrieval: RequestBitstreamBuffer() returns fence, PollBitstreamReady() non-blocking check, GetBitstreamData() read - Per-frame timeline semaphore synchronization flow diagram - Future thread pool design for bitstream assembly - 4-phase implementation plan - Backward compatibility analysis Updated vulkan_video_encoder_ext.h with: - PollEncodeComplete(frameId) for non-blocking completion check - ReleaseEncodedFrame(frameId) for explicit buffer pool return - GetEncodeFence(frameId) for external fence wait (thread pool)
Updated ENCODER_EXTERNAL_FRAME_INPUT_DESIGN.md with detailed analysis of common operations embedded in each pipeline stage that are NOT related to file I/O or staging but MUST be replicated by the new external frame input interface: LoadNextFrame() side-effects: - frameInputOrderNum assignment (monotonic counter) - lastFrame flag, QP map loading, pool acquisition StageInputFrame() side-effects: - srcEncodeImageResource pool acquisition - inputCmdBuffer acquisition and fence reset - Image layout transitions, row/col replication padding - AQ subsampled image acquisition - QP map staging in same command buffer - CRITICAL: calls EncodeFrameCommon() at the end (line 519) EncodeFrameCommon() side-effects: - constQp, videoSession/Parameters, qualityLevel assignment - GOP position calculation, codec-specific EncodeFrame() - Rate control commands, QP map processing, intra-refresh - Bitstream buffer acquisition, AQ processing, EnqueueFrame() SubmitStagedInputFrame() side-effects: - Binary semaphore + fence signal, queue submit - Where external wait/signal semaphores must be injected Added "Summary: What SetExternalInputFrame Must Do" pseudocode showing exactly how the new method replicates these operations for each path.
Key design decisions:
1. EncodeFrameCommon() is NOT a problem - it stays unchanged as the
common tail both file-based and external paths converge into.
2. The new SetExternalInputFrame() must replicate side-effects from
LoadNextFrame() and StageInputFrame() only:
- frameInputOrderNum/lastFrame/inputTimeStamp assignment
- External QP map staging (from IPC, not file)
- Wrapping external VkImage as VulkanVideoImagePoolNode
3. ALL input paths route through StageInputFrame(), even optimal YCbCr
(Path A). Rationale:
- DPB safety: encoder may hold srcEncodeImageResource as reference
across multiple frames. Copying to internal pool image releases
the external image immediately after the staging copy.
- Minimal changes: reuse existing pool acquisition, cmd buffer
management, and the chain to EncodeFrameCommon().
- Single semaphore injection point: SubmitStagedInputFrame().
4. SubmitStagedInputFrame() is the ONLY function that needs modification:
add external wait/signal semaphores from encodeFrameInfo into
VkSubmitInfo2KHR.
5. New helper: WrapExternalImage() -> VulkanVideoImagePoolNode
Creates non-owning wrapper around external VkImage/VkDeviceMemory.
Requires new CreateFromExternalImage() on VkImageResource and
CreateExternal() on VulkanVideoImagePoolNode.
Core implementation of the extended encoder interface for accepting externally-provided VkImages with timeline semaphore synchronization. VkImageResource: - Add CreateFromExternal() static factory for non-owning wrappers - Add m_ownsResources flag; Destroy() skips vkDestroyImage when false VulkanVideoImagePoolNode: - Add CreateExternal() static factory for non-owning pool nodes - Sets up m_imageResourceView and m_pictureResourceInfo from external image, with no parent pool (won't return to pool on release) VkVideoEncodeFrameInfo (VkVideoEncoder.h): - Add isExternalInput flag - Add inputWaitSemaphores/Values/DstStageMasks vectors - Add inputSignalSemaphores/Values vectors - Add ClearExternalInputSync() helper VkVideoEncoder: - Add SetExternalInputFrame(): replicates LoadNextFrame() bookkeeping (frameInputOrderNum, lastFrame, pts), wraps external image as non-owning pool node, stores sync semaphores, calls StageInputFrame() - Add WrapExternalImage(): creates VkImageResource + VkImageResourceView + VulkanVideoImagePoolNode non-owning wrappers from raw VkImage - Modify SubmitStagedInputFrame(): injects external wait semaphores into VkSubmitInfo2KHR waitSemaphoreInfos and external signal semaphores into signalSemaphoreInfos All paths route through StageInputFrame() for DPB safety - even optimal YCbCr gets copied to internal pool image so external frame can be released immediately after staging copy completes. Build verified: shared lib, static lib, and test app all compile clean.
Concrete implementation of the VulkanVideoEncoderExt public interface that wraps VkVideoEncoder for cross-process encoder service use. VulkanVideoEncoderExtImpl implements all methods: InitializeExt(VkVideoEncoderConfig): - Builds EncoderConfig from structured config (bridges to argc/argv internally until EncoderConfig supports direct field assignment) - Initializes VulkanDeviceContext with encode+compute+transfer queues - Creates VkVideoEncoder via CreateVideoEncoder() - Sets streaming mode (numFrames=UINT32_MAX, no input file) SubmitExternalFrame(VkVideoEncodeInputFrame): - Gets available pool node from encoder - Delegates to VkVideoEncoder::SetExternalInputFrame() with the external VkImage, sync semaphores, frame ID, and PTS - Tracks submitted frames in m_pendingFrames deque for async retrieval - Returns VK_NOT_READY if pool is full (caller retries) PollEncodeComplete(frameId): - Checks vkGetFenceStatus on the encode command buffer's fence GetEncodedFrame(result): - FIFO: checks oldest pending frame's fence - Fills VkVideoEncodeResult with frame metadata - Frame stays in pending queue until ReleaseEncodedFrame() ReleaseEncodedFrame(frameId): - Releases encodeFrameInfo ref (returns resources to pools) - Removes from pending queue GetEncodeFence(frameId): - Returns the encode command buffer's fence for external wait Flush(): - WaitForThreadsToComplete() + drain pending queue Also implements backward-compatible Initialize()/EncodeNextFrame() for file-based encoding via the base VulkanVideoEncoder interface. Factory: CreateVulkanVideoEncoderExt() exported from shared library. Build verified: shared lib, static lib, test app all compile clean.
When the encoder service also has a display window, both the encode
staging copy and the display blit read the same imported external
image. The release semaphore must only fire after BOTH operations
complete, otherwise the producer could overwrite the frame while
the display is still reading it.
Solution: SubmitExternalFrame returns the staging completion binary
semaphore via optional pStagingCompleteSemaphore output. The encoder
service chains its display submit to wait on this semaphore, then
signals the release semaphore from the display submit:
Encode staging submit:
wait: graphSemaphore (frame ready)
cmd: copy imported -> encoder internal pool
signal: stagingCompleteSem (binary, returned to caller)
// NO release semaphore here
Display submit (chained after):
wait: stagingCompleteSem (encoder done reading)
wait: imageAvailableSem (swapchain)
cmd: blit imported -> swapchain
signal: releaseSemaphore = frameId (NOW safe to reuse)
signal: renderFinishedSem (for present)
This ensures the producer only gets the release signal after both
the encoder and display are done reading the external image.
For optimal YCbCr input (NV12/P010), the external image now goes
directly to vkCmdEncodeVideoKHR as srcEncodeImageResource — no staging
copy, no filter, zero intermediate processing. This is correct because
the encoder's DPB (reconstructed reference frames) is managed internally
in separate dpbImageResources[]; the input image is only read once as
the source picture for that frame.
SetExternalInputFrame() now has two paths:
- Path A (optimal YCbCr): WrapExternalImage -> srcEncodeImageResource,
skip StageInputFrame, go directly to EncodeFrameCommon()
- Path B/C (linear or RGBA): WrapExternalImage -> srcStagingImageView,
go through StageInputFrame as before
SubmitVideoCodingCmds() changes:
- Enable encodeCmdBuffer's binary semaphore for Path A (was VK_NULL_HANDLE)
- Inject external wait semaphores (graph sem) when isExternalInput &&
!inputCmdBuffer (direct encode, no staging)
- Increase wait/signal max counts (4->8 wait, 1->4 signal)
- Fix duplicate signalSemaphoreInfo assignment
SubmitExternalFrame() pStagingCompleteSemaphore now returns:
- Path A: encodeCmdBuffer's semaphore (signaled when encode done reading)
- Path B/C: inputCmdBuffer's semaphore (signaled when staging done)
Sync chain for encoder service with display:
Encode (encode queue):
wait: graphSemaphore -> cmd: vkCmdEncodeVideoKHR -> signal: encodeInputDoneSem
Display (graphics queue, chained after):
wait: encodeInputDoneSem -> cmd: blit -> signal: releaseSemaphore
Updated design doc: Path A skips StageInputFrame, DPB is separate.
Critical:
- Query index (Path A external): When srcEncodeImageResource->GetImageIndex()
is negative (external pool node, m_parentIndex == -1), use query slot 0
instead of (uint32_t)-1 in SubmitVideoCodingCmds and in the result
retrieval path. Avoids invalid query pool index.
- VkImageResource CreateFromExternal null deref:
- In constructor: skip vulkanDeviceMemory->GetMemoryPropertyFlags() and
host-visible layout block when vulkanDeviceMemory is null (external
wrapper has no VulkanDeviceMemoryImpl).
- GetDeviceMemory() / GetImageDeviceMemory(): return VK_NULL_HANDLE when
m_vulkanDeviceMemory is null instead of dereferencing.
High:
- Output path: VkVideoEncoderConfig now has outputPath (const char*).
BuildEncoderConfig() adds --output and path to argv when set so
per-encoder output flows from ThreadedRenderingVk/encoder instance to
encoder library. EncoderInstance sets encConfig.outputPath in
ThreadedRenderingVk_Standalone repo.
…ernalInputFrame - Add imageTiling to VkVideoEncodeInputFrame (default VK_IMAGE_TILING_OPTIMAL) - SubmitExternalFrame uses frame.imageTiling for path selection (direct vs staging) - Fixes wrong path when external frame is LINEAR or DRM_FORMAT_MODIFIER_EXT
CreateDebugUtilsMessengerEXT and DestroyDebugUtilsMessengerEXT are extension functions and were not in the dispatch table. Load them in InitDebugReport and store in m_createDebugUtilsMessengerEXT and m_destroyDebugUtilsMessengerEXT; use these in InitDebugReport and destructor.
DecoderConfig.h: parse deviceID and CRC init values with std::from_chars. Helpers.h: parse UUID hex bytes with std::from_chars. Avoids reliance on glibc strtoul/strtoull for portability.
…bs/json - Add vk_video_encoder/json_config/ with schema, example, default JSON and defaults doc - Add EncoderConfigJsonLoader (simdjson) in vk_video_encoder/libs/json/ - VkEncoderConfig: LoadFromJsonFile(), --encoderConfig; help/docs point to json_config/
- Add vk_video_encoder/json_config/nvidia/ with preset JSONs aligned to NVIDIA Video Codec SDK tuning (High Quality, Low Latency, Ultra-low Latency, Lossless) and P1–P7 presets. - high_quality_p1..p7.json: VBR, 250 GOP, 3 B-frames, qualityPreset 1–7. - low_latency_p1..p3.json: CBR, 0 B-frames, 30 GOP. - ultra_low_latency_p1.json: CBR, minimal VBV, 15 GOP. - lossless.json: CQP with QP 0 for I/P/B. - README.md: documents tuning/preset model and usage. - PreferredSettings_extracted.md: extracted reference from PreferredSettings (2).xlsx (sheet1–4: tuning params, preset names, per-tuning settings, legacy NVENC mapping).
On Intel (vendor 0x8086), re-importing a DMA-BUF that was exported from a single-plane LINEAR image returns VK_ERROR_INVALID_EXTERNAL_HANDLE. Multi-plane LINEAR (NV12, P010) export/import works. This is a known driver limitation. - Cache physical device vendor ID in init() (m_vendorID). - In runExportImportTest(), when useLinear is true and the format is single-plane (planeCount == 1) and vendor is Intel, skip the test with message: "Intel: single-plane LINEAR DMA-BUF import returns VK_ERROR_INVALID_EXTERNAL_HANDLE (driver limitation)". Result: 12 previously failing TC3_ExportImport_*_LINEAR tests become SKIP on Intel; no failures. NV12/P010 LINEAR still pass.
Profile fixes (align with PreferredSettings / NVIDIA Video Codec SDK ToT): - Set gopLength 250, idrPeriod 250 for high-quality presets (was 60). - Map qualityPreset to SDK P1–P7 (e.g. high_quality_p4 → qualityPreset 3). - Add tuningMode to preset JSONs (highquality, lowlatency, ultralowlatency, lossless). Documentation: - Expand vk_video_encoder/json_config/nvidia/README.md with tuning/preset tables, usage, and references to PreferredSettings_extracted.md. - Add preset_review_report.md with preset-to-parameter review.
- Add scripts/run_encoder_profile_tests.py: runs all NVIDIA preset JSON configs (vk_video_encoder/json_config/nvidia/*.json) against vk-video-enc-test; supports local and SSH-remote (e.g. GPU VM), optional validation, codec/ profile filters; output format mirrors run_encoder_tests.py. - Add scripts/run_encoder_profile_tests.sh: shell wrapper for the profile runner. - Add docs/VIDEO_TEST_SUITE.md: overview of decoder/encoder test scripts and codec support, including the new encoder profile sweep. - Update common/libs/tests/drm_format_mod/docs/STATUS_REPORT.md and vk_video_encoder/json_config/nvidia/README.md with status and usage.
VIDEO_TEST_SUITE: add Encoder Test Content section note and example for ThreadedRenderingVk_Standalone/scripts/generate_encoder_yuv.sh. run_encoder_profile_tests.py: docstring points to that script for generating YUV at required resolutions and names.
…0R10) The RGBA2YCBCR shader generator always emitted separate outputImageY/Cb/Cr writes which fails for packed formats (Y410 = A2B10G10R10_UNORM_PACK32) since YcbcrVkFormatInfo() returns nullptr for non-multiplanar formats. Fix: detect packed output (outputMpInfo == nullptr) in InitRGBA2YCBCR and emit a single packed write to outputImageRGB with correct channel mapping (A2B10G10R10: R=Cr, G=Y, B=Cb, A=1). Also relax the planeNum assert in UpdateImageDescriptorSets: for packed formats with VK_IMAGE_ASPECT_COLOR_BIT at curImageAspect==0, only the combined view is used (GetImageView()), so plane count doesn't matter.
…410) When outputMpInfo is null (packed format like A2B10G10R10), the dispatch grid defaulted to chromaHorzRatio=2, chromaVertRatio=2 (4:2:0 assumption). For Y410 this is wrong — it's 4:4:4, so the ratio should be 1:1. Result was only 1/4 of the output image getting written (top-left quarter). Fix: default to ratio 1 when outputMpInfo is null. All 4 dispatch sites in the file had this bug (image→image, buffer→image, and AQ variants).
CLI -c overrides JSON codec field, so each profile JSON serves as a base config that gets tested with all 3 codecs. --codec flag filters to one. Test names now: nvidia/high_quality_p4/h264, nvidia/high_quality_p4/h265, etc. Output files: 1920x1080_420_8le_h265_nvidia_high_quality_p4.265 Tested: 13 profiles × 3 codecs = 33 passed, 2 skipped (P6/P7 unsupported).
run_decoder_roundtrip.py: decodes all bitstreams in a directory using vulkan-video-dec-test to verify encode→decode roundtrip. Discovers .264, .265, .ivf files and reports pass/fail for each. Supports --filter, --verbose, --timeout, auto-detects decoder path. Tested: 33 bitstreams (11 profiles × 3 codecs), all decoded successfully.
play_encoded_ffplay.py: plays .264/.265/.ivf files with ffplay. Supports --filter, --play (single file loop), sequential batch playback. Press Q to advance, Ctrl+C to stop.
README sections added/updated: - §14: encoder profile tests with multi-codec, auto-detect, options table - §15: decoder roundtrip (run_decoder_roundtrip.py) + visual playback (play_encoded_ffplay.py) with options tables - §16: complete end-to-end workflow (generate → verify → encode → decode → play)
Add reference to verify_yuv_ffplay.sh and encoder_yuv_generation.md docs.
Partial fix for encoder instance IPC mode where frames come from the renderer via DMA-BUF import (no input file): VkEncoderConfig.cpp: - Skip input file requirement when no -i specified (external frame mode) - Skip numFrames clamping to file frame count in external mode - Replaced assert with error return + args dump for debugging vulkan_video_encoder_ext.cpp: - Pass -i /dev/null as workaround (doesn't work - mmap fails) - Only pass --qpI/--qpP when > 0 (avoid unnecessary CQP mode) - Use 1000000 frames instead of UINT32_MAX for streaming mode - Added debug argv dump KNOWN ISSUE: ParseArguments still fails because the input file handler tries to mmap the file. Need proper refactor to separate file-based vs external frame input paths in EncoderConfig initialization.
Three bugs fixed: 1. vulkan_video_encoder_ext.cpp: Wrong CLI arg names in BuildEncoderConfig. --averageBitRate/--maxBitRate → --averageBitrate/--maxBitrate (lowercase 'r') --frameRateNum/--frameRateDen removed (no such CLI args in ParseArguments) These caused unknown positional args → DoParseArguments returns -1. 2. VkEncoderConfig.cpp: Skip input file requirement for external frame mode. When no -i is given, skip file handler mmap/validation, use --inputWidth/ --inputHeight directly. Skip numFrames clamping to file frame count. 3. VkEncoderConfig.cpp: Replace assert with error return + args dump for debugging when ParseArguments fails. Status: ParseArguments now succeeds for IPC mode. EncoderInstance initializes and receives frame FDs. Next step: encoder frame import + encode pipeline.
Identifies that InitVulkanDevice (Vulkan instance + device creation) is the failing step when the encoder library runs in the child process. BuildEncoderConfig succeeds, but InitVulkanDevice never returns.
Step-by-step stderr+fflush traces identify InitPhysicalDevice as the failing call in the encoder child process. LoadVk, InitVkInstance, InitDebugReport all succeed, but InitPhysicalDevice never returns.
VulkanVideoEncoderExt: - Fix NULL pointer crash in DeviceUuidUtils ctor (memcpy from nullptr) - Fix deviceId=0 filtering out GPUs (use -1 for auto-select) - Always request compute queue (VulkanFilter needs it) - Add GetVkDevice/GetVkPhysicalDevice/GetVkInstance to ext API - Correctly strip video bits from DRM modifier format query VkVideoEncoder: - Add DRM_FORMAT_MODIFIER_EXT to isDirectlyEncodable check - Add STORAGE_BIT to WrapExternalImage usage - Fix WrapExternalImage layout to TRANSFER_SRC_OPTIMAL for staging VulkanDeviceContext: - Add fprintf traces in InitPhysicalDevice for debugging drm_format_mod_test: - Add --video-encode and --video-decode flags - Enable VK_KHR_video_queue/encode_queue/decode_queue extensions - Add VIDEO_ENCODE_SRC/VIDEO_DECODE_DST to image creation - Skip LINEAR modifier for video (NVDEC/NVENC require tiled) - Add drm-video-issue-readme.txt documenting the driver issue
TC5 (runVideoFormatQueryTest): - Queries vkGetPhysicalDeviceVideoFormatPropertiesKHR separately for encode (VIDEO_ENCODE_SRC) and decode (VIDEO_DECODE_DST) - Uses H.264 High 4:2:0 8-bit profile for the query - Verifies the target format is in the returned list - Prints format properties (tiling, usage, flags) in verbose mode TC6 (runPlaneLayoutTest): - Creates exportable image (tiled or linear) - Queries plane layouts (offset, size, rowPitch, arrayPitch, depthPitch) - Exports DMA-BUF, imports with same parameters - Queries imported image layouts - Compares export vs import: offset, rowPitch, size must match - Validates plane offsets are increasing for multi-planar formats - Reports mismatches as FAIL Both tests are dispatched in runAllTests when --video-encode or --video-decode is specified.
importDmaBufImage was destroying the imported image and returning VK_SUCCESS with a null outImage. This caused TC6 (PlaneLayoutTest) to segfault when querying the imported image's plane layouts. Fix: wrap the imported image+memory in VkImageResource::CreateFromExternal and return it to the caller. Also add null check for dstImage before accessing it in TC6.
importDmaBufImage creates raw VkImage + VkDeviceMemory handles and wraps them in VkImageResource::CreateFromExternal (non-owning). The wrapper's destructor doesn't destroy the raw handles. Fix: track raw handles in m_importedHandles vector. The new destroyImportedImage() method looks up the handles by VkImage and destroys both image and memory. The destructor also cleans up any remaining handles. Previously leaked 8 allocations (6304 bytes), now 0 leaks.
Add VkImageResource::CreateFromImport() that takes ownership of both the VkImage and VkDeviceMemory handles. When the last VkSharedBaseObj reference drops, the handles are destroyed automatically via the existing ref-counted RAII pattern (VkVideoRefCountBase). VulkanDeviceMemoryImpl: add public constructor from pre-allocated VkDeviceMemory handle. Deinitialize() calls vkFreeMemory as usual. drm_format_mod_test: replace manual handle tracking (m_importedHandles vector + destroyImportedImage) with CreateFromImport. The imported images are now cleaned up automatically when VkSharedBaseObj goes out of scope. Zero leaks, zero manual cleanup code.
Y4M C420p16 outputs decoder P010 (MSB) as-is. No conversion needed.
VkVideoFrameToFile wrote a hardcoded F24:1 in the Y4M header regardless of the actual stream frame rate. This caused ffmpeg PSNR comparisons to fail due to timebase mismatch (e.g. 30fps stream vs 24fps Y4M). Changes: - VkVideoFrameOutput.h: add virtual SetFrameRate(num, den) with empty default so existing consumers are unaffected - VkVideoFrameToFile.cpp: store m_frameRateNum/m_frameRateDen (default 30/1), use them in Y4M header instead of hardcoded F24:1 - VulkanVideoProcessor.cpp: call SetFrameRate() with the stream's frame_rate from VkParserDetectedVideoFormat before each OutputFrame()
When encoding externally-imported frames (e.g. from DMA-BUF), the StageInputFrame() transition used VK_IMAGE_LAYOUT_UNDEFINED as the old layout. This discards image contents per the Vulkan spec and produces scrambled encoded output. Changes: - VkVideoEncodeFrameInfo: add srcExternalImageLayout field to track the layout the producer left the image in (e.g. GENERAL for compute output) - SetExternalInputFrame(): accept srcImageCurrentLayout parameter and store it in encodeFrameInfo - StageInputFrame(): use srcExternalImageLayout (not UNDEFINED) as the old layout for external inputs, preserving image contents through the transition to TRANSFER_SRC_OPTIMAL - SubmitExternalFrame(): forward frame.currentLayout to the new parameter This fixes scrambled encode output when the renderer's PostProcessFilter writes frames in VK_IMAGE_LAYOUT_GENERAL and exports them via DMA-BUF.
…aces Adds ability to create encoder input images with specific DRM format modifier tiling instead of VK_IMAGE_TILING_OPTIMAL. This reproduces the scrambled output bug seen in the renderer-encoder pipeline. Changes: - VkEncoderConfig: add drmFormatModifierIndex (-1=disabled) and selectedDrmFormatModifier fields, --drmFormatModifierIndex CLI param - VkVideoEncoder: add SelectDrmFormatModifier() that queries available modifiers with VIDEO_ENCODE_SRC usage, skips LINEAR, prints decoded NVIDIA modifier info (compression, GOB height, pageKind, etc.) - VulkanVideoImagePool::Configure: add optional drmFormatModifier param that creates images with VK_IMAGE_TILING_DRM_FORMAT_MODIFIER_EXT and proper VUID-compliant pNext chain (format list + modifier list) Test results on RTX 5080 (dev driver 610.01): - Without --drmFormatModifierIndex: PSNR ~40 dB (correct, OPTIMAL) - With --drmFormatModifierIndex 0 (compressed BL): PSNR ~11 dB (broken) - With --drmFormatModifierIndex 5 (uncompressed BL): PSNR ~11 dB (broken) - Both compressed and uncompressed block-linear produce identical garbled output, confirming the bug is in DRM modifier tiling + video encode, not specific to compression.
…xternal input Fix Bug 3 (WrapExternalImage multiplanar view assert): - UpdateImageDescriptorSets: trim validImageAspects to match the view's actual plane count. The default m_inputImageAspects includes all 3 plane bits (PLANE_0|PLANE_1|PLANE_2), but 2-plane formats like NV12/P010 only have 2 planes. The loop iterated into PLANE_2 causing the assert planeNum < imageView->GetNumberOfPlanes() to fire (2 < 2). WrapExternalImage: add MUTABLE_FORMAT_BIT, EXTENDED_USAGE_BIT, and planeUsageOverride for multiplanar per-plane storage views needed by the compute filter. StageInputFrame: skip the encoder's preprocess compute filter when isExternalInput is true. External DMA-BUF frames from the renderer are already in the target NV12 format. The compute filter would do storage reads on the imported DRM modifier image which causes GPU faults on compressed block-linear memory. TransitionImageLayout: add GENERAL->TRANSFER_SRC_OPTIMAL, UNDEFINED->GENERAL, and GENERAL->VIDEO_ENCODE_SRC_KHR transitions needed by the external input staging path.
Each passed profile now reports total time, ms/frame, and enc-fps inline. Summary includes a timing table for all passed profiles.
The encoder library's VkDevice was missing VK_KHR_external_semaphore and VK_KHR_external_semaphore_fd extensions. When the encoder service runs in headless mode, it reuses the encoder library's VkDevice for semaphore export (release semaphore sent to parent via READY handshake). Without these extensions, vkGetSemaphoreFdKHR was NULL after volkLoadDevice(), crashing the child process before READY was sent.
Extend the decoder library to support streaming decoded YCbCr surfaces to external consumers (presenters, encoders) via DMA-BUF export: - SemSyncTypeIdx: add PRESENTER/ENCODER tokens, shift 2→4 - VulkanVideoFrameBuffer: external consumer semaphore array, CPU wait before slot reuse, AddExternalConsumer(), ExportFrameCompleteSemaphoreFd() - VulkanDisplayFrame: numExternalConsumers + doneValues tracking - VkVideoDecoder: ENABLE_EXTERNAL_CONSUMER_EXPORT flag - VulkanVideoProcessor: wire enableExternalConsumerExport from config - vulkan_video_decoder: GetDevice()/GetPhysicalDevice()/GetInstance() getters, auto-create VkVideoFrameOutput from config outputFileName
…s used Set enableExternalConsumerExport=true in the --remotePresent handler so decoded images get SAMPLED_BIT + TRANSFER_SRC_BIT usage flags, required for DMA-BUF export to external presenter/encoder consumers.
Add VkExportSemaphoreCreateInfo to the frameComplete timeline semaphore creation chain so external consumers can import it via opaque FD for cross-process GPU synchronization.
Add forwarding methods through the interface chain so external decoder services can register consumer release semaphores: - VulkanVideoDecoder (interface): AddExternalConsumer, ExportFrameCompleteSemaphoreFd - VulkanVideoDecoderImpl: forward to VulkanVideoProcessor - VulkanVideoProcessor: forward to VulkanVideoFrameBuffer The frame buffer CPU-waits on registered external consumer semaphores before reusing decoded frame slots, preventing slot overwrite when the decoder runs faster than consumers can display.
…nabled When enableExternalConsumerExport is set, use VkImageResource::CreateExportable() instead of Create() for decoded frame images. This sets up: - VkExternalMemoryImageCreateInfo with DMA_BUF handle type - VkExportMemoryAllocateInfo for proper DMA-BUF export - VK_IMAGE_TILING_DRM_FORMAT_MODIFIER_EXT with selected modifier - Memory plane layouts queryable via GetMemoryPlaneLayout() Applies to DPB images in coincide mode, output images in distinct mode, and both for AV1 film grain. ImageSpec: add exportHandleTypes + exportDrmModifier fields. VulkanVideoFrameBuffer::CreateImage: use CreateExportable() when set.
Add two options for DRM modifier selection when exporting decoded surfaces to external consumers: 1. Block height: prefer smallest (default) or largest GOB height 2. Compression: prefer compressed (c>0) or uncompressed (c=0, default) If an explicit DRM modifier index is specified (--drmModifierIndex), it is used unconditionally, bypassing the preference logic. Log all available modifiers with decoded NVIDIA parameters (compression, block height, plane count, features). DecoderConfig: exportPreferCompressed, exportPreferSmallestBlockHeight VkVideoDecoder: SetExportPreferences(), m_exportDrmModifierIndex
Consolidate DRM format modifier handling that was duplicated across the
encoder (VkVideoEncoder.cpp), decoder (VkVideoDecoder.cpp), and the
ThreadedRenderingVk pipeline (ExternalMemory.h) into a single shared
header-only utility class.
New file: common/libs/VkCodecUtils/VkDrmFormatModifierUtils.h
The class provides:
Static methods (all platforms):
- Vendor-aware modifier decoding (NVIDIA, AMD, Intel, ARM, QCOM)
- NVIDIA block-linear field extraction: block height, page kind,
generation, sector layout, compression type
- PrintModifierInfo() and ModifierToString() for debug output
- IsLinear(), IsCompressed() with vendor-aware detection
Instance methods (Linux only, guarded with #ifdef __linux__):
- QueryModifiers(): enumerate DRM modifiers via
vkGetPhysicalDeviceFormatProperties2 with
VkDrmFormatModifierPropertiesListEXT
- SelectModifier(): select best modifier with configurable preferences
for block height (smallest/largest), compression (prefer/avoid),
explicit index override, and linear fallback
- DumpAvailableModifiers(): debug dump with per-modifier feature flags
and NVIDIA field decoding
Encoder changes (VkVideoEncoder.cpp):
- Remove duplicated PrintNvidiaDrmModifierInfo() static function
- Replace SelectDrmFormatModifier() body with VkDrmFormatModifierUtils
calls (DumpAvailableModifiers + SelectModifier + PrintModifierInfo)
- Guard function body with #ifdef __linux__
Decoder changes (VkVideoDecoder.cpp):
- Remove duplicated inline lambdas (getNvModCompression,
getNvModBlockHeight), ModCandidate struct, and sorting logic
- Replace ~120 lines of modifier selection with VkDrmFormatModifierUtils
calls (~20 lines)
- Guard external consumer export block with #ifdef __linux__
Tested on NVIDIA GeForce RTX 5080 (VM):
- DRM format modifier tests: 123 passed, 0 failed (default)
- DRM format modifier tests: 131 passed, 0 failed (compression)
- Decoder service tests: 50/50 PASS (H.264, H.265, AV1, all resolutions)
- DRM modifier cycling (--cycle-drm-modifiers): 8/8 PASS
VkVideoDecoder's constructor initialises m_enableExportPreferCompressed to VK_TRUE, expressing the intent that L2-compressed DRM modifiers are preferred for DMA-BUF export. However, VulkanVideoProcessor always calls SetExportPreferences(programConfig.exportPreferCompressed, ...) immediately after construction, overriding that default. Because DecoderConfig::Reset() initialised exportPreferCompressed to false, and no command-line argument ever sets it to true, m_enableExportPreferCompressed was silently forced to VK_FALSE on every run. Result: SelectModifier picked the c=0 (uncompressed) block-linear modifier even though compressed variants with identical block height were available and supported — confirmed by the dump output showing modifier [5] (c=0,h=1) selected instead of [0] (c=1,h=1). Fix: align the DecoderConfig default with the VkVideoDecoder constructor intent by initialising exportPreferCompressed = true.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.