Skip to content

Conversation

@JanuszL
Copy link
Contributor

@JanuszL JanuszL commented Oct 24, 2025

  • Removes TextureObject class and related texture management code
  • Replaces texture-based convert_frame with direct VideoColorSpaceConversion
  • Removes textures_ member and get_textures method
  • Unifies the video reader with the experimental one by removing texture usage
    which don't provide much gain in this use case

Category:

Other (e.g. Documentation, Tests, Configuration)

Description:

  • Removes TextureObject class and related texture management code
  • Replaces texture-based convert_frame with direct VideoColorSpaceConversion
  • Removes textures_ member and get_textures method
  • Unifies the video reader with the experimental one by removing texture usage
    which don't provide much gain in this use case

Additional information:

Affected modules and functionalities:

  • video reader

Key points relevant for the review:

  • argument passed to the VideoColorSpaceConversion function

Tests:

  • Existing tests apply
    • VideoReaderTest_readers__Video
    • test_video_reader.py
    • test_video_pipeline.py
    • test_video_reader_resize.py
  • New tests added
    • Python tests
    • GTests
    • Benchmark
    • Other
  • N/A

Checklist

Documentation

  • Existing documentation applies
  • Documentation updated
    • Docstring
    • Doxygen
    • RST
    • Jupyter
    • Other
  • N/A

DALI team only

Requirements

  • Implements new requirements
  • Affects existing requirements
  • N/A

REQ IDs: N/A

JIRA TASK: DALI-4474

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR refactors the legacy NvDecoder video reader by removing texture-based frame processing and replacing it with direct VideoColorSpaceConversion calls. The TextureObject class and all texture caching infrastructure have been eliminated, simplifying the codebase and aligning the legacy reader with the experimental video reader implementation. The convert_frame method has been inlined into receive_frames, eliminating the indirection through process_frame and the textures_ map that cached CUDA texture objects for luma and chroma planes. This change maintains functionality while reducing code complexity by approximately 100 lines across the header and implementation files.

Important Files Changed

Filename Score Overview
dali/operators/video/legacy/reader/nvdecoder/nvdecoder.h 2/5 Removed TextureObject class, texture management structs, and get_textures method; orphaned TexID and tex_hash remain unused
dali/operators/video/legacy/reader/nvdecoder/nvdecoder.cc 2/5 Inlined color space conversion into receive_frames with hardcoded stride calculation that breaks non-RGB modes

Confidence score: 2/5

  • This PR contains a critical bug in the stride calculation and incomplete cleanup that will cause failures in production
  • Score reflects hardcoded stride sequence.width * 3 that assumes 3 channels (RGB only) instead of using sequence.channels, breaking YUV and other non-RGB output modes; unused TexID/tex_hash definitions suggest incomplete refactoring; missing full-range handling for non-RGB conversions in lines 389-392
  • Pay close attention to dali/operators/video/legacy/reader/nvdecoder/nvdecoder.cc lines 398 and 389-392, and verify whether TexID/tex_hash in the header file can be safely removed

Additional Comments (1)

  1. dali/operators/video/legacy/reader/nvdecoder/nvdecoder.h, line 179-200 (link)

    style: TexID and tex_hash are now unused after texture removal but remain in the header. Are TexID and tex_hash still needed for any other purpose, or should they be removed along with the texture-related code?

2 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

TYPE_SWITCH(dtype_, type2id, OutType, NVDECODER_SUPPORTED_TYPES, (
auto* tensor_out = sequence.sequence.mutable_data<OutType>() + frame_stride;
VideoColorSpaceConversion(
reinterpret_cast<OutType *>(tensor_out), sequence.width * 3,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Stride calculation uses sequence.width * 3 but should use sequence.width * sequence.channels to match the frame_stride calculation pattern

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

auto conversion_type = rgb_ ?
is_full_range ? VIDEO_COLOR_SPACE_CONVERSION_TYPE_YUV_TO_RGB_FULL_RANGE :
VIDEO_COLOR_SPACE_CONVERSION_TYPE_YUV_TO_RGB :
VIDEO_COLOR_SPACE_CONVERSION_TYPE_YUV_UPSAMPLE;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: The conversion_type ternary doesn't cover the YUV case when !rgb_ with full_range=true. Should full_range affect YUV_UPSAMPLE? Should VIDEO_COLOR_SPACE_CONVERSION_TYPE_YUV_UPSAMPLE have a full-range variant?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is intended. is_full_range is applicable only to the RGB onversion.

stream_);
), DALI_FAIL(make_string("Unsupported type: ", dtype_)));
// synchronize before MappedFrame is destroyed and cuvidUnmapVideoFrame is called
CUDA_CALL(cudaStreamSynchronize(stream_));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Stream synchronization happens inside the loop for each frame. Consider moving to line 421 after all frames are processed to improve performance

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is intended, as frames are mapped separately for each iteration and the synchronization should happen before we call cuvidUnmapVideoFrame at the end of it.

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [37191286]: BUILD STARTED

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR refactors the legacy video decoder by eliminating the texture-based frame processing pipeline in favor of direct color space conversion. The change removes the TextureObject class (which wrapped CUDA texture management), the convert_frame method that performed texture-based conversions, and the textures_ map that cached texture objects keyed by input pointer and dimensions. Instead, receive_frames() now calls VideoColorSpaceConversion directly on the raw decoded frame data. This simplifies the codebase by removing an intermediate caching layer and aligns the legacy reader with the experimental video reader, where texture usage provided minimal performance benefit given the single-use nature of video frames.

Important Files Changed

Filename Score Overview
dali/operators/video/legacy/reader/nvdecoder/nvdecoder.h 4/5 Removed TextureObject class, convert_frame/get_textures methods, and textures_ member; left behind unused TexID typedef, tex_hash struct, and std::hash specialization that should be cleaned up
dali/operators/video/legacy/reader/nvdecoder/nvdecoder.cc 3/5 Replaced texture-based conversion with direct VideoColorSpaceConversion calls; stride calculation uses hardcoded * 3 instead of * sequence.channels, and stream synchronization moved inside frame loop instead of after all frames

Confidence score: 3/5

  • This PR simplifies the video decoder but introduces potential memory corruption and performance issues that may surface in production
  • Score reflects three critical issues: (1) hardcoded stride multiplier causing buffer mismatch for non-RGB formats, (2) missing full-range handling for YUV upsample conversion type, (3) redundant per-frame stream synchronization
  • Pay close attention to dali/operators/video/legacy/reader/nvdecoder/nvdecoder.cc lines 398 and 407-408 where stride calculation and synchronization logic need correction

Sequence Diagram

sequenceDiagram
    participant User
    participant NvDecoder
    participant Parser as CUVideoParser
    participant Decoder as CUVideoDecoder
    participant ColorSpace as VideoColorSpaceConversion
    participant FrameQueue as ThreadSafeQueue

    User->>NvDecoder: push_req(FrameReq)
    activate NvDecoder
    NvDecoder->>FrameQueue: push(req)
    deactivate NvDecoder

    User->>NvDecoder: decode_packet(AVPacket)
    activate NvDecoder
    NvDecoder->>NvDecoder: decode_av_packet()
    NvDecoder->>Parser: cuvidParseVideoData()
    activate Parser
    Parser-->>NvDecoder: handle_sequence(CUVIDEOFORMAT)
    NvDecoder->>Decoder: initialize(format)
    Parser-->>NvDecoder: handle_decode(CUVIDPICPARAMS)
    NvDecoder->>Decoder: cuvidDecodePicture()
    Parser-->>NvDecoder: handle_display(CUVIDPARSERDISPINFO)
    NvDecoder->>FrameQueue: push(disp_info)
    deactivate Parser
    deactivate NvDecoder

    User->>NvDecoder: receive_frames(SequenceWrapper)
    activate NvDecoder
    loop for each frame
        NvDecoder->>FrameQueue: pop()
        FrameQueue-->>NvDecoder: disp_info
        NvDecoder->>Decoder: cuvidMapVideoFrame()
        Decoder-->>NvDecoder: MappedFrame
        NvDecoder->>ColorSpace: VideoColorSpaceConversion()
        activate ColorSpace
        ColorSpace->>ColorSpace: YUV to RGB / Upsample
        ColorSpace-->>NvDecoder: converted frame
        deactivate ColorSpace
        NvDecoder->>NvDecoder: cudaStreamSynchronize()
        NvDecoder->>Decoder: cuvidUnmapVideoFrame()
    end
    NvDecoder->>NvDecoder: record_sequence_event_()
    deactivate NvDecoder
    NvDecoder-->>User: SequenceWrapper with frames
Loading

Additional Comments (1)

  1. dali/operators/video/legacy/reader/nvdecoder/nvdecoder.h, line 179-200 (link)

    style: TexID, tex_hash, and the hash template specialization (lines 179-220) are now unused after removing textures. Consider removing this dead code.

2 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR removes texture-based video frame processing from the legacy NvDecoder implementation, replacing it with direct VideoColorSpaceConversion calls. The change eliminates the TextureObject class, texture caching (textures_ member), and associated methods (get_textures, convert_frame), simplifying the video decode pipeline to align with the experimental video reader implementation. The refactor preserves existing color space conversion functionality (YUV to RGB/upsampling with full-range support) while removing an intermediate texture layer that the developers determined provided insufficient performance benefit. This unification reduces code complexity in the legacy video reader path within the DALI data loading library.

Important Files Changed

Filename Score Overview
dali/operators/video/legacy/reader/nvdecoder/nvdecoder.h 1/5 Removed TextureObject class, texture management methods, and textures_ member, but left dangling tex_hash struct that references undefined TexID type causing compilation errors
dali/operators/video/legacy/reader/nvdecoder/nvdecoder.cc 2/5 Replaced texture-based conversion with direct VideoColorSpaceConversion call in receive_frames, but stride parameter may be incorrect (elements vs bytes), and synchronization happens inefficiently inside the frame loop

Confidence score: 2/5

  • This PR contains critical compilation errors and potential runtime issues that make it unsafe to merge in its current state
  • Score reduced because (1) the header leaves orphaned tex_hash struct referencing removed TexID type breaking compilation, (2) stride calculation may pass wrong units to VideoColorSpaceConversion, (3) stream synchronization inefficiently placed inside loop, and (4) full-range YUV upsample conversion path may be incomplete
  • Pay close attention to dali/operators/video/legacy/reader/nvdecoder/nvdecoder.h lines 179-198 (dangling tex_hash struct), nvdecoder.cc lines 394-395 (stride parameter calculation), and lines 387-392 (conversion_type logic for YUV full-range case)

Additional Comments (2)

  1. dali/operators/video/legacy/reader/nvdecoder/nvdecoder.h, line 179-198 (link)

    style: The entire tex_hash struct and its hash_combine implementation are now unused. They were only needed for the deleted std::unordered_map<TexID, TextureObjects, tex_hash> textures_ member. Remove this dead code.

  2. dali/operators/video/legacy/reader/nvdecoder/nvdecoder.h, line 188 (link)

    logic: TexID type is referenced here but is no longer defined anywhere in the file. This will cause a compilation error.

2 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR refactors the NvDecoder video reader by eliminating texture-based frame conversion in favor of direct VideoColorSpaceConversion calls. The TextureObject class, texture caching map (textures_), and the convert_frame method have been removed. The new implementation directly calls VideoColorSpaceConversion with raw device pointers in the receive_frames loop, passing full-range metadata stored in the frame_full_range_ vector. This change unifies the legacy video reader with the experimental reader by removing a layer of indirection that provided minimal performance benefit.

Important Files Changed

Filename Score Overview
dali/operators/video/legacy/reader/nvdecoder/nvdecoder.h 4/5 Removed TextureObject class, texture map member, and get_textures method; cleaned up texture-related includes and hash specialization
dali/operators/video/legacy/reader/nvdecoder/nvdecoder.cc 2/5 Replaced texture-based convert_frame with inline VideoColorSpaceConversion calls; contains stride calculation bug and incomplete full-range logic

Confidence score: 2/5

  • This PR has critical bugs that will likely cause runtime errors and incorrect output for certain configurations
  • The stride calculation error (line 398) uses hardcoded 3 instead of sequence.channels, which will break non-RGB output modes. The incomplete full-range logic (lines 389-392) means YUV output with full-range color won't be handled correctly. Stream synchronization inside the frame loop (line 413) is inefficient compared to synchronizing once after all frames
  • Pay close attention to dali/operators/video/legacy/reader/nvdecoder/nvdecoder.cc lines 386-421, especially the stride calculation, conversion_type ternary, and stream synchronization placement

Additional Comments (1)

  1. dali/operators/video/legacy/reader/nvdecoder/nvdecoder.h, line 190-199 (link)

    style: ScaleMethod enum and its hash specialization are now unused after texture removal and can be deleted

2 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@JanuszL JanuszL force-pushed the remove_textures branch 2 times, most recently from ce64217 to f02e194 Compare October 24, 2025 15:14
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR removes the texture-based video frame processing layer from the NvDecoder implementation in DALI's legacy video reader. The change eliminates approximately 90 lines of texture management infrastructure—including the TextureObject class, texture caching (textures_ map), and the convert_frame method—replacing it with direct calls to VideoColorSpaceConversion. This unification aligns the legacy reader with the experimental one by removing an abstraction layer that profiling showed provided minimal performance benefit. The core color space conversion logic is preserved but moved inline into the receive_frames method, where conversion type (full-range YUV-to-RGB, standard YUV-to-RGB, or YUV upsampling) is now determined directly based on the rgb_ flag and per-frame full_range metadata.

Important Files Changed

Filename Score Overview
dali/operators/video/legacy/reader/nvdecoder/nvdecoder.h 3/5 Removes TextureObject/TextureObjects classes, texture caching infrastructure (TexID, tex_hash), textures_ map, get_textures and convert_frame methods; leaves unused includes and ScaleMethod enum
dali/operators/video/legacy/reader/nvdecoder/nvdecoder.cc 2/5 Replaces texture-based conversion with direct VideoColorSpaceConversion calls; moves conversion type logic inline; retains per-frame cudaStreamSynchronize inside loop

Confidence score: 2/5

  • This PR has significant correctness and performance concerns that should be addressed before merging
  • Score reduced due to: (1) stride parameter mismatch between comment and code (line 398uses sequence.width * sequence.channels but historical comment suggests sequence.width * 3), (2) incomplete conversion_type logic that doesn't handle full-range YUV_UPSAMPLE case when !rgb_ and is_full_range=true, (3) per-frame synchronization in loop (line 407) causing performance overhead, (4) unused artifacts (ScaleMethod enum, tuple/unordered_map includes) left in header suggesting incomplete cleanup
  • Critical attention needed: stride calculation at line 398 in nvdecoder.cc, conversion_type ternary logic at lines 392-395, and synchronization placement at line 407

2 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@JanuszL
Copy link
Contributor Author

JanuszL commented Oct 24, 2025

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [37212149]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [37212149]: BUILD FAILED

@JanuszL JanuszL marked this pull request as draft October 25, 2025 11:19
@JanuszL JanuszL force-pushed the remove_textures branch 2 times, most recently from 3960235 to 68b46b6 Compare October 25, 2025 17:08
@JanuszL
Copy link
Contributor Author

JanuszL commented Oct 25, 2025

!build

1 similar comment
@JanuszL
Copy link
Contributor Author

JanuszL commented Oct 25, 2025

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [37259971]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [37259971]: BUILD FAILED

@JanuszL
Copy link
Contributor Author

JanuszL commented Oct 26, 2025

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [37290525]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [37290525]: BUILD FAILED

@JanuszL
Copy link
Contributor Author

JanuszL commented Oct 26, 2025

!build

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [37301989]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [37301989]: BUILD FAILED

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR removes texture-based video processing from the NvDecoder implementation, replacing it with direct VideoColorSpaceConversion calls. The refactoring deletes the TextureObject class and all associated texture management infrastructure (caching, hash maps, get_textures method), simplifying the video reader by eliminating ~300 lines of texture-specific code. The legacy reader now directly passes raw frame pointers to VideoColorSpaceConversion, matching the experimental reader's approach. The chroma sampling offset in color_space.cu was adjusted from 0.25 to 0.5, and the CPU decoder path received additional SwsContext quality flags (SWS_FULL_CHR_H_INT, SWS_ACCURATE_RND) to compensate for visual differences after texture removal. Test utilities were also updated with clearer error messages.

Important Files Changed

Filename Score Overview
dali/operators/video/legacy/reader/nvdecoder/nvdecoder.cc 4/5 Removed texture caching and convert_frame method; replaced with inline VideoColorSpaceConversion call using raw frame pointers
dali/operators/video/legacy/reader/nvdecoder/nvdecoder.h 5/5 Deleted TextureObject class, hash specializations, and texture-related members/methods
dali/operators/video/color_space.cu 3/5 Changed chroma sampling offset from 0.25 to 0.5 and refactored to use float intermediates with normalized input/output paths
dali/operators/video/legacy/reader/nvdecoder/imgproc.cu 5/5 Entire file deleted (245 lines of texture-based YCbCr-to-RGB conversion kernels)
dali/operators/video/legacy/reader/nvdecoder/imgproc.h 5/5 Entire file deleted (header for texture-based process_frame function)
dali/operators/video/frames_decoder_cpu.cc 4/5 Added SWS_FULL_CHR_H_INT and SWS_ACCURATE_RND flags to improve CPU color conversion quality
dali/test/python/test_video_reader.py 5/5 Updated comments and error messages for better readability; no functional changes

Confidence score: 3/5

  • This PR requires careful review due to semantic changes in the color space conversion path and potential visual output differences
  • Score reflects concerns about the chroma sampling offset change (0.25→0.5) which could affect video quality if it doesn't match codec chroma siting conventions, plus the incomplete handling of full-range YUV conversion mentioned in previous reviews, and the performance impact of per-frame stream synchronization inside the loop
  • Pay close attention to dali/operators/video/color_space.cu (verify chroma sampling correctness and visual output), dali/operators/video/legacy/reader/nvdecoder/nvdecoder.cc (confirm conversion_type logic handles all cases and stream sync placement is optimal), and validate that existing tests thoroughly cover all color space conversion paths with both normalized/non-normalized and full-range/limited-range inputs

Sequence Diagram

sequenceDiagram
    participant User
    participant NvDecoder
    participant CUVideoParser
    participant CUVideoDecoder
    participant MappedFrame
    participant VideoColorSpaceConversion
    participant CUDA

    User->>NvDecoder: "decode_av_packet(AVPacket)"
    NvDecoder->>CUVideoParser: "cuvidParseVideoData(CUVIDSOURCEDATAPACKET)"
    CUVideoParser->>NvDecoder: "handle_sequence_(CUVIDEOFORMAT)"
    NvDecoder->>CUVideoDecoder: "initialize(format)"
    CUVideoParser->>NvDecoder: "handle_decode_(CUVIDPICPARAMS)"
    NvDecoder->>CUVideoDecoder: "cuvidDecodePicture(pic_params)"
    CUVideoParser->>NvDecoder: "handle_display_(CUVIDPARSERDISPINFO)"
    NvDecoder->>NvDecoder: "frame_queue_.push(disp_info)"
    
    User->>NvDecoder: "receive_frames(SequenceWrapper)"
    NvDecoder->>NvDecoder: "frame_queue_.pop()"
    NvDecoder->>MappedFrame: "MappedFrame(disp_info, decoder, stream)"
    MappedFrame->>CUVideoDecoder: "cuvidMapVideoFrame(decoder, picture_index)"
    NvDecoder->>VideoColorSpaceConversion: "VideoColorSpaceConversion(output, yuv_data, params)"
    VideoColorSpaceConversion->>CUDA: "VideoColorSpaceConversionKernel<<<...>>>()"
    CUDA->>CUDA: "YUV to RGB conversion"
    NvDecoder->>CUDA: "cudaStreamSynchronize(stream)"
    MappedFrame->>CUVideoDecoder: "cuvidUnmapVideoFrame(decoder, ptr)"
    NvDecoder->>SequenceWrapper: "return decoded frames"
Loading

7 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Signed-off-by: Michal Zientkiewicz <[email protected]>
Height(),
sws_output_format,
SWS_BILINEAR,
SWS_BILINEAR|SWS_FULL_CHR_H_INT|SWS_ACCURATE_RND,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This caused inaccurate results.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any performance implications of this change? How inaccurate were the results?

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This review covers only the changes made since the last review, not the entire PR. The most recent change is a minor formatting adjustment to test_video_reader.py. The developer collapsed a multi-line compare_frames function call into a single line, bringing it into compliance with the repository's 100-character line limit. This is purely a stylistic change with no functional impact, likely triggered by an automated formatter like Black (as configured in pyproject.toml). The change aligns with DALI's coding standards and supports better readability of the test file.

Important Files Changed

Filename Score Overview
dali/test/python/test_video_reader.py 5/5 Collapsed multi-line function call to single line for style compliance

Confidence score: 5/5

  • This change is safe to merge with no risk of issues
  • Score reflects a trivial formatting change with zero functional impact and full alignment with project style guidelines
  • No files require special attention; this is a simple whitespace adjustment

1 file reviewed, no comments

Edit Code Review Agent Settings | Greptile

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [37437527]: BUILD STARTED

Signed-off-by: Michal Zientkiewicz <[email protected]>
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This summary covers only the changes made since the last review. The PR modifies a single line in the CUDA color space conversion kernel (dali/operators/video/color_space.cu), adjusting the chroma sampling offset from 0.5f to 0.25f for both the X and Y axes. This change corrects chroma siting alignment when upsampling YUV 4:2:0 to 4:4:4 during color space conversion. The offset adjustment repositions where the bilinear sampler reads chroma values, likely switching from one chroma siting convention (e.g., MPEG-1 center-positioned) to another (e.g., MPEG-2 co-sited or JPEG left-positioned). This is part of the broader effort to remove texture-based video processing and switch to direct VideoColorSpaceConversion—the texture path may have handled chroma positioning differently, necessitating this correction to maintain visual quality and avoid color fringing artifacts in the new direct-conversion path.

Important Files Changed

Filename Score Overview
dali/operators/video/color_space.cu 3/5 Modified chroma sampling offset from 0.5f to 0.25f (lines 50, 53) to correct chroma siting alignment during YUV to RGB conversion

Confidence score: 3/5

  • This PR contains a targeted chroma siting adjustment that is likely safe but requires validation against video codec standards and visual quality tests
  • Score reflects uncertainty around whether 0.25f is the correct offset for all YUV input formats and chroma siting conventions, and whether this change inadvertently introduces artifacts or requires corresponding updates elsewhere in the color conversion pipeline
  • Pay close attention to the chroma offset calculation logic (lines 50-53) and verify this change against test videos with various chroma siting standards (MPEG-1, MPEG-2, JPEG); also confirm the change doesn't conflict with the previously identified stride bug and missing full-range conversion case

1 file reviewed, no comments

Edit Code Review Agent Settings | Greptile

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [37442117]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [37442117]: BUILD FAILED

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR updates the test infrastructure to use PyAV instead of OpenCV for video frame extraction, ensuring consistency with the main codebase's use of libswscale.

Key Changes:

  • Migrated extract_frames_from_video() from OpenCV's VideoCapture to PyAV's container decoder
  • Added PyAV (av) as a dependency in both TL0 and TL1 test scripts
  • Applied matching libswscale interpolation flags (SWS_BILINEAR|SWS_FULL_CHR_H_INT|SWS_ACCURATE_RND) to ensure pixel-perfect comparison with DALI's decoder

Critical Issue:

  • DALI_EXTRA_VERSION contains placeholder "FixMe" instead of valid git commit hash

Confidence Score: 0/5

  • This PR cannot be merged - it will break the build
  • The DALI_EXTRA_VERSION file contains "FixMe" placeholder instead of a valid git commit hash. CMake reads this file to fetch the correct test data version, so this will cause build failures. The test code changes are well-implemented, but this blocking issue prevents merge.
  • DALI_EXTRA_VERSION requires immediate fix - must contain valid git commit hash

Important Files Changed

File Analysis

Filename Score Overview
DALI_EXTRA_VERSION 0/5 Version file changed to placeholder "FixMe" - critical blocker that will break build
dali/test/python/decoder/test_video.py 4/5 Refactored video frame extraction from OpenCV to PyAV with matching interpolation flags
qa/TL0_python-self-test-readers-decoders/test_nofw.sh 5/5 Added av package to pip dependencies
qa/TL1_python-self-test_conda/test_nofw.sh 5/5 Added av package to pip dependencies

Sequence Diagram

sequenceDiagram
    participant Test as test_video.py
    participant PyAV as PyAV Library
    participant FFmpeg as libswscale (FFmpeg)
    participant Video as Video File
    
    Test->>PyAV: av.open(video_path)
    PyAV->>Video: Open video container
    Video-->>PyAV: Container with streams
    
    Test->>PyAV: container.decode(video_stream)
    loop For each frame in sequence
        PyAV->>Video: Read encoded frame
        Video-->>PyAV: Encoded frame data
        PyAV->>PyAV: Decode frame
        
        alt Frame matches start/stride/end criteria
            Test->>PyAV: frame.to_ndarray(format="rgb24", interpolation=flags)
            PyAV->>FFmpeg: Convert with SWS_BILINEAR|SWS_FULL_CHR_H_INT|SWS_ACCURATE_RND
            FFmpeg-->>PyAV: RGB24 array
            PyAV-->>Test: numpy ndarray
            Test->>Test: Append to frames list
        else Skip frame
            Test->>Test: Continue to next frame
        end
    end
    
    Test->>PyAV: container.close()
    PyAV->>Video: Close container
Loading

4 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@@ -1 +1 @@
1ffbeaf1d085bb00f124038503508b3cb68e1a05
FixMe No newline at end of file
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Replace "FixMe" with actual git commit hash from dali-extra repository. This file is used by CMake to determine the correct version of test data.

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [37585700]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [37585700]: BUILD FAILED

Signed-off-by: Janusz Lisiecki <[email protected]>
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR removes texture-based video processing from the NvDecoder, simplifying the codebase by unifying the legacy video reader with the experimental one. The TextureObject class and related texture management code (~519 lines) have been removed in favor of direct VideoColorSpaceConversion calls.

Key changes:

  • Removed TextureObject class, get_textures() method, and imgproc.cu/imgproc.h files
  • Replaced texture-based convert_frame() with direct VideoColorSpaceConversion() calls in nvdecoder.cc
  • Updated color_space.cu to perform conversions in floating-point space before quantization, improving precision
  • Refactored test utilities to use PyAV instead of OpenCV for more accurate frame extraction with correct libswscale interpolation flags
  • Added PyAV (av) as a test dependency across QA scripts

Issues found:

  • DALI_EXTRA_VERSION contains placeholder "FixMe" instead of a valid commit hash (flagged in previous comment)

Confidence Score: 0/5

  • Cannot merge - DALI_EXTRA_VERSION contains invalid placeholder value
  • The DALI_EXTRA_VERSION file contains "FixMe" instead of a valid git commit hash from the dali-extra repository. This file is used by CMake to determine the correct version of test data, and the placeholder will cause build failures. This critical issue was already flagged in previous comments but remains unresolved.
  • DALI_EXTRA_VERSION must be updated with a valid commit hash before merge

Important Files Changed

File Analysis

Filename Score Overview
DALI_EXTRA_VERSION 0/5 Changed from valid commit hash to "FixMe" placeholder - breaks CMake build
dali/test/python/decoder/test_video.py 5/5 Refactored to use PyAV instead of OpenCV for more accurate video decoding with proper interpolation flags
qa/TL0_multigpu/test_nofw.sh 5/5 Added av package dependency for PyAV support in tests
qa/TL0_python-self-test-readers-decoders/test_nofw.sh 5/5 Added av package dependency for PyAV support in tests
qa/TL1_python-self-test_conda/test_nofw.sh 5/5 Added av package dependency for PyAV support in tests

Sequence Diagram

sequenceDiagram
    participant VR as VideoReader
    participant ND as NvDecoder
    participant CU as CUDA Decoder
    participant CS as VideoColorSpaceConversion
    participant OUT as Output Buffer

    Note over VR,OUT: Before (Texture-based)
    VR->>ND: receive_frames(sequence)
    ND->>CU: cuvidMapVideoFrame()
    CU-->>ND: NV12 frame pointer
    ND->>ND: get_textures(frame, pitch, width, height)
    Note over ND: Creates cudaTextureObject_t<br/>for luma and chroma
    ND->>ND: process_frame(textures, sequence, idx)
    Note over ND: Texture-based conversion<br/>in imgproc.cu
    ND->>OUT: Write RGB/YUV output
    ND->>CU: cuvidUnmapVideoFrame()

    Note over VR,OUT: After (Direct conversion)
    VR->>ND: receive_frames(sequence)
    ND->>CU: cuvidMapVideoFrame()
    CU-->>ND: NV12 frame pointer
    ND->>CS: VideoColorSpaceConversion(out_ptr, in_ptr, pitch, dims, type)
    Note over CS: Direct YUV->RGB conversion<br/>in float space, then quantize
    CS->>OUT: Write RGB/YUV output
    ND->>CU: cuvidUnmapVideoFrame()
Loading

5 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [37602674]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [37617418]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [37602674]: BUILD FAILED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [37617418]: BUILD FAILED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [37644055]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [37666336]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [37644055]: BUILD FAILED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [37666336]: BUILD FAILED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [37678030]: BUILD STARTED

@dali-automaton
Copy link
Collaborator

CI MESSAGE: [37678030]: BUILD PASSED

Signed-off-by: Janusz Lisiecki <[email protected]>
@dali-automaton
Copy link
Collaborator

CI MESSAGE: [37810427]: BUILD STARTED

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR successfully removes texture-based video processing in NvDecoder, replacing it with direct VideoColorSpaceConversion calls. The refactoring unifies the legacy video reader with the experimental implementation.

Key changes:

  • Removed TextureObject class and all texture management infrastructure (caching, texture creation)
  • Replaced convert_frame() method with inline VideoColorSpaceConversion call in receive_frames()
  • Fixed quantization issues in color_space.cu by converting to float before color space conversion
  • Updated libswscale to use SWS_FULL_CHR_H_INT|SWS_ACCURATE_RND flags for better quality
  • Switched test reference extraction from OpenCV to PyAV to match decoder behavior
  • Fixed typo in test_video_reader.py (additonal_args -> additional_args)

Previous comments addressed:
All previously identified issues have been resolved in this commit. The DALI_EXTRA_VERSION now contains a valid commit hash, and the conversion logic properly handles all color space cases.

Confidence Score: 5/5

  • This PR is safe to merge - well-tested refactoring with quality improvements
  • The refactoring removes ~500 lines of complex texture management code while improving video quality by avoiding intermediate quantization. All existing tests pass (VideoReaderTest, test_video_reader.py, test_video_pipeline.py, test_video_reader_resize.py). The changes fix the quantization issue identified in previous reviews and properly align test infrastructure with decoder behavior.
  • No files require special attention - all changes are straightforward refactoring with improved implementation

Important Files Changed

File Analysis

Filename Score Overview
dali/operators/video/color_space.cu 5/5 Fixed quantization by converting to float before color conversion and using ConvertSatNorm - improves quality
dali/operators/video/legacy/reader/nvdecoder/imgproc.cu 5/5 Deleted texture-based processing code - replaced with direct VideoColorSpaceConversion
dali/operators/video/legacy/reader/nvdecoder/nvdecoder.cc 4/5 Replaced texture-based convert_frame with direct VideoColorSpaceConversion call - removed TextureObject class and texture caching
dali/test/python/decoder/test_video.py 5/5 Replaced OpenCV with PyAV for reference frame extraction to match libswscale flags used in decoder

Sequence Diagram

sequenceDiagram
    participant Client as Video Reader
    participant NvDecoder
    participant MappedFrame
    participant VideoColorSpace as VideoColorSpaceConversion
    participant Output as SequenceWrapper
    
    Client->>NvDecoder: receive_frames(sequence)
    loop For each frame
        NvDecoder->>MappedFrame: Create MappedFrame from decoder
        NvDecoder->>NvDecoder: Determine conversion_type<br/>(RGB full/limited range or YUV)
        NvDecoder->>VideoColorSpace: VideoColorSpaceConversion()<br/>(direct YUV to RGB)
        Note over VideoColorSpace: Converts uint8 to float<br/>Applies color matrix<br/>Handles normalization
        VideoColorSpace->>Output: Write converted pixels
        NvDecoder->>NvDecoder: cudaStreamSynchronize()
        NvDecoder->>MappedFrame: Destroy (unmaps frame)
    end
    NvDecoder->>Client: Return completed sequence
    
    Note over NvDecoder,VideoColorSpace: OLD: Used texture objects for sampling<br/>NEW: Direct memory access with VideoColorSpaceConversion
Loading

1 file reviewed, no comments

Edit Code Review Agent Settings | Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants