Skip to content

Reduce the diff with the pytorch:main branch#3

Open
gouzil wants to merge 5 commits intoPFCCLab:paddlefrom
gouzil:feat/reduce_diff
Open

Reduce the diff with the pytorch:main branch#3
gouzil wants to merge 5 commits intoPFCCLab:paddlefrom
gouzil:feat/reduce_diff

Conversation

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to reduce divergence from the upstream PyTorch torchcodec main branch by aligning several C++ core behaviors and operator registrations.

Changes:

  • Implemented previously stubbed/unimplemented paths for audio resampler flush (maybeFlushSwrBuffers) and tensor-backed AVIO output (AVIOToTensorContext::getOutputTensor).
  • Adjusted custom op registration blocks (including adding a BackendSelect registration block) and minor include/formatting alignment.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
src/torchcodec/_core/SingleStreamDecoder.cpp Implements returning flushed resampled audio samples instead of throwing.
src/torchcodec/_core/custom_ops.cpp Reorders includes and changes TORCH_LIBRARY_IMPL registrations (adds BackendSelect).
src/torchcodec/_core/AVIOTensorContext.cpp Implements returning the encoded output tensor slice instead of throwing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -1419,11 +1419,8 @@ std::optional<torch::Tensor> SingleStreamDecoder::maybeFlushSwrBuffers() {
auto actualNumRemainingSamples = swr_convert(
swrContext_.get(), outputBuffers.data(), numRemainingSamples, nullptr, 0);
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

swr_convert() can return a negative value on error. Passing that directly as the length to lastSamples.narrow(...) will throw an unrelated/cryptic error (or potentially create an invalid slice). Add an explicit check for actualNumRemainingSamples < 0 and surface a clear failure (e.g., TORCH_CHECK(actualNumRemainingSamples >= 0, ...)) before narrowing; optionally also handle the == 0 case explicitly.

Suggested change
swrContext_.get(), outputBuffers.data(), numRemainingSamples, nullptr, 0);
swrContext_.get(), outputBuffers.data(), numRemainingSamples, nullptr, 0);
TORCH_CHECK(
actualNumRemainingSamples >= 0,
"swr_convert() failed while flushing buffered audio samples: ",
actualNumRemainingSamples);
if (actualNumRemainingSamples == 0) {
return std::nullopt;
}

Copilot uses AI. Check for mistakes.
Comment on lines +923 to 926
TORCH_LIBRARY_IMPL(torchcodec_ns, BackendSelect, m) {
m.impl("create_from_file", &create_from_file);
m.impl("create_from_tensor", &create_from_tensor);
m.impl("_create_from_file_like", &_create_from_file_like);
Copy link

Copilot AI Apr 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registering create_from_tensor under BackendSelect is risky because it has a Tensor argument and the implementation assumes host-accessible memory (no video_tensor.is_cpu() check; it is passed into AVIOFromTensorContext which reads via data_ptr()/memcpy). With this dispatch key change, CUDA tensors may no longer fail dispatch and could reach this CPU-only code path. Consider registering create_from_tensor under CPU (and only keeping no-Tensor factory ops like create_from_file / _get_json_ffmpeg_library_versions under BackendSelect), or add a strict CPU device check for video_tensor if BackendSelect is intentional.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@SigureMo SigureMo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Paddle 那边支持 schema 最核心解决的问题应该是

Image

这里的 diff 才对,这里有试过回退么?

Copy link
Copy Markdown

@ShigureNyako ShigureNyako left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我先看了这轮 diff 的目标和 CI,AVIOToTensorContext::getOutputTensor() / audio flush 这两个方向本身都对,当前检查也已经全绿;不过以现在这版代码来看,我这里还有两个阻塞项,先不建议合:

  1. src/torchcodec/_core/custom_ops.cpp:923-926
    这里把 create_from_tensor 挂到了 BackendSelect。但当前 create_from_tensor() 只校验了 contiguous 和 dtype(custom_ops.cpp:309-323),随后就把输入 tensor 交给 AVIOFromTensorContext;而 AVIOTensorContext.cpp 的 read 路径会直接对 data_ptr<uint8_t>() 做 host memcpy。这样一来,如果非 CPU 的 video_tensor 也能通过 dispatch 走到这里,就会把一个 CPU-only 的实现暴露出去。若这里确实必须改成 BackendSelect,建议至少补一个显式的 video_tensor.is_cpu() 校验;否则把 create_from_tensor 保持在 CPU 注册下会更稳妥。

  2. src/torchcodec/_core/SingleStreamDecoder.cpp:1419-1423
    这个点我在行内也留了:swr_convert() 失败时会返回负数,当前实现直接把它作为 narrow() 的长度使用,会把 FFmpeg 的真实错误变成一个和问题本身无关的 slicing 异常。这里建议先 TORCH_CHECK(actualNumRemainingSamples >= 0, ...),再返回 narrow() 结果。

这两个点补齐后,我再继续看下一轮。

@@ -1419,11 +1419,8 @@ std::optional<torch::Tensor> SingleStreamDecoder::maybeFlushSwrBuffers() {
auto actualNumRemainingSamples = swr_convert(
swrContext_.get(), outputBuffers.data(), numRemainingSamples, nullptr, 0);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里建议先检查 swr_convert() 的返回值。它失败时会返回负数,当前实现直接把这个值拿去做 lastSamples.narrow(..., actualNumRemainingSamples) 的长度,会把 FFmpeg 的真实错误变成一个和问题本身无关的 narrow 异常。

建议像 FFMPEGCommon.cpp 里已有的调用那样,先 TORCH_CHECK(actualNumRemainingSamples >= 0, ...),再返回 narrow() 的结果。

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

先别管这些了,你看看 diff 减少的同时,test_paddle 有没有覆盖到

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我看了下 test_paddle,当前覆盖不到这个线程里的问题。

具体来说,test_paddle/test_video_decode.py:120 确实会走 decoder.get_frames_at(...),但它前面的 indices 来自 test_paddle/test_video_decode.py:62-64np.arange(...),而且整个用例是 CPU 视频解码路径;它没有覆盖这里的 audio flush,也没有覆盖 swr_convert() 失败返回负数的分支。

所以这和“diff 有没有减少 / test_paddle 有没有过”是两件事。这个点本身还是 correctness 问题:一旦 swr_convert() 返回负数,当前实现会先在 narrow() 这里报一个无关异常,把真实 FFmpeg 错误吞掉。这个检查我这边还是建议补上。

"""

if isinstance(indices, list):
indices = torch.tensor(indices, dtype=torch.int64).cpu()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这能删吗?你在 CPU 机器上可能是对的,在 GPU 机器上可能就跑到 GPU 上了,这得确认下,CI 测不出来

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我顺着这段调用链看了一下:就当前这个 PR 来说,删掉 _video_decoder.py 里这两处本地转换,本身不会把 list 输入改成走 GPU。

原因是 get_frames_at() / get_frames_played_at() 现在都会直接走 src/torchcodec/_core/ops.py,而当前 paddle 分支里的包装层仍然会在这里做一次归一化:

  • src/torchcodec/_core/ops.py:250-259frame_indices 如果是 list,还是会先转成 tensor 并显式 .cpu()
  • src/torchcodec/_core/ops.py:262-274timestamps 如果是 list,也还是会转成 tensor 并显式 .cpu()

所以就这次 diff 本身来看,list 场景只是去掉了重复转换,行为没有变

不过如果担心的是 tensor 输入,那这个风险现在确实还在,而且不是这次删除新引入的:ops.py 对 tensor 只做 dtype cast,没有搬回 CPU;而 C++ 侧 src/torchcodec/_core/SingleStreamDecoder.cpp:623-627:787-790 直接用 accessor<...>() 读取 index / timestamp。这条路径目前也没有看到专门覆盖 GPU tensor 输入的测试。

Copy link
Copy Markdown

@ShigureNyako ShigureNyako left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我补看了 1cb1047 这一轮增量。这个 commit 主要是在清理 _video_decoder.py 里的重复转换,以及调整一个 sampler 调用写法,相关 CI 也已经全绿。

不过就当前 head 来看,我这边结论还是不变,仍需要修改后再看:

  1. 我前一轮指出的两个 C++ 阻塞项还没有处理:
    • src/torchcodec/_core/custom_ops.cppcreate_from_tensor 挂到 BackendSelect 后,CPU-only 路径仍缺少明确保护;
    • src/torchcodec/_core/SingleStreamDecoder.cppswr_convert() 的负返回值仍未检查。
  2. 这次 _video_decoder.py 删除的 list -> tensor.cpu() 只是去掉了重复转换;当前真正做这层归一化的是 src/torchcodec/_core/ops.py。而 test_paddle/test_video_decode.py 当前只覆盖了 CPU 视频解码下的 get_frames_at(np.arange(...)),没有覆盖上面两个 C++ 阻塞点,也没有覆盖 reviewer 关心的 GPU / tensor 输入路径。

所以我这边暂时还是不能通过,等这两个阻塞点补齐后我再继续复审。

@SigureMo SigureMo changed the title Reduce the diff with the pytrch:main branch Reduce the diff with the pytorch:main branch Apr 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants