Reduce the diff with the pytorch:main branch#3
Reduce the diff with the pytorch:main branch#3gouzil wants to merge 5 commits intoPFCCLab:paddlefrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR aims to reduce divergence from the upstream PyTorch torchcodec main branch by aligning several C++ core behaviors and operator registrations.
Changes:
- Implemented previously stubbed/unimplemented paths for audio resampler flush (
maybeFlushSwrBuffers) and tensor-backed AVIO output (AVIOToTensorContext::getOutputTensor). - Adjusted custom op registration blocks (including adding a
BackendSelectregistration block) and minor include/formatting alignment.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/torchcodec/_core/SingleStreamDecoder.cpp | Implements returning flushed resampled audio samples instead of throwing. |
| src/torchcodec/_core/custom_ops.cpp | Reorders includes and changes TORCH_LIBRARY_IMPL registrations (adds BackendSelect). |
| src/torchcodec/_core/AVIOTensorContext.cpp | Implements returning the encoded output tensor slice instead of throwing. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -1419,11 +1419,8 @@ std::optional<torch::Tensor> SingleStreamDecoder::maybeFlushSwrBuffers() { | |||
| auto actualNumRemainingSamples = swr_convert( | |||
| swrContext_.get(), outputBuffers.data(), numRemainingSamples, nullptr, 0); | |||
There was a problem hiding this comment.
swr_convert() can return a negative value on error. Passing that directly as the length to lastSamples.narrow(...) will throw an unrelated/cryptic error (or potentially create an invalid slice). Add an explicit check for actualNumRemainingSamples < 0 and surface a clear failure (e.g., TORCH_CHECK(actualNumRemainingSamples >= 0, ...)) before narrowing; optionally also handle the == 0 case explicitly.
| swrContext_.get(), outputBuffers.data(), numRemainingSamples, nullptr, 0); | |
| swrContext_.get(), outputBuffers.data(), numRemainingSamples, nullptr, 0); | |
| TORCH_CHECK( | |
| actualNumRemainingSamples >= 0, | |
| "swr_convert() failed while flushing buffered audio samples: ", | |
| actualNumRemainingSamples); | |
| if (actualNumRemainingSamples == 0) { | |
| return std::nullopt; | |
| } |
| TORCH_LIBRARY_IMPL(torchcodec_ns, BackendSelect, m) { | ||
| m.impl("create_from_file", &create_from_file); | ||
| m.impl("create_from_tensor", &create_from_tensor); | ||
| m.impl("_create_from_file_like", &_create_from_file_like); |
There was a problem hiding this comment.
Registering create_from_tensor under BackendSelect is risky because it has a Tensor argument and the implementation assumes host-accessible memory (no video_tensor.is_cpu() check; it is passed into AVIOFromTensorContext which reads via data_ptr()/memcpy). With this dispatch key change, CUDA tensors may no longer fail dispatch and could reach this CPU-only code path. Consider registering create_from_tensor under CPU (and only keeping no-Tensor factory ops like create_from_file / _get_json_ffmpeg_library_versions under BackendSelect), or add a strict CPU device check for video_tensor if BackendSelect is intentional.
ShigureNyako
left a comment
There was a problem hiding this comment.
我先看了这轮 diff 的目标和 CI,AVIOToTensorContext::getOutputTensor() / audio flush 这两个方向本身都对,当前检查也已经全绿;不过以现在这版代码来看,我这里还有两个阻塞项,先不建议合:
-
src/torchcodec/_core/custom_ops.cpp:923-926
这里把create_from_tensor挂到了BackendSelect。但当前create_from_tensor()只校验了 contiguous 和 dtype(custom_ops.cpp:309-323),随后就把输入 tensor 交给AVIOFromTensorContext;而AVIOTensorContext.cpp的 read 路径会直接对data_ptr<uint8_t>()做 hostmemcpy。这样一来,如果非 CPU 的video_tensor也能通过 dispatch 走到这里,就会把一个 CPU-only 的实现暴露出去。若这里确实必须改成BackendSelect,建议至少补一个显式的video_tensor.is_cpu()校验;否则把create_from_tensor保持在CPU注册下会更稳妥。 -
src/torchcodec/_core/SingleStreamDecoder.cpp:1419-1423
这个点我在行内也留了:swr_convert()失败时会返回负数,当前实现直接把它作为narrow()的长度使用,会把 FFmpeg 的真实错误变成一个和问题本身无关的 slicing 异常。这里建议先TORCH_CHECK(actualNumRemainingSamples >= 0, ...),再返回narrow()结果。
这两个点补齐后,我再继续看下一轮。
| @@ -1419,11 +1419,8 @@ std::optional<torch::Tensor> SingleStreamDecoder::maybeFlushSwrBuffers() { | |||
| auto actualNumRemainingSamples = swr_convert( | |||
| swrContext_.get(), outputBuffers.data(), numRemainingSamples, nullptr, 0); | |||
There was a problem hiding this comment.
这里建议先检查 swr_convert() 的返回值。它失败时会返回负数,当前实现直接把这个值拿去做 lastSamples.narrow(..., actualNumRemainingSamples) 的长度,会把 FFmpeg 的真实错误变成一个和问题本身无关的 narrow 异常。
建议像 FFMPEGCommon.cpp 里已有的调用那样,先 TORCH_CHECK(actualNumRemainingSamples >= 0, ...),再返回 narrow() 的结果。
There was a problem hiding this comment.
先别管这些了,你看看 diff 减少的同时,test_paddle 有没有覆盖到
There was a problem hiding this comment.
我看了下 test_paddle,当前覆盖不到这个线程里的问题。
具体来说,test_paddle/test_video_decode.py:120 确实会走 decoder.get_frames_at(...),但它前面的 indices 来自 test_paddle/test_video_decode.py:62-64 的 np.arange(...),而且整个用例是 CPU 视频解码路径;它没有覆盖这里的 audio flush,也没有覆盖 swr_convert() 失败返回负数的分支。
所以这和“diff 有没有减少 / test_paddle 有没有过”是两件事。这个点本身还是 correctness 问题:一旦 swr_convert() 返回负数,当前实现会先在 narrow() 这里报一个无关异常,把真实 FFmpeg 错误吞掉。这个检查我这边还是建议补上。
| """ | ||
|
|
||
| if isinstance(indices, list): | ||
| indices = torch.tensor(indices, dtype=torch.int64).cpu() |
There was a problem hiding this comment.
这能删吗?你在 CPU 机器上可能是对的,在 GPU 机器上可能就跑到 GPU 上了,这得确认下,CI 测不出来
There was a problem hiding this comment.
我顺着这段调用链看了一下:就当前这个 PR 来说,删掉 _video_decoder.py 里这两处本地转换,本身不会把 list 输入改成走 GPU。
原因是 get_frames_at() / get_frames_played_at() 现在都会直接走 src/torchcodec/_core/ops.py,而当前 paddle 分支里的包装层仍然会在这里做一次归一化:
src/torchcodec/_core/ops.py:250-259:frame_indices如果是 list,还是会先转成 tensor 并显式.cpu()。src/torchcodec/_core/ops.py:262-274:timestamps如果是 list,也还是会转成 tensor 并显式.cpu()。
所以就这次 diff 本身来看,list 场景只是去掉了重复转换,行为没有变。
不过如果担心的是 tensor 输入,那这个风险现在确实还在,而且不是这次删除新引入的:ops.py 对 tensor 只做 dtype cast,没有搬回 CPU;而 C++ 侧 src/torchcodec/_core/SingleStreamDecoder.cpp:623-627 和 :787-790 直接用 accessor<...>() 读取 index / timestamp。这条路径目前也没有看到专门覆盖 GPU tensor 输入的测试。
ShigureNyako
left a comment
There was a problem hiding this comment.
我补看了 1cb1047 这一轮增量。这个 commit 主要是在清理 _video_decoder.py 里的重复转换,以及调整一个 sampler 调用写法,相关 CI 也已经全绿。
不过就当前 head 来看,我这边结论还是不变,仍需要修改后再看:
- 我前一轮指出的两个 C++ 阻塞项还没有处理:
src/torchcodec/_core/custom_ops.cpp里create_from_tensor挂到BackendSelect后,CPU-only 路径仍缺少明确保护;src/torchcodec/_core/SingleStreamDecoder.cpp里swr_convert()的负返回值仍未检查。
- 这次
_video_decoder.py删除的list -> tensor.cpu()只是去掉了重复转换;当前真正做这层归一化的是src/torchcodec/_core/ops.py。而test_paddle/test_video_decode.py当前只覆盖了 CPU 视频解码下的get_frames_at(np.arange(...)),没有覆盖上面两个 C++ 阻塞点,也没有覆盖 reviewer 关心的 GPU / tensor 输入路径。
所以我这边暂时还是不能通过,等这两个阻塞点补齐后我再继续复审。

减少与 Pytorch mian 分支的 diff
cc: @SigureMo 、@ShigureNyako
相关链接
c10::DispatchKeyand replace dispatch_key_to_string withc10::toStringPaddlePaddle/Paddle#78525std::optionalin error messages PaddlePaddle/Paddle#78521BackendSelectdispatch lookup in torch compat PaddlePaddle/Paddle#78591