-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use FrameOutput for audio as well #574
base: main
Are you sure you want to change the base?
Conversation
struct FrameOutput { | ||
torch::Tensor data; // 3D: of shape CHW or HWC. | ||
int streamIndex; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Driveby: removing unused streamIndex field
// TODO: it's not great that we are getting a FrameOutput, which is | ||
// intended for videos. We should consider bypassing | ||
// convertAVFrameToFrameOutput and directly call | ||
// convertAudioAVFrameToFrameOutputOnCPU. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this PR is about not addressing this TODO. I tried to address it, but then realized that we do call convertAVFrameToFrameOutput()
in getNextFrameInternal()
:
torchcodec/src/torchcodec/decoders/_core/VideoDecoder.cpp
Lines 583 to 589 in b830480
VideoDecoder::FrameOutput VideoDecoder::getNextFrameInternal( | |
std::optional<torch::Tensor> preAllocatedOutputTensor) { | |
validateActiveStream(); | |
UniqueAVFrame avFrame = decodeAVFrame( | |
[this](const UniqueAVFrame& avFrame) { return avFrame->pts >= cursor_; }); | |
return convertAVFrameToFrameOutput(avFrame, preAllocatedOutputTensor); | |
} |
If we were to address the TODO, we would need separated audio/video logics in getNextFrameInternal()
: one calling convertAVFrameToFrameOutput()
(for video), and the other calling the audio-conversion stuff.
And we can't get around the fact that getNextFrameInternal()
returns a FrameOutput
, can we?
So, I guess this PR is about accepting FrameOutput
as a "universal frame output", for both audio and videos.
See comment below