-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add core support for decoding from Python file-like objects #564
base: main
Are you sure you want to change the base?
Conversation
) | ||
|
||
global _pybind_ops | ||
_pybind_ops = importlib.util.module_from_spec(spec) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not yet work on Mac. See: https://github.com/pytorch/torchcodec/actions/runs/13924707065/job/38966320796?pr=564
We're unable to load the spec from the file on line 45. When digging around, it seems like we should be using ctypes.CDLL
to load a shared library, and that's what torch.ops.load_library()
does under the hood. Hence why I kept that in for the pybind ops.
TorchAudio does both: https://github.com/pytorch/audio/blob/c670ad81fda266b6598aeeef434583eb98197ae8/src/torio/_extension/utils.py#L105
The common case for pybind11 is that you have:
- A shared library named
foo.so
. - A module declared as
foo
in thePYBIND11_MODULE
macro in C++. foo.so
lives in a place Python knows to look.
Then import foo
should just work.
at::Tensor wrapDecoderPointerToTensor( | ||
std::unique_ptr<VideoDecoder> uniqueDecoder) { | ||
VideoDecoder* decoder = uniqueDecoder.release(); | ||
|
||
auto deleter = [decoder](void*) { delete decoder; }; | ||
at::Tensor tensor = | ||
at::from_blob(decoder, {sizeof(VideoDecoder)}, deleter, {at::kLong}); | ||
at::from_blob(decoder, {sizeof(VideoDecoder*)}, deleter, {at::kLong}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drive-by: we have been allocating enough space for an entire VideoDecoder
object. But we're not storing the entire object, just a pointer.
decoder = create_from_file_like( | ||
open(path, mode="rb", buffering=-4096), "exact" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you describe intent behind buffering=-4096
? I'm looking at the open()
docs but they don't seem to describe what a negative value means?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, I misread the docs. It should be positive.
@@ -15,6 +18,8 @@ | |||
_get_extension_path, | |||
) | |||
|
|||
_pybind_ops: Optional[ModuleType] = None | |||
|
|||
|
|||
def load_torchcodec_extension(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nit: the name and description of this function could be slightly updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd love to. :)
torch.ops.load_library(_get_extension_path(custom_ops_library_name)) | ||
|
||
pybind_ops_library_path = _get_extension_path(pybind_ops_library_name) | ||
torch.ops.load_library(pybind_ops_library_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to better understand spec_from_file_location
and module_from_spec
, but I'm wondering:
in the main comment you wrote:
We just use torch.ops.load_library() for the custom ops
yet it looks like we're using torch.ops.load_library()
for all 3 extensions. Is that redundant with the use of spec_from_file_location
/ module_from_spec
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question! I don't know. It's what TorchAudio does:
That is, it uses both torch.ops.load_library()
and functions from importlib
. I've gotten Linux working without the torch.ops.load_library()
on the pybind library. What we have here was my last attempt at seeing what might work for Mac.
@register_fake("torchcodec_ns::_convert_to_tensor") | ||
def _convert_to_tensor_abstract(decoder_ptr: int) -> torch.Tensor: | ||
return torch.empty([], dtype=torch.long) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious if _convert_to_tensor
needs to be exposed as a custom op in Python? It seems to only be used within create_from_file_like()
, so maybe we don't need to expose it in Python and just handle that logic internally within the C++ implementation of _pybind_ops.create_from_file_like
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's needed because we can't handle that logic on the C++ side. That is, pybind11 should be able to just return a tensor. But when we do that, we run into a PyTorch core bug: pytorch/pytorch#136664. In order to avoid that bug, we can't return a tensor from pybind11.
Our work-around, then, is to launder the pointer though an int
on the pybind11 side, and then use a PyTorch custom op to launder that int
into a tensor. We need to expose _convert_to_tensor()
to Python because we put those calls together on the Python side. Notably, we never expose _convert_to_tensor()
outside of ops.py
. It's only used to implement create_from_file_like()
.
// https://github.com/pytorch/pytorch/issues/136664 | ||
// | ||
// So we instead launder the pointer through an int, and then use a conversion | ||
// function on the custom ops side to launder that int into a tensor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we even need to return a tensor (or an int, as a fallback)? IIRC the reason we have to return tensors with our custom ops was mostly because of a limitation of the Pytorch custom ops plumbing, but since we're just using pybind here, could we simply return a nice VideoDecoder
object?
OK, this still needs to be a tensor (or int fallback) because we'll be calling core APIs from Python e.g. core.xyz(decoder, ...)
where decoder
is a tensor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, exactly. I want this functionality to slot into our current APIs, not having a mirror set of APIs.
AVIOFileLikeContext::AVIOFileLikeContext(py::object fileLike) | ||
: fileLike_{UniquePyObject(new py::object(fileLike))} { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm struggling to parse
new py::object(fileLike))
We're allocating a new py::object
from an existing py::object
, which is fileLike
? What kind of copy is involved here (if any?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. We're invoking py::object
's copy constructor on fileLike
. That resulting object is dynamically allocated on the heap, and its pointer is returned by new
. The pointer to that object is then owned by AVIOFileLikeContext::fileLike_
.
It may be instructive to contrast this to a stack allocated object, in statements that appear in normal code blocks:
py::object copyAllocatedOnStack(fileLike);
py::object copyAlsoAllocatedOnStack{fileLike}; // new style initializer list
py::object* copyAllocatedOnHeap = new py::object(fileLike);
// TODO: It is maybe more efficient to grab the lock once in the | ||
// surrounding scope? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's quite likely, IIRC when I was working with Cython stuff, acquiring / releasing the GIL wasn't trivially cheap
auto chunk = static_cast<std::string>( | ||
static_cast<py::bytes>((*fileLike)->attr("read")(request))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you help me understand why casting to string
? Is it for convenience of keeping both the data and the size together?
num_read += chunk_len; | ||
} | ||
return num_read == 0 ? AVERROR_EOF : num_read; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Above: this can come later, but we should definitely add a test on an object that can only return a small amount of bytes at once, so as to stress test the while
logic above.
Co-authored-by: Nicolas Hug <[email protected]>
Co-authored-by: Nicolas Hug <[email protected]>
The purpose of this PR is to provide functionality in the core API that allows users to provide a Python file-like object as the video for us to decode. Specifically, we're exposing:
Everything else in this PR is in direct support of what we need to do to support this new function.
On the kind of file-like object, note that we accept and test both
io.RawIOBase
andio.BytesIO
. I'm confident we should supportRawIOBase
, as it's unbuffered, byte oriented reading. I'm less sure aboutBytesIO
, because it is buffered byte oriented reading. The current tests pass, but we're not stressing it much.On what we had to do to expose this new function:
libtorchcodec_decoderN.so
,libtorchcodec_custom_opsN.so
,libtorchcodec_pybind_opsN.so
.Going over each in turn:
Why directly use pybind11?
We're already using PyTorch C++ custom ops for our interface between Python and C++, and they already have some dependence on pybind11, so why do we need to create non-custom op pybind11 functions?
From what I can tell, the custom ops were not really intended for what we want to do, which is call C++ code which will itself make callbacks back up to user-provide Python functions. That is, when the user passes us a file-like object, we want FFmpeg to call the read and seek methods on the Python file-like object for all reading and seeking. I don't think custom ops are designed for that kind of dynamic callbacks to Python code. (Custom ops definitely can call Python custom ops, but they need to be registered as such ahead of time. We want to call arbitrary Python functions.) I may be wrong here, and we can explore that in the future.
What's I'm more confident of is that we need to store an actual reference to the Python file-like object in the C++ side. Using pybind11 directly, that's easy: we keep a pointer to a
py::object*
. PyTorch custom ops only accept tensors as arguments. We're already smuggling pointers through tensors in the rest of the core API, but we're going from C++ to Python and back to C++. When we store a pointer from C++ in a tensor to go back up to Python, we know it's the right thing. In this instance, we would want to smuggle a pointer from Python through a tensor to C++ - but Python doesn't have pointers. We may be able to get something that will work most of the time by just asking forid(file_like)
, but even then, I'm not sure how to reliably turn that into apy::object*
on the C++ side.If we just use a pybind11 function, all of this difficulty goes away. The cost is that using a file-like object is definitely not going to be compatible with torch.export and torch.compile, but I'm not sure how to make that known to the PyTorch custom ops. This warrants further investigation.
Why do we need multiple shared libraries?
I think things are simpler if we have a shared library just for the custom ops and a separate shared library just for the pybind11 ops. That then means we need a third library which holds the actual decoder logic. I was not able to get anything working until I made this division, as I am currently using
importlib.util.spec_from_file_location()
andimportlib.util.module_from_spec()
to load the pybind11 module. We just usetorch.ops.load_library()
for the custom ops; that function has machinery that then exposes available functions as fields of the module.What I'm currently doing works on Linux, but is failing on Mac, so something is wrong. It may be possible we don't need to do the split.
Generalization of handling AVIOContext
Custom reading and seeking is done in FFmpeg by setting up an
AVIOContext
object. You callavio_alloc_context()
where you provide a pointer to some external state, and then functions for read, seek and write. Then, during decoding, when FFmpeg needs to get more bytes from the file, it calls the callbacks, providing a pointer to the external state. You're responsible for managing that external state in your callbacks.We already were using this for when users provided us the entire file as just plain bytes. I generalized this handling into three classes:
AVIOContexHolder
which is a base class that knows how to allocate anAVIOContext
. It cannot be instantiated directly. TheVideoDecoder
can be instantiated with anAVIOContextHolder
and it uses it appropriately.AVIOBytesContext
which is the existing functionality we already had. It derives fromAVIOContextHolder
.AVIOFileLikeContext
which is the new functionality, and it also derives fromAVIOContextHolder
.