Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add core support for decoding from Python file-like objects #564

Draft
wants to merge 36 commits into
base: main
Choose a base branch
from

Conversation

scotts
Copy link
Contributor

@scotts scotts commented Mar 14, 2025

The purpose of this PR is to provide functionality in the core API that allows users to provide a Python file-like object as the video for us to decode. Specifically, we're exposing:

def create_from_file_like(
    file_like: Union[io.RawIOBase, io.BytesIO], seek_mode: Optional[str] = None
) -> torch.Tensor:

Everything else in this PR is in direct support of what we need to do to support this new function.

On the kind of file-like object, note that we accept and test both io.RawIOBase and io.BytesIO. I'm confident we should support RawIOBase, as it's unbuffered, byte oriented reading. I'm less sure about BytesIO, because it is buffered byte oriented reading. The current tests pass, but we're not stressing it much.

On what we had to do to expose this new function:

  1. Implement pybind11 functions that don't go through PyTorch C++ custom ops.
  2. Split the shared libraries into three: libtorchcodec_decoderN.so, libtorchcodec_custom_opsN.so, libtorchcodec_pybind_opsN.so.
  3. Generalize how we handle AVIOContext objects in the C++ VideoDecoder.

Going over each in turn:

Why directly use pybind11?

We're already using PyTorch C++ custom ops for our interface between Python and C++, and they already have some dependence on pybind11, so why do we need to create non-custom op pybind11 functions?

From what I can tell, the custom ops were not really intended for what we want to do, which is call C++ code which will itself make callbacks back up to user-provide Python functions. That is, when the user passes us a file-like object, we want FFmpeg to call the read and seek methods on the Python file-like object for all reading and seeking. I don't think custom ops are designed for that kind of dynamic callbacks to Python code. (Custom ops definitely can call Python custom ops, but they need to be registered as such ahead of time. We want to call arbitrary Python functions.) I may be wrong here, and we can explore that in the future.

What's I'm more confident of is that we need to store an actual reference to the Python file-like object in the C++ side. Using pybind11 directly, that's easy: we keep a pointer to a py::object*. PyTorch custom ops only accept tensors as arguments. We're already smuggling pointers through tensors in the rest of the core API, but we're going from C++ to Python and back to C++. When we store a pointer from C++ in a tensor to go back up to Python, we know it's the right thing. In this instance, we would want to smuggle a pointer from Python through a tensor to C++ - but Python doesn't have pointers. We may be able to get something that will work most of the time by just asking for id(file_like), but even then, I'm not sure how to reliably turn that into a py::object* on the C++ side.

If we just use a pybind11 function, all of this difficulty goes away. The cost is that using a file-like object is definitely not going to be compatible with torch.export and torch.compile, but I'm not sure how to make that known to the PyTorch custom ops. This warrants further investigation.

Why do we need multiple shared libraries?

I think things are simpler if we have a shared library just for the custom ops and a separate shared library just for the pybind11 ops. That then means we need a third library which holds the actual decoder logic. I was not able to get anything working until I made this division, as I am currently using importlib.util.spec_from_file_location() and importlib.util.module_from_spec() to load the pybind11 module. We just use torch.ops.load_library() for the custom ops; that function has machinery that then exposes available functions as fields of the module.

What I'm currently doing works on Linux, but is failing on Mac, so something is wrong. It may be possible we don't need to do the split.

Generalization of handling AVIOContext

Custom reading and seeking is done in FFmpeg by setting up an AVIOContext object. You call avio_alloc_context() where you provide a pointer to some external state, and then functions for read, seek and write. Then, during decoding, when FFmpeg needs to get more bytes from the file, it calls the callbacks, providing a pointer to the external state. You're responsible for managing that external state in your callbacks.

We already were using this for when users provided us the entire file as just plain bytes. I generalized this handling into three classes:

  1. AVIOContexHolder which is a base class that knows how to allocate an AVIOContext. It cannot be instantiated directly. The VideoDecoder can be instantiated with an AVIOContextHolder and it uses it appropriately.
  2. AVIOBytesContext which is the existing functionality we already had. It derives from AVIOContextHolder.
  3. AVIOFileLikeContext which is the new functionality, and it also derives from AVIOContextHolder.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 14, 2025
)

global _pybind_ops
_pybind_ops = importlib.util.module_from_spec(spec)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not yet work on Mac. See: https://github.com/pytorch/torchcodec/actions/runs/13924707065/job/38966320796?pr=564

We're unable to load the spec from the file on line 45. When digging around, it seems like we should be using ctypes.CDLL to load a shared library, and that's what torch.ops.load_library() does under the hood. Hence why I kept that in for the pybind ops.

TorchAudio does both: https://github.com/pytorch/audio/blob/c670ad81fda266b6598aeeef434583eb98197ae8/src/torio/_extension/utils.py#L105

The common case for pybind11 is that you have:

  1. A shared library named foo.so.
  2. A module declared as foo in the PYBIND11_MODULE macro in C++.
  3. foo.so lives in a place Python knows to look.

Then import foo should just work.

at::Tensor wrapDecoderPointerToTensor(
std::unique_ptr<VideoDecoder> uniqueDecoder) {
VideoDecoder* decoder = uniqueDecoder.release();

auto deleter = [decoder](void*) { delete decoder; };
at::Tensor tensor =
at::from_blob(decoder, {sizeof(VideoDecoder)}, deleter, {at::kLong});
at::from_blob(decoder, {sizeof(VideoDecoder*)}, deleter, {at::kLong});
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drive-by: we have been allocating enough space for an entire VideoDecoder object. But we're not storing the entire object, just a pointer.

Comment on lines +365 to +367
decoder = create_from_file_like(
open(path, mode="rb", buffering=-4096), "exact"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you describe intent behind buffering=-4096? I'm looking at the open() docs but they don't seem to describe what a negative value means?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, I misread the docs. It should be positive.

@@ -15,6 +18,8 @@
_get_extension_path,
)

_pybind_ops: Optional[ModuleType] = None


def load_torchcodec_extension():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: the name and description of this function could be slightly updated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd love to. :)

torch.ops.load_library(_get_extension_path(custom_ops_library_name))

pybind_ops_library_path = _get_extension_path(pybind_ops_library_name)
torch.ops.load_library(pybind_ops_library_path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to better understand spec_from_file_location and module_from_spec, but I'm wondering:

in the main comment you wrote:

We just use torch.ops.load_library() for the custom ops

yet it looks like we're using torch.ops.load_library() for all 3 extensions. Is that redundant with the use of spec_from_file_location / module_from_spec ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! I don't know. It's what TorchAudio does:

https://github.com/pytorch/audio/blob/c670ad81fda266b6598aeeef434583eb98197ae8/src/torio/_extension/utils.py#L101-L109

That is, it uses both torch.ops.load_library() and functions from importlib. I've gotten Linux working without the torch.ops.load_library() on the pybind library. What we have here was my last attempt at seeing what might work for Mac.

Comment on lines +161 to +163
@register_fake("torchcodec_ns::_convert_to_tensor")
def _convert_to_tensor_abstract(decoder_ptr: int) -> torch.Tensor:
return torch.empty([], dtype=torch.long)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious if _convert_to_tensor needs to be exposed as a custom op in Python? It seems to only be used within create_from_file_like(), so maybe we don't need to expose it in Python and just handle that logic internally within the C++ implementation of _pybind_ops.create_from_file_like?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's needed because we can't handle that logic on the C++ side. That is, pybind11 should be able to just return a tensor. But when we do that, we run into a PyTorch core bug: pytorch/pytorch#136664. In order to avoid that bug, we can't return a tensor from pybind11.

Our work-around, then, is to launder the pointer though an int on the pybind11 side, and then use a PyTorch custom op to launder that int into a tensor. We need to expose _convert_to_tensor() to Python because we put those calls together on the Python side. Notably, we never expose _convert_to_tensor() outside of ops.py. It's only used to implement create_from_file_like().

// https://github.com/pytorch/pytorch/issues/136664
//
// So we instead launder the pointer through an int, and then use a conversion
// function on the custom ops side to launder that int into a tensor.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we even need to return a tensor (or an int, as a fallback)? IIRC the reason we have to return tensors with our custom ops was mostly because of a limitation of the Pytorch custom ops plumbing, but since we're just using pybind here, could we simply return a nice VideoDecoder object?

OK, this still needs to be a tensor (or int fallback) because we'll be calling core APIs from Python e.g. core.xyz(decoder, ...) where decoder is a tensor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly. I want this functionality to slot into our current APIs, not having a mirror set of APIs.

Comment on lines +12 to +13
AVIOFileLikeContext::AVIOFileLikeContext(py::object fileLike)
: fileLike_{UniquePyObject(new py::object(fileLike))} {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm struggling to parse

new py::object(fileLike))

We're allocating a new py::object from an existing py::object, which is fileLike? What kind of copy is involved here (if any?)

Copy link
Contributor Author

@scotts scotts Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. We're invoking py::object's copy constructor on fileLike. That resulting object is dynamically allocated on the heap, and its pointer is returned by new. The pointer to that object is then owned by AVIOFileLikeContext::fileLike_.

It may be instructive to contrast this to a stack allocated object, in statements that appear in normal code blocks:

py::object copyAllocatedOnStack(fileLike);
py::object copyAlsoAllocatedOnStack{fileLike}; // new style initializer list
py::object* copyAllocatedOnHeap = new py::object(fileLike);

Comment on lines +35 to +36
// TODO: It is maybe more efficient to grab the lock once in the
// surrounding scope?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's quite likely, IIRC when I was working with Cython stuff, acquiring / releasing the GIL wasn't trivially cheap

Comment on lines +38 to +39
auto chunk = static_cast<std::string>(
static_cast<py::bytes>((*fileLike)->attr("read")(request)));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you help me understand why casting to string? Is it for convenience of keeping both the data and the size together?

num_read += chunk_len;
}
return num_read == 0 ? AVERROR_EOF : num_read;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Above: this can come later, but we should definitely add a test on an object that can only return a small amount of bytes at once, so as to stress test the while logic above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants