Skip to content

Developer Notes

Patrick Kidger edited this page Oct 31, 2019 · 5 revisions

General

The general principle for the code layout is to have as much as possible in C++, and to just bind it into Python. That said, the only supported interface for the code is in Python, as there's just too much that only really works from the Python side.

Typical code layout is to have the core functionality written in C++. The forward and backward passes will be bound into Python, and wrapped into the torch.autograd framework inside of Python. Ideally we'd do this in the C++ - it must be possible somehow, PyTorch itself does this - but it's not at all clear how.

This then has the implication that anything that depends upon the autograd framework is written in Python. In principle it should be possible to re-bind the Python back into C++ and then just have a clean C++ interface that is just bound into Python, and that would be nice. Might yet consider this.

  • At the moment the axis ordering in the C++ is (stream, batch, channel). This seems to be fastest, but is essentially a compromise. At time of writing we haven't written any custom GPU code (instead relying on PyTorch to do it for us), and as the 'highest level' thing we do is iterate down the stream dimension, then this is fastest on the GPU. (So that each operation then uses tensors of shape (batch, channel), for maximum parallelism.) If we ever write GPU code then we should consider switching this to (batch, stream, channel), and parallelise over the batch dimension at the highest level. We have written custom CPU code, and so this is going to be slightly slower than it really has to be because of this ordering.

autograd

Not everything can/should be handled inside a autograd.Function, so typically each such autograd.Function will be wrapped with a private function handling the rest of these details. In particular (non-exhaustive list):

  • There's some annoying transposing (at least in Signatory v1.0). Basically, the interface uses the convention of (batch, stream, channel), as that's what's most common. Meanwhile the C++ code uses the convention of (stream, batch, channel), for speed. Now it turns out that due to PyTorch bug 24413, that the transposing has to occur outside of the autograd framework. Thus the convention in the code is to do all transposing in the wrapper function. Thus everything in C++/autograd.Function will use (stream, batch, channel), and everything else will use (batch, stream, channel). Make sure to think carefully about what the axis ordering is in any given part of the code you're using!

  • PyTorch bug 25340, which is a wontfix, implies that one has to be quite careful about what is saved on ctx in autograd.Function, and how. To be completely safe (and this is also best for consistency reasons): any torch.Tensor to keep should be saved using ctx.save_for_backward. Anything else should be assigned as an attribute to ctx. In particular it is not safe to save any structure of torch.Tensors (e.g. a list, PyCapsule, etc.) as an attribute. Instead the structure should be serialised into a list of tensors and ctx.save_for_backward used. (Which is definitely a bit awkward).

Clone this wiki locally