-
-
Notifications
You must be signed in to change notification settings - Fork 36
Developer Notes
The general principle for the code layout is to have as much as possible in C++, and to just bind it into Python. That said, the only supported interface for the code is in Python, as there's just too much that only really works from the Python side.
Typical code layout is to have the core functionality written in C++. The forward and backward passes will be bound into Python, and wrapped into the torch.autograd framework inside of Python. Ideally we'd do this in the C++ - it must be possible somehow, PyTorch itself does this - but it's not at all clear how.
This then has the implication that anything that depends upon the autograd framework is written in Python. In principle it should be possible to re-bind the Python back into C++ and then just have a clean C++ interface that is just bound into Python, and that would be nice. Might yet consider this.
- At the moment the axis ordering in the C++ is (stream, batch, channel). This seems to be fastest, but is essentially a compromise. At time of writing we haven't written any custom GPU code (instead relying on PyTorch to do it for us), and as the 'highest level' thing we do is iterate down the stream dimension, then this is fastest on the GPU. (So that each operation then uses tensors of shape (batch, channel), for maximum parallelism.) If we ever write GPU code then we should consider switching this to (batch, stream, channel), and parallelise over the batch dimension at the highest level. We have written custom CPU code, and so this is going to be slightly slower than it really has to be because of this ordering.
Not everything can/should be handled inside a autograd.Function, so typically each such autograd.Function will be wrapped with a private function handling the rest of these details. In particular (non-exhaustive list):
-
There's some annoying transposing (at least in Signatory v1.0). Basically, the interface uses the convention of
(batch, stream, channel), as that's what's most common. Meanwhile the C++ code uses the convention of(stream, batch, channel), for speed. Now it turns out that due to PyTorch bug 24413, that the transposing has to occur outside of the autograd framework. Thus the convention in the code is to do all transposing in the wrapper function. Thus everything in C++/autograd.Functionwill use(stream, batch, channel), and everything else will use(batch, stream, channel). Make sure to think carefully about what the axis ordering is in any given part of the code you're using! -
PyTorch bug 25340, which is a wontfix, implies that one has to be quite careful about what is saved on
ctxinautograd.Function, and how. To be completely safe (and this is also best for consistency reasons): anytorch.Tensorto keep should be saved usingctx.save_for_backward. Anything else should be assigned as an attribute toctx. In particular it is not safe to save any structure oftorch.Tensors (e.g. a list, PyCapsule, etc.) as an attribute. Instead the structure should be serialised into a list of tensors andctx.save_for_backwardused. (Which is definitely a bit awkward).