Description
To achieve high performance where vectorization with NumPy is not possible, Biotite currently uses Cython code. However, there are some limitations in Cython:
- Apart from fused types Cython does not support generics
- Separation into multiple modules can be a bit cumbersome
- Fast string operations are basically impossible unless one wants to use rather unsafe ASCII-only C-strings
- To get a very high performance, safe guards need to be disabled leading to potential memory issues (out-of-bounds, leaks, etc.)
- In my opinion Rust is much cleaner than Cython, when low-level, typed operations are involved.
Hence, this issue should initiate the discussion if Rust code using PyO3 should be allowed in Biotite, as it has become quite mature in recent years. This would address all the issues mentioned above. More specifically, these are the places in the code base where the limitations become quite clear:
PDBFile
: There already isfastpdb
written in Rust, but it needs to be maintained separately in the moment.connect_via_residue_names()
: The Python string operations in this Cython function makes it quite slow and it is actually the bottleneck inpdbx.get_structure()
, when the input is aBinaryCIFFile
.- Line parsing in
CIFFile
: It has been addressed multiple times now to make it faster (Handle embedded quote in mmcif #619, Update _split_one_line and remove whitespace parameter #686), but this short function is still the bottleneck when reading CIF files.
Probably even more places in Biotite would benefit from routines written in Rust.
However there would also be a few disadvantages:
- Cython is easier to learn than Rust, as it is close to Python.
- Development would become more complex, as there would be 3 programming languages (Python, Cython, Rust), as long as Cython code still exists in the code base.
- The build process would become more complex as the Rust compiler needs to be involved, but this should not change the user experience.
- People cannot install Biotite from a source distribution, if they do not have a Rust compiler installed (will hopefully be solved by Implement new sdist feature to download rust toolchain PyO3/maturin#2177).
I lean towards accepting Rust in Biotite (otherwise I would not have opened this issue 😉), but I really like to hear your opinion about this @t0mdavid-m @JHKru @MaxGreil and other contributors/users with an opinion on this topic.