Skip to content

Introducing Rust code into Biotite #688

Open
@padix-key

Description

@padix-key

To achieve high performance where vectorization with NumPy is not possible, Biotite currently uses Cython code. However, there are some limitations in Cython:

  • Apart from fused types Cython does not support generics
  • Separation into multiple modules can be a bit cumbersome
  • Fast string operations are basically impossible unless one wants to use rather unsafe ASCII-only C-strings
  • To get a very high performance, safe guards need to be disabled leading to potential memory issues (out-of-bounds, leaks, etc.)
  • In my opinion Rust is much cleaner than Cython, when low-level, typed operations are involved.

Hence, this issue should initiate the discussion if Rust code using PyO3 should be allowed in Biotite, as it has become quite mature in recent years. This would address all the issues mentioned above. More specifically, these are the places in the code base where the limitations become quite clear:

  • PDBFile: There already is fastpdb written in Rust, but it needs to be maintained separately in the moment.
  • connect_via_residue_names(): The Python string operations in this Cython function makes it quite slow and it is actually the bottleneck in pdbx.get_structure(), when the input is a BinaryCIFFile.
  • Line parsing in CIFFile: It has been addressed multiple times now to make it faster (Handle embedded quote in mmcif #619, Update _split_one_line and remove whitespace parameter #686), but this short function is still the bottleneck when reading CIF files.

Probably even more places in Biotite would benefit from routines written in Rust.
However there would also be a few disadvantages:

  • Cython is easier to learn than Rust, as it is close to Python.
  • Development would become more complex, as there would be 3 programming languages (Python, Cython, Rust), as long as Cython code still exists in the code base.
  • The build process would become more complex as the Rust compiler needs to be involved, but this should not change the user experience.
  • People cannot install Biotite from a source distribution, if they do not have a Rust compiler installed (will hopefully be solved by Implement new sdist feature to download rust toolchain PyO3/maturin#2177).

I lean towards accepting Rust in Biotite (otherwise I would not have opened this issue 😉), but I really like to hear your opinion about this @t0mdavid-m @JHKru @MaxGreil and other contributors/users with an opinion on this topic.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions