Skip to content

Support native Python collections #2227

@tbennun

Description

@tbennun

Using Structures and the nanobind Python/C++ binding library, we can outline a path to support tuples, dictionaries, and lists natively in SDFGs. The use of nanobind for Python-called functions may also reduce the overhead of CompiledSDFG calls, as a corollary.

A potential plan can be:

First, refactor dace/data.py into a dace/data/... folder that contains pydata.py (as well as tensor.py and other files).

In pydata, implement the following data container types:

  • class PythonList(Array): Represented as a 1D array but implemented/code-generated as nb::list.
  • class PythonTuple(Array): Same as list.
  • class PythonClass(Structure): Represents a general Python class. Accessing fields in the class act similarly to Structure fields - to access, simply add a connector with the field's name. This also solves an issue where scalar fields in objects cannot be updated when using DaCe.
  • class PythonDict(Structure): Similarly to classes, dictionary keys can be encoded as connectors. Control flow structures could iterate over items/keys/values (a PythonDictIterator data container, a subclass of PythonGenerator, might be introduced for this purpose).
    • Alternatively, and more generally (preferred!), PythonDict could extend a general KeyValueStore data container type that we will generically introduce to DaCe.
  • class PythonGenerator(Stream): General stateful memory that, upon accessing with a read memlet, will generate a different value every time. The semantics are similar to DaCe streams (FIFO queues). Edges that write into a generator (without a set connector) are disallowed.

The data container classes are strongly typed (i.e., PythonDict has a specific Data entry for key and value). In order to deal with Pythonic weak typing, a PythonUnion data container class might be introduced, but discouraged. nanobind should be good at throwing exceptions if we evaluate the wrong type at runtime.

Code generation will be adapted to emit nb::dict/nb::array/nb::list/nb::object etc. Classes will contain fields that are captured at marshalling time.

Note that this solution is not intended to generate the highest-performing code, but in order to create useful shims to/from existing Python codes.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions