Skip to content

Re-tooling for orchestration, debug, cache, runtime interruption: the need for a swift brush of genius. #263

@FlorianDeconinck

Description

@FlorianDeconinck

A design to rule them all, a design to find them, a design to bring them all and in the darkness bind them.

NDSL is a transpiler and in such needs to keep a tight hand on code funneled throughout it's system. When working with stencils only, the code is held by the FrozenStencil class, itself a wrapper around the gt4py.stencil.

When running orchestration, the problem thickens. Orchestration collection of subsystems relates to the whole program optimizer relying on DaCe, it has a few parts:

  • A GT4Py-DaCe bridge, carried by gt4py
  • A monkey-patch function/method interruption (function orchestrate) to carry the required work of feeding code and closure to DaCe
  • A distributed build system to make sure we build only the required ranks on the cube-sphere + a two-step build and run
  • An "optimization" pipeline, scheduled to grow, with bespoke NDSL passes
  • A collection of "callbacks" and other dace_inhibitor flagged calls to escape the DaCe parser into pure python code at runtime
  • Caches for various SDFGs
  • Override for bespoke operatiors and symbols to feed DaCe parser when the meaning is not obvious (NDSL concepts)

All of those subsystems are loose. The orchestrate functions triggers a series of event that will pull on each of those when needed.

On top of this, new needs for performance timing, benchmarking and runtime debugging are emerging which would need to be able to:

  • inject code
  • save data out (in memory or in vram)
  • save code or memory info out
  • produce re-usable executable on the fly

The above laundry list of hell should convince the reader that we are well down the road of spaghetti design. It is time to clean up the pile to tech debt.

We need a clean way to capture runtime code - independent of orchestration - that will serve as a platform to

  • do code & system injection for code-less operations: debug, timing
  • direct parser, including DaCe parsing, and command their swift execution (caches)
  • and in general allow advance capacity around the transpiler

Like gt4py.stencil we need a ndsl.runtime or equivalent to refactor all the above.

Metadata

Metadata

Assignees

No one assigned

    Labels

    EnhancementNew feature or requestRefactorTechnical internal work to improve systems

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions