Skip to content

Arm-Examples/executorch-grid-sample-export

Repository files navigation

Grid Sampler Export Bundle

This bundle contains the code needed to export an ExecuTorch edge program and build a scenario for models that use grid_sample.

Requirements

  • Tested with Python 3.10 and 3.12 on Linux and Windows.
  • Your own model file. The shipped sample_model.py works out of the box with the included sample_inputs/, and you can adapt it to your own model or copy the relevant snippets into an existing model script.
  • A Python environment with the package versions from requirements.txt.
    • Please note that a development version of executorch is needed and several 'monkey-patches' are applied (see arm_backend_monkey_patch.py)
  • model-converter and flatc on PATH. In a typical Python environment they are installed under the environment's site-packages/model_converter/binaries/bin and site-packages/bin directories.
  • scenario-runner on PATH
    • Installing requirements.txt also installs the tested pip package that provides this CLI. That package was validated against the bundled scenarios in this release.
    • Some more advanced scenarios may still require a newer custom build with broader format support. For example, scenarios that rely on additional tensor/image formats beyond the bundled examples may need that path.
    • The CMake build flag SCENARIO_RUNNER_EXPERIMENTAL_IMAGE_FORMAT_SUPPORT=ON may still be required for custom builds, depending on the image formats used by your scenario.
  • If the target device does not have native Data Graph Arm extension support, the runtime also needs the appropriate emulation layers enabled. See https://github.com/arm/ai-ml-emulation-layer-for-vulkan.

Typical setup

Bash example:

python3 -m venv .venv
. .venv/bin/activate
python -m pip install -r requirements.txt

SITE_PACKAGES="$(python -c 'import site; print(site.getsitepackages()[0])')"
export PATH="$SITE_PACKAGES/model_converter/binaries/bin:$SITE_PACKAGES/bin:$PATH"

python sample_model.py
scenario-runner --scenario artifacts/scenario/scenario.json --output artifacts/scenario --log-level debug

The script resolves its sample inputs and artifact output relative to its own location, so it can be run from any current working directory. It exports and builds artifacts/scenario/scenario.json, then prints a sample scenario-runner command line you can run manually. The bundled example applies a 3x3 blur, rotates with grid_sample, then applies a 3x3 sharpen filter. It uses INT8 image IO and INT16 symmetric grid coordinates.

Profiling output

scenario-runner can dump per-dispatch timing data to JSON:

scenario-runner \
  --scenario artifacts/scenario/scenario.json \
  --output artifacts/scenario \
  --profiling-dump-path artifacts/scenario/profiling.json \
  --log-level debug

The resulting profiling.json contains a Timestamps array and, when Data Graph pipelines are present, a Memory Usage array.

  • Timestamps contains one entry per profiled dispatch with Command type, cycle counts before and after the dispatch, Cycle count for command, Timestamp Period, Time for command [ms], and Iteration.
  • Time for command [ms] is already converted from the cycle delta using the reported Vulkan timestamp period, so that is usually the easiest field to read directly.
  • Memory Usage contains one entry per Data Graph pipeline session with Session memory [bytes].

The most important point is that the Timestamps entries are appended in the same execution order that scenario-runner registers dispatches from scenario.json. Read the profiling dump as an execution-ordered dispatch trace, with two important details:

  • DispatchCompute and DispatchSpirvGraph each produce one timestamp entry.
  • DispatchDataGraph currently produces one timestamp entry in the scenarios generated by this bundle. Internally the runner can expand a data-graph command into one entry per VGF segment, so future multi-segment VGF files may produce multiple consecutive entries instead.
  • DispatchBarrier and MarkBoundary commands do not produce timestamp entries.

So the profiling dump follows dispatch order, not raw top-level command count. If you want to line entries up with scenario.json, count only dispatch commands. For the scenarios generated by this bundle, each DispatchDataGraph maps to one profiling row.

One current quirk of the runner implementation is that Timestamps[].Iteration is written as one-based, while Memory Usage[].Iteration is written as zero-based.

Practical-RIFE example

The bundle also includes a Practical-RIFE integration under practical_rife_model.py. It vendors the small patched runtime subset of Practical-RIFE needed by the export flow, but it does not include the upstream trained weights.

To run it:

  1. Download the upstream Practical-RIFE 4.25 weights as either flownet.pkl or rife-4.25.zip.
  2. Install them into the bundle:
python install_practical_rife_weights.py --weights-file /path/to/flownet.pkl

or

python install_practical_rife_weights.py --weights-archive /path/to/rife-4.25.zip
  1. Export the bundled Practical-RIFE example:
python practical_rife_model.py --mode quantized

By default, practical_rife_model.py uses the bundled Practical-RIFE/demo/I0_0.png and Practical-RIFE/demo/I0_1.png inputs and writes artifacts under artifacts/model_practical_rife/. You can override the inputs or installed weights with --input0, --input1, --timestep-path, and --weights-path.

The Practical-RIFE path relies on the advanced exporter controls documented below: INT16 grid-position quantization, grid_sampler_grid_input_qspec, and named module_name_quant_configs overrides for the internal grid-building submodules.

What changed relative to upstream Practical-RIFE

The bundled Practical-RIFE code is not a verbatim copy of upstream. The export flow needs a few targeted changes so the grid-building path can be exported and quantized cleanly:

  • In model/warplayer.py, the grid-building path is split into a small set of named modules so the exporter can apply per-module INT16 grid-domain overrides while keeping the grid packing close to the upstream cat(...) structure.
  • In train_log/IFNet_HDv3.py, the grid-position math is kept behind one named child module and the mixed-range cat(...) inputs are fed through concat-local copies before the concat.

These changes are needed because the export path applies the special INT16 grid-position qparams through named per-module overrides. Upstream Practical-RIFE does not expose the grid-building stages in a way that lets the exporter place those qparams precisely. The concat-local copies are also important because PT2E cat(...) tends to force all inputs into one shared activation domain, which would otherwise pollute the image and feature branches with poorer qparams and hurt the exported result.

Advanced Model Pattern

Use the simple bundled example as the starting point for models where the sampling grid already appears as an explicit model input. For more complex models, export_model(...) exposes a few controls that matter:

  • quantized_input_qspecs and quantized_output_qspecs are positional. Provide one entry per graph input or output when PT2E quantization is enabled, and use None for positions that should stay unquantized at the graph boundary.
  • Grid tensors should usually stay on INT16 symmetric SNORM (scale=1.0 / 32767.0, zero point 0) rather than the normal INT8 image domain.
  • Prefer INT16 for grid_sample position tensors. INT8 grid positions lose too much coordinate precision in practice and usually cause visibly worse output quality.
  • Use grid_sampler_grid_input_qspec=... when the sampling grid reaches grid_sample as a graph input or other explicit boundary tensor.
  • Use module_name_quant_configs={...} when the model builds the sampling grid internally and you need specific submodules to produce or preserve the INT16 grid domain before grid_sample.

In practice, many real models need both patterns: positional IO qspecs for graph inputs and outputs, plus module_name_quant_configs to keep internal grid-building math in the intended quantized domain.

Modeling note

When building coordinate tensors for grid_sample, prefer torch.linspace(...) over torch.arange(...). In practice, arange often introduces int64 ops into the exported graph, and those do not lower cleanly through TOSA.

How It Works

  • export_executorch.py captures the model, applies the Arm/ExecuTorch export flow, and partitions the graph into regular NN segments and grid_sample segments.
  • Regular NN segments are lowered through the Arm VGF path and emitted as section_*.vgf binaries.
  • grid_sample segments are lowered by the custom backend into compute-shader work rather than VGF sections.
  • export_scenario.py turns those lowered pieces into one scenario.json that references all VGF sections, shader work, tensors, and intermediate resources together.
  • sample_model.py supplies the sample inputs, runs export, and leaves everything under artifacts/ so the scenario can be executed later with scenario-runner.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages