All resources (slides, code, etc) for All Things Open 2025: TinyML Meets PyTorch: Deploying AI at the Edge with Python Using ExecuTorch and TorchScript
Please see the session presentation and recording for details.
This session discusses and demonstrates how you can take existing, pre-built AI/ML/LLM models meant to run on servers with GPU resources and run for inference on an Edge or IoT device without building an MLOps or training pipeline snowflake. This is achieved by quantizing these pre-built models and exporting them using either ExecuTorch or TorchScript.
There are two methods discussed in this session:
ExecuTorch- exports to the ExecuTorch Runtime, which is best suited for a "modern" operating system for Edge devices (like Ubuntu, Android, iOS, etc).TorchScript- exports to a native C++ runtime, which is best suited for Edge and IoT devices (like Raspberry Pi, ESP32, Arduino, etc).
Each of the two processes above will demonstrate, by example, how to quantize a model to run on CPU-only resources for inference on their respective devices.
To quantize and export using ExecuTorch, visit the ./demos/executorch/README.md.
To quantize and export using TorchScript, visit the ./demos/torchscript/README.md.