Welcome to the Hexagon-MLIR tutorials! These hands-on examples will guide you through the process of writing, compiling, and executing Triton kernels and PyTorch models on Qualcomm Hexagon NPUs.
📖 Start with PyTorch Tutorials
These tutorials demonstrate how to leverage Qualcomm's Hexagon NPU targets for AI workloads. You'll discover how to:
- Write Triton Kernels: Create kernels that run efficiently on Qualcomm Hexagon NPUs
- Understand the Compilation Pipeline: Follow your code from Python through multiple IR transformations to optimized machine code
- Optimize Performance: Leverage specific features like multi-threading, vector processing, and memory hierarchy optimization
- Debug and Profile: Use built-in tools to analyze and improve your kernel performance
- Use PyTorch Flow: Take PyTorch models and compile and execute in our flow
- Understand the Compilation Pipeline: Follow your code from Python through multiple IR transformations to optimized machine code
Before diving into the tutorials, make sure you have:
- ✅ Hexagon-MLIR framework installed (Installation Guide)
- ✅ Python environment with required dependencies
- ✅ Access to Hexagon hardware or simulator
- ✅ Basic understanding of Python and tensor operations