Skip to content

Setup Streaming version of model for Hardware deployment #9

@bglid

Description

@bglid

Summary

Based on previous deployment attempts to the ESP32, going to try mimicking the streaming conversion shown in the original GTCRN repo to see how that affects latency and Tensor Arena allocation on hardware.


UPDATE: Will need to revisit at a later date. Currently, the current model, due to the padding setup, has really poor performance when converted to Streaming. I know what the issue is, it's the padding set in the GTConv decoder block. It looks like the best path forward is to retrain with a changed model architecture. For the sake of testing deployment and benchmarking the current offline quantized model, I will be be tabling this until later. Therefore, it's expected that the deployed version of GTCRN-Micro will have some pretty poor latency.


To-dos

  • Create a converter script for any of the Conv ops (including the TCN ones most likely)
  • Run inference with streaming version
  • Quantize the streaming variant of the model
  • Do sample deployment to hardware (either ESP32-S3 or STM32H7) to report streaming latency results on hardware

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions