-
Notifications
You must be signed in to change notification settings - Fork 24
Description
SFPU perf cpp test in LLK infra is using full init procedure for Unpacker, Math and Packer threads which is not most common use case in tt-metal OPs. More relevant use case is that we have fusion of OPs where some FPU based OP leaves results in Dest register that are later reused by SFPU. In this case usually minimum of initialization/reinitialization is done in order to support this OP. Execution phase of SFPU happens on Math thread by default and basic compute api use case have only Math thread code for SFPU related OPs - we should check if existing isolation run configs are providing relevant timings to support this default use case.
Example how we use SFPU in compute api's by default
It would be also interesting to analyze if there is any perf diff if SFPU instruction issuing is done from Packer instead of Math thread that is one perf improvement that started to be relevant on SDPA and Deeepseek. So isolation cases that initialize and execute SFPU from Packer instead of Math thread would also be interesting to support.