-
Notifications
You must be signed in to change notification settings - Fork 68
[Bug] Dataflow programming pass simulator test but fails at hw_emu test #572
Description
Describe the bug
When I try to implement systolic array based flashattention by dataflow programming in Allo. And I've already finished the following computation all in a single systolic array.
-
Score computation (Matrix Multiplication):
$$S = A B^T$$ -
Row-wise maximum for numerical stability:
$$m_i = \max_j (S_{ij})$$ -
Scaled exponential (with max-shifting):
$$P_{ij} = \exp\left((S_{ij} - m_i) \cdot D\right)$$ -
Row-wise sum (Denominator):
And these computation can both pass simulator and hw_emu tests.
However, after I added the following logic, the systolic array passes simulator tests and fails at hw_emu test. The output matrix O is all zeros.
5. Final Output computation:
Reproduction
My test code is https://github.com/RuizeYu05/FlashAttention-allo/blob/main/systolic/test_systolic_flash_debug.py
(In order to do software simulator test, I mark the allo.exp in above code. The fully implemented code is https://github.com/RuizeYu05/FlashAttention-allo/blob/main/systolic/test_weight_sta_exp.py)
Buggy output
The following page shows software simulator can pass:

The following page shows the output of hw_emu test:

Expected behavior
Since this systolic array can pass software simulator test, it should also pass hw_emu test.
Additional context
No response