llama_functions.py backend tasks

Tasks for `llama_functions.py` that targets `wip/llm_intgrate` branch:

- [ ] make `llama_functions.py` test-able by `pytest ./llama_functions.py`. We will need independent verification of the backend features.
- [x] create "inlined" version for the `gen_*` functions.  see [patch_inlined_matmul.patch](https://github.com/user-attachments/files/22395630/patch_inlined_matmul.patch) for reference. Convert the existing functions to call the inlined version to populate the functions.
- [ ] check matmul implementation.
    ```
    StringAttr.get("parallel") for _ in range(dims - 1)  # batch + M + N
                                              ^^^^^^^^ --- should this be `dims` (no minus 1)?
    ```
- [x] setup benchmark for `attention()` then play around with pass fuzzing. What passes can get more performance?

*Future looking tasks* :
- [ ] can we use `linalg.matmul` or even `linalg.batch_matmul`? The batch version can do broadcast and transpose!
- [ ] `linalg` dialect supports `tensor`. can we use `tensor` instead of `memref`? Need to figure out bufferization. What is the performance of `memref` based vs `tensor` based. 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama_functions.py backend tasks #41

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

llama_functions.py backend tasks #41

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions