You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Tasks for llama_functions.py that targets wip/llm_intgrate branch:
make llama_functions.py test-able by pytest ./llama_functions.py. We will need independent verification of the backend features.
create "inlined" version for the gen_* functions. see patch_inlined_matmul.patch for reference. Convert the existing functions to call the inlined version to populate the functions.
check matmul implementation.
StringAttr.get("parallel") for _ in range(dims - 1) # batch + M + N
^^^^^^^^ --- should this be `dims` (no minus 1)?
setup benchmark for attention() then play around with pass fuzzing. What passes can get more performance?
Future looking tasks :
can we use linalg.matmul or even linalg.batch_matmul? The batch version can do broadcast and transpose!
linalg dialect supports tensor. can we use tensor instead of memref? Need to figure out bufferization. What is the performance of memref based vs tensor based.
Tasks for
llama_functions.pythat targetswip/llm_intgratebranch:llama_functions.pytest-able bypytest ./llama_functions.py. We will need independent verification of the backend features.gen_*functions. see patch_inlined_matmul.patch for reference. Convert the existing functions to call the inlined version to populate the functions.attention()then play around with pass fuzzing. What passes can get more performance?Future looking tasks :
linalg.matmulor evenlinalg.batch_matmul? The batch version can do broadcast and transpose!linalgdialect supportstensor. can we usetensorinstead ofmemref? Need to figure out bufferization. What is the performance ofmemrefbased vstensorbased.