Skip to content

Transition AIETarget to use internal aie2xclbin function #459

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Jun 28, 2024

Conversation

nirvedhmeshram
Copy link
Contributor

@nirvedhmeshram nirvedhmeshram commented Jun 25, 2024

Since #420 added a internal utility to go from AIE to XCLBIN it makes sense for all flows to use that.
This utility doesn't add any AIE lowering passes so the TranslationPassPipelines are responsible for that. Hence the passes needed for the IR lowered through the AIR flow to be further lowered to AIE are added in its lowering pipeline.

There were some minor additions needed to the utility to get it working e2e for the AIE flow.

We no longer need to have a separate aie2xclbin-disable-threading as we are passing the existing context and the mlir-disable-threading flag gets automatically propagated in this case. And it actually gives an error to try changing it if the passed context is multi-threaded.

With this PR the AIE lowering is becoming a part of the IREE TranslationPassPipeline so the pack-peel lit test that was just passing the air-to-aie conversion had to be watered down to reflect what can actually be lowered to AIE.

Unfortunately, we still also need the mlir-aie wheel as the internal utility is still expecting a mlir-aie install for me_basic.o library here

@makslevental
Copy link
Collaborator

Unfortunately, we still also need the mlir-aie wheel as the internal utility is still expecting a mlir-aie install for me_basic.o library here

The thing is that me_basic.o doesn't really make sense, at least for CI:

static inline void _init() {
  // constructors are called in reverse order of the list
  for (thunk *t = &_ctors_end; t-- != chess_copy(&_ctors_start);)
    (*t)();
}

void _fini() {
  // destructors in forward order
  for (thunk *t = &_dtors_start; t != chess_copy(&_dtors_end); ++t)
    (*t)();
}

So like even though I only vaguely understand what this code accomplishes (constructors? destructors?) I'm pretty it can't possibly apply to our CI executions because that's through Peano. How the linking still succeeds I have no idea either (possibly symbol resolution is deferred to runtime and then since there is no runtime linker on the single cores and the code path is never taken there's no issue 🤷). Regardless, we should adapt LDScript to be more flexible wrt the inclusion of me_basic.o (which we now have the freedom to do...) depending on whether we're using the chess backend or Peano backend. I'm not fluent enough in LD linker scripts to do it blind (without access to an E2E smoketest on my local machine) so while I'm happy to make the change, it'll have to wait until ~Monday.

@nirvedhmeshram
Copy link
Contributor Author

looks like more lowering passes from AIE are needed (I tried a minimal set first) but it fails for large problem sizes.

@nirvedhmeshram nirvedhmeshram marked this pull request as draft June 25, 2024 21:40
@makslevental
Copy link
Collaborator

looks like more lowering passes from AIE are needed (I tried a minimal set first) but it fails for large problem sizes.

<unknown>:0: error: 'aie.tile' op allocated buffers exceeded available memory (99328>65536)

This is (I believe but @jtuyls can authoritatively answer) some fail in objectfifostatefultransform rather than a lack of a pass (indeed the data memory on a tile is 64K and so whoever/whomever/whatever has delegated that much work to some tile is mistaken).

@nirvedhmeshram
Copy link
Contributor Author

nirvedhmeshram commented Jun 26, 2024

looks like more lowering passes from AIE are needed (I tried a minimal set first) but it fails for large problem sizes.

<unknown>:0: error: 'aie.tile' op allocated buffers exceeded available memory (99328>65536)

This is (I believe but @jtuyls can authoritatively answer) some fail in objectfifostatefultransform rather than a lack of a pass (indeed the data memory on a tile is 64K and so whoever/whomever/whatever has delegated that much work to some tile is mistaken).

I think I know whats going on, there is a large size peel-pack test that worked till iree translation but not in the actual AIE lowering, now that the AIE lowering is moving into the iree translation we have problems. I will test this hypothesis by adding one of the failing tests to e2e CI and seeing what happens.

@nirvedhmeshram
Copy link
Contributor Author

Yup, I think this failure is expected when we are moving the translation pipeline to lower through AIE,
https://github.com/nod-ai/iree-amd-aie/actions/runs/9683545497/job/26719580993?pr=463#step:6:4990

@nirvedhmeshram nirvedhmeshram marked this pull request as ready for review June 26, 2024 21:07
@nirvedhmeshram
Copy link
Contributor Author

@newling @makslevental this is ready for review.

Copy link
Contributor

@newling newling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Only the one comment about the other aie2xclbin-* flags, which might also be removable.

At some point we can completely remove large parts (or all) of print_ir_aie2xclbin.sh

@nirvedhmeshram nirvedhmeshram requested a review from newling June 27, 2024 18:47
Copy link
Collaborator

@makslevental makslevental left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not gonna approve since James already asked for changes (not sure how that conflict resolution works...) but basically looks good to me modulo the two small things I commented on.

@nirvedhmeshram
Copy link
Contributor Author

I'm not gonna approve since James already asked for changes (not sure how that conflict resolution works...) but basically looks good to me modulo the two small things I commented on.

I cant make those exactly so let me know if you want me to bring in the pass myself.

@nirvedhmeshram
Copy link
Contributor Author

nirvedhmeshram commented Jun 27, 2024

I made two changes post approval so calling those out. Moved a couple of passes out of aie2xclbin utlity and into the translation pipeline. The reason for this was that in @makslevental 's flow those are not required.
I renamed the top level pipelines to demarcate the mlir-air and mli-aie pass pipelines, this way if the objectfifo work can just directly use the mlir-aie pipeline if it wants to.

@nirvedhmeshram
Copy link
Contributor Author

ouch the pass movment breaks microkernels.

@nirvedhmeshram
Copy link
Contributor Author

ok it was a pass ordering issue and wasn't directly related to micro-kernels.

@nirvedhmeshram nirvedhmeshram merged commit 8bfa5f9 into main Jun 28, 2024
2 checks passed
@nirvedhmeshram nirvedhmeshram deleted the nm_remove_wheel_dep branch June 28, 2024 00:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants