36% inference performance penalty with remat_scan #3222

enolan · 2023-07-24T21:58:00Z

enolan
Jul 24, 2023

If I switch my model from running transformer blocks in a for loop to running them with remat_scan I eat a pretty big performance hit - 36%. See this Colab, particularly when I call run_transformer_noscan_j and run_transformer_withscan_j. Is this to be expected? In my actual model decoding is also ~3x slower, though I can't replicate that in the Colab. Might have to do with me having written a traceable version for the real model.

So, is this a bug? How much of a penalty should I expect to eat using (remat_)scan? What's the best way to get a decent balance of compile time and run times? The scanless version of my model takes ~12 minutes to JIT on a single GPUs and long enough on 8 that I've always given up before it finished, so I need to do something.

cgarciae · 2023-07-25T14:32:25Z

cgarciae
Jul 25, 2023
Maintainer

Not sure the exact number you should expect, but with remat you are trading speed for memory, you should only use it if you are getting OOM during backprop.

2 replies

marcvanzee Jul 25, 2023
Maintainer

To add to this: scan over layers is mainly useful to shorten the compilation time of your model. It basically creates a single XLA op that is more easy to compile by XLA than all the separate ops that you get if you use a native Python for loop.

Perhaps you can try benchmarking the compilation times? They should be shorter for deep models.

enolan Jul 25, 2023
Author

Why would remat do anything when I'm not differentiating? AIUI it's about storing fewer intermediate values for backprop, and there's no backprop happening here.

I know that the scan functions are for speeding up compilation time, I just expected the runtime penalty to be smaller. 36% seems like a lot.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

36% inference performance penalty with remat_scan #3222

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

36% inference performance penalty with remat_scan #3222

Uh oh!

enolan Jul 24, 2023

Replies: 1 comment · 2 replies

Uh oh!

cgarciae Jul 25, 2023 Maintainer

Uh oh!

marcvanzee Jul 25, 2023 Maintainer

Uh oh!

enolan Jul 25, 2023 Author

enolan
Jul 24, 2023

Replies: 1 comment 2 replies

cgarciae
Jul 25, 2023
Maintainer

marcvanzee Jul 25, 2023
Maintainer

enolan Jul 25, 2023
Author