Skip to content

[Bug] Fix memory OOM in Genesis push_differentiable example#228

Merged
hughperkins merged 6 commits intomainfrom
hp/test-differneiable-push-mem
Oct 7, 2025
Merged

[Bug] Fix memory OOM in Genesis push_differentiable example#228
hughperkins merged 6 commits intomainfrom
hp/test-differneiable-push-mem

Conversation

@hughperkins
Copy link
Collaborator

@hughperkins hughperkins commented Sep 25, 2025

Issue: #

Brief Summary

Before Genesis migrated to gstaichi, in Genesis-Embodied-AI/Genesis#1550, push_differentiable used to use ~22GB of memory, which we can observe by running while true; do { nvidia-smi | grep 575W; sleep 1; } done in parallel, in a second terminal session:

Screenshot 2025-09-25 at 9 20 59 AM

After migrating to gstaichi, this test used ~32GB of memory:

Screenshot 2025-09-25 at 8 18 56 AM

This PR fixes this regression, so this test uses ~22GB of memory again.

The extra memory was being used because we were parsing incoming kernel parameters multiple times. This PR modifies the code so that each kernel parameter is only processed once.

copilot:summary

Walkthrough

copilot:walkthrough

@hughperkins
Copy link
Collaborator Author

We will likely replace this with something like #229

=> draft for now.

@hughperkins hughperkins marked this pull request as draft September 27, 2025 08:14
@hughperkins
Copy link
Collaborator Author

Actually, it's not clear to me that non-static ranges should fail. So let's merge this anyway. (albeit likely with @yun-long 's suggestion incorporated to make this a bit more principled).

@hughperkins hughperkins marked this pull request as ready for review October 6, 2025 17:39
@yun-long
Copy link

yun-long commented Oct 7, 2025

LGTM

@hughperkins
Copy link
Collaborator Author

Thanks!

@hughperkins hughperkins enabled auto-merge (squash) October 7, 2025 16:01
@hughperkins hughperkins merged commit 76829cd into main Oct 7, 2025
47 checks passed
@hughperkins hughperkins deleted the hp/test-differneiable-push-mem branch October 7, 2025 16:49
@hughperkins
Copy link
Collaborator Author

whoopa, this code has a bug. Not sure how unit tests passed 🤔 Anyway, I will add create a new PR with a unit test that exercises the bug, and a fix for the bug.

@hughperkins
Copy link
Collaborator Author

@hughperkins
Copy link
Collaborator Author

(In case anyone's curious what the bug is, there shouldn't be an extend here:

_added_kwargs, _kwargs_new = extend(CallTransformer._expand_Call_dataclass_kwargs([kwarg_node]))

)

@hughperkins
Copy link
Collaborator Author

#235

YilingQiao pushed a commit to Genesis-Embodied-AI/Genesis that referenced this pull request Oct 10, 2025
ACMLCZH pushed a commit to ACMLCZH/Genesis that referenced this pull request Oct 13, 2025
jmCabrillana pushed a commit to jmCabrillana/Genesis that referenced this pull request Dec 9, 2025
Kashu7100 pushed a commit to Kashu7100/Genesis that referenced this pull request Jan 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants