Skip to content

[FEATURE] Get hibernation compiling and working#1542

Merged
duburcqa merged 25 commits intoGenesis-Embodied-AI:mainfrom
gasnica:pr/25/811_hibernation
Aug 15, 2025
Merged

[FEATURE] Get hibernation compiling and working#1542
duburcqa merged 25 commits intoGenesis-Embodied-AI:mainfrom
gasnica:pr/25/811_hibernation

Conversation

@gasnica
Copy link
Contributor

@gasnica gasnica commented Aug 12, 2025

Description

Changes:

  • use_contact_islands now compiles
  • use_hibernation now compiles and runs
  • hibernation and waking up is done per-island, instead of per-entity
    • on hibernation, a persistent island is saved; this is done by linking all entities into a circular-list via a new field in ContactIsland
  • rigid_debug.py::Debug class is added to expose assertf() and validate flag (which enables additional validation code that slows sim down)
  • validation functions are put into rigid_validate.py
  • some rarely used functions got longer names by appending '__description_of_what_they_really_do'

Bug fix:

  • this also fixes an out-of-bounds memory write, when island-building stack is overflown

Note:

  • the hibernation code currently uses the original data_oriented classes, and is not migrated and ready to use ndarrays

Other things:

  • num atomic_adds got reduced in some hibernation/wakeup code
  • some variables were renamed to be consistent with code elsewhere -- we can revert that in this PR if you decide this adds confusion. Some such changes:
    • island -> i_island
    • ContactIsland.n_edge -> *.n_edges
    • ContactIsland.n_island -> *.n_islands
  • a good bit of linting was added, some comments
  • RigidSolver specifies _entities type explicitly, because it should override the more generic type of the base class. Not sure how to better do it.

Related Issue

Motivation and Context

Speedup simulation

How Has This Been / Can This Be Tested?

Used runtime asserts & validation to verify all is working on a test scene of 100+ object, hibernating and unhibernating due to new collisions.

Screenshots (if appropriate):

Screenshot 2025-08-11 at 8 55 05 PM

Checklist:

  • I read the CONTRIBUTING document.
  • I followed the Submitting Code Changes section of CONTRIBUTING document.
  • I tagged the title correctly (including BUG FIX/FEATURE/MISC/BREAKING)
  • I updated the documentation accordingly or no change is needed.
  • I tested my changes and added instructions on how to test it for reviewers.
  • I have added tests to cover my changes.
  • All new and existing tests passed.

) # Island already expected to be marked as not hibernated

is_entity_fixed = island_idx == -1
Debug.assertf(0x7AD00007, not is_entity_fixed) # Fixed entities are excluded from hibernation logic
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using a global variable like Debug might be tricky and dangerous for caching? @hughperkins

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here, the Debug class is just used to wrap the assert functionality, and assertf is a @classmethod. And I'm also referencing Debug.validate in code -- which is a class-level property, so effectively a global var

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using a global variable like Debug might be tricky and dangerous for caching?

Still, I think @YilingQiao is getting a point. This approve won't be supported anymore I think. But I should be ok for the new approach I suggest, which evaluates function pointers at compile time?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: "printing": remidner that the mere presence of a print in the code can double the runtime latency. We saw this for the func narrow phase kernels: I commented out some conditional prints, that weren't even being executed, and halved the speed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

*halved the latency

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, so, to be fair, this isn't my codebase, so I shouldn't really be expressing an opinion on this point concerning whether to keep the code in the codebase for now or not. I should be strictly keeping to the remit of providing consultation on the extent to which the proposed changes are compatible with future versions of GsTaichi. So yeah, @gasnica you can ignore my opinion as far as keeping the code in the Genesis codebase or not (and it sounds like Alexis is more favorable than I am to your keeping the code here for the time being :) )

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now, there are only two ti.static(False). Can we delete them and only add them back when we systematically add the Debug mechanism?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So yeah, @gasnica you can ignore my opinion as far as keeping the code in the Genesis codebase or not (and it sounds like Alexis is more favorable than I am to your keeping the code here for the time being :) )

First, I'm asking for your opinion. It is not "your" codebase, but it does not make your opinion less valuable. Just less binding. Seconds, I'm favorable to nothing except making life easier for maintainer and developers. I don't know if this debug feature would help actually. I'm just listening to the pros and cons and trying to make sure we are aligned with the standpoint (like running in debug all the unit tests systematically). That is all.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@duburcqa

So, there are two things being debated:

  • use of Debug
  • allowing code that isn't being used in the codebase

You are agreeing with me that I have standing on the first point, the use of Debug in the codebase.

However, in this thread I'm stating a strong opinion against putting things in the codebase that arent being used, and I feel I don't have standing on this point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, Yiling, removing the ti.static(False) -- I missed those two places when removing everything else.

@duburcqa
Copy link
Collaborator

duburcqa commented Aug 13, 2025

import taichi as ti


class Debug:
    debug = False

    id_registry = set()

    @classmethod
    def assertf(cls, id_: int, cond: Any, msg: Any) -> None:
        assert id_ not in cls.id_registry
        cls.id_registry.add(id_)

        @ti.func
        def _impl_assert():
            if ti.static(cls.debug):
                if not cond:
                    print(f"[{ti.u32(id_):#.8x}] ERROR:", msg)

        return _impl_assert


@ti.kernel
def caller(i: ti.i32, val: ti.template()):
    ti.static(Debug.assertf(0x00000001, i > 0, "i not positive"))()
    for j in range(i):
        ti.static(Debug.assertf(0x00000002, j < 5, "j >= 5"))()
    for I in ti.grouped(ti.ndrange(*val.shape)):
        ti.static(Debug.assertf(0xD0000003, (val[I] > 5).all(), f"val[I] <= 5: {val[I]}"))()


ti.init(debug=True)

Debug.debug = True

val = ti.field(ti.types.vector(3, ti.i32), shape=(2, 4))
val.fill(10)
val[0, 0][0] = 2
caller(10, val)
val[1, 0][0] = 3
caller(10, val)

@hughperkins
Copy link
Collaborator

(untagging myself from this PR, so please re-tag me if you respond to any of my comments)

@gasnica gasnica force-pushed the pr/25/811_hibernation branch from c5ac775 to ed8c68e Compare August 13, 2025 20:17
static_rigid_sim_config: ti.template(),
collider_state: array_class.ColliderState,
):
cd_geom_a = ti.static(collider_state.contact_data.geom_a)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep as it was, i.e. directly using collider_state.contact_data.geom_a
I'm not sure if py dataclass support this kind of alias @hughperkins

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@YilingQiao Not that we are using this pattern a lot in SAP coupler I think.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, we need to fix that for SAP coupler. I did not touch that code otherwise it will create a lot of conflicts with Ziheng's ongoing work.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, assigning sub structs to variables isn't supported, though not because I've encountered any reason why it might be hard, simply because the consensus was that we don't need to do this currently. I'm not averse to investigating adding support for it, if there is a need for assigning sub structs to variables. (I dont know if it will turn out to be hard or not, since haven't looked into it)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverting this change for now,

and disabling formatting locally to avoid messy auto-formatting.

) # Island already expected to be marked as not hibernated

is_entity_fixed = island_idx == -1
Debug.assertf(0x7AD00007, not is_entity_fixed) # Fixed entities are excluded from hibernation logic
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So yeah, @gasnica you can ignore my opinion as far as keeping the code in the Genesis codebase or not (and it sounds like Alexis is more favorable than I am to your keeping the code here for the time being :) )

First, I'm asking for your opinion. It is not "your" codebase, but it does not make your opinion less valuable. Just less binding. Seconds, I'm favorable to nothing except making life easier for maintainer and developers. I don't know if this debug feature would help actually. I'm just listening to the pros and cons and trying to make sure we are aligned with the standpoint (like running in debug all the unit tests systematically). That is all.

@duburcqa duburcqa merged commit 1563bc6 into Genesis-Embodied-AI:main Aug 15, 2025
18 checks passed
winnieyangwannan pushed a commit to winnieyangwannan/Genesis that referenced this pull request Sep 14, 2025
Kashu7100 pushed a commit to Kashu7100/Genesis that referenced this pull request Jan 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants