the q continue target for the max reasoning step should be just the next q halt

lucidrains · lucidrains · commit 3bbc8b908bfc · 2025-07-30T11:40:08.000-07:00
diff --git a/HRM/hrm_with_act.py b/HRM/hrm_with_act.py
@@ -394,11 +394,18 @@ def forward(
 
         # q continue is learned using bellman's on max(q_halt, q_continue) of next reasoning step
 
-        q_max_halt_continue = maximum(q_halts, q_continues)
+        mask_value = -torch.finfo(q_continues.dtype).max
+
+        q_max_halt_continue = maximum(
+            q_halts[1:-1],
+            q_continues[1:-1]
+        )
+
+        q_continue_target = cat((q_max_halt_continue, q_halts[-1:])) # last step the q_continue target is just q_halt
 
         q_continue_losses = F.binary_cross_entropy(
             q_continues[:-1],
-            q_max_halt_continue[1:] * self.discount_factor, # they use a discount factor of 1., don't understand why yet
+            q_continue_target * self.discount_factor, # they use a discount factor of 1., don't understand why yet
             reduction = 'none'
         )
 
diff --git a/README.md b/README.md
@@ -2,6 +2,8 @@
 
 ## Hierarchical Reasoning Model (wip)
 
+Explorations into the proposed recurrent [hierarchical reasoning model](https://arxiv.org/abs/2506.21734) by Wang et al. from [Sapient Intelligence](https://www.sapient.inc/). Official repository is [here](https://github.com/sapientinc/HRM)
+
 ### Install
 
 ```bash
diff --git a/pyproject.toml b/pyproject.toml
@@ -1,6 +1,6 @@
 [project]
 name = "HRM-pytorch"
-version = "0.0.20"
+version = "0.1.1"
 description = "The proposal from a Singaporean AGI company"
 authors = [
     { name = "Phil Wang", email = "lucidrains@gmail.com" }