Skip to content

Commit 3bbc8b9

Browse files
committed
the q continue target for the max reasoning step should be just the next q halt
1 parent 6483ded commit 3bbc8b9

File tree

3 files changed

+12
-3
lines changed

3 files changed

+12
-3
lines changed

HRM/hrm_with_act.py

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -394,11 +394,18 @@ def forward(
394394

395395
# q continue is learned using bellman's on max(q_halt, q_continue) of next reasoning step
396396

397-
q_max_halt_continue = maximum(q_halts, q_continues)
397+
mask_value = -torch.finfo(q_continues.dtype).max
398+
399+
q_max_halt_continue = maximum(
400+
q_halts[1:-1],
401+
q_continues[1:-1]
402+
)
403+
404+
q_continue_target = cat((q_max_halt_continue, q_halts[-1:])) # last step the q_continue target is just q_halt
398405

399406
q_continue_losses = F.binary_cross_entropy(
400407
q_continues[:-1],
401-
q_max_halt_continue[1:] * self.discount_factor, # they use a discount factor of 1., don't understand why yet
408+
q_continue_target * self.discount_factor, # they use a discount factor of 1., don't understand why yet
402409
reduction = 'none'
403410
)
404411

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22

33
## Hierarchical Reasoning Model (wip)
44

5+
Explorations into the proposed recurrent [hierarchical reasoning model](https://arxiv.org/abs/2506.21734) by Wang et al. from [Sapient Intelligence](https://www.sapient.inc/). Official repository is [here](https://github.com/sapientinc/HRM)
6+
57
### Install
68

79
```bash

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[project]
22
name = "HRM-pytorch"
3-
version = "0.0.20"
3+
version = "0.1.1"
44
description = "The proposal from a Singaporean AGI company"
55
authors = [
66
{ name = "Phil Wang", email = "lucidrains@gmail.com" }

0 commit comments

Comments
 (0)