Moment infini #76

PotosnakW · 2025-03-27T04:21:18Z

New files:

momentfm/models/moment.py: supports infini channel mixing with config.infini_channel_mixing boolean and config.n_series (number of individual time series). Only support forecasting currently.
momentfm/utils.t5_infini.py: contains 'T5InfiniModel' class. if config.infini_channel_mixing==True then T5InfiniAttention is used, else the default T5Attention is used.

JanekDev

I checked the implementation with the paper and everything looks good! I left some minor stylistic comments + a small comment on positional bias. Additionally, it's good that infini-moment was moved to the other file.

JanekDev · 2025-03-27T15:00:25Z

momentfm/models/moment.py


-        x_enc = self.tokenizer(x=x_enc)
+        batch_size, n_channels, seq_len = x_enc.shape


I suggest unifying n_channels and n_series.

JanekDev · 2025-03-27T15:07:07Z

momentfm/models/moment.py

+        x: [batch_size x n_channels x n_patches x d_model]
+        output: [batch_size x n_channels x forecast_horizon]
+        """
+        x = self.flatten(x)   # x: [batch_size, n_series, n_patches, d_model]


suggesting unification of n_channels and n_series

JanekDev · 2025-03-27T16:41:52Z

momentfm/utils/t5_infini.py

+            if not self.has_relative_attention_bias:
+                position_bias = torch.zeros(
+                    (1, self.n_channels, self.n_heads, seq_length, key_length), device=hidden_states.device, dtype=hidden_states.dtype
+                ) # Willa - should we use n_channels or just 1?


in the original implementation by Nina there is no channel axis, so it gets probably broadcasted and position biases are shared between channels, hence there should be probably 1?

JanekDev · 2025-03-27T16:57:07Z

momentfm/utils/t5_infini.py

+        # Vectorized infini attention computation across channels
+        sigma_k = self.elu(key_states) + 1.0  # [batch_size, n_series, n_heads, n_patch, dim]
+        sigma_k_transposed = sigma_k.transpose(-2, -1) # [batch_size, n_series, n_heads, dim, n_patch]
+        memory_matrix = torch.matmul(sigma_k_transposed, value_states).sum(dim=1).unsqueeze(1) # [batch_size, 1, n_heads, dim, dim] sum over channels then unsqueeze to enable broadcasting over channels


for the purpose of making it easier to understand- can we split the computation of memory matrix into memory updates and only then sum them in the separate line? Implementation looks correct btw!

JanekDev · 2025-03-27T17:08:25Z

momentfm/utils/t5_infini.py

+        z = sigma_k.sum(dim=-2).unsqueeze(-1).sum(dim=1) # [batch_size, n_heads, dim, 1] sum over sequence length and channels
+        z = z.unsqueeze(dim=1) # [batch_size, 1, n_heads, dim, 1]
+        sigma_q = self.elu(query_states) + 1.0 # [batch_size, n_series, n_heads, n_patch, dim]
+        A_mem = (sigma_q @ memory_matrix) / ((sigma_q @ z) + 1e-6) # [batch_size, n_series, n_heads, n_patch, dim]/[batch_size, n_series, n_heads, n_patch, 1] --> [batch_size, n_series, n_heads, n_patch, dim] Adding 1e-6 for preventing division to 0


maybe split this too?

Willa Potosnak added 2 commits March 26, 2025 18:00

infini

1c1a80d

infini bool check

35f72fd

mononitogoswami requested review from mononitogoswami, JanekDev and zukowskanina March 27, 2025 13:14

infini_moment

000768a

JanekDev approved these changes Mar 27, 2025

View reviewed changes

suggestion updates

b76cdb8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Moment infini #76

Moment infini #76

Uh oh!

PotosnakW commented Mar 27, 2025 •

edited

Loading

Uh oh!

JanekDev left a comment •

edited

Loading

Uh oh!

JanekDev Mar 27, 2025

Uh oh!

JanekDev Mar 27, 2025

Uh oh!

JanekDev Mar 27, 2025

Uh oh!

JanekDev Mar 27, 2025

Uh oh!

JanekDev Mar 27, 2025

Uh oh!

Uh oh!


		x_enc = self.tokenizer(x=x_enc)
		batch_size, n_channels, seq_len = x_enc.shape

Moment infini #76

Are you sure you want to change the base?

Moment infini #76

Uh oh!

Conversation

PotosnakW commented Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JanekDev left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

JanekDev Mar 27, 2025

Choose a reason for hiding this comment

Uh oh!

JanekDev Mar 27, 2025

Choose a reason for hiding this comment

Uh oh!

JanekDev Mar 27, 2025

Choose a reason for hiding this comment

Uh oh!

JanekDev Mar 27, 2025

Choose a reason for hiding this comment

Uh oh!

JanekDev Mar 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

PotosnakW commented Mar 27, 2025 •

edited

Loading

JanekDev left a comment •

edited

Loading