Attention Refactor (WIP) by stefanradev93 · Pull Request #627 · bayesflow-org/bayesflow

stefanradev93 · 2026-01-29T23:39:52Z

This PR refactors the transformers module for internal consistency and directly exposes attention_mask and use_causal_mask in the relevant transformers. The following changes were made:

Transformer building blocks were moved into a dedicated attention module.
Abstract base class Transformer was added to easily tell apart transformer summaries from other summary nets.
Files were renamed to reflect semantic names.
TimeSeriesTransformer can now act as a many-to-many network (e.g., for modeling time-varying targets)

It also prepares to address ##626.

@arrjon @paul-buerkner It remains to decide how we want the attention mask passed? Should we search for it in the simulator outputs (as we do for other special args like sample_weights)? This would have the advantage that the mask can be constructed very flexibly.

codecov · 2026-01-30T00:19:06Z

Codecov Report

❌ Patch coverage is 87.03704% with 14 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
...ks/transformers/attention/induced_set_attention.py	37.50%	5 Missing ⚠️
...w/networks/transformers/time_series_transformer.py	66.66%	5 Missing ⚠️
bayesflow/approximators/continuous_approximator.py	96.00%	1 Missing ⚠️
...orks/transformers/attention/multihead_attention.py	87.50%	1 Missing ⚠️
bayesflow/networks/transformers/set_transformer.py	87.50%	1 Missing ⚠️
bayesflow/networks/transformers/transformer.py	85.71%	1 Missing ⚠️

Files with missing lines	Coverage Δ
bayesflow/networks/deep_set/deep_set.py	`100.00% <100.00%> (ø)`
bayesflow/networks/deep_set/equivariant_layer.py	`100.00% <100.00%> (ø)`
bayesflow/networks/deep_set/invariant_layer.py	`100.00% <100.00%> (ø)`
...ayesflow/networks/fusion_network/fusion_network.py	`95.38% <100.00%> (ø)`
bayesflow/networks/summary_network.py	`92.10% <100.00%> (+0.21%)`	⬆️
...etworks/time_series_network/time_series_network.py	`96.55% <100.00%> (ø)`
...esflow/networks/transformers/attention/__init__.py	`100.00% <100.00%> (ø)`
...ormers/attention/pooling_by_multihead_attention.py	`95.83% <100.00%> (ø)`
...w/networks/transformers/attention/set_attention.py	`100.00% <100.00%> (ø)`
...esflow/networks/transformers/fusion_transformer.py	`90.62% <100.00%> (ø)`
... and 6 more

... and 30 files with indirect coverage changes

paul-buerkner · 2026-01-30T08:31:09Z

Nice! Yes, I would love to search for attention_mask in the simulator output! In fact, I cannot really think of an alternative that would be nearly as nice. Is there any disadvantage to doing so?

stefanradev93 · 2026-01-30T14:15:01Z

Not any I could think of, except some overhead on the user's side which may be unavoidable. @arrjon if you concur, I will proceed with adding the search for attention mask among simulation outputs.

arrjon · 2026-01-30T17:24:24Z

Sounds good to me as it gives maximal flexibility to the user!

arrjon

LGTM

stefanradev93 · 2026-01-31T23:55:11Z

I now allowed for mask and attention_mask keyword arguments in compute_metrics of the approximator, which are expected to be part of the output (or None - default) and be propagated both to the inference and summary network, depending on the signature of each network.

The case for sampling is tricky tho, as now the user will have to provide the attention_mask or mask arguments as **kwargs instead of passing them as part of the conditions dict. This may pose some confusion, but expecting the masks to be part of the conditions invalidates our internal logic...

paul-buerkner · 2026-02-02T15:17:01Z

Can you elaborate why we cannot use conditions for this purpose?

stefanradev93 · 2026-02-03T13:56:21Z

I can solve it with a bunch of checks for now.

paul-buerkner · 2026-02-03T18:53:41Z

I don’t fully understand what you mean by this. Solve what? Making it part of conditions? Stefan Radev ***@***.***> schrieb am Di. 3. Feb. 2026 um 14:56:

…

*stefanradev93* left a comment (bayesflow-org/bayesflow#627) <#627 (comment)> I can solve it with a bunch of checks for now. — Reply to this email directly, view it on GitHub <#627 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADCW2ADNWQSQIJOVGN64ALD4KCSJVAVCNFSM6AAAAACTL6MU52VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTQNBRGQ4DEMJXGA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

stefanradev93 · 2026-02-06T15:58:03Z

@arrjon Can you please check if my latest commit enables the functionality you needed?

stefanradev93 added 2 commits January 29, 2026 18:30

Sanitize transformers / attention

9df5a7d

Remove causal mask option from set transformer

2b97414

stefanradev93 requested a review from arrjon January 29, 2026 23:40

stefanradev93 added 2 commits January 29, 2026 18:50

Improve docs and fix input shape name

588b32b

mend

c1c6d97

arrjon approved these changes Jan 30, 2026

View reviewed changes

stefanradev93 added 3 commits January 31, 2026 18:46

Add mask and attention mask options

e5bd07c

Remove kwargs chaos

5564aad

Fix typehint

581762c

Get masks from conditions in sample and log_prob

4e7b425

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attention Refactor (WIP)#627

Attention Refactor (WIP)#627
stefanradev93 wants to merge 8 commits intodevfrom
attention_refactor

stefanradev93 commented Jan 29, 2026 •

edited

Loading

Uh oh!

codecov bot commented Jan 30, 2026 •

edited

Loading

Uh oh!

paul-buerkner commented Jan 30, 2026 •

edited

Loading

Uh oh!

stefanradev93 commented Jan 30, 2026 •

edited

Loading

Uh oh!

arrjon commented Jan 30, 2026

Uh oh!

arrjon left a comment

Uh oh!

stefanradev93 commented Jan 31, 2026 •

edited

Loading

Uh oh!

paul-buerkner commented Feb 2, 2026

Uh oh!

stefanradev93 commented Feb 3, 2026

Uh oh!

paul-buerkner commented Feb 3, 2026 via email

Uh oh!

stefanradev93 commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

stefanradev93 commented Jan 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

paul-buerkner commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stefanradev93 commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arrjon commented Jan 30, 2026

Uh oh!

arrjon left a comment

Choose a reason for hiding this comment

Uh oh!

stefanradev93 commented Jan 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paul-buerkner commented Feb 2, 2026

Uh oh!

stefanradev93 commented Feb 3, 2026

Uh oh!

paul-buerkner commented Feb 3, 2026 via email

Uh oh!

stefanradev93 commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

stefanradev93 commented Jan 29, 2026 •

edited

Loading

codecov bot commented Jan 30, 2026 •

edited

Loading

paul-buerkner commented Jan 30, 2026 •

edited

Loading

stefanradev93 commented Jan 30, 2026 •

edited

Loading

stefanradev93 commented Jan 31, 2026 •

edited

Loading