Skip to content

[WIP] Improve variance scaling and reduce perplexity in transformers#22

Draft
Copilot wants to merge 2 commits into
mainfrom
copilot/reduce-perplexity-transformers
Draft

[WIP] Improve variance scaling and reduce perplexity in transformers#22
Copilot wants to merge 2 commits into
mainfrom
copilot/reduce-perplexity-transformers

Conversation

Copilot AI commented May 20, 2026

Copy link
Copy Markdown
  • Analyzed transformer v1 and v2 implementations
  • Fix MhaResidualNode.forward() in transformer_v2.py:
    • Add causal context scaling for softmax averaging (sqrt(eff_ctx))
    • Add 1/√2 residual scaling to prevent variance doubling
  • Fix Mlp2ResidualNode.forward() in transformer_v2.py:
    • Add 1/√2 residual scaling to prevent variance doubling
  • Fix create_deep_transformer() in transformer_v2.py:
    • Use proper per-node initialization (dimension-appropriate init for each node type instead of a single global std)
  • Fix SkipConnection.forward() in skip_connection.py:
    • Add 1/√N scaling for N inputs (prevents variance doubling in v1 transformer with external skip connections)
  • Add variance tests for v2 transformer nodes
  • Run existing tests to confirm no regressions

Copilot AI linked an issue May 20, 2026 that may be closed by this pull request
Agent-Logs-Url: https://github.com/trueagi-io/FabricPC/sessions/31914e42-f5c5-4dda-8439-fc8ad09dffab

Co-authored-by: layomia <9868994+layomia@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Transformer demo: reduce the perplexity

2 participants