Stochastic Gating #862

kentslaney · 2025-05-07T04:47:12Z

If anyone has more GPUs than ideas, I'd appreciate this being tried (and getting feedback from it). It trains on a small scale and doesn't immediately diverge in loss from the original, but my hope is that it might mitigate mode collapse, which happens late and at scale. That being said, it's a negative result so far. If I get around to trying it at scale myself, I'll update the thread.

Thoughts and discussion without results is welcome as well.

I also have a standalone implementation for anyone without a training setup

stochastic gating

20c8b4f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Stochastic Gating #862

Stochastic Gating #862

Uh oh!

kentslaney commented May 7, 2025 •

edited

Loading

Uh oh!

Uh oh!

Stochastic Gating #862

Are you sure you want to change the base?

Stochastic Gating #862

Uh oh!

Conversation

kentslaney commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

kentslaney commented May 7, 2025 •

edited

Loading