Skip to content

fix(framework): Fix token orphan expiry causing run failure#6916

Open
msheller wants to merge 3 commits intomainfrom
fix-token-orphan-window
Open

fix(framework): Fix token orphan expiry causing run failure#6916
msheller wants to merge 3 commits intomainfrom
fix-token-orphan-window

Conversation

@msheller
Copy link
Copy Markdown
Member

@msheller msheller commented Apr 1, 2026

Issue

In the new RequestToken flow, it is possible for SuperLink to crash after minting a token but before setting the run status, which would cause an eventual failure when the token expires, as the PENDING run would be set to FAILED.

Proposal

Token expiry now only sets run status to FAILED if the run already started. This way, if the SuperLink crashes in the window and leaves an orphaned token, the expiry of that token won't set the run to FAILED.

In current operation, this then means that the SuperExec will loop on failure to get a token for a run until the orphaned token expires, at which point SuperExec will get a token and everything will proceed as normal.

The alternative of a completely atomic operation for minting tokens and setting run-states would be a much larger change and would only save 30 seconds.

NOTE: In review, copilot found an issue where federation_manager.report_run_usage() can be called regardless of if an UPDATE occurred. Present implementations for report_run_usage() are no-ops, so I am opting to treat it as an issue to be addressed later, as the brittleness of the correct behavior (by contract in federation_manager.report_run_usage()) is an independent problem.

Checklist

  • Implement proposed change
  • Write tests
  • Update documentation
  • Address LLM-reviewer comments, if applicable (e.g., GitHub Copilot)
  • Make CI checks pass
  • Ping maintainers on Slack (channel #contributions)

@msheller msheller marked this pull request as ready for review April 1, 2026 00:42
Copilot AI review requested due to automatic review settings April 1, 2026 00:42
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adjusts SuperLink’s token-expiry cleanup so that expiring an orphaned token does not incorrectly transition a never-started (PENDING) run into a FAILED state, addressing a crash window in the new RequestToken flow.

Changes:

  • Narrow run-failure-on-token-expiry logic in SqlLinkState to only affect runs that have actually started.
  • Add a regression test ensuring token expiry does not fail a PENDING run and that the run can be re-claimed after token cleanup.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
framework/py/flwr/server/superlink/linkstate/sql_linkstate.py Restricts the token-expiry run-failure UPDATE to runs with starting_at != '' to avoid failing PENDING runs.
framework/py/flwr/server/superlink/linkstate/linkstate_test.py Adds a test validating that token expiry doesn’t fail a never-started run and that a new token can be minted afterward.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@msheller msheller requested a review from Copilot April 1, 2026 02:07
@github-actions github-actions bot added the Maintainer Used to determine what PRs (mainly) come from Flower maintainers. label Apr 1, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Member

@panh99 panh99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Maintainer Used to determine what PRs (mainly) come from Flower maintainers.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants