Skip to content

🐛 Clean tuples dict keys from workers_info in /api/v1/retire_workers.#8996

Merged
jacobtomlinson merged 4 commits intodask:mainfrom
fcourtial:fcourtial/fix-retire-workers-500
Sep 4, 2025
Merged

🐛 Clean tuples dict keys from workers_info in /api/v1/retire_workers.#8996
jacobtomlinson merged 4 commits intodask:mainfrom
fcourtial:fcourtial/fix-retire-workers-500

Conversation

@fcourtial
Copy link
Contributor

Fix JSON serialization error in retire_workers API endpoint

When retiring workers through the HTTP API endpoint /api/v1/retire_workers, the response includes worker metrics that contain tuple keys (e.g., digests_total_since_heartbeat). These tuple keys cannot be JSON serialized, causing a 500 error that breaks clients like the Dask Kubernetes Operator.

This PR:

  • Adds a clean_dict function to delete tuple keys during serialization
  • Preserves the dictionary structure while making it JSON-serializable

Example:

# Before - causes 500 error
{
    "metrics": {
        ("execute", "thread-cpu"): 1
    }
}

# After - properly serialized
{
    "metrics": {}
}

@fcourtial fcourtial requested a review from fjetter as a code owner January 28, 2025 17:31
@fcourtial fcourtial changed the title Fcourtial/fix retire workers 500 🐛 Clean tuples dict keys from workers_info in /api/v1/retire_workers. Jan 28, 2025
Copy link
Member

@jacobtomlinson jacobtomlinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this

@fcourtial
Copy link
Contributor Author

It should partly solve this issue: #8370

@github-actions
Copy link
Contributor

github-actions bot commented Jan 28, 2025

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

    27 files  ± 0      27 suites  ±0   11h 31m 39s ⏱️ - 12m 20s
 4 117 tests + 1   4 004 ✅ + 2    111 💤 ±0  2 ❌  - 1 
51 609 runs  +13  49 320 ✅ +14  2 287 💤 +2  2 ❌  - 3 

For more details on these failures, see this check.

Results for commit 5ffb406. ± Comparison against base commit 0f0adef.

♻️ This comment has been updated with latest results.

@jacobtomlinson
Copy link
Member

I would appreciate @fjetter or @hendrikmakait taking a look at this.

@fcourtial
Copy link
Contributor Author

One question would be, are we supposed to retire a worker that still has digests_total_since_heartbeat? I don't want to fix the symptom only.

@fcourtial
Copy link
Contributor Author

Any update by any luck?

Copy link
Member

@jacobtomlinson jacobtomlinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get this merged. I was originally concerned that other consumers of the API may be affected by this, but it may just be dask-kubernetes that uses it at this point.

Given nobody has shouted let's get this in.

@jacobtomlinson jacobtomlinson merged commit d132b03 into dask:main Sep 4, 2025
29 of 33 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants