Commit 316dbad
Add exponential backoff to Ax DB retry operations (#5104)
Summary:
Pull Request resolved: #5104
The Axolotl experiment `igfr_h2_toprank_brew_ax_tuning` failed with a MySQL OperationalError (1290) during a database failover while saving analysis cards. The MySQL server was temporarily in read-only mode during master switchover.
The existing `retry_on_exception` decorator on DB save/update functions in `with_db_settings_base.py` correctly catches `OperationalError` and retries up to 3 times, but it had no wait between retries (`initial_wait_seconds` was not set). This means all 3 retries fired immediately and all failed because the failover hadn't completed yet.
This diff adds `initial_wait_seconds=5` to all 7 retry-decorated DB operation functions. This enables exponential backoff between retries:
- 1st attempt: immediate
- 2nd attempt: after 5 second wait
- 3rd attempt: after 10 second wait
This gives MySQL failovers up to 15 seconds to complete, which should be sufficient for typical failover scenarios. The `initial_wait_seconds` parameter is already supported by the `retry_on_exception` decorator in `ax.utils.common.executils` — it was simply not being used.
Functions updated:
- `_save_experiment_to_db_if_possible`
- `_save_or_update_trials_in_db_if_possible`
- `_save_generation_strategy_to_db_if_possible`
- `_update_generation_strategy_in_db_if_possible`
- `_update_runner_on_experiment_in_db_if_possible`
- `_update_experiment_properties_in_db`
- `_save_analysis_card_to_db`
Reviewed By: mpolson64
Differential Revision: D98166115
fbshipit-source-id: 2d32aa4b26ac3e08cc95ecf8335899e75ca2c86b1 parent 6715f6e commit 316dbad
1 file changed
Lines changed: 7 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
498 | 498 | | |
499 | 499 | | |
500 | 500 | | |
| 501 | + | |
501 | 502 | | |
502 | 503 | | |
503 | 504 | | |
| |||
521 | 522 | | |
522 | 523 | | |
523 | 524 | | |
| 525 | + | |
524 | 526 | | |
525 | 527 | | |
526 | 528 | | |
| |||
550 | 552 | | |
551 | 553 | | |
552 | 554 | | |
| 555 | + | |
553 | 556 | | |
554 | 557 | | |
555 | 558 | | |
| |||
573 | 576 | | |
574 | 577 | | |
575 | 578 | | |
| 579 | + | |
576 | 580 | | |
577 | 581 | | |
578 | 582 | | |
| |||
602 | 606 | | |
603 | 607 | | |
604 | 608 | | |
| 609 | + | |
605 | 610 | | |
606 | 611 | | |
607 | 612 | | |
| |||
619 | 624 | | |
620 | 625 | | |
621 | 626 | | |
| 627 | + | |
622 | 628 | | |
623 | 629 | | |
624 | 630 | | |
| |||
635 | 641 | | |
636 | 642 | | |
637 | 643 | | |
| 644 | + | |
638 | 645 | | |
639 | 646 | | |
640 | 647 | | |
| |||
0 commit comments