-
Notifications
You must be signed in to change notification settings - Fork 148
Fix: Ban problematic orders instead of tokens on simulation failures #3949
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
All contributors have signed the CLA ✍️ ✅ |
|
I have read the CLA Document and I hereby sign the CLA |
| pub struct Detector { | ||
| failure_ratio: f64, | ||
| required_measurements: u32, | ||
| counter: Arc<DashMap<order::Uid, OrderStatistics>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The order needs to be dropped from this cache once it is executed. We shouldn't accumulate all the failed orders here indefinitely.
| .and_modify(|counter| { | ||
| counter.attempts += 1; | ||
| counter.fails += u32::from(failure); | ||
| }) | ||
| .or_insert_with(|| OrderStatistics { | ||
| attempts: 1, | ||
| fails: u32::from(failure), | ||
| flagged_unsupported_at: None, | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And we shouldn't accumulate successful orders here for sure.
Disclaimer: I implemented this feature mostly using claude, so please carefully review. The logic fills good now to me. Also, feel free to take ownership of this PR and apply commits on top.
Problem
The current metrics-based bad token detection incorrectly bans entire tokens when settlement simulations fail, even though failures are often caused by solver-specific issues, not the tokens themselves. This leads to legitimate tokens like
WETH or USDC being banned (for specific colocated solvers).
Example scenario:
Settlement simulations can fail for many reasons unrelated to token quality:
Solution
Implement order-level banning with metrics-based detection tracking failures per order UID instead of per token.
Changes
bad_ordersmodule mirroringbad_tokensstructureorder_uids()to solutions for trackingbad_orders_detectedPrometheus metricConfiguration
Note: Simulation-based bad token detection (which directly tests token behavior) remains unchanged and continues to work correctly.
Caveats
I think this PR improves things, preventing to blame tokens for an issue with a specific order. We can have now a similar issue with one problematic order making some other order to be banned because they are part of the same solution.
In practice, this affects only to one solver (is done in the driver), and I don't think will happen often. However, future PRs could enhance the granularity and try to simulate things in isolation if possible to find the culpit of reverts. Never the less, this should be way less intrusive and give fewer false positives than the old check.
Test