fix(router): handle large swaps exceeding route capacity (OSMO-53)#689
fix(router): handle large swaps exceeding route capacity (OSMO-53)#689
Conversation
Previously, errors returned by CalculateTokenOutByTokenIn in the dynamic programming split optimization were silently ignored with a blank identifier. This caused routes with insufficient liquidity (e.g., orderbook pools) to be incorrectly included in the optimal split calculation, resulting in catastrophic slippage for large swaps. This fix explicitly checks for errors and treats routes that error as producing zero output, ensuring they are excluded from the optimal split. Fixes OSMO-53
…pacity When all routes fail at the full swap amount, retry with a smaller probe amount (10%) to identify viable routes. This allows the split algorithm to handle routes with limited capacity (e.g., orderbook pools). Combined with the previous fix in dynamic_splits.go that properly handles route errors, this ensures large swaps are split across working routes rather than failing entirely or using broken routes. Fixes OSMO-53
📝 WalkthroughWalkthroughThe changes introduce a probe fallback mechanism in routing logic where a smaller probe amount (10% of requested) is attempted when initial route calculations fail, propagating a Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
Pre-merge checks❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
router/usecase/optimized_routes.go (1)
41-42: Consider logging discarded probe errors for debugging.The probe errors are intentionally discarded since we only care whether any routes work. However, for debugging production issues with the fallback mechanism, consider logging the probe errors at debug level before discarding.
🔎 Suggested improvement
// Retry route calculation with smaller probe amount - routesWithAmountOut, _ = routes.CalculateTokenOutByTokenIn(ctx, probeCoin) + routesWithAmountOut, probeErrors := routes.CalculateTokenOutByTokenIn(ctx, probeCoin) + if len(probeErrors) > 0 { + // Log at debug level for troubleshooting fallback behavior + // Consider using structured logging here if available + }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
router/usecase/dynamic_splits_test.gois excluded by!**/*_test.go
📒 Files selected for processing (3)
.github/workflows/build.ymlrouter/usecase/dynamic_splits.gorouter/usecase/optimized_routes.go
🧰 Additional context used
🧬 Code graph analysis (2)
router/usecase/dynamic_splits.go (1)
domain/types/coin.go (1)
NewCoin(6-13)
router/usecase/optimized_routes.go (4)
domain/types/coin.go (1)
NewCoin(6-13)ingest/usecase/plugins/orderbook/fillbot/osmosis_swap.go (1)
Denom(35-35)domain/tokens.go (1)
TokenSwapMethodExactIn(76-76)router/usecase/precompute.go (1)
GetPrecomputeOrderOfMagnitude(58-75)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build
- GitHub Check: Run linter
- GitHub Check: Summary
🔇 Additional comments (4)
.github/workflows/build.yml (1)
38-40: LGTM - CI artifact now includes test report.The addition of
report.jsonto the code coverage artifact preserves the JSON test output alongsidecoverage.out. This is useful for downstream tooling or analysis.router/usecase/dynamic_splits.go (1)
221-229: LGTM - Graceful error handling for route calculations.This correctly addresses the root cause where errored routes (e.g., insufficient liquidity in orderbook pools) were previously breaking the split optimization. By caching and returning zero for failed routes, the DP algorithm can continue evaluating other routes, and the downstream check at line 121-123 handles the case where all routes produce zero output.
router/usecase/optimized_routes.go (2)
35-39: Probe amount fallback logic is sound.The fallback to
osmomath.OneInt()when the probe amount is zero handles edge cases wheretokenIn.Amount < 10. This ensures the probe always uses a valid non-zero amount.
52-67: Cache invalidation on probe failure follows established pattern.The cache invalidation logic correctly mirrors the existing implementation in
estimateAndRankSingleRouteQuoteInGivenOut(lines 117-132). This ensures consistency in how route cache failures are handled across both swap directions.
…is used When probe fallback succeeds (routes work at 10% but not full amount), the quote's OutAmount is approximate. Previously this could be returned directly in single-route scenarios, showing users incorrect output amounts. Changes: - Add usedProbeFallback return value to track when probe was triggered - Skip early single-route return when probe was used - Force split algorithm to run for accurate calculations - Return clear error message if swap exceeds available liquidity Addresses CodeRabbit review feedback on OSMO-53
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
router/usecase/router_usecase.go (1)
199-213: Consider clarifying logic when probe fallback was used.When
usedProbeFallbackis true,topSingleRouteQuotehas an inaccurateOutAmountbased on the probe. While the comparison at line 202 should work (split output > probe output), the logic would be clearer if we explicitly always usetopSplitQuotewhen probe was used:finalQuote := topSingleRouteQuote +// When probe fallback was used, always prefer the split quote which has accurate recalculation +if usedProbeFallback { + finalQuote = topSplitQuote +// Otherwise, use split only if it's better than single route +} else if topSplitQuote.GetAmountOut().Amount.GT(topSingleRouteQuote.GetAmountOut().Amount) { -if topSplitQuote.GetAmountOut().Amount.GT(topSingleRouteQuote.GetAmountOut().Amount) { routes := topSplitQuote.GetRoute() r.logger.Debug("split route selected", zap.Int("route_count", len(routes))) finalQuote = topSplitQuote }Note: Line 210 has a pre-existing issue where it always logs "single route selected" even when split was chosen (line 210 is not part of this PR's changes). The log should be conditional or moved inside an else block.
router/usecase/optimized_routes.go (1)
38-65: Probe fallback logic is sound, but consider adding clarity comment.The probe mechanism correctly:
- Triggers only when all routes fail at the full amount
- Uses 10% of the original amount (with fallback to 1) to identify viable routes
- Marks
usedProbeFallback = trueto signal approximate results- Updates
InAmountback to the original for downstream processingHowever, at lines 62-64, updating
InAmountwhile leavingOutAmountfrom the probe creates a quote with mismatched amounts. While this is intentional and properly handled by callers (as verified in router_usecase.go), adding a comment would improve clarity:🔎 Suggested comment for clarity
// Update InAmount to original requested amount for proper handling downstream. // The split algorithm will recalculate actual outputs based on each route's capacity. + // Note: OutAmount remains from the probe calculation and is NOT accurate for the full amount. + // This quote is never returned directly to users - callers must use the split algorithm + // or return an error (see usedProbeFallback checks in GetOptimalQuoteOutGivenIn). for i := range routesWithAmountOut { routesWithAmountOut[i].InAmount = tokenIn.Amount }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
router/usecase/export_test.gois excluded by!**/*_test.go
📒 Files selected for processing (2)
router/usecase/optimized_routes.gorouter/usecase/router_usecase.go
🧰 Additional context used
🧬 Code graph analysis (2)
router/usecase/optimized_routes.go (2)
router/usecase/route/route.go (2)
RouteImpls(34-34)RouteWithOutAmount(36-40)router/usecase/precompute.go (1)
GetPrecomputeOrderOfMagnitude(58-75)
router/usecase/router_usecase.go (4)
domain/router.go (2)
DisableSplitRoutes(137-137)RouterOptions(152-169)router/usecase/route/route.go (1)
RouteImpl(22-32)ingest/types/candidate_route.go (1)
CandidateRoutes(22-26)domain/url.go (1)
GetURLPathFromContext(30-37)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build
- GitHub Check: Run linter
- GitHub Check: Summary
🔇 Additional comments (9)
router/usecase/router_usecase.go (7)
166-175: LGTM! Critical guard prevents returning inaccurate probe-based quotes.The condition correctly prevents early return when probe fallback was used, ensuring that the split algorithm runs to recalculate accurate outputs. The inline documentation clearly explains why this guard is necessary.
180-183: LGTM! Consistent guard after route filtering.Correctly prevents returning probe-based quotes even after filtering out CosmWasm routes.
293-295: LGTM! Intentional behavior for internal pricing.The comment clearly explains that probe fallback is intentionally ignored here since this function is used for pricing purposes rather than user-facing quotes where accuracy is critical.
373-392: LGTM! Signature correctly updated to propagate probe fallback indicator.The function properly:
- Returns
falsefor the probe flag on all error paths- Propagates
usedProbeFallbackfromestimateAndRankSingleRouteQuoteOutGivenIn- Returns the flag to callers for downstream handling
397-469: LGTM! Comprehensive propagation of probe fallback through route computation.All error paths correctly return
falsefor the probe fallback flag, and the flag is properly propagated fromrankRoutesByDirectQuoteto callers.
502-516: LGTM! Critical safeguard prevents inaccurate quotes in custom direct quote path.The check at lines 511-513 correctly returns an error when probe fallback was used, preventing users from receiving a quote with mismatched input/output amounts. The error message clearly explains the issue and suggests a remedy.
185-197: Error handling correctly accounts for probe fallback limitations.The split algorithm recalculates outputs accurately by dividing the tokenIn into 10 increments and calling
CalculateTokenOutByTokenInfor each route at each increment (lines in dynamic_splits.go). Routes with limited capacity are handled gracefully—if a route fails at any increment, it returns zero output for that increment, and the DP algorithm avoids selecting it.The error handling in this code is correct: when probe fallback was used and the split quote fails, returning an error is appropriate because the single-route quote is based on the smaller probe amount and thus inaccurate for the full requested amount. When probe was not used, returning the single-route quote is safe since it was calculated from the full amount.
router/usecase/optimized_routes.go (2)
19-32: LGTM! Excellent documentation of probe fallback behavior.The documentation clearly explains:
- The return value semantics including the new
usedProbeFallbackflag- The critical caveat that
OutAmountis an approximation when probe fallback is used- Guidance for callers to use the split algorithm for accurate calculations
This addresses the previous review concern about OutAmount mismatch by explicitly documenting the behavior and setting expectations for callers.
66-81: LGTM! Cache invalidation appropriately handles probe failure.When even the probe amount fails, the code:
- Invalidates both candidate and ranked route caches to force recomputation
- Returns the original error to inform the user
- Correctly returns
falseforusedProbeFallbacksince the probe didn't succeedThe cache invalidation is particularly useful for alloyed pools that may rebalance, as explained in the comment. While this could theoretically cause cache thrashing if the same large swap is attempted repeatedly, this is acceptable since the swap is genuinely failing and caches should be refreshed.
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
router/usecase/router_usecase.go (1)
166-216: Potential issue: returning inaccurate quote when probe fallback is used.When
usedProbeFallbackis true,topSingleRouteQuote.GetAmountOut()contains an approximate value based on a smaller probe amount (10%). If the split quote computation succeeds but produces a worse amount than this approximate single route quote (line 202), the code returnstopSingleRouteQuote(lines 207-210), which contains the inaccurate probe-based amount.This violates the principle stated in the comments: when probe fallback is used, the quote's
OutAmountis approximate and we must run the split algorithm for accurate calculations.Consider adding a check before line 202 to handle this case:
🔎 Proposed fix
topSplitQuote, err := getSplitQuote(ctx, rankedRoutes, tokenIn) if err != nil { // If split quote fails and probe fallback was used, we cannot return the single route quote // because its OutAmount is inaccurate. Return the error so users know to try a smaller amount. if usedProbeFallback { return nil, fmt.Errorf("requested amount exceeds available liquidity, try a smaller amount") } // If probe wasn't used, the single route quote is accurate - return it as fallback. return topSingleRouteQuote, nil } finalQuote := topSingleRouteQuote + // If probe fallback was used, we cannot trust topSingleRouteQuote's amount. + // Always use the split quote result when probe fallback occurred. + if usedProbeFallback { + finalQuote = topSplitQuote + } else { // If the split route quote is better than the single route quote, return the split route quote - if topSplitQuote.GetAmountOut().Amount.GT(topSingleRouteQuote.GetAmountOut().Amount) { + if topSplitQuote.GetAmountOut().Amount.GT(topSingleRouteQuote.GetAmountOut().Amount) { + routes := topSplitQuote.GetRoute() + + r.logger.Debug("split route selected", zap.Int("route_count", len(routes))) + + finalQuote = topSplitQuote + } + }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
router/usecase/dynamic_splits_test.gois excluded by!**/*_test.gorouter/usecase/optimized_routes_test.gois excluded by!**/*_test.go
📒 Files selected for processing (1)
router/usecase/router_usecase.go
🧰 Additional context used
🧬 Code graph analysis (1)
router/usecase/router_usecase.go (4)
domain/router.go (3)
DisableSplitRoutes(137-137)Quote(64-83)RouterOptions(152-169)ingest/types/candidate_route.go (1)
CandidateRoutes(22-26)router/usecase/route/route.go (1)
RouteImpl(22-32)domain/url.go (1)
GetURLPathFromContext(30-37)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Run linter
- GitHub Check: build
- GitHub Check: Summary
🔇 Additional comments (4)
router/usecase/router_usecase.go (4)
293-295: LGTM: Probe fallback appropriately ignored for pricing.The comment clearly explains that
usedProbeFallbackis intentionally ignored here because this function returns all routes for pricing purposes, where approximate values are acceptable. This is correct for the pricing worker context.
361-393: LGTM: Function signature properly updated.The
rankRoutesByDirectQuotefunction signature and all return statements are correctly updated to propagate theusedProbeFallbackflag. The documentation clearly explains when this flag is set and its implications for callers.
395-469: LGTM: Proper propagation of probe fallback flag.All return paths in
computeAndRankRoutesByDirectQuotecorrectly handle theusedProbeFallbackflag. Error paths appropriately returnfalse, and success paths properly propagate the value from the nestedrankRoutesByDirectQuotecall.
502-514: LGTM: Appropriate error handling for custom quotes.The probe fallback handling in
GetCustomDirectQuoteOutGivenInis correct. Custom direct quotes should return an error rather than an inaccurate approximate value. The error message is clear, actionable, and includes the pool ID for debugging.
Add workflow_dispatch trigger to push-dev-docker-images.yml
Sanitize branch name for Docker tag to replace slashes with dashes.
…-for-large-usdc-to-btc-swaps
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
.github/workflows/push-dev-docker-images.yml (1)
46-51: LGTM! Sanitization correctly handles forward slashes in branch names.The conditional sanitization is necessary because Docker image tags cannot contain forward slashes. This implementation correctly handles branch names like "jason/osmo-53-..." by replacing slashes with dashes.
💡 Optional: Consider comprehensive tag sanitization
For additional robustness, you could sanitize all potentially problematic characters in a single pass:
- # Sanitize branch name for Docker tag only if it contains forward slashes - if [[ "${{ github.ref_name }}" == *"/"* ]]; then - SANITIZED_REF_NAME=$(echo "${{ github.ref_name }}" | sed 's/\//-/g') - else - SANITIZED_REF_NAME="${{ github.ref_name }}" - fi + # Sanitize branch name for Docker tag (replace special chars with dashes) + SANITIZED_REF_NAME=$(echo "${{ github.ref_name }}" | sed 's/[^a-zA-Z0-9._-]/-/g' | sed 's/^[.-]*//')This handles forward slashes plus other edge cases (e.g., tags starting with periods/dashes, other special characters), though the current implementation is sufficient for typical branch naming patterns.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/push-dev-docker-images.yml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Run linter
- GitHub Check: build
- GitHub Check: Summary
🔇 Additional comments (1)
.github/workflows/push-dev-docker-images.yml (1)
52-53: Correct usage of sanitized ref name in Docker tags.Both
DOCKER_IMAGE_TAGandDOCKER_IMAGE_TAG_WITH_DATEcorrectly useSANITIZED_REF_NAMEinstead of the rawgithub.ref_name, ensuring valid Docker tags are generated.
| SHORT_SHA=$(echo ${GITHUB_SHA} | cut -c1-8) | ||
| echo "DOCKER_IMAGE_TAG=${{ github.ref_name }}-${SHORT_SHA}" >> $GITHUB_ENV | ||
| echo "DOCKER_IMAGE_TAG_WITH_DATE=${{ github.ref_name }}-${SHORT_SHA}-$(date +%s)" >> $GITHUB_ENV | ||
| # Sanitize branch name for Docker tag only if it contains forward slashes |
There was a problem hiding this comment.
The workflow was failing - it hadn't been run in months so it seemed that something might have changed either on github's end or somewhere else. This fixed it.
OSMO-53: Implement Option 2 - Only force-add canonical orderbook if it has sufficient capacity for the requested swap amount. Changes: - Add orderbookHasSufficientCapacity() helper function - Modify force-add logic to check capacity before re-adding orderbook - Add OSMO-53 debug logging for troubleshooting - Add unit tests for capacity check (9 test cases) This prevents low-liquidity orderbooks from being repeatedly reconsidered for large swaps that exceed their capacity.
Permanent record of the routing issue investigation, proposed fixes, developer feedback, and staging test procedures for the large USDC-to-BTC swap slippage problem (OSMO-53). Co-authored-by: Cursor <cursoragent@cursor.com>
Summary
Fixes critical bug where large USDC→BTC swaps (500K+) resulted in 94% slippage due to router selecting low-liquidity orderbook pool.
Linear Issue
OSMO-53: SQS router causes 94% slippage for large USDC to BTC swaps
Problem
dynamic_splits.goSolution (Three-Part Fix)
Part 1: Error Handling in Split Algorithm
dynamic_splits.go- Routes that error now return 0 output instead of being silently ignored.Part 2: Probe Fallback for Route Ranking
optimized_routes.go- When all routes fail at full amount:Part 3: Prevent Inaccurate Quote Returns (CodeRabbit feedback)
router_usecase.go- When probe fallback is used:usedProbeFallbackflag through the call chain"requested amount exceeds available liquidity, try a smaller amount"Solution Flow
flowchart TD A["User requests swap<br/>e.g. 500K USDC→BTC"] --> B["estimateAndRankSingleRouteQuote()<br/><i>optimized_routes.go</i>"] B --> C{"All routes fail<br/>at full amount?"} C -->|No| D["Return quote +<br/>usedProbeFallback = false"] C -->|Yes| E["🔧 FIX #2: Probe Fallback<br/>Try 10% of amount"] E --> F{"Probe succeeds?"} F -->|No| G["Return error:<br/>no viable routes"] F -->|Yes| H["Set usedProbeFallback = true<br/>Return routes for splitting"] D --> I["GetOptimalQuoteOutGivenIn()<br/><i>router_usecase.go</i>"] H --> I I --> J{"Single route OR<br/>splits disabled?"} J -->|Yes| K{"🔧 FIX #3: Check flag<br/>usedProbeFallback?"} J -->|No| L["Run Split Algorithm<br/><i>dynamic_splits.go</i>"] K -->|No| M["Return single route quote<br/>(OutAmount is accurate)"] K -->|Yes| N["Force split algorithm<br/>(OutAmount was approximate)"] N --> L L --> O["🔧 FIX #1: Error Handling<br/>Routes that error → return 0"] O --> P{"Valid split found?"} P -->|Yes| Q["Return accurate split quote"] P -->|No| R{"usedProbeFallback?"} R -->|Yes| S["Return error:<br/>'amount exceeds liquidity,<br/>try smaller amount'"] R -->|No| T["Return single route<br/>as fallback"] style E fill:#4dabf7,color:#000 style K fill:#4dabf7,color:#000 style O fill:#4dabf7,color:#000 style S fill:#fab005,color:#000Flow Comparison
Files Changed
router/usecase/dynamic_splits.go- Handle route errors in DP algorithmrouter/usecase/optimized_routes.go- Add probe fallback +usedProbeFallbackreturnrouter/usecase/router_usecase.go- Propagate flag, skip early returns when probe usedrouter/usecase/export_test.go- Update test exports for new signaturesrouter/usecase/optimized_routes_test.go- Handle new return valuerouter/usecase/dynamic_splits_test.go- Handle new return valueTesting
Summary by CodeRabbit
Bug Fixes
Chores
✏️ Tip: You can customize this high-level summary in your review settings.