fix: Handle rate-limiting exceptions consistently in LLM provider service #3370

JanCizmar · 2026-01-09T10:27:13Z

Summary by CodeRabbit

Release Notes

New Features
- Enhanced AI provider selection with configurable priority filtering
- Provider suspension mechanism with automatic recovery after rate-limit periods
- Test/fake response support for AI providers
Improvements
- Improved rate-limit handling with explicit retry timing information

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-09T10:27:29Z

📝 Walkthrough

Walkthrough

The pull request restructures LlmProviderService with new public methods for provider selection, rate-limit handling, and fake response generation. Concurrently, LlmRateLimitedException's constructor is updated to make the retryAt parameter non-nullable, removing optional handling for retry timing information.

Changes

Cohort / File(s)	Summary
LlmProviderService Core Reorganization `ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt`	Added public methods: `getAll(organizationId)`, `getAllServerProviders()`, `getProviderByName(organizationId, name, priority)` for provider resolution; `getProviderResponse(...)` and `getProviderService(...)` for response generation; `suspendProvider(name, providerId)` for suspension management; `getFakedResponse(config)` for fake provider support. Introduced `repeatWhileProvidersRateLimited(...)` and `repeatWithTimeouts(...)` for centralized retry logic. Added `RATE_LIMIT_SUSPEND_PERIOD_MS` constant and `ProviderInfo` data class. Reorganized provider selection flow with per-name logic, rate-limit checks, and round-robin rotation among candidates.
LlmRateLimitedException Constructor Update `backend/data/src/main/kotlin/io/tolgee/exceptions/LlmRateLimitedException.kt`	Changed `retryAt` parameter from `Long? = null` to non-nullable `Long`, making retry timing information mandatory in exception construction.

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant LlmProviderService
    participant ProviderSelector as Provider<br/>Selection
    participant SuspensionCheck as Suspension<br/>Check
    participant RetryHandler as Retry<br/>Handler
    participant ApiService as AbstractLlmApiService
    participant RestTemplate

    Client->>LlmProviderService: getProviderByName(org, name, priority)
    LlmProviderService->>ProviderSelector: Compute candidates (custom + server providers)
    ProviderSelector-->>LlmProviderService: Candidate list
    LlmProviderService->>SuspensionCheck: Check suspension status
    SuspensionCheck-->>LlmProviderService: Provider (non-suspended)
    Client->>LlmProviderService: getProviderResponse(service, params, config, template)
    LlmProviderService->>RetryHandler: repeatWhileProvidersRateLimited(...)
    loop Until not rate-limited or max attempts
        RetryHandler->>ApiService: Call provider API
        ApiService->>RestTemplate: Execute request
        RestTemplate-->>ApiService: Response or rate-limit error
        ApiService-->>RetryHandler: PromptResult or LlmRateLimitedException
        alt Rate-limited
            RetryHandler->>SuspensionCheck: suspendProvider(name, providerId)
            RetryHandler->>ProviderSelector: Select next candidate
        else Success
            RetryHandler-->>LlmProviderService: PromptResult
        end
    end
    LlmProviderService-->>Client: PromptResult

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Hops through providers with care,
Rate limits caught, suspended with flair,
Retries and fakes dance in the air,
Round-robin selection beyond compare,
LLM logic now handled with finesse!

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Title check	⚠️ Warning	The title claims to 'handle rate-limiting exceptions consistently' but the changes show much broader refactoring including provider selection reorganization, public API expansion, fake provider support, and internal state management—only partially about exception handling.	Revise the title to better reflect the scope, such as 'refactor: Reorganize LLM provider selection and add rate-limit retry handling' to accurately capture the primary changes.
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In @ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt:
- Around line 102-117: The code can proceed with an empty providers list due to
a TOCTOU race and cause division-by-zero/IndexOutOfBounds; move the cache read
and the empty-check into the synchronized(this) block (or re-check
providers.isEmpty() inside synchronized) so the provider list and providerInfo
are read atomically, and if providers is still empty throw a proper exception
(e.g., LlmRateLimitedException or a new descriptive exception) instead of
performing modulus/index access; ensure lastUsed/lastUsedMap logic (lastUsed,
lastUsedIndex, newIndex, providers.get(...), lastUsedMap.set(...)) only runs
when providers.isNotEmpty().

🧹 Nitpick comments (2)

ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt (2)
57-57: ConcurrentHashMap usage with synchronized access.

While ConcurrentHashMap provides thread-safe operations, all accesses to lastUsedMap (lines 111-116) are wrapped in synchronized(this) blocks, making the concurrent collection redundant. A regular mutableMapOf() would suffice given the synchronization, though this doesn't cause issues.

229-233: Good synchronization for cache modification.

The synchronized block ensures atomicity of the read-modify-write operation on the cache, preventing concurrent modifications from being lost. However, synchronizing on this creates a broad lock scope that could lead to contention when multiple threads operate on different provider names simultaneously.

Consider using a more fine-grained locking strategy (e.g., per-provider-name locks) if performance becomes an issue:
private val providerLocks = ConcurrentHashMap<String, Any>()

fun suspendProvider(name: String, providerId: Long, period: Long) {
  val lock = providerLocks.computeIfAbsent(name) { Any() }
  synchronized(lock) {
    val providerInfo = cache.get(name, ProviderInfo::class.java) ?: ProviderInfo()
    providerInfo.suspendMap.set(providerId, currentDateProvider.date.time + period)
    cache.set(name, providerInfo)
  }
}

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 53b818d and 5d80367.

📒 Files selected for processing (2)

backend/data/src/main/kotlin/io/tolgee/batch/MtProviderCatching.kt
ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt

🧰 Additional context used

📓 Path-based instructions (2)

**/*.kt

📄 CodeRabbit inference engine (AGENTS.md)

**/*.kt: After modifying JPA entities, run ./gradlew diffChangeLog to generate Liquibase changelog entries (add --no-daemon flag if docker command is not found)
Run specific test suites instead of the bare test task; use :data:test, :server-app:runStandardTests, :server-app:runContextRecreatingTests, :server-app:runWebsocketTests, :server-app:runWithoutEeTests, :ee-test:test, or :security:test with the --tests flag for individual tests
Always run ./gradlew ktlintFormat before commits in backend code

Files:

backend/data/src/main/kotlin/io/tolgee/batch/MtProviderCatching.kt
ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt

**/*.{ts,tsx,js,jsx,kt,kts}

⚙️ CodeRabbit configuration file

As part of review, please check if the file follows 'The Stepdown Rule': The most important, high-level concepts should be at the top (like a newspaper headline and opening paragraph). Details should increase as you read downward. Functions should be ordered so that a caller appears before the functions it calls. When working with JavaScript components, we allow the main component to live at the bottom of the file as an exception to the rule.

Files:

backend/data/src/main/kotlin/io/tolgee/batch/MtProviderCatching.kt
ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Build frontend 🏗️
GitHub Check: Frontend static check 🪲
GitHub Check: Build backend 🏗️

🔇 Additional comments (3)

backend/data/src/main/kotlin/io/tolgee/batch/MtProviderCatching.kt (1)

47-47: LGTM! Reasonable default retry delay.

The increase from 100ms to 5000ms (5 seconds) provides a more appropriate default delay for rate-limited scenarios, preventing rapid retry storms and aligning better with typical rate-limiting windows.

ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt (2)

91-100: Good addition of synchronization for provider filtering.

Wrapping the cache read and provider filtering in a synchronized block ensures atomicity and prevents race conditions where provider suspension state could change mid-operation.

146-150: No changes needed. The LlmRateLimitedException constructor correctly supports the retryAt, params, and cause parameters. The code is using the exception as designed, and retryAt is an existing parameter in the class definition (defined with a default value of null), not a new addition.

ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In @ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt:
- Around line 90-117: The synchronized block around provider filtering is
insufficient because lastUsedMap.compute(...) runs outside it, causing a race
where a provider can be selected after being suspended; move the round-robin
update and selection (the logic using lastUsedMap.compute and the final return
of providers.first { it.id == providerId }) into the same synchronized section
that computes `providers` so filtering and selection are atomic; also stop
synchronizing on `name.intern()` and instead use a dedicated lock-per-name
(e.g., a ConcurrentHashMap<String, Any> called `locks` and synchronize on
`locks.getOrPut(name)`), ensuring `suspendProvider()` and other code paths use
the same lock map so suspensions and selections cannot race.

🧹 Nitpick comments (1)

ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt (1)

124-186: Consider reordering methods to follow the Stepdown Rule.

According to the coding guidelines, "a caller appears before the functions it calls." Since callProvider (line 169) calls repeatWhileProvidersRateLimited (line 124), consider moving callProvider above repeatWhileProvidersRateLimited for better readability.

As per coding guidelines for Kotlin files, the Stepdown Rule should be followed.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5d80367 and fa22b41.

📒 Files selected for processing (1)

ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt

🧰 Additional context used

📓 Path-based instructions (2)

**/*.kt

📄 CodeRabbit inference engine (AGENTS.md)

**/*.kt: After modifying JPA entities, run ./gradlew diffChangeLog to generate Liquibase changelog entries (add --no-daemon flag if docker command is not found)
Run specific test suites instead of the bare test task; use :data:test, :server-app:runStandardTests, :server-app:runContextRecreatingTests, :server-app:runWebsocketTests, :server-app:runWithoutEeTests, :ee-test:test, or :security:test with the --tests flag for individual tests
Always run ./gradlew ktlintFormat before commits in backend code

Files:

ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt

**/*.{ts,tsx,js,jsx,kt,kts}

⚙️ CodeRabbit configuration file

As part of review, please check if the file follows 'The Stepdown Rule': The most important, high-level concepts should be at the top (like a newspaper headline and opening paragraph). Details should increase as you read downward. Functions should be ordered so that a caller appears before the functions it calls. When working with JavaScript components, we allow the main component to live at the bottom of the file as an exception to the rule.

Files:

ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Build backend 🏗️
GitHub Check: Frontend static check 🪲
GitHub Check: Build frontend 🏗️

🔇 Additional comments (3)

ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt (3)

57-57: Good use of ConcurrentHashMap for thread safety.

The change to ConcurrentHashMap ensures thread-safe access to lastUsedMap. However, note that since the round-robin selection logic (lines 111-117) is not inside the synchronized block, you might want to consider moving it inside to ensure atomicity between provider filtering and selection (see comment on lines 90-117).

146-150: Excellent improvement to include retry timing.

Adding the retryAt timestamp to LlmRateLimitedException provides much better feedback to callers about when they can retry. This is consistent with the exception thrown at line 103 when all providers are suspended.

229-233: Good synchronization of cache updates.

Moving the cache update into a synchronized block ensures thread safety when suspending providers. This correctly prevents race conditions between suspension and provider selection. However, as noted in the earlier comment, consider replacing String.intern() with dedicated lock objects.

ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt (2)
105-105: Minor simplification available.

The suspend time retrieval can be simplified:
♻️ Suggested simplification
-val closestUnsuspend = providerInfo.suspendMap.map { (_, time) -> time }.min()
+val closestUnsuspend = providerInfo.suspendMap.values.min()
317-321: Consider thread-safe map for suspendMap.

The suspendMap inside ProviderInfo uses a non-thread-safe mutableMapOf(). While current access is protected by synchronized blocks, using a ConcurrentHashMap would provide defense-in-depth and make the code more robust against future modifications that might access it without synchronization.
♻️ Suggested improvement
 companion object {
   data class ProviderInfo(
-    var suspendMap: MutableMap<Long, Long> = mutableMapOf(),
+    var suspendMap: MutableMap<Long, Long> = ConcurrentHashMap(),
   )
 }

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fa22b41 and 664c401.

📒 Files selected for processing (1)

ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt

🧰 Additional context used

📓 Path-based instructions (2)

**/*.kt

📄 CodeRabbit inference engine (AGENTS.md)

**/*.kt: After modifying JPA entities, run ./gradlew diffChangeLog to generate Liquibase changelog entries (add --no-daemon flag if docker command is not found)
Run specific test suites instead of the bare test task; use :data:test, :server-app:runStandardTests, :server-app:runContextRecreatingTests, :server-app:runWebsocketTests, :server-app:runWithoutEeTests, :ee-test:test, or :security:test with the --tests flag for individual tests
Always run ./gradlew ktlintFormat before commits in backend code

Files:

ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt

**/*.{ts,tsx,js,jsx,kt,kts}

⚙️ CodeRabbit configuration file

As part of review, please check if the file follows 'The Stepdown Rule': The most important, high-level concepts should be at the top (like a newspaper headline and opening paragraph). Details should increase as you read downward. Functions should be ordered so that a caller appears before the functions it calls. When working with JavaScript components, we allow the main component to live at the bottom of the file as an exception to the rule.

Files:

ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (24)

GitHub Check: E2E testing ‍🔎 (15, 9)
GitHub Check: E2E testing ‍🔎 (15, 10)
GitHub Check: E2E testing ‍🔎 (15, 12)
GitHub Check: E2E testing ‍🔎 (15, 8)
GitHub Check: E2E testing ‍🔎 (15, 11)
GitHub Check: E2E testing ‍🔎 (15, 14)
GitHub Check: E2E testing ‍🔎 (15, 13)
GitHub Check: E2E testing ‍🔎 (15, 4)
GitHub Check: E2E testing ‍🔎 (15, 1)
GitHub Check: E2E testing ‍🔎 (15, 7)
GitHub Check: E2E testing ‍🔎 (15, 2)
GitHub Check: BT ‍🔎 (server-app:runStandardTests)
GitHub Check: E2E testing ‍🔎 (15, 3)
GitHub Check: E2E testing ‍🔎 (15, 6)
GitHub Check: BT ‍🔎 (ktlint:test)
GitHub Check: E2E testing ‍🔎 (15, 5)
GitHub Check: E2E testing ‍🔎 (15, 0)
GitHub Check: BT ‍🔎 (data:test)
GitHub Check: BT ‍🔎 (server-app:runWithoutEeTests)
GitHub Check: BT ‍🔎 (ee-test:test)
GitHub Check: BT ‍🔎 (server-app:runContextRecreatingTests)
GitHub Check: BT ‍🔎 (security:test)
GitHub Check: BT ‍🔎 (server-app:runWebsocketTests)
GitHub Check: Frontend static check 🪲

🔇 Additional comments (3)

ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt (3)

58-58: LGTM!

Good use of ConcurrentHashMap for thread-safe storage of last-used provider tracking.

114-121: LGTM!

Good use of ConcurrentHashMap.compute() for atomic round-robin advancement. The approach correctly captures an immutable snapshot of providers before updating the index.

150-154: LGTM!

The exception correctly includes a retryAt timestamp consistent with the suspension period, and properly preserves the original error for debugging.

ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt (2)
168-230: Approve with optional refactoring suggestion.

The provider selection logic is correct: custom providers override server providers by name, priority filtering works as expected, and the round-robin selection via lastUsedMap.compute is atomic and safe. The race condition between the synchronized block exit and the compute operation is acceptable because providers is a local snapshot.

Optional: Replace String.intern() synchronization.

Using name.intern() as a lock object (lines 200, 258) is functional but discouraged as it pollutes the string pool and can cause unexpected contention. Consider using a dedicated lock map instead:
♻️ Suggested refactoring

Add a lock map at the class level:
private val lockMap: ConcurrentHashMap<String, Any> = ConcurrentHashMap()
Then replace synchronization blocks:
// Instead of synchronized(name.intern())
synchronized(lockMap.computeIfAbsent(name) { Any() }) {
  // ... existing logic
}
This provides the same per-name locking without string pool pollution.
60-322: Consider reordering functions per coding guidelines.

The file doesn't fully follow the Stepdown Rule specified in the coding guidelines. Functions should be ordered so callers appear before callees. For example:

getProviderByName (line 168) is called by repeatWhileProvidersRateLimited (line 79)

suspendProvider (line 253) is called by repeatWhileProvidersRateLimited (line 79)

getProviderService (line 243) is called by callProvider (line 60)

Reordering would improve readability by allowing readers to follow the call flow top-to-bottom.

As per coding guidelines, Kotlin files should follow the Stepdown Rule.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 664c401 and f950d86.

📒 Files selected for processing (1)

ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt

🧰 Additional context used

📓 Path-based instructions (2)

**/*.kt

📄 CodeRabbit inference engine (AGENTS.md)

**/*.kt: After modifying JPA entities, run ./gradlew diffChangeLog to generate Liquibase changelog entries (add --no-daemon flag if docker command is not found)
Run specific test suites instead of the bare test task; use :data:test, :server-app:runStandardTests, :server-app:runContextRecreatingTests, :server-app:runWebsocketTests, :server-app:runWithoutEeTests, :ee-test:test, or :security:test with the --tests flag for individual tests
Always run ./gradlew ktlintFormat before commits in backend code

Files:

ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt

**/*.{ts,tsx,js,jsx,kt,kts}

⚙️ CodeRabbit configuration file

As part of review, please check if the file follows 'The Stepdown Rule': The most important, high-level concepts should be at the top (like a newspaper headline and opening paragraph). Details should increase as you read downward. Functions should be ordered so that a caller appears before the functions it calls. When working with JavaScript components, we allow the main component to live at the bottom of the file as an exception to the rule.

Files:

ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt

🧬 Code graph analysis (1)

ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt (1)

backend/data/src/main/kotlin/io/tolgee/repository/LlmProviderRepository.kt (2)

getAll (9-25)

getAll (12-19)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Frontend static check 🪲
GitHub Check: Build frontend 🏗️
GitHub Check: Build backend 🏗️

🔇 Additional comments (8)

ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt (8)

39-39: LGTM - Good choice for thread-safe map operations.

The ConcurrentHashMap import is appropriate for the thread-safe lastUsedMap implementation and supports the atomic compute operation used for round-robin provider selection.

58-58: LGTM - Thread-safe implementation.

The switch to ConcurrentHashMap ensures thread-safe access to the round-robin tracking map, with atomic compute operations preventing race conditions during provider selection.

60-77: LGTM - Clean orchestration of provider calls.

The new callProvider function effectively coordinates rate-limit handling, timeout retries, and response processing. The fallback logic for attempts (config → service default) and price calculation ensures consistent behavior.

79-106: LGTM - Effective rate-limit handling with improved UX.

The retry logic properly suspends rate-limited providers and surfaces the retryAt timestamp to clients, enabling informed retry behavior. The 60-second suspension period aligns with the exception's retry window.

108-122: LGTM - Solid timeout retry implementation.

The function correctly handles progressive timeout attempts by creating a fresh RestTemplate with the appropriate timeout for each iteration, properly catching timeout-related exceptions.

124-166: LGTM - Well-structured helper functions.

The extracted helpers improve readability and maintainability. Notably, the calculatePrice function correctly excludes cached tokens from input cost (line 160), which optimizes pricing for prompt caching scenarios.

236-241: LGTM - Clever use of negative IDs.

The negative indexing scheme (-(index + 1)) effectively distinguishes server-configured providers from custom providers without requiring separate type flags or additional fields.

253-263: LGTM - Consistent synchronization pattern.

The suspension logic correctly uses the same synchronization pattern as getProviderByName, ensuring thread-safe cache updates. The mutable suspendMap is safe because all access occurs within the synchronized block.

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt (1)

111-125: Guard edge cases: empty attempts => NPE; cachedTokens > inputTokens => negative price

Both can produce surprising behavior (crash or negative credits).

Proposed fix

 private fun <T> repeatWithTimeouts(
   attempts: List<Int>,
   callback: (restTemplate: RestTemplate) -> T,
 ): T {
+  if (attempts.isEmpty()) {
+    throw InvalidStateException() // (or BadRequestException) - attempts must contain at least one timeout
+  }
   var lastError: Exception? = null
   for (timeout in attempts) {
     val restTemplate = restTemplateBuilder.readTimeout(Duration.ofSeconds(timeout.toLong())).build()
     try {
       return callback(restTemplate)
     } catch (e: ResourceAccessException) {
       lastError = e
     }
   }
   throw FailedDependencyException(Message.LLM_PROVIDER_ERROR, listOf(lastError!!.message), lastError)
 }

 private fun calculatePrice(
   providerConfig: LlmProviderDto,
   usage: PromptResult.Usage?,
 ): Int {
   val tokenPriceInCreditsInput: Double = (providerConfig.tokenPriceInCreditsInput ?: 0.0)
   val tokenPriceInCreditsOutput: Double = (providerConfig.tokenPriceInCreditsOutput ?: 0.0)
   val inputTokens: Long = usage?.inputTokens ?: 0L
   val outputTokens: Long = usage?.outputTokens ?: 0L
   val cachedTokens: Long = usage?.cachedTokens ?: 0L
 
-  val inputPrice = (inputTokens - cachedTokens) * tokenPriceInCreditsInput
+  val billableInputTokens = (inputTokens - cachedTokens).coerceAtLeast(0L)
+  val inputPrice = billableInputTokens * tokenPriceInCreditsInput
   val outputPrice = (outputTokens) * tokenPriceInCreditsOutput
 
   val price = inputPrice + outputPrice
 
   return (price * 100).roundToInt()
 }

Also applies to: 153-169

🤖 Fix all issues with AI agents

In @ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt:
- Around line 171-237: ProviderInfo.suspendMap may include suspend times for
provider IDs that are not in the current candidate set (providersWithPriority),
so computing closestUnsuspend from the whole map can return an incorrect retry
time; fix by restricting the min calculation to suspendMap entries whose keys
are in the current providers list (e.g., get the candidateIds =
providersWithPriority.map { it.id } and compute closestUnsuspend =
providerInfo.suspendMap.filterKeys { it in candidateIds
}.values.min()/minOrNull()), and handle the null case (fall back to throwing
BadRequestException) before throwing LlmRateLimitedException(closestUnsuspend).

🧹 Nitpick comments (2)

ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt (2)

58-61: lockMap can grow unbounded (high-cardinality provider names)

If name is user-controlled / high-cardinality, lockMap becomes a memory leak. Consider a bounded/striped lock strategy (or include org in the lock key if that matches the intended isolation).

139-151: Fake response JSON is OK for tests, but consider returning structured fields
If anything downstream parses PromptResult.response as JSON, string interpolation with config.name could create invalid JSON for unusual names.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f950d86 and ba5a609.

📒 Files selected for processing (1)

ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt

🧰 Additional context used

📓 Path-based instructions (2)

**/*.kt

📄 CodeRabbit inference engine (AGENTS.md)

**/*.kt: After modifying JPA entities, run ./gradlew diffChangeLog to generate Liquibase changelog entries (add --no-daemon flag if docker command is not found)
Run specific test suites instead of the bare test task; use :data:test, :server-app:runStandardTests, :server-app:runContextRecreatingTests, :server-app:runWebsocketTests, :server-app:runWithoutEeTests, :ee-test:test, or :security:test with the --tests flag for individual tests
Always run ./gradlew ktlintFormat before commits in backend code

Files:

ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt

**/*.{ts,tsx,js,jsx,kt,kts}

⚙️ CodeRabbit configuration file

As part of review, please check if the file follows 'The Stepdown Rule': The most important, high-level concepts should be at the top (like a newspaper headline and opening paragraph). Details should increase as you read downward. Functions should be ordered so that a caller appears before the functions it calls. When working with JavaScript components, we allow the main component to live at the bottom of the file as an exception to the rule.

Files:

ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt

🧬 Code graph analysis (1)

ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt (1)

backend/data/src/main/kotlin/io/tolgee/repository/LlmProviderRepository.kt (2)

getAll (9-25)

getAll (12-19)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Build backend 🏗️
GitHub Check: Build frontend 🏗️
GitHub Check: Frontend static check 🪲

🔇 Additional comments (4)

ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt (4)

63-80: callProvider(...) orchestration looks clean (selection → retries → enrichment)

Nice consolidation of selection + timeout retries + price enrichment in one place.

127-137: Fake-provider switch is nicely centralized

260-270: Good: suspension write is done under the same per-name lock as reads

82-109: Confirm whether hard-coded "60s suspend + 3 provider attempts" matches product expectations

The review comment accurately identifies hardcoded values:

3 retry attempts (line 92: repeat(3))

60-second suspension period (line 97: 60 * 1000)

No delay/backoff between retry attempts

retryAt in thrown exception uses currentDateProvider.date.time + 60 * 1000 (line 105)

However, when getProviderByName() detects all providers are suspended, it throws with the actual closestUnsuspend time from the suspension map (line 221), not the hardcoded 60s. The hardcoded 60s applies only when all 3 attempts exhaust without finding an available provider.

The attempts configuration in LlmProperties allows per-provider timeout specifications but does not control the "3 retry" count or the 60-second suspension period—both remain hardcoded.

No test coverage or product documentation was found to determine if these values align with product expectations. This requires confirmation from the product team or design documentation.

ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt

…vice - Made `retryAt` in `LlmRateLimitedException` non-nullable with default suspension logic. - Centralized rate limit suspension period using a constant. - Improved exception handling and logging for rate-limited LLM providers.

- Reordered methods logically for better structure and consistency. - Resolved redundant definitions by consolidating duplicated method declarations. - Ensured private constants are properly maintained in the companion object.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt (1)
57-57: Critical: Race condition on shared mutable state.

The lastUsedMap is accessed without synchronization in getProviderByName (lines 232, 236), which creates a data race when multiple threads call this method concurrently. This violates thread-safety guarantees and can lead to incorrect round-robin provider selection.
🔒 Suggested fix using ConcurrentHashMap
-  private var lastUsedMap: MutableMap<String, Long> = mutableMapOf()
+  private val lastUsedMap: ConcurrentHashMap<String, Long> = ConcurrentHashMap()
Also add the import:
import java.util.concurrent.ConcurrentHashMap
Note: While ConcurrentHashMap makes individual operations thread-safe, the compound read-modify-write operation in getProviderByName (lines 232-236) may still need synchronization if strict round-robin ordering is required under concurrent access. Consider using compute or computeIfPresent for atomic updates.

🤖 Fix all issues with AI agents

In @ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt:
- Around line 173-187: The function repeatWithTimeouts can receive an empty
attempts list, leaving lastError null and causing an NPE when throwing
FailedDependencyException; add a guard at the start of repeatWithTimeouts to
validate attempts is not empty (or set a clear default) and if it is empty throw
a FailedDependencyException or IllegalArgumentException with a descriptive
message, or when throwing FailedDependencyException ensure you supply a non-null
cause/message by checking lastError and providing a fallback error message;
update references in repeatWithTimeouts to use this validation so lastError is
never dereferenced when attempts is empty.

🧹 Nitpick comments (5)

ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt (5)
150-150: Extract magic number to a named constant.

The hardcoded 3 should be extracted to a named constant to improve maintainability and make the retry limit configurable or at least self-documenting.
♻️ Proposed refactor

Add to the companion object:
   companion object {
     private const val RATE_LIMIT_SUSPEND_PERIOD_MS = 60000L
+    private const val MAX_RATE_LIMIT_RETRIES = 3
Then update line 150:
-    repeat(3) {
+    repeat(MAX_RATE_LIMIT_RETRIES) {
232-237: Use idiomatic Kotlin indexing operator.

Line 235 uses .get(newIndex) which is less idiomatic in Kotlin. The [] operator is preferred for list/array access.
♻️ Proposed refactor
-    val provider = providers.get(newIndex)
+    val provider = providers[newIndex]
227-230: Consider adding more context to the exception.

When all providers are suspended, the exception could include the provider name and list of suspended provider IDs to help with debugging.
♻️ Proposed enhancement
     if (providers.isEmpty() && providerInfo?.suspendMap?.isNotEmpty() == true) {
       val closestUnsuspend = providerInfo.suspendMap.map { (_, time) -> time }.min()
-      throw LlmRateLimitedException(closestUnsuspend)
+      throw LlmRateLimitedException(
+        retryAt = closestUnsuspend,
+        params = listOf("All providers for '$name' are currently suspended")
+      )
     }
197-211: Consider simplifying provider selection logic.

The current logic fetches all providers, checks existence, then filters. This could be more direct by filtering immediately, which would be clearer and slightly more efficient.
♻️ Proposed refactor for clarity
-    val providersOfTheName =
-      if (customProviders.find { it.name == name } != null) {
-        customProviders
-      } else {
-        serverProviders
-      }.filter {
-        it.name == name
-      }
+    val customProvidersOfName = customProviders.filter { it.name == name }
+    val providersOfTheName = customProvidersOfName.ifEmpty {
+      serverProviders.filter { it.name == name }
+    }
 
-    val providersWithPriority =
-      if (providersOfTheName.find { it.priority == priority } != null) {
-        providersOfTheName.filter { it.priority == priority }
-      } else {
-        providersOfTheName
-      }
+    val providersWithMatchingPriority = providersOfTheName.filter { it.priority == priority }
+    val providersWithPriority = providersWithMatchingPriority.ifEmpty {
+      providersOfTheName
+    }
59-310: Consider reorganizing methods to follow the Stepdown Rule.

Per the coding guidelines, functions should be ordered so that a caller appears before the functions it calls, with high-level concepts at the top. Currently, helper methods like repeatWhileProvidersRateLimited, repeatWithTimeouts, and getProviderByName appear after CRUD methods but are called by the high-level callProvider method.

Consider this organization:

High-level public API: callProvider (already at top ✓)

CRUD operations: createProvider, updateProvider, deleteProvider, getAll

Core provider operations called by callProvider: repeatWhileProvidersRateLimited, repeatWithTimeouts, getProviderByName

Lower-level helpers: getProviderResponse, getProviderService, suspendProvider, getFakedResponse, calculatePrice

As per coding guidelines.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ba5a609 and 0ee4b48.

📒 Files selected for processing (2)

backend/data/src/main/kotlin/io/tolgee/exceptions/LlmRateLimitedException.kt
ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt

🧰 Additional context used

📓 Path-based instructions (2)

**/*.kt

📄 CodeRabbit inference engine (AGENTS.md)

**/*.kt: After modifying JPA entities, run ./gradlew diffChangeLog to generate Liquibase changelog entries (add --no-daemon flag if docker command is not found)
Run specific test suites instead of the bare test task; use :data:test, :server-app:runStandardTests, :server-app:runContextRecreatingTests, :server-app:runWebsocketTests, :server-app:runWithoutEeTests, :ee-test:test, or :security:test with the --tests flag for individual tests
Always run ./gradlew ktlintFormat before commits in backend code

Files:

backend/data/src/main/kotlin/io/tolgee/exceptions/LlmRateLimitedException.kt
ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt

**/*.{ts,tsx,js,jsx,kt,kts}

⚙️ CodeRabbit configuration file

As part of review, please check if the file follows 'The Stepdown Rule': The most important, high-level concepts should be at the top (like a newspaper headline and opening paragraph). Details should increase as you read downward. Functions should be ordered so that a caller appears before the functions it calls. When working with JavaScript components, we allow the main component to live at the bottom of the file as an exception to the rule.

Files:

backend/data/src/main/kotlin/io/tolgee/exceptions/LlmRateLimitedException.kt
ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt

🧬 Code graph analysis (1)

ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt (1)

backend/data/src/main/kotlin/io/tolgee/repository/LlmProviderRepository.kt (2)

getAll (9-25)

getAll (12-19)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Frontend static check 🪲
GitHub Check: Build backend 🏗️

🔇 Additional comments (6)

ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt (5)

130-132: LGTM!

The delegation to the repository and DTO mapping is straightforward and correct.

240-250: LGTM!

The routing logic for fake vs. real provider responses is clear and appropriate for testing scenarios.

252-260: LGTM!

The provider type mapping is exhaustive and correct.

280-287: Cache update has potential race condition (likely acceptable).

The get-modify-set pattern on the cache (lines 284-286) is not atomic. If two threads suspend different providers with the same name simultaneously, one update could be lost. However, this is likely acceptable given that:

Suspension timing doesn't need to be perfectly precise

The impact is minimal (slightly delayed unsuspension)

The scenario is relatively rare

If stricter consistency is needed, consider using cache locking or atomic operations.

289-301: LGTM!

The fake response generation is appropriate for testing purposes, with realistic token usage values.

backend/data/src/main/kotlin/io/tolgee/exceptions/LlmRateLimitedException.kt (1)

6-14: All call sites have been properly updated for the breaking API change.

The retryAt parameter is now required and non-nullable. Verification of the codebase confirms that both instantiations of LlmRateLimitedException in LlmProviderService.kt (lines 166 and 229) provide the required retryAt value. No orphaned call sites exist.

coderabbitai · 2026-01-09T15:33:12Z

ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt

+  fun <T> repeatWithTimeouts(
+    attempts: List<Int>,
+    callback: (restTemplate: RestTemplate) -> T,
+  ): T {
+    var lastError: Exception? = null
+    for (timeout in attempts) {
+      val restTemplate = restTemplateBuilder.readTimeout(Duration.ofSeconds(timeout.toLong())).build()
+      try {
+        return callback(restTemplate)
+      } catch (e: ResourceAccessException) {
+        lastError = e
+      }
+    }
+    throw FailedDependencyException(Message.LLM_PROVIDER_ERROR, listOf(lastError!!.message), lastError)
+  }


⚠️ Potential issue | 🟡 Minor

Handle edge case of empty attempts list.

If attempts is empty, the loop doesn't execute and lastError remains null, causing a NullPointerException on line 186. While this may be an unlikely scenario, defensive programming would handle it gracefully.

🛡️ Proposed fix

fun <T> repeatWithTimeouts( attempts: List<Int>, callback: (restTemplate: RestTemplate) -> T, ): T { + if (attempts.isEmpty()) { + throw BadRequestException(Message.LLM_PROVIDER_ERROR, listOf("No timeout attempts configured")) + } var lastError: Exception? = null for (timeout in attempts) { val restTemplate = restTemplateBuilder.readTimeout(Duration.ofSeconds(timeout.toLong())).build() try { return callback(restTemplate) } catch (e: ResourceAccessException) { lastError = e } } - throw FailedDependencyException(Message.LLM_PROVIDER_ERROR, listOf(lastError!!.message), lastError) + throw FailedDependencyException(Message.LLM_PROVIDER_ERROR, listOf(lastError?.message), lastError) }

🤖 Prompt for AI Agents

In @ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt around lines 173 - 187, The function repeatWithTimeouts can receive an empty attempts list, leaving lastError null and causing an NPE when throwing FailedDependencyException; add a guard at the start of repeatWithTimeouts to validate attempts is not empty (or set a clear default) and if it is empty throw a FailedDependencyException or IllegalArgumentException with a descriptive message, or when throwing FailedDependencyException ensure you supply a non-null cause/message by checking lastError and providing a fallback error message; update references in repeatWithTimeouts to use this validation so lastError is never dereferenced when attempts is empty.

JanCizmar requested a review from bdshadow January 9, 2026 10:27

coderabbitai bot reviewed Jan 9, 2026

View reviewed changes

ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt Outdated Show resolved Hide resolved

bdshadow requested changes Jan 9, 2026

View reviewed changes

ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt Outdated Show resolved Hide resolved

ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt Outdated Show resolved Hide resolved

coderabbitai bot reviewed Jan 9, 2026

View reviewed changes

ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt Outdated Show resolved Hide resolved

coderabbitai bot reviewed Jan 9, 2026

View reviewed changes

ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt Outdated Show resolved Hide resolved

coderabbitai bot reviewed Jan 9, 2026

View reviewed changes

ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt Outdated Show resolved Hide resolved

ee/backend/app/src/main/kotlin/io/tolgee/ee/service/LlmProviderService.kt Outdated Show resolved Hide resolved

JanCizmar requested a review from bdshadow January 9, 2026 14:26

JanCizmar marked this pull request as draft January 9, 2026 14:41

JanCizmar force-pushed the jancizmar/llm-retry-improvements branch from ba5a609 to a519cd0 Compare January 9, 2026 15:24

JanCizmar changed the title ~~fix: synchronize LLM provider selection and retry logic~~ fix: Handle rate-limiting exceptions consistently in LLM provider service Jan 9, 2026

JanCizmar marked this pull request as ready for review January 9, 2026 15:28

coderabbitai bot reviewed Jan 9, 2026

View reviewed changes

JanCizmar force-pushed the main branch from da9d355 to 599ad9c Compare January 9, 2026 16:35

Uh oh!

fix: Handle rate-limiting exceptions consistently in LLM provider service #3370

Are you sure you want to change the base?

fix: Handle rate-limiting exceptions consistently in LLM provider service #3370

Conversation

JanCizmar commented Jan 9, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

JanCizmar commented Jan 9, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 9, 2026 •

edited

Loading