Feat/cache pricing #1884

Siri-Ray · 2025-12-18T09:33:23Z

Summary by CodeRabbit

New Features
- Added token-based credit billing for individual model calls, including support for cache read/write tokens and model tier tracking.
- Enhanced message caching strategy with improved positioning for better performance optimization.
Refactor
- Transitioned credit billing from batch processing to per-call processing for more granular tracking and real-time billing.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

- Added logic to process credit billing for successful completions and user aborts. - Ensured compliance with coding standards, including optional chaining and nullish coalescing for safer property access. - Removed redundant credit billing logic from previous location to streamline the process.

- Implemented two-level caching strategy: - Global static point: caches system prompt across all sessions (index 0) - Session dynamic points: caches recent conversation history (last 3 messages before user query) - Replaced `instanceof` checks with `_getType()` method to properly handle deserialized messages - Added support for caching AI messages with tool calls

…fly-ai/refly into feat/cache-pricing

- Added SyncTokenCreditUsageJobData interface to capture detailed token usage data. - Implemented syncTokenCreditUsage method in CreditService to handle credit billing for individual model calls. - Updated SkillInvokerService to process credit billing directly upon model call completion, simplifying the billing logic. - Enhanced logging for credit billing processes to improve observability.

coderabbitai · 2025-12-18T09:33:37Z

Walkthrough

This PR introduces per-model-call token credit billing and refactors the credit billing flow. A new SyncTokenCreditUsageJobData interface and syncTokenCreditUsage method enable synchronous billing after each model invocation. The skill invoker transitions from batch billing accumulation to immediate per-call billing. Context caching logic is rewritten to apply cache points at two levels.

Changes

Cohort / File(s)	Change Summary
Credit DTO Enhancement `apps/api/src/modules/credit/credit.dto.ts`	Added new exported interface `SyncTokenCreditUsageJobData` with fields for token usage (input, output, cache read/write), billing context, model identifiers, and tier, expanding imports to include `ModelTier`.
Credit Service Billing `apps/api/src/modules/credit/credit.service.ts`	Implemented new public async method `syncTokenCreditUsage()` for per-call token billing. Validates user, checks early-bird status, computes credit cost, applies allowances, and delegates to `deductCreditsAndCreateUsage()`.
Skill Invoker Billing Refactor `apps/api/src/modules/skill/skill-invoker.service.ts`	Replaced batch token billing with per-call synchronous billing. Renamed `processCreditUsageReport()` to `processCreditBilling()`, changed signature to accept individual token counts and provider details, removed bulk-fetch scaffolding, and integrated direct `CreditService.syncTokenCreditUsage()` calls.
Message Context Caching Rewrite `packages/skill-template/src/scheduler/utils/message.ts`	Rewrote caching strategy from fixed trailing subset to two-level positioning (after SystemMessage at index 0 and after last three messages). Updated cache-point application to use `_getType()` for deserialized messages and handle system, human, and AI message types explicitly.

Sequence Diagram

sequenceDiagram
    participant SkillInvoker
    participant CreditService
    participant Database as DB

    SkillInvoker->>SkillInvoker: Invoke model call
    SkillInvoker->>SkillInvoker: Collect tokens (input, output, cache)
    
    SkillInvoker->>CreditService: syncTokenCreditUsage(data)
    activate CreditService
    
    CreditService->>Database: Validate user & check early-bird
    Database-->>CreditService: User & status
    
    CreditService->>CreditService: Compute creditCost<br/>(unit or fixed)
    CreditService->>CreditService: Apply early-bird<br/>allowance
    
    CreditService->>CreditService: Construct modelUsageDetails
    
    alt Zero Amount
        CreditService->>Database: Record zero-amount usage
    else Positive Cost
        CreditService->>CreditService: Deduct credits &<br/>create usage
        CreditService->>Database: Update user balance
    end
    
    Database-->>CreditService: Success
    CreditService-->>SkillInvoker: true/false
    deactivate CreditService
    
    SkillInvoker->>SkillInvoker: Continue execution

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

High-density logic in CreditService.syncTokenCreditUsage(): Early-bird allowance calculation, cost computation logic, and dual paths (zero-amount vs. deduction) require careful validation.
Significant refactoring in SkillInvoker: Method rename, signature change, removed batch scaffolding, and integration of per-call billing require verification that all call sites are updated and billing logic is preserved.
Message caching rewrite: Two-level strategy with message-type switching and array content handling needs validation against existing behavior.
Possible duplication note: The summary indicates syncTokenCreditUsage may be duplicated in the patch—verify the final state in credit.service.ts.

Possibly related PRs

feat: Support credit billing for cached tokens #1848: Adds cacheReadTokens and cacheWriteTokens fields to credit billing data structures.
feat: Add auto model support and default configuration #1763: Modifies the same skill-invoker and credit billing flow with per-call token billing data structures and model routing logic.
fix: Remove duplicate token credit usage #1835: Removes and rewires batch token billing, replacing it with per-call syncTokenCreditUsage pathway.

Suggested reviewers

mrcfps
nettee
lefarcen

Poem

🐰 Tokens flow swift, one by one,
No batch delays when calls are done.
Cache points positioned just so tight,
Billing per-call shines crystal bright! ✨

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title 'Feat/cache pricing' is vague and does not clearly convey the primary changes. The PR introduces a new token credit usage system with per-call billing, refactored credit service logic, and context caching strategy changes, but the title only hints at 'cache pricing' without being specific about the actual implementation.	Consider using a more descriptive title such as 'Add per-call token credit billing and refactor credit service' or 'Implement single-model token credit usage billing' to better reflect the main changes in the PR.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feat/cache-pricing

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (2)

apps/api/src/modules/credit/credit.service.ts (1)

1121-1157: Consider extracting credit cost calculation to a shared helper.

The credit cost calculation logic (lines 1121-1157) is nearly identical to the logic in syncBatchTokenCreditUsage (lines 996-1027). This duplication increases maintenance burden and risk of divergence.

🔎 Suggested refactor:

Extract a private helper method:

private calculateCreditCost(
  inputTokens: number,
  outputTokens: number,
  cacheReadTokens: number,
  cacheWriteTokens: number,
  creditBilling: CreditBilling,
): number {
  if (creditBilling.unit === '5k_tokens') {
    const perInputUnit = creditBilling.inputCost || 0;
    const perOutputUnit = creditBilling.outputCost || 0;
    const perCacheReadUnit = creditBilling.cacheReadCost ?? perInputUnit;
    const perCacheWriteUnit = creditBilling.cacheWriteCost ?? perInputUnit;

    const inputCost = (inputTokens / 5000) * perInputUnit;
    const outputCost = (outputTokens / 5000) * perOutputUnit;
    const cacheReadCost = (cacheReadTokens / 5000) * perCacheReadUnit;
    const cacheWriteCost = (cacheWriteTokens / 5000) * perCacheWriteUnit;

    return Math.ceil(inputCost + outputCost + cacheReadCost + cacheWriteCost);
  } else if (creditBilling.unit === '1m_tokens') {
    // Similar calculation with 1000000 divisor
    // ...
  } else {
    return Math.max(creditBilling.inputCost || 0, creditBilling.outputCost || 0);
  }
}

Then both syncTokenCreditUsage and syncBatchTokenCreditUsage can call this helper.

apps/api/src/modules/skill/skill-invoker.service.ts (1)

1525-1542: Clear method signature and documentation.

The refactored method has a clear purpose. However, the parameter type for actualProviderItem is an inline object type which could be extracted for better readability.

🔎 Consider extracting the inline type:

+type ProviderItemBillingInfo = {
+  name?: string;
+  creditBilling?: string;
+  tier?: string;
+  provider?: { name?: string };
+};

 private async processCreditBilling(
   user: User,
   resultId: string,
   version: number,
   inputTokens: number,
   outputTokens: number,
   cacheReadTokens: number,
   cacheWriteTokens: number,
-  actualProviderItem:
-    | { name?: string; creditBilling?: string; tier?: string; provider?: { name?: string } }
-    | null
-    | undefined,
+  actualProviderItem: ProviderItemBillingInfo | null | undefined,
   requestProviderItem?: ProviderItem,
 ): Promise<void> {

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1bba87d and c14ec8a.

📒 Files selected for processing (4)

apps/api/src/modules/credit/credit.dto.ts (1 hunks)
apps/api/src/modules/credit/credit.service.ts (2 hunks)
apps/api/src/modules/skill/skill-invoker.service.ts (5 hunks)
packages/skill-template/src/scheduler/utils/message.ts (3 hunks)

🧰 Additional context used

📓 Path-based instructions (15)

**/*.{js,ts,jsx,tsx}

📄 CodeRabbit inference engine (.cursorrules)

**/*.{js,ts,jsx,tsx}: Always use optional chaining (?.) when accessing object properties
Always use nullish coalescing (??) or default values for potentially undefined values
Always check array existence before using array methods
Always validate object properties before destructuring
Always use single quotes for string literals in JavaScript/TypeScript code

**/*.{js,ts,jsx,tsx}: Use semicolons at the end of statements
Include spaces around operators (e.g., a + b instead of a+b)
Always use curly braces for control statements
Place opening braces on the same line as their statement

**/*.{js,ts,jsx,tsx}: Group import statements in order: React/framework libraries, third-party libraries, internal modules, relative path imports, type imports, style imports
Sort imports alphabetically within each import group
Leave a blank line between import groups
Extract complex logic into custom hooks
Use functional updates for state (e.g., setCount(prev => prev + 1))
Split complex state into multiple state variables rather than single large objects
Use useReducer for complex state logic instead of multiple useState calls

Files:

apps/api/src/modules/credit/credit.dto.ts
packages/skill-template/src/scheduler/utils/message.ts
apps/api/src/modules/skill/skill-invoker.service.ts
apps/api/src/modules/credit/credit.service.ts

**/*.{js,ts,tsx,jsx,py,java,cpp,c,cs,rb,go,rs,php,swift,kt,scala,r,m,mm,sql}

📄 CodeRabbit inference engine (.cursor/rules/00-language-priority.mdc)

**/*.{js,ts,tsx,jsx,py,java,cpp,c,cs,rb,go,rs,php,swift,kt,scala,r,m,mm,sql}: All code comments MUST be written in English
All variable names, function names, class names, and other identifiers MUST use English words
Comments should be concise and explain 'why' rather than 'what'
Use proper grammar and punctuation in comments
Keep comments up-to-date when code changes
Document complex logic, edge cases, and important implementation details
Use clear, descriptive names that indicate purpose
Avoid abbreviations unless they are universally understood

Files:

apps/api/src/modules/credit/credit.dto.ts
packages/skill-template/src/scheduler/utils/message.ts
apps/api/src/modules/skill/skill-invoker.service.ts
apps/api/src/modules/credit/credit.service.ts

**/*.{js,ts,tsx,jsx}

📄 CodeRabbit inference engine (.cursor/rules/00-language-priority.mdc)

Use JSDoc style comments for functions and classes in JavaScript/TypeScript

Files:

apps/api/src/modules/credit/credit.dto.ts
packages/skill-template/src/scheduler/utils/message.ts
apps/api/src/modules/skill/skill-invoker.service.ts
apps/api/src/modules/credit/credit.service.ts

**/*.{js,jsx,ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/01-code-style.mdc)

**/*.{js,jsx,ts,tsx}: Use single quotes for string literals in TypeScript/JavaScript
Always use optional chaining (?.) when accessing object properties in TypeScript/JavaScript
Always use nullish coalescing (??) or default values for potentially undefined values in TypeScript/JavaScript
Always check array existence before using array methods in TypeScript/JavaScript
Validate object properties before destructuring in TypeScript/JavaScript
Use ES6+ features like arrow functions, destructuring, and spread operators in TypeScript/JavaScript
Avoid magic numbers and strings - use named constants in TypeScript/JavaScript
Use async/await instead of raw promises for asynchronous code in TypeScript/JavaScript

Files:

apps/api/src/modules/credit/credit.dto.ts
packages/skill-template/src/scheduler/utils/message.ts
apps/api/src/modules/skill/skill-invoker.service.ts
apps/api/src/modules/credit/credit.service.ts

**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/03-typescript-guidelines.mdc)

**/*.{ts,tsx}: Avoid using any type whenever possible - use unknown type instead with proper type guards
Always define explicit return types for functions, especially for public APIs
Prefer extending existing types over creating entirely new types
Use TypeScript utility types (Partial<T>, Pick<T, K>, Omit<T, K>, Readonly<T>, Record<K, T>) to derive new types
Use union types and intersection types to combine existing types
Always import types explicitly using the import type syntax
Group type imports separately from value imports
Minimize creating local type aliases for imported types

Files:

apps/api/src/modules/credit/credit.dto.ts
packages/skill-template/src/scheduler/utils/message.ts
apps/api/src/modules/skill/skill-invoker.service.ts
apps/api/src/modules/credit/credit.service.ts

**/*.{js,ts,jsx,tsx,css,json}

📄 CodeRabbit inference engine (.cursor/rules/04-code-formatting.mdc)

Maximum line length of 100 characters

Files:

apps/api/src/modules/credit/credit.dto.ts
packages/skill-template/src/scheduler/utils/message.ts
apps/api/src/modules/skill/skill-invoker.service.ts
apps/api/src/modules/credit/credit.service.ts

**/*.{js,ts,jsx,tsx,css,json,yml,yaml}

📄 CodeRabbit inference engine (.cursor/rules/04-code-formatting.mdc)

Use 2 spaces for indentation, no tabs

Files:

apps/api/src/modules/credit/credit.dto.ts
packages/skill-template/src/scheduler/utils/message.ts
apps/api/src/modules/skill/skill-invoker.service.ts
apps/api/src/modules/credit/credit.service.ts

**/*.{js,ts,jsx,tsx,css,json,yml,yaml,md}

📄 CodeRabbit inference engine (.cursor/rules/04-code-formatting.mdc)

No trailing whitespace at the end of lines

Files:

apps/api/src/modules/credit/credit.dto.ts
packages/skill-template/src/scheduler/utils/message.ts
apps/api/src/modules/skill/skill-invoker.service.ts
apps/api/src/modules/credit/credit.service.ts

**/*.{css,scss,sass,less,js,jsx,ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/09-design-system.mdc)

**/*.{css,scss,sass,less,js,jsx,ts,tsx}: Primary color (#155EEF) should be used for main brand color in buttons, links, and accents
Error color (#F04438) should be used for error states and destructive actions
Success color (#12B76A) should be used for success states and confirmations
Warning color (#F79009) should be used for warnings and important notifications
Info color (#0BA5EC) should be used for informational elements

Files:

apps/api/src/modules/credit/credit.dto.ts
packages/skill-template/src/scheduler/utils/message.ts
apps/api/src/modules/skill/skill-invoker.service.ts
apps/api/src/modules/credit/credit.service.ts

**/*.{tsx,ts}

📄 CodeRabbit inference engine (.cursor/rules/09-i18n-guidelines.mdc)

**/*.{tsx,ts}: Use the translation wrapper component and useTranslation hook in components
Ensure all user-facing text is translatable

Files:

apps/api/src/modules/credit/credit.dto.ts
packages/skill-template/src/scheduler/utils/message.ts
apps/api/src/modules/skill/skill-invoker.service.ts
apps/api/src/modules/credit/credit.service.ts

**/*.{tsx,ts,json}

📄 CodeRabbit inference engine (.cursor/rules/09-i18n-guidelines.mdc)

Support dynamic content with placeholders in translations

Files:

apps/api/src/modules/credit/credit.dto.ts
packages/skill-template/src/scheduler/utils/message.ts
apps/api/src/modules/skill/skill-invoker.service.ts
apps/api/src/modules/credit/credit.service.ts

**/*.{tsx,ts,jsx,js,vue,css,scss,less}

📄 CodeRabbit inference engine (.cursor/rules/11-ui-design-patterns.mdc)

**/*.{tsx,ts,jsx,js,vue,css,scss,less}: Use the primary blue (#155EEF) for main UI elements, CTAs, and active states
Use red (#F04438) only for errors, warnings, and destructive actions
Use green (#12B76A) for success states and confirmations
Use orange (#F79009) for warning states and important notifications
Use blue (#0BA5EC) for informational elements
Primary buttons should be solid with the primary color
Secondary buttons should have a border with transparent or light background
Danger buttons should use the error color
Use consistent padding, border radius, and hover states for all buttons
Follow fixed button sizes based on their importance and context
Use consistent border radius (rounded-lg) for all cards
Apply light shadows (shadow-sm) for card elevation
Maintain consistent padding inside cards (p-4 or p-6)
Use subtle borders for card separation
Ensure proper spacing between card elements
Apply consistent styling to all form inputs
Use clear visual indicators for focus, hover, and error states in form elements
Apply proper spacing between elements using 8px, 16px, 24px increments
Ensure proper alignment of elements (left, center, or right)
Use responsive layouts that work across different device sizes
Maintain a minimum contrast ratio of 4.5:1 for text

Files:

apps/api/src/modules/credit/credit.dto.ts
packages/skill-template/src/scheduler/utils/message.ts
apps/api/src/modules/skill/skill-invoker.service.ts
apps/api/src/modules/credit/credit.service.ts

**/*.{tsx,ts,jsx,js,vue}

📄 CodeRabbit inference engine (.cursor/rules/11-ui-design-patterns.mdc)

**/*.{tsx,ts,jsx,js,vue}: Include appropriate loading states for async actions in buttons
Group related form elements with appropriate spacing
Provide clear validation feedback for forms
Ensure proper labeling and accessibility for form elements
Ensure all interactive elements are keyboard accessible
Include appropriate ARIA attributes for complex components
Provide alternative text for images and icons
Support screen readers with semantic HTML elements

Files:

apps/api/src/modules/credit/credit.dto.ts
packages/skill-template/src/scheduler/utils/message.ts
apps/api/src/modules/skill/skill-invoker.service.ts
apps/api/src/modules/credit/credit.service.ts

**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.cursor/rules/08-contributing-guidelines.mdc)

**/*.{ts,tsx,js,jsx}: Follow the TypeScript/JavaScript style guidelines
Ensure code is well-tested and documented

Files:

apps/api/src/modules/credit/credit.dto.ts
packages/skill-template/src/scheduler/utils/message.ts
apps/api/src/modules/skill/skill-invoker.service.ts
apps/api/src/modules/credit/credit.service.ts

apps/api/src/**/*.{controller,service}.ts

📄 CodeRabbit inference engine (.cursor/rules/06-api-structure.mdc)

Implement proper error handling in API modules

Files:

apps/api/src/modules/skill/skill-invoker.service.ts
apps/api/src/modules/credit/credit.service.ts

🧬 Code graph analysis (2)

apps/api/src/modules/credit/credit.dto.ts (1)

packages/openapi-schema/src/types.gen.ts (2)

CreditBilling (6095-6124)

ModelTier (1532-1532)

apps/api/src/modules/credit/credit.service.ts (2)

apps/api/src/modules/credit/credit.dto.ts (2)

SyncTokenCreditUsageJobData (12-29)

ModelUsageDetail (61-71)

packages/utils/src/id.ts (1)

genCreditUsageId (285-288)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: build / Build

🔇 Additional comments (13)

apps/api/src/modules/credit/credit.dto.ts (2)

1-1: LGTM!

The import statement correctly adds ModelTier type using the import type syntax pattern established in the file.

8-29: Well-structured interface with clear documentation.

The SyncTokenCreditUsageJobData interface is well-designed for per-call billing. The JSDoc comments clearly explain the purpose and usage context.

One minor inconsistency to note: cacheReadTokens and cacheWriteTokens are required here but optional in ModelUsageDetail (lines 68-69). This is acceptable since this interface is specifically for billing where cache tokens should always be tracked (even if 0), but worth keeping consistent if these interfaces are meant to be interchangeable.

apps/api/src/modules/credit/credit.service.ts (4)

17-17: LGTM!

Import correctly added for the new DTO type.

1092-1096: Good JSDoc documentation for the new method.

The comment clearly describes the purpose and trigger point for this billing method.

1184-1198: LGTM on the zero-cost early return path.

The early return correctly creates a usage record with amount: 0 and dueAmount tracking the original cost, and returns false to indicate no recharge is required.

1200-1215: Credit deduction path looks correct.

The method properly delegates to deductCreditsAndCreateUsage and returns the requireRecharge flag to allow the caller to handle insufficient credits.
apps/api/src/modules/skill/skill-invoker.service.ts (6)
55-55: LGTM!

Import correctly updated to use the new SyncTokenCreditUsageJobData type.

1067-1084: Per-call billing integration looks correct.

The billing is triggered at the right point (on_chat_model_end) with proper gating via shouldBillCredits. The condition correctly allows billing for successful completions and user aborts (since partial usage should be charged).

1548-1574: Routing logic handles Auto model correctly.

The code properly checks for routed data and fetches the original item's billing configuration when the request came through an Auto model. Good fallback behavior with warning log on failure.

1576-1584: Early return on missing credit billing is appropriate.

The warning log helps with debugging when billing config is missing.

1586-1600: Credit usage data construction is correct.

All required fields from SyncTokenCreditUsageJobData are properly populated. Note that tier from the DTO interface is not being set here - verify if this is intentional.

The SyncTokenCreditUsageJobData interface includes an optional tier?: ModelTier field, but it's not being set in the creditUsageData object. If tier-based billing or reporting is needed, this should be added:
   const creditUsageData: SyncTokenCreditUsageJobData = {
     uid: user.uid,
     resultId,
     version,
     inputTokens,
     outputTokens,
     cacheReadTokens,
     cacheWriteTokens,
     creditBilling,
     modelName: billingModelName || 'unknown',
     actualModelName: actualProviderItem?.name,
     modelProvider: actualProviderItem?.provider?.name,
+    tier: actualProviderItem?.tier as ModelTier | undefined,
     timestamp: new Date(),
   };
1602-1610: Credit billing processing with proper error handling.

The method correctly throws ModelUsageQuotaExceeded when credits are insufficient, which will be caught by the caller and handled appropriately. The info log provides useful debugging information.
packages/skill-template/src/scheduler/utils/message.ts (1)

198-216: LGTM! AI message caching implementation is correct.

The new AI message caching logic properly:

Converts content to text format for caching

Applies the cachePoint marker after the content

Preserves essential properties (tool_calls, additional_kwargs)

Follows the same pattern as system and human message handling

This ensures AI messages in the conversation history can be cached while maintaining their full context.

coderabbitai · 2025-12-18T09:36:52Z

packages/skill-template/src/scheduler/utils/message.ts

+ * Two-level caching strategy:
+ * 1. Global Static Point: After System Prompt (index 0)
+ *    - Shared across all users and sessions
+ *    - Contains: System Prompt + Tool Definitions + Examples
+ *    - Benefits: Write once, reuse globally
+ * 2. Session Dynamic Point: After the second-to-last message (messages.length - 2)
+ *    - Caches the conversation history for the current user
+ *    - Benefits: Reuses multi-turn conversation context within a session
+ *
+ * LangChain AWS uses cachePoint markers:
 * - Format: { cachePoint: { type: 'default' } }
- * - Only cache messages before the final message (not the last user query)
+ * - Place the cachePoint marker AFTER the content to cache
 */


⚠️ Potential issue | 🟡 Minor

Documentation inconsistency with implementation.

Line 122 states the Session Dynamic Point is "After the second-to-last message (messages.length - 2)", but the actual implementation (lines 138-141) caches THREE positions: length-2, length-3, and length-4.

🔎 Apply this diff to correct the documentation:

-* 2. Session Dynamic Point: After the second-to-last message (messages.length - 2) +* 2. Session Dynamic Point: After the last 3 messages except the final user message (indices length-2, length-3, length-4)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

* Two-level caching strategy:

* 1. Global Static Point: After System Prompt (index 0)

* - Shared across all users and sessions

* - Contains: System Prompt + Tool Definitions + Examples

* - Benefits: Write once, reuse globally

* 2. Session Dynamic Point: After the second-to-last message (messages.length - 2)

* - Caches the conversation history for the current user

* - Benefits: Reuses multi-turn conversation context within a session

*

* LangChain AWS uses cachePoint markers:

* - Format: { cachePoint: { type: 'default' } }

* - Only cache messages before the final message (not the last user query)

* - Place the cachePoint marker AFTER the content to cache

*/

* Two-level caching strategy:

* 1. Global Static Point: After System Prompt (index 0)

* - Shared across all users and sessions

* - Contains: System Prompt + Tool Definitions + Examples

* - Benefits: Write once, reuse globally

* 2. Session Dynamic Point: After the last 3 messages except the final user message (indices length-2, length-3, length-4)

* - Caches the conversation history for the current user

* - Benefits: Reuses multi-turn conversation context within a session

*

* LangChain AWS uses cachePoint markers:

* - Format: { cachePoint: { type: 'default' } }

* - Place the cachePoint marker AFTER the content to cache

*/

🤖 Prompt for AI Agents

In packages/skill-template/src/scheduler/utils/message.ts around lines 117 to 129, the documentation incorrectly states the Session Dynamic Point caches only the second-to-last message (messages.length - 2) while the implementation caches three positions; update the docblock to describe that the Session Dynamic Point places cache markers after the second-to-last, third-to-last, and fourth-to-last messages (messages.length - 2, messages.length - 3, messages.length - 4) and keep the note that cachePoint markers must be placed AFTER the content to cache so the docs match the code.

coderabbitai · 2025-12-18T09:36:53Z

packages/skill-template/src/scheduler/utils/message.ts

+    // Determine if this message should have a cache point
+    // 1. Global Static Point: After System Prompt (index 0)
+    // 2. Session Dynamic Point: After the last 3 messages except the last user message (index -2, -3, -4)
+    const isGlobalStaticPoint = index === 0;
+    const isSessionDynamicPoint =
+      index === messages.length - 2 ||
+      index === messages.length - 3 ||
+      index === messages.length - 4;
+    const shouldAddCachePoint = isGlobalStaticPoint || isSessionDynamicPoint;
+
+    if (!shouldAddCachePoint) {
+      return message;
+    }


⚠️ Potential issue | 🟠 Major

Potential duplicate cache points for short message arrays.

When the message array has 2-4 messages, the system message at index 0 could receive duplicate cache point markers because it matches both isGlobalStaticPoint and one of the isSessionDynamicPoint conditions:

messages.length = 2: index 0 matches both global (index 0) and session (length-2 = 0)

messages.length = 3: index 0 matches both global (index 0) and session (length-3 = 0)

messages.length = 4: index 0 matches both global (index 0) and session (length-4 = 0)

While this won't crash, it applies caching transformation twice to the same message, which could cause unexpected behavior depending on how the cache provider handles duplicate markers.

🔎 Apply this diff to prevent duplicate cache points:

const applyContextCaching = (messages: BaseMessage[]): BaseMessage[] => { if (messages.length <= 1) return messages; return messages.map((message, index) => { // Determine if this message should have a cache point // 1. Global Static Point: After System Prompt (index 0) // 2. Session Dynamic Point: After the last 3 messages except the last user message (index -2, -3, -4) const isGlobalStaticPoint = index === 0; const isSessionDynamicPoint = - index === messages.length - 2 || - index === messages.length - 3 || - index === messages.length - 4; + index > 0 && ( + index === messages.length - 2 || + index === messages.length - 3 || + index === messages.length - 4 + ); const shouldAddCachePoint = isGlobalStaticPoint || isSessionDynamicPoint;

🤖 Prompt for AI Agents

In packages/skill-template/src/scheduler/utils/message.ts around lines 134 to 146, the logic that marks cache points can double-mark index 0 for short messages (length 2–4) because isSessionDynamicPoint includes indices that can equal 0; change the condition so session dynamic points exclude index 0 (e.g., require index > 0) or compute session indices first and dedupe before combining with the global static point, then use the combined unique result to decide shouldAddCachePoint so index 0 cannot be added twice.

coderabbitai · 2025-12-18T09:36:53Z

packages/skill-template/src/scheduler/utils/message.ts

+    // Use _getType() instead of instanceof to handle deserialized messages
+    const messageType = message._getType();


⚠️ Potential issue | 🟡 Minor

Use .getType() method instead of _getType() for better API stability.

Current LangChain documentation recommends using .getType() instead, and official code examples use message.getType(). While _getType() may work, typeguard functions like isAIMessage() are the preferred alternative, especially for proper type checking.

🤖 Prompt for AI Agents

In packages/skill-template/src/scheduler/utils/message.ts around lines 149 to 150, the code uses the internal method _getType() which is unstable; replace the call to message._getType() with the public API message.getType() (or, where appropriate, use the provided typeguard helpers like isAIMessage()/isHumanMessage() for type checks) so the code relies on the supported, stable LangChain API and proper typeguards for deserialized messages.

Siri-Ray and others added 4 commits December 16, 2025 19:22

Merge branch 'fix/prompt-caching-breakpoint' of https://github.com/re…

1b38bf3

…fly-ai/refly into feat/cache-pricing

coderabbitai bot reviewed Dec 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat/cache pricing #1884

Feat/cache pricing #1884

Siri-Ray commented Dec 18, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 18, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Dec 18, 2025

Uh oh!

coderabbitai bot Dec 18, 2025

Uh oh!

coderabbitai bot Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		// Use _getType() instead of instanceof to handle deserialized messages
		const messageType = message._getType();

Feat/cache pricing #1884

Are you sure you want to change the base?

Feat/cache pricing #1884

Conversation

Siri-Ray commented Dec 18, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Dec 18, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Siri-Ray commented Dec 18, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 18, 2025 •

edited

Loading