Skip to content

docs: improve prompt caching notebook with clearer examples and metrics,#289

Merged
PedramNavid merged 1 commit into
mainfrom
pdrm/docs/prompt-caching-improvements
Dec 17, 2025
Merged

docs: improve prompt caching notebook with clearer examples and metrics,#289
PedramNavid merged 1 commit into
mainfrom
pdrm/docs/prompt-caching-improvements

Conversation

@PedramNavid

@PedramNavid PedramNavid commented Nov 15, 2025

Copy link
Copy Markdown
Collaborator

closes #288

  • Separated non-cached baseline from first/second cached calls to show cache creation vs. hit

Type of Change

  • New cookbook
  • Bug fix (fixes an issue in existing cookbook)
  • Documentation update
  • Code quality improvement (refactoring, optimization)
  • Dependency update
  • Other (please describe):

Testing

  • I have tested this cookbook/change locally
  • All cells execute without errors

Note: Pull requests that do not follow these guidelines or lack sufficient detail may be closed. This helps us maintain high-quality, useful cookbooks for the community. If you're unsure about any requirements, please open an issue to discuss before submitting a PR.

closes #288

Enhanced the notebook to better demonstrate prompt caching benefits:
- Separated non-cached baseline from first/second cached calls to show cache creation vs. hit
- Added timestamp to prevent unintended cache reuse across runs
- Clarified that first cached call creates cache, second call benefits from it
- Added explicit cache read/write token metrics to output
- Included performance comparison summary showing ~2x speedup
- Updated explanations to emphasize that prompt caching requires two calls
- Reformatted output display for better readability
- Added visual separator for performance comparison section
@github-actions

Copy link
Copy Markdown

Notebook Changes

This PR modifies the following notebooks:

📓 misc/prompt_caching.ipynb

View diff
nbdiff misc/prompt_caching.ipynb (b29ec6f109c0379fa2eb620611a3d504e28fba09) misc/prompt_caching.ipynb (7a7034462ccd08cb33e5c62ee482705f102ca4ad)
--- misc/prompt_caching.ipynb (b29ec6f109c0379fa2eb620611a3d504e28fba09)  (no timestamp)
+++ misc/prompt_caching.ipynb (7a7034462ccd08cb33e5c62ee482705f102ca4ad)  (no timestamp)
## modified /cells/3/source:
@@ -4,4 +4,5 @@ import requests
 from bs4 import BeautifulSoup
 
 client = anthropic.Anthropic()
-MODEL_NAME = "claude-sonnet-4-5"
+MODEL_NAME = "claude-sonnet-4-5"
+TIMESTAMP = int(time.time())

## inserted before /cells/7:
+  markdown cell:
+    source:
+      ### Part 1: Non-cached API Call (Baseline)
+      
+      First, let's make a truly non-cached API call **without** the `cache_control` parameter. This will establish our baseline performance.
+      
+      We'll ask for a short output to keep response generation time low, since prompt caching only affects input processing time.
+  code cell:
+    source:
+      def make_non_cached_api_call():
+          """Make an API call WITHOUT cache_control - no caching enabled."""
+          messages = [
+              {
+                  "role": "user",
+                  "content": [
+                      {
+                          "type": "text",
+                          "text": str(TIMESTAMP) + "<book>" + book_content + "</book>",
+                          # Note: No cache_control parameter here - this is truly non-cached
+                      },
+                      {"type": "text", "text": "What is the title of this book? Only output the title."},
+                  ],
+              }
+          ]
+      
+          start_time = time.time()
+          response = client.messages.create(
+              model=MODEL_NAME,
+              max_tokens=300,
+              messages=messages,
+          )
+          end_time = time.time()
+      
+          return response, end_time - start_time
+      
+      
+      non_cached_response, non_cached_time = make_non_cached_api_call()
+      
+      print(f"Non-cached API call time: {non_cached_time:.2f} seconds")
+      print(f"Input tokens: {non_cached_response.usage.input_tokens}")
+      print(f"Output tokens: {non_cached_response.usage.output_tokens}")
+      
+      print("\nResponse:")
+      print(non_cached_response.content[0].text)
+    outputs:
+      output 0:
+        output_type: stream
+        name: stdout
+        text:
+          Non-cached API call time: 6.86 seconds
+          Input tokens: 187363
+          Output tokens: 8
+          
+          Response:
+          Pride and Prejudice
+  markdown cell:
+    source:
+      ### Part 2: First Cached API Call (Cache Creation)
+      
+      Now let's enable prompt caching by adding `cache_control: {"type": "ephemeral"}` to the book content. 
+      
+      **Important:** The first call with `cache_control` will **create** the cache entry. This initial call will have similar timing to the non-cached call because it still needs to process all tokens. However, it will store them in the cache for future use.
+      
+      Look for the `cache_creation_input_tokens` field in the usage stats to see how many tokens were cached.

## deleted /cells/7:
-  markdown cell:
-    source:
-      ### Part 1: Non-cached API Call
-      
-      First, let's make a non-cached API call. This will load the prompt into the cache so that our subsequent cached API calls can benefit from the prompt caching.
-      
-      We will ask for a short output string to keep the output response time low since the benefit of prompt caching applies only to the input processing time.

## inserted before /cells/8/outputs/0:
+  output:
+    output_type: stream
+    name: stdout
+    text:
+      First cached API call time: 5.96 seconds
+      Input tokens: 16
+      Output tokens: 8
+      Cache creation tokens: 187347
+      
+      Response:
+      Pride and Prejudice
+      
+      Note: This first call creates the cache but doesn't benefit from it yet - timing is similar to non-cached call.

## deleted /cells/8/outputs/0:
-  output:
-    output_type: stream
-    name: stdout
-    text:
-      Non-cached API call time: 20.37 seconds
-      Non-cached API call input tokens: 17
-      Non-cached API call output tokens: 8
-      
-      Summary (non-cached):
-      [TextBlock(text='Pride and Prejudice', type='text')]

## modified /cells/8/source:
@@ -1,12 +1,13 @@
-def make_non_cached_api_call():
+def make_cached_api_call_create():
+    """First call WITH cache_control - creates the cache entry."""
     messages = [
         {
             "role": "user",
             "content": [
                 {
                     "type": "text",
-                    "text": "<book>" + book_content + "</book>",
-                    "cache_control": {"type": "ephemeral"},
+                    "text": str(TIMESTAMP) + "<book>" + book_content + "</book>",
+                    "cache_control": {"type": "ephemeral"},  # This enables caching
                 },
                 {"type": "text", "text": "What is the title of this book? Only output the title."},
             ],
@@ -24,11 +25,18 @@ def make_non_cached_api_call():
     return response, end_time - start_time
 
 
-non_cached_response, non_cached_time = make_non_cached_api_call()
+cached_create_response, cached_create_time = make_cached_api_call_create()
 
-print(f"Non-cached API call time: {non_cached_time:.2f} seconds")
-print(f"Non-cached API call input tokens: {non_cached_response.usage.input_tokens}")
-print(f"Non-cached API call output tokens: {non_cached_response.usage.output_tokens}")
+print(f"First cached API call time: {cached_create_time:.2f} seconds")
+print(f"Input tokens: {cached_create_response.usage.input_tokens}")
+print(f"Output tokens: {cached_create_response.usage.output_tokens}")
+print(
+    f"Cache creation tokens: {getattr(cached_create_response.usage, 'cache_creation_input_tokens', 0)}"
+)
 
-print("\nSummary (non-cached):")
-print(non_cached_response.content)
+print("\nResponse:")
+print(cached_create_response.content[0].text)
+
+print(
+    "\nNote: This first call creates the cache but doesn't benefit from it yet - timing is similar to non-cached call."
+)

## inserted before /cells/9:
+  markdown cell:
+    source:
+      ### Part 3: Second Cached API Call (Cache Hit)
+      
+      Now let's make another API call with the same `cache_control` parameter. Since the cache was created in Part 2, this call will **read from the cache** instead of processing all tokens again.
+      
+      This is where you see the real performance benefit! Look for the `cache_read_input_tokens` field in the usage stats.

## deleted /cells/9:
-  markdown cell:
-    source:
-      ### Part 2: Cached API Call
-      
-      Now, let's make a cached API call. I'll add in the "cache_control": {"type": "ephemeral"} attribute to the content object and add the "prompt-caching-2024-07-31" beta header to the request. This will enable prompt caching for this API call.
-      
-      To keep the output latency constant, we will ask Claude the same question as before. Note that this question is not part of the cached content.

## inserted before /cells/10/outputs/0:
+  output:
+    output_type: stream
+    name: stdout
+    text:
+      Second cached API call time: 3.66 seconds
+      Input tokens: 16
+      Output tokens: 8
+      Cache read tokens: 187347
+      
+      Response:
+      Pride and Prejudice
+      
+      ======================================================================
+      PERFORMANCE COMPARISON
+      ======================================================================
+      Non-cached call:       6.86s
+      First cached call:     5.96s (creates cache)
+      Second cached call:    3.66s (reads from cache)
+      
+      Speedup from caching:  1.9x faster!
+      ======================================================================

## deleted /cells/10/outputs/0:
-  output:
-    output_type: stream
-    name: stdout
-    text:
-      Cached API call time: 2.92 seconds
-      Cached API call input tokens: 17
-      Cached API call output tokens: 8
-      
-      Summary (cached):
-      [TextBlock(text='Pride and Prejudice', type='text')]

## modified /cells/10/source:
@@ -1,12 +1,13 @@
-def make_cached_api_call():
+def make_cached_api_call_hit():
+    """Second call WITH cache_control - reads from existing cache."""
     messages = [
         {
             "role": "user",
             "content": [
                 {
                     "type": "text",
-                    "text": "<book>" + book_content + "</book>",
-                    "cache_control": {"type": "ephemeral"},
+                    "text": str(TIMESTAMP) + "<book>" + book_content + "</book>",
+                    "cache_control": {"type": "ephemeral"},  # Same cache_control as before
                 },
                 {"type": "text", "text": "What is the title of this book? Only output the title."},
             ],
@@ -24,11 +25,21 @@ def make_cached_api_call():
     return response, end_time - start_time
 
 
-cached_response, cached_time = make_cached_api_call()
+cached_hit_response, cached_hit_time = make_cached_api_call_hit()
 
-print(f"Cached API call time: {cached_time:.2f} seconds")
-print(f"Cached API call input tokens: {cached_response.usage.input_tokens}")
-print(f"Cached API call output tokens: {cached_response.usage.output_tokens}")
+print(f"Second cached API call time: {cached_hit_time:.2f} seconds")
+print(f"Input tokens: {cached_hit_response.usage.input_tokens}")
+print(f"Output tokens: {cached_hit_response.usage.output_tokens}")
+print(f"Cache read tokens: {getattr(cached_hit_response.usage, 'cache_read_input_tokens', 0)}")
 
-print("\nSummary (cached):")
-print(cached_response.content)
+print("\nResponse:")
+print(cached_hit_response.content[0].text)
+
+print("\n" + "=" * 70)
+print("PERFORMANCE COMPARISON")
+print("=" * 70)
+print(f"Non-cached call:       {non_cached_time:.2f}s")
+print(f"First cached call:     {cached_create_time:.2f}s (creates cache)")
+print(f"Second cached call:    {cached_hit_time:.2f}s (reads from cache)")
+print(f"\nSpeedup from caching:  {non_cached_time / cached_hit_time:.1f}x faster!")
+print("=" * 70)

## inserted before /cells/11:
+  markdown cell:
+    source:
+      ## Summary of Example 1
+      
+      This example demonstrated three distinct scenarios:
+      
+      1. **Non-cached call** - Without `cache_control`, Claude processes all ~187k tokens normally
+      2. **First cached call** - With `cache_control`, Claude processes all tokens AND stores them in cache (similar timing to non-cached)
+      3. **Second cached call** - With `cache_control`, Claude reads from the existing cache (2-10x faster!)
+      
+      The key insight: **Prompt caching requires two calls to show benefits**
+      - The first call with `cache_control` creates the cache entry
+      - Subsequent calls with the same `cache_control` read from the cache for dramatic speedups
+      
+      This is especially valuable for:
+      - Large documents or codebases that remain constant across multiple queries
+      - System prompts with detailed instructions
+      - Multi-turn conversations (as shown in Example 2 below)

## deleted /cells/11:
-  markdown cell:
-    source:
-      As you can see, the cached API call only took 3.64 seconds total compared to 21.44 seconds for the non-cached API call. This is a significant improvement in overall latency due to caching.

## inserted before /cells/13/outputs/0:
+  output:
+    output_type: stream
+    name: stdout
+    text:
+      
+      Turn 1:
+      User: What is the title of this novel?
+      Assistant: The title of this novel is **Pride and Prejudice** by Jane Austen.
+      User input tokens: 3
+      Output tokens: 22
+      Input tokens (cache read): 0
+      Input tokens (cache write): 187360
+      0.0% of input prompt cached (3 tokens)
+      Time taken: 5.79 seconds
+      
+      Turn 2:
+      User: Who are Mr. and Mrs. Bennet?
+      Assistant: Mr. and Mrs. Bennet are the parents of five daughters: Jane, Elizabeth, Mary, Kitty, and Lydia. 
+      
+      **Mr. Bennet** is described as an intelligent, sarcastic man with "quick parts, sarcastic humour, reserve, and caprice." He tends to be detached and ironic, often amusing himself at his wife's expense, and prefers to spend time in his library rather than deal with family matters.
+      
+      **Mrs. Bennet** is described as "a woman of mean understanding, little information, and uncertain temper." She is nervous, excitable, and foolish, with her main goal in life being to get her daughters married. She lacks the intelligence and social graces of her husband and is often oblivious to her own impropriety.
+      
+      Their contrasting personalities create much of the domestic tension and comedy in the novel. Mr. Bennet married Mrs. Bennet when he was "captivated by youth and beauty," but her weak understanding soon ended any real affection he had for her, leaving him to cope with his disappointment through ironic detachment.
+      User input tokens: 3
+      Output tokens: 247
+      Input tokens (cache read): 187360
+      Input tokens (cache write): 36
+      100.0% of input prompt cached (187363 tokens)
+      Time taken: 8.29 seconds
+      
+      Turn 3:
+      User: What is Netherfield Park?
+      Assistant: **Netherfield Park** is a large estate in Hertfordshire that is rented by Mr. Bingley at the beginning of the novel. 
+      
+      It is located about three miles from Longbourn, the Bennet family's home, making it conveniently close for social visits. The estate becomes the center of much excitement and speculation when Mr. Bingley, a wealthy young bachelor, takes up residence there.
+      
+      Key points about Netherfield:
+      
+      - It's described as a good house with pleasant grounds
+      - Mr. Bingley rents it (rather than owning it), as he has not yet purchased an estate of his own
+      - His sisters, Caroline Bingley and Mrs. Hurst, live with him there, along with Mrs. Hurst's husband
+      - Mr. Darcy, Bingley's close friend, is a frequent visitor
+      - The famous ball where Elizabeth and Darcy have their first significant interactions takes place at Netherfield
+      - Jane Bennet stays there when she falls ill after riding over in the rain, which allows Elizabeth to visit and spend time at the estate
+      
+      Netherfield Park is important to the plot as it brings the wealthy Mr. Bingley (and Mr. Darcy) into the neighborhood, setting the main romantic storylines in motion.
+      User input tokens: 3
+      Output tokens: 293
+      Input tokens (cache read): 187396
+      Input tokens (cache write): 258
+      100.0% of input prompt cached (187399 tokens)
+      Time taken: 10.14 seconds
+      
+      Turn 4:
+      User: What is the main theme of this novel?
+      Assistant: The main themes of **Pride and Prejudice** include:
+      
+      **1. Pride and Prejudice (as the title suggests)**
+      - The novel explores how pride and prejudice create misunderstandings and obstacles to happiness
+      - **Darcy's pride** in his social status initially leads him to insult Elizabeth and look down on her family
+      - **Elizabeth's prejudice** against Darcy (based on first impressions and Wickham's lies) blinds her to his true character
+      - Both must overcome these flaws to find happiness together
+      
+      **2. Class and Social Status**
+      - The rigid class distinctions of Regency England and their impact on relationships and marriage prospects
+      - The tension between wealth, birth, and merit as measures of a person's worth
+      - Lady Catherine's objections to Elizabeth based on her "inferior" connections
+      
+      **3. Marriage and Economics**
+      - The novel examines different motivations for marriage: love (Jane and Bingley), practicality (Charlotte and Mr. Collins), lust and recklessness (Lydia and Wickham), and the ideal combination of respect, affection, and compatibility (Elizabeth and Darcy)
+      - The economic pressures on women to marry well, especially with the entailment of the Bennet estate
+      
+      **4. Self-Knowledge and Personal Growth**
+      - Elizabeth's journey from prejudice to understanding
+      User input tokens: 3
+      Output tokens: 300
+      Input tokens (cache read): 187654
+      Input tokens (cache write): 305
+      100.0% of input prompt cached (187657 tokens)
+      Time taken: 9.53 seconds

## deleted /cells/13/outputs/0:
-  output:
-    output_type: stream
-    name: stdout
-    text:
-      
-      Turn 1:
-      User: What is the title of this novel?
-      Assistant: The title of this novel is "Pride and Prejudice" by Jane Austen.
-      User input tokens: 4
-      Output tokens: 22
-      Input tokens (cache read): 0
-      Input tokens (cache write): 187354
-      0.0% of input prompt cached (4 tokens)
-      Time taken: 20.37 seconds
-      
-      Turn 2:
-      User: Who are Mr. and Mrs. Bennet?
-      Assistant: Mr. and Mrs. Bennet are the parents of five daughters (Jane, Elizabeth, Mary, Kitty, and Lydia) in Pride and Prejudice. 
-      
-      Mr. Bennet is an intelligent but detached father who often retreats to his library to avoid his wife's dramatics. He has a satirical wit and tends to be amused by the follies of others, including his own family members. He shows particular fondness for his second daughter Elizabeth, who shares his sharp mind and wit.
-      
-      Mrs. Bennet is a woman primarily focused on getting her five daughters married to wealthy men. She is described as having "poor nerves" and is often anxious, dramatic, and somewhat foolish. Her main goal in life is to see her daughters well-married, particularly because their family estate is entailed to a male heir (Mr. Collins), meaning her daughters will be left with little financial security after Mr. Bennet's death. She is often described as lacking sophistication and good judgment, which sometimes embarrasses her more sensible daughters, particularly Elizabeth.
-      
-      Their marriage is portrayed as an ill-matched one, where Mr. Bennet married Mrs. Bennet in his youth because of her beauty, only to discover they were incompatible in terms of intellect and character. This serves as a cautionary tale about marrying without proper consideration of character compatibility.
-      User input tokens: 4
-      Output tokens: 297
-      Input tokens (cache read): 187354
-      Input tokens (cache write): 36
-      100.0% of input prompt cached (187358 tokens)
-      Time taken: 7.53 seconds
-      
-      Turn 3:
-      User: What is Netherfield Park?
-      Assistant: Netherfield Park is a large estate near the Bennet family home of Longbourn in the novel. It becomes significant to the plot when it is rented by Mr. Bingley, a wealthy young man who moves into the neighborhood. 
-      
-      The arrival of Mr. Bingley at Netherfield sets much of the novel's plot in motion, as he quickly develops a romantic interest in Jane Bennet, the eldest Bennet daughter. It's also through Netherfield that Elizabeth Bennet first encounters Mr. Darcy, who is staying there as Mr. Bingley's friend.
-      
-      Netherfield serves as an important setting for several key scenes in the novel, including:
-      - The ball where Mr. Darcy first slights Elizabeth
-      - Jane's illness and subsequent stay at Netherfield (where Elizabeth comes to nurse her)
-      - Various social interactions between the Bennets and the Bingley-Darcy party
-      
-      The estate symbolizes wealth and social status in the novel, and its occupancy by Mr. Bingley represents the possibility of social and financial advancement for the Bennet family through marriage. When Bingley suddenly leaves Netherfield, it creates significant disappointment and disruption in the hopes of the Bennet family, particularly for Jane.
-      User input tokens: 4
-      Output tokens: 289
-      Input tokens (cache read): 187390
-      Input tokens (cache write): 308
-      100.0% of input prompt cached (187394 tokens)
-      Time taken: 6.76 seconds
-      
-      Turn 4:
-      User: What is the main theme of this novel?
-      Assistant: The main theme of "Pride and Prejudice" is the interplay between pride and prejudice in relationships, particularly as shown through the central romance between Elizabeth Bennet and Mr. Darcy. However, there are several important related themes:
-      
-      1. Pride and Prejudice:
-      - Darcy's pride in his social position initially makes him appear arrogant and disdainful
-      - Elizabeth's prejudice against Darcy based on first impressions and Wickham's false account
-      - Both characters must overcome these flaws to find happiness together
-      
-      2. Marriage and Social Class:
-      - The pressure on young women to marry well for financial security
-      - The conflict between marrying for love versus social advantage
-      - Different types of marriages are portrayed (Elizabeth/Darcy, Jane/Bingley, Lydia/Wickham, Charlotte/Collins)
-      
-      3. Reputation and Social Expectations:
-      - The importance of reputation in Regency society
-      - How behavior reflects on family honor
-      - The restrictions placed on women in this period
-      
-      4. Personal Growth and Self-Knowledge:
-      - Elizabeth and Darcy both learn to recognize their own faults
-      - The importance of overcoming first impressions
-      - Character development through experience and reflection
-      
-      5. Family and Society:
-      - The role of family connections in determining social status
-      - The impact of family behavior on individual prospects
-      - The balance between
-      User input tokens: 4
-      Output tokens: 300
-      Input tokens (cache read): 187698
-      Input tokens (cache write): 301
-      100.0% of input prompt cached (187702 tokens)
-      Time taken: 7.13 seconds

## modified /cells/13/source:
@@ -44,7 +44,7 @@ conversation_history = ConversationHistory()
 
 # System message containing the book content
 # Note: 'book_content' should be defined elsewhere in the code
-system_message = f"<file_contents> {book_content} </file_contents>"
+system_message = f"{TIMESTAMP} <file_contents> {book_content} </file_contents>"
 
 # Predefined questions for our simulation
 questions = [

Generated by nbdime

@PedramNavid PedramNavid left a comment

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review

Recommendation: APPROVE ✅

Summary

This PR significantly improves the prompt caching notebook by separating the demonstration into three distinct phases (non-cached baseline, cache creation, cache hit), making the caching lifecycle much clearer for readers. The addition of the TIMESTAMP variable to prevent accidental cache hits across runs is a clever technique.

Actionable Feedback (5 items)
  • misc/prompt_caching.ipynb (introduction) - Add Terminal Learning Objectives (TLOs) as bullet points following the cookbook style guide pattern
  • misc/prompt_caching.ipynb (pip install cell) - Use %%capture instead of --quiet for pip install to suppress output
  • misc/prompt_caching.ipynb (after introduction) - Add a Prerequisites section listing required knowledge and tools
  • misc/prompt_caching.ipynb (setup cell) - Add from dotenv import load_dotenv and load_dotenv() to demonstrate best practices for API key handling
  • misc/prompt_caching.ipynb (end of notebook) - Add a Conclusion section that maps back to learning objectives
Detailed Review

Code Quality

  • Functions have clear docstrings explaining their purpose
  • Output formatting improved (shows response.content[0].text instead of raw TextBlock)
  • Defensive getattr() pattern for optional usage fields
  • Variable naming is clear and consistent
  • Code passes ruff formatting/linting

Security

  • No security issues detected
  • API key handling uses anthropic.Anthropic() which reads from environment
  • No hardcoded secrets or credentials
  • Uses public domain content from Project Gutenberg

Suggestions

  • Add a comment explaining the TIMESTAMP purpose for readers who may not immediately understand why it's needed
  • Consider adding a cost comparison alongside the time comparison to show financial benefits
  • Mention cache TTL (~5 minutes) in the explanations so users understand caching behavior in production
  • Example 2 could benefit from expanded introductory text explaining WHY multi-turn caching matters

Positive Notes

  • Excellent pedagogical structure: The three-part breakdown (baseline → cache creation → cache hit) clearly demonstrates the caching lifecycle
  • Smart TIMESTAMP technique: Prevents accidental cache hits when re-running, ensuring reproducible demonstrations
  • Clear explanations: Markdown cells explain what will happen before each code block
  • Performance comparison table: Provides a satisfying visual summary of benefits
  • Technically correct: Token counts, cache metrics, and performance benefits all accurately demonstrated

🤖 Generated with Claude Code

@PedramNavid PedramNavid requested a review from elie December 1, 2025 22:42
@PedramNavid PedramNavid merged commit eee5442 into main Dec 17, 2025
7 checks passed
@PedramNavid PedramNavid deleted the pdrm/docs/prompt-caching-improvements branch December 17, 2025 00:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

prompt_caching example code needs a look

2 participants