Skip to content

Conversation

@ChelseyZ
Copy link
Contributor

@ChelseyZ ChelseyZ commented Dec 26, 2025

  • Core invariant: Larger context windows alone do not prevent context rot; effective systems must control token noise by selective retrieval and context management to maintain agent accuracy over long-running workflows.
  • Logic removed/simplified: No application logic is removed; this PR only adds documentation. It does not modify existing code paths—therefore no runtime simplifications or redundant logic eliminations are performed in the codebase.
  • Why no data loss or regression: The change is purely additive (a new markdown blog under blog/en/...), so it does not alter compiled code, APIs, storage schemas, or runtime behavior. No existing files or exported symbols are modified, and all examples are illustrative; therefore there is no path in the repository where production data or behavior could be changed.
  • New capability added: Provides an operational guide (JIT retrieval, pre-retrieval vector search, and hybrid approaches) with concrete Milvus examples and mitigation strategies (compression, external memory, sub-agents, prompt/tool best practices) to help engineers design systems that prevent context rot; this is documentation-only and affects user guidance rather than code.

@sre-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: ChelseyZ

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link

coderabbitai bot commented Dec 26, 2025

Walkthrough

Adds a new blog post markdown file describing context engineering strategies to prevent context rot in AI agents using Milvus. The file includes front matter metadata and content covering context rot definitions and causes; retrieval approaches (Just-in-Time retrieval, pre-retrieval vector search, hybrid retrieval); Milvus-specific examples, code snippets, and images; decision guidance for selecting approaches; techniques for when context windows are insufficient (two-stage pipelines, compression, external memory, sub-agents); and prompt/tooling best practices. The post references Zilliz Cloud and provides concrete workflow examples.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'publish a new blog: context rot' clearly summarizes the main change—adding a new blog post about context rot and prevention strategies.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5529a59 and a0a7422.

📒 Files selected for processing (1)
  • blog/en/keeping-ai-agents-grounded-context-engineering-strategies-that-prevent-context-rot-using-milvus.md
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-25T12:37:16.088Z
Learnt from: septemberfd
Repo: milvus-io/community PR: 511
File: blog/en/embedding-first-chunking-second-smarter-rag-retrieval-with-max-min-semantic-chunking.md:7-7
Timestamp: 2025-12-25T12:37:16.088Z
Learning: In milvus-io/community blog posts, the front matter 'cover' field does not require the 'https://' protocol prefix. When editing or adding blog markdown files under the blog directory (e.g., blog/en/...), specify cover URLs without the protocol (the blogging system handles protocol-less URLs). This applies to all markdown files in the blog area.

Applied to files:

  • blog/en/keeping-ai-agents-grounded-context-engineering-strategies-that-prevent-context-rot-using-milvus.md
🪛 LanguageTool
blog/en/keeping-ai-agents-grounded-context-engineering-strategies-that-prevent-context-rot-using-milvus.md

[style] ~19-~19: ‘exact same’ might be wordy. Consider a shorter alternative.
Context: ...ysteriously vanish. But if you drop the exact same prompt into a new chat, suddenly the mo...

(EN_WORDINESS_PREMIUM_EXACT_SAME)


[style] ~21-~21: Consider a different verb to strengthen your wording.
Context: ... tokens to 128K, retrieval accuracy can drop by 15–30%. The model still has room, bu...

(DROP_DECLINE)


[style] ~40-~40: As an alternative to the over-used intensifier ‘extremely’, consider replacing this phrase.
Context: ...o when you ask an LLM to operate across extremely large contexts, you’re pushing it into a regi...

(EN_WEAK_ADJECTIVE)


[style] ~56-~56: For conciseness, consider replacing this expression with an adverb.
Context: ...ific item and inserts it into context at the moment it matters—not before. For example,...

(AT_THE_MOMENT)


[style] ~79-~79: This phrase is redundant. Consider writing “relevant”.
Context: ...me, the system retrieves a small set of highly relevant chunks through similarity searches. - ...

(HIGHLY_RELEVANT)


[style] ~98-~98: To elevate your writing, try using a synonym here.
Context: ...Accuracy:* Before a task begins, it’s hard to predict precisely what the model wil...

(HARD_TO)


[grammar] ~100-~100: Use a hyphen to join words.
Context: ...ep or exploratory workflows. So in real world workloads, a hybrid appaorch is th...

(QB_NEW_EN_HYPHEN)


[grammar] ~100-~100: Ensure spelling is correct
Context: .... So in real world workloads, a hybrid appaorch is the optimal solution. - Vector sea...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[style] ~308-~308: The phrase ‘in many cases’ is used quite frequently. Consider using a less frequent alternative to set your writing apart.
Context: ...automatically produce better results**; in many cases, it does the opposite. When a model is ...

(IN_MANY_STYLE_CASES)

🪛 markdownlint-cli2 (0.18.1)
blog/en/keeping-ai-agents-grounded-context-engineering-strategies-that-prevent-context-rot-using-milvus.md

38-38: Images should have alternate text (alt text)

(MD045, no-alt-text)


120-120: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


142-142: Images should have alternate text (alt text)

(MD045, no-alt-text)


145-145: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


243-243: Images should have alternate text (alt text)

(MD045, no-alt-text)


254-254: Images should have alternate text (alt text)

(MD045, no-alt-text)


266-266: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


295-295: Images should have alternate text (alt text)

(MD045, no-alt-text)


318-318: Spaces inside link text

(MD039, no-space-in-links)


318-318: Spaces inside link text

(MD039, no-space-in-links)


318-318: Spaces inside link text

(MD039, no-space-in-links)


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (1)
blog/en/keeping-ai-agents-grounded-context-engineering-strategies-that-prevent-context-rot-using-milvus.md (1)

19-19: Consider style improvements for clarity and conciseness.

LanguageTool flagged several wordiness and style issues. While not blockers, addressing these will improve readability:

  • Line 19: "exact same" → consider "same"
  • Line 21: "drop by 15–30%" → consider "decline" for stronger wording
  • Line 40: "extremely large contexts" → replace "extremely" with a more precise intensifier
  • Line 56: "at the moment it matters" → consider "when needed" for conciseness
  • Line 79: "highly relevant chunks" → "relevant chunks" (redundant intensifier)
  • Line 98: "it's hard to predict" → consider "difficult to anticipate" to elevate phrasing
  • Line 307: "in many cases" → use less frequent alternative for variety
🔎 Proposed style improvements
-If you've worked with long-running LLM conversations, you've probably had this frustrating moment: halfway through a long thread, the model starts drifting. Answers get vague, reasoning weakens, and key details mysteriously vanish. But if you drop the exact same prompt into a new chat, suddenly the model behaves—focused, accurate, grounded.
+If you've worked with long-running LLM conversations, you've probably had this frustrating moment: halfway through a long thread, the model starts drifting. Answers get vague, reasoning weakens, and key details mysteriously vanish. But if you drop the same prompt into a new chat, suddenly the model behaves—focused, accurate, grounded.

-This isn't the model "getting tired" — it's **context rot**. As a conversation grows, the model has to juggle more information, and its ability to prioritize slowly declines. [Antropic studie](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents)s show that as context windows stretch from around 8K tokens to 128K, retrieval accuracy can drop by 15–30%. The model still has room, but it loses track of what matters.
+This isn't the model "getting tired" — it's **context rot**. As a conversation grows, the model has to juggle more information, and its ability to prioritize slowly declines. [Anthropic studies](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents) show that as context windows stretch from around 8K tokens to 128K, retrieval accuracy can decline by 15–30%. The model still has room, but it loses track of what matters.

-The root issue comes from the [Transformer architecture](https://zilliz.com/learn/decoding-transformer-models-a-study-of-their-architecture-and-underlying-principles) itself. Every token must compare itself against every other token, forming pairwise attention across the entire sequence. That means computation grows **O(n²)** with context length. Expanding your prompt from 1K tokens to 100K doesn't make the model "work harder"—it multiplies the number of token interactions by **10,000×**. Then there's the problem with the training data. Models see far more short sequences than long ones. So when you ask an LLM to operate across extremely large contexts, you're pushing it into a regime it wasn't heavily trained for.
+The root issue comes from the [Transformer architecture](https://zilliz.com/learn/decoding-transformer-models-a-study-of-their-architecture-and-underlying-principles) itself. Every token must compare itself against every other token, forming pairwise attention across the entire sequence. That means computation grows **O(n²)** with context length. Expanding your prompt from 1K tokens to 100K doesn't make the model "work harder"—it multiplies the number of token interactions by **10,000×**. Then there's the problem with the training data. Models see far more short sequences than long ones. So when you ask an LLM to operate across substantially larger contexts, you're pushing it into a regime it wasn't heavily trained for.

-Instead of stuffing entire codebases or datasets into its context (which greatly increases the chance of drift and forgetting), Claude Code maintains a tiny index: file paths, commands, and documentation links. When the model needs a piece of information, it retrieves that specific item and inserts it into context **at the moment it matters**—not before.
+Instead of stuffing entire codebases or datasets into its context (which greatly increases the chance of drift and forgetting), Claude Code maintains a tiny index: file paths, commands, and documentation links. When the model needs a piece of information, it retrieves that specific item and inserts it into context **when needed**—not before.

-In a typical RAG setup: - Documents are embedded and stored in a vector database, such as Milvus. - At query time, the system retrieves a small set of highly relevant chunks through similarity searches.
+In a typical RAG setup: - Documents are embedded and stored in a vector database, such as Milvus. - At query time, the system retrieves a small set of relevant chunks through similarity searches.

-**Accuracy:** Before a task begins, it's hard to predict precisely what the model will need—especially for multi-step or exploratory workflows.
+**Accuracy:** Before a task begins, it's difficult to predict precisely what the model will need—especially for multi-step or exploratory workflows.

-**However, a bigger context window doesn't automatically produce better results**; in many cases, it does the opposite. When a model is overloaded, fed stale information, or forced through massive prompts, accuracy quietly drifts.
+**However, a bigger context window doesn't automatically produce better results**; often, it does the opposite. When a model is overloaded, fed stale information, or forced through massive prompts, accuracy quietly drifts.

Note: Line 21 also contains a typo: "Antropic studies" link text should not include the 's' in the markdown syntax.

Also applies to: 21-21, 40-40, 56-56, 79-79, 98-98, 307-307

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 019d603 and 5529a59.

📒 Files selected for processing (1)
  • blog/en/keeping-ai-agents-grounded-context-engineering-strategies-that-prevent-context-rot-using-milvus.md
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-25T12:37:16.088Z
Learnt from: septemberfd
Repo: milvus-io/community PR: 511
File: blog/en/embedding-first-chunking-second-smarter-rag-retrieval-with-max-min-semantic-chunking.md:7-7
Timestamp: 2025-12-25T12:37:16.088Z
Learning: In milvus-io/community blog posts, the front matter 'cover' field does not require the 'https://' protocol prefix. When editing or adding blog markdown files under the blog directory (e.g., blog/en/...), specify cover URLs without the protocol (the blogging system handles protocol-less URLs). This applies to all markdown files in the blog area.

Applied to files:

  • blog/en/keeping-ai-agents-grounded-context-engineering-strategies-that-prevent-context-rot-using-milvus.md
🪛 LanguageTool
blog/en/keeping-ai-agents-grounded-context-engineering-strategies-that-prevent-context-rot-using-milvus.md

[style] ~19-~19: ‘exact same’ might be wordy. Consider a shorter alternative.
Context: ...ysteriously vanish. But if you drop the exact same prompt into a new chat, suddenly the mo...

(EN_WORDINESS_PREMIUM_EXACT_SAME)


[style] ~21-~21: Consider a different verb to strengthen your wording.
Context: ... tokens to 128K, retrieval accuracy can drop by 15–30%. The model still has room, bu...

(DROP_DECLINE)


[style] ~40-~40: As an alternative to the over-used intensifier ‘extremely’, consider replacing this phrase.
Context: ...o when you ask an LLM to operate across extremely large contexts, you’re pushing it into a regi...

(EN_WEAK_ADJECTIVE)


[style] ~56-~56: For conciseness, consider replacing this expression with an adverb.
Context: ...ific item and inserts it into context at the moment it matters—not before. For example,...

(AT_THE_MOMENT)


[style] ~79-~79: This phrase is redundant. Consider writing “relevant”.
Context: ...me, the system retrieves a small set of highly relevant chunks through similarity searches. - ...

(HIGHLY_RELEVANT)


[style] ~98-~98: To elevate your writing, try using a synonym here.
Context: ...Accuracy:* Before a task begins, it’s hard to predict precisely what the model wil...

(HARD_TO)


[grammar] ~100-~100: Use a hyphen to join words.
Context: ...ep or exploratory workflows. So in real world workloads, a hybrid appaorch is th...

(QB_NEW_EN_HYPHEN)


[grammar] ~100-~100: Ensure spelling is correct
Context: .... So in real world workloads, a hybrid appaorch is the optimal solution. - Vector sea...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)


[style] ~307-~307: The phrase ‘in many cases’ is used quite frequently. Consider using a less frequent alternative to set your writing apart.
Context: ...automatically produce better results**; in many cases, it does the opposite. When a model is ...

(IN_MANY_STYLE_CASES)

🪛 markdownlint-cli2 (0.18.1)
blog/en/keeping-ai-agents-grounded-context-engineering-strategies-that-prevent-context-rot-using-milvus.md

38-38: Images should have alternate text (alt text)

(MD045, no-alt-text)


120-120: Emphasis used instead of a heading

(MD036, no-emphasis-as-heading)


142-142: Images should have alternate text (alt text)

(MD045, no-alt-text)


145-145: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


228-228: Code block style
Expected: fenced; Actual: indented

(MD046, code-block-style)


243-243: Images should have alternate text (alt text)

(MD045, no-alt-text)


254-254: Images should have alternate text (alt text)

(MD045, no-alt-text)


261-261: Multiple headings with the same content

(MD024, no-duplicate-heading)


265-265: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


294-294: Images should have alternate text (alt text)

(MD045, no-alt-text)


317-317: Spaces inside link text

(MD039, no-space-in-links)


317-317: Spaces inside link text

(MD039, no-space-in-links)


317-317: Spaces inside link text

(MD039, no-space-in-links)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants