-
Notifications
You must be signed in to change notification settings - Fork 27
publish a new blog: context rot #513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
publish a new blog: context rot #513
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: ChelseyZ The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
WalkthroughAdds a new blog post markdown file describing context engineering strategies to prevent context rot in AI agents using Milvus. The file includes front matter metadata and content covering context rot definitions and causes; retrieval approaches (Just-in-Time retrieval, pre-retrieval vector search, hybrid retrieval); Milvus-specific examples, code snippets, and images; decision guidance for selecting approaches; techniques for when context windows are insufficient (two-stage pipelines, compression, external memory, sub-agents); and prompt/tooling best practices. The post references Zilliz Cloud and provides concrete workflow examples. Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Organization UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used🧠 Learnings (1)📚 Learning: 2025-12-25T12:37:16.088ZApplied to files:
🪛 LanguageToolblog/en/keeping-ai-agents-grounded-context-engineering-strategies-that-prevent-context-rot-using-milvus.md[style] ~19-~19: ‘exact same’ might be wordy. Consider a shorter alternative. (EN_WORDINESS_PREMIUM_EXACT_SAME) [style] ~21-~21: Consider a different verb to strengthen your wording. (DROP_DECLINE) [style] ~40-~40: As an alternative to the over-used intensifier ‘extremely’, consider replacing this phrase. (EN_WEAK_ADJECTIVE) [style] ~56-~56: For conciseness, consider replacing this expression with an adverb. (AT_THE_MOMENT) [style] ~79-~79: This phrase is redundant. Consider writing “relevant”. (HIGHLY_RELEVANT) [style] ~98-~98: To elevate your writing, try using a synonym here. (HARD_TO) [grammar] ~100-~100: Use a hyphen to join words. (QB_NEW_EN_HYPHEN) [grammar] ~100-~100: Ensure spelling is correct (QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1) [style] ~308-~308: The phrase ‘in many cases’ is used quite frequently. Consider using a less frequent alternative to set your writing apart. (IN_MANY_STYLE_CASES) 🪛 markdownlint-cli2 (0.18.1)blog/en/keeping-ai-agents-grounded-context-engineering-strategies-that-prevent-context-rot-using-milvus.md38-38: Images should have alternate text (alt text) (MD045, no-alt-text) 120-120: Emphasis used instead of a heading (MD036, no-emphasis-as-heading) 142-142: Images should have alternate text (alt text) (MD045, no-alt-text) 145-145: Fenced code blocks should have a language specified (MD040, fenced-code-language) 243-243: Images should have alternate text (alt text) (MD045, no-alt-text) 254-254: Images should have alternate text (alt text) (MD045, no-alt-text) 266-266: Fenced code blocks should have a language specified (MD040, fenced-code-language) 295-295: Images should have alternate text (alt text) (MD045, no-alt-text) 318-318: Spaces inside link text (MD039, no-space-in-links) 318-318: Spaces inside link text (MD039, no-space-in-links) 318-318: Spaces inside link text (MD039, no-space-in-links) Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
🧹 Nitpick comments (1)
blog/en/keeping-ai-agents-grounded-context-engineering-strategies-that-prevent-context-rot-using-milvus.md (1)
19-19: Consider style improvements for clarity and conciseness.LanguageTool flagged several wordiness and style issues. While not blockers, addressing these will improve readability:
- Line 19: "exact same" → consider "same"
- Line 21: "drop by 15–30%" → consider "decline" for stronger wording
- Line 40: "extremely large contexts" → replace "extremely" with a more precise intensifier
- Line 56: "at the moment it matters" → consider "when needed" for conciseness
- Line 79: "highly relevant chunks" → "relevant chunks" (redundant intensifier)
- Line 98: "it's hard to predict" → consider "difficult to anticipate" to elevate phrasing
- Line 307: "in many cases" → use less frequent alternative for variety
🔎 Proposed style improvements
-If you've worked with long-running LLM conversations, you've probably had this frustrating moment: halfway through a long thread, the model starts drifting. Answers get vague, reasoning weakens, and key details mysteriously vanish. But if you drop the exact same prompt into a new chat, suddenly the model behaves—focused, accurate, grounded. +If you've worked with long-running LLM conversations, you've probably had this frustrating moment: halfway through a long thread, the model starts drifting. Answers get vague, reasoning weakens, and key details mysteriously vanish. But if you drop the same prompt into a new chat, suddenly the model behaves—focused, accurate, grounded. -This isn't the model "getting tired" — it's **context rot**. As a conversation grows, the model has to juggle more information, and its ability to prioritize slowly declines. [Antropic studie](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents)s show that as context windows stretch from around 8K tokens to 128K, retrieval accuracy can drop by 15–30%. The model still has room, but it loses track of what matters. +This isn't the model "getting tired" — it's **context rot**. As a conversation grows, the model has to juggle more information, and its ability to prioritize slowly declines. [Anthropic studies](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents) show that as context windows stretch from around 8K tokens to 128K, retrieval accuracy can decline by 15–30%. The model still has room, but it loses track of what matters. -The root issue comes from the [Transformer architecture](https://zilliz.com/learn/decoding-transformer-models-a-study-of-their-architecture-and-underlying-principles) itself. Every token must compare itself against every other token, forming pairwise attention across the entire sequence. That means computation grows **O(n²)** with context length. Expanding your prompt from 1K tokens to 100K doesn't make the model "work harder"—it multiplies the number of token interactions by **10,000×**. Then there's the problem with the training data. Models see far more short sequences than long ones. So when you ask an LLM to operate across extremely large contexts, you're pushing it into a regime it wasn't heavily trained for. +The root issue comes from the [Transformer architecture](https://zilliz.com/learn/decoding-transformer-models-a-study-of-their-architecture-and-underlying-principles) itself. Every token must compare itself against every other token, forming pairwise attention across the entire sequence. That means computation grows **O(n²)** with context length. Expanding your prompt from 1K tokens to 100K doesn't make the model "work harder"—it multiplies the number of token interactions by **10,000×**. Then there's the problem with the training data. Models see far more short sequences than long ones. So when you ask an LLM to operate across substantially larger contexts, you're pushing it into a regime it wasn't heavily trained for. -Instead of stuffing entire codebases or datasets into its context (which greatly increases the chance of drift and forgetting), Claude Code maintains a tiny index: file paths, commands, and documentation links. When the model needs a piece of information, it retrieves that specific item and inserts it into context **at the moment it matters**—not before. +Instead of stuffing entire codebases or datasets into its context (which greatly increases the chance of drift and forgetting), Claude Code maintains a tiny index: file paths, commands, and documentation links. When the model needs a piece of information, it retrieves that specific item and inserts it into context **when needed**—not before. -In a typical RAG setup: - Documents are embedded and stored in a vector database, such as Milvus. - At query time, the system retrieves a small set of highly relevant chunks through similarity searches. +In a typical RAG setup: - Documents are embedded and stored in a vector database, such as Milvus. - At query time, the system retrieves a small set of relevant chunks through similarity searches. -**Accuracy:** Before a task begins, it's hard to predict precisely what the model will need—especially for multi-step or exploratory workflows. +**Accuracy:** Before a task begins, it's difficult to predict precisely what the model will need—especially for multi-step or exploratory workflows. -**However, a bigger context window doesn't automatically produce better results**; in many cases, it does the opposite. When a model is overloaded, fed stale information, or forced through massive prompts, accuracy quietly drifts. +**However, a bigger context window doesn't automatically produce better results**; often, it does the opposite. When a model is overloaded, fed stale information, or forced through massive prompts, accuracy quietly drifts.Note: Line 21 also contains a typo: "Antropic studies" link text should not include the 's' in the markdown syntax.
Also applies to: 21-21, 40-40, 56-56, 79-79, 98-98, 307-307
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
blog/en/keeping-ai-agents-grounded-context-engineering-strategies-that-prevent-context-rot-using-milvus.md
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-25T12:37:16.088Z
Learnt from: septemberfd
Repo: milvus-io/community PR: 511
File: blog/en/embedding-first-chunking-second-smarter-rag-retrieval-with-max-min-semantic-chunking.md:7-7
Timestamp: 2025-12-25T12:37:16.088Z
Learning: In milvus-io/community blog posts, the front matter 'cover' field does not require the 'https://' protocol prefix. When editing or adding blog markdown files under the blog directory (e.g., blog/en/...), specify cover URLs without the protocol (the blogging system handles protocol-less URLs). This applies to all markdown files in the blog area.
Applied to files:
blog/en/keeping-ai-agents-grounded-context-engineering-strategies-that-prevent-context-rot-using-milvus.md
🪛 LanguageTool
blog/en/keeping-ai-agents-grounded-context-engineering-strategies-that-prevent-context-rot-using-milvus.md
[style] ~19-~19: ‘exact same’ might be wordy. Consider a shorter alternative.
Context: ...ysteriously vanish. But if you drop the exact same prompt into a new chat, suddenly the mo...
(EN_WORDINESS_PREMIUM_EXACT_SAME)
[style] ~21-~21: Consider a different verb to strengthen your wording.
Context: ... tokens to 128K, retrieval accuracy can drop by 15–30%. The model still has room, bu...
(DROP_DECLINE)
[style] ~40-~40: As an alternative to the over-used intensifier ‘extremely’, consider replacing this phrase.
Context: ...o when you ask an LLM to operate across extremely large contexts, you’re pushing it into a regi...
(EN_WEAK_ADJECTIVE)
[style] ~56-~56: For conciseness, consider replacing this expression with an adverb.
Context: ...ific item and inserts it into context at the moment it matters—not before. For example,...
(AT_THE_MOMENT)
[style] ~79-~79: This phrase is redundant. Consider writing “relevant”.
Context: ...me, the system retrieves a small set of highly relevant chunks through similarity searches. - ...
(HIGHLY_RELEVANT)
[style] ~98-~98: To elevate your writing, try using a synonym here.
Context: ...Accuracy:* Before a task begins, it’s hard to predict precisely what the model wil...
(HARD_TO)
[grammar] ~100-~100: Use a hyphen to join words.
Context: ...ep or exploratory workflows. So in real world workloads, a hybrid appaorch is th...
(QB_NEW_EN_HYPHEN)
[grammar] ~100-~100: Ensure spelling is correct
Context: .... So in real world workloads, a hybrid appaorch is the optimal solution. - Vector sea...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[style] ~307-~307: The phrase ‘in many cases’ is used quite frequently. Consider using a less frequent alternative to set your writing apart.
Context: ...automatically produce better results**; in many cases, it does the opposite. When a model is ...
(IN_MANY_STYLE_CASES)
🪛 markdownlint-cli2 (0.18.1)
blog/en/keeping-ai-agents-grounded-context-engineering-strategies-that-prevent-context-rot-using-milvus.md
38-38: Images should have alternate text (alt text)
(MD045, no-alt-text)
120-120: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
142-142: Images should have alternate text (alt text)
(MD045, no-alt-text)
145-145: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
228-228: Code block style
Expected: fenced; Actual: indented
(MD046, code-block-style)
243-243: Images should have alternate text (alt text)
(MD045, no-alt-text)
254-254: Images should have alternate text (alt text)
(MD045, no-alt-text)
261-261: Multiple headings with the same content
(MD024, no-duplicate-heading)
265-265: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
294-294: Images should have alternate text (alt text)
(MD045, no-alt-text)
317-317: Spaces inside link text
(MD039, no-space-in-links)
317-317: Spaces inside link text
(MD039, no-space-in-links)
317-317: Spaces inside link text
(MD039, no-space-in-links)
Uh oh!
There was an error while loading. Please reload this page.