Skip to content

Feature/context management auto compaction #80

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

nicobailon
Copy link

Add automatic context management for LLM workflows

Summary

  • Implements intelligent context window management to prevent LLM
    token limit overflows
  • Provides two compaction strategies: truncation and LLM
    summarization
  • Adds modern model support with accurate token counting for 2025
    models
  • Includes comprehensive configuration options and thread safety

Key Features

  • Automatic monitoring: Tracks token usage and triggers compaction at
    configurable thresholds
  • Smart compaction strategies:
    • Truncation: Fast, preserves recent context
    • LLM summarization: Intelligent summarization of older content
  • Modern model support: Updated configurations for GPT-4o, Claude
    3.5, Gemini 2.0/2.5
  • Tool integration: Built-in tools automatically respect max_tokens
    limits
  • Thread safety: Mutex synchronization for concurrent workflows
  • Flexible configuration: Extensive options for fine-tuning behavior

Technical Implementation

  • Context management integrated at the workflow level
  • Model-specific token counting (tiktoken for OpenAI, character
    ratios for others)
  • Automatic model selection for summarization tasks
  • Comprehensive test coverage including concurrency and integration
    tests
  • Enhanced documentation with concrete examples

This feature enables long-running AI workflows to operate reliably
within LLM context windows while preserving conversation quality and
coherence.

This update introduces a comprehensive context management system to handle token limits in long-running workflows. Key features include:

- Automatic monitoring and compaction of conversation transcripts when approaching token limits.
- Configurable strategies for context compaction: truncation and LLM summarization.
- New configuration options for context management, including thresholds and maximum tokens.
- Integration of context management into existing tools, allowing for token limits on outputs.

Additionally, the README and documentation have been updated to reflect these changes, and new tests have been added to ensure functionality and concurrency safety.

Relevant files:
- `README.md`: Added documentation for automatic context management.
- `docs/INSTRUMENTATION.md`: Updated to include context management events.
- `lib/roast/helpers/content_truncator.rb`: New helper for truncating content based on token limits.
- `lib/roast/workflow/context_manager.rb`: New class for managing context and compaction logic.
- `lib/roast/workflow/model_config.rb`: Added model configuration for token limits.
- `lib/roast/workflow/base_workflow.rb`: Integrated context management into the workflow.
- `test/roast/workflow/context_manager_test.rb`: New tests for context management functionality.
- `test/roast/workflow/context_management_integration_test.rb`: Integration tests for context management with tools.
- `test/roast/workflow/context_management_concurrency_test.rb`: Tests for concurrency in context management.
- `test/roast/workflow/model_config_test.rb`: Tests for model configuration related to token limits.
## Summary

- Implements intelligent context window management to prevent LLM token limit overflows
- Provides two compaction strategies: truncation and LLM summarization
- Adds modern model support with accurate token counting for 2025 models
- Includes comprehensive configuration options and thread safety

## Key Features

- **Automatic monitoring**: Tracks token usage and triggers compaction at configurable thresholds
- **Smart compaction strategies**:
  - Truncation: Fast, preserves recent context
  - LLM summarization: Intelligent summarization of older content
- **Modern model support**: Updated configurations for GPT-4o, Claude 3.5, Gemini 2.0/2.5
- **Tool integration**: Built-in tools automatically respect max_tokens limits
- **Thread safety**: Mutex synchronization for concurrent workflows
- **Flexible configuration**: Extensive options for fine-tuning behavior

## Technical Implementation

- Context management integrated at the workflow level
- Model-specific token counting (tiktoken for OpenAI, character ratios for others)
- Automatic model selection for summarization tasks
- Comprehensive test coverage including concurrency and integration tests
- Enhanced documentation with concrete examples

This feature enables long-running AI workflows to operate reliably within LLM context windows while preserving conversation quality and coherence.
@obie
Copy link
Contributor

obie commented May 26, 2025

Thanks for your contribution!

I know the PR is still in draft but here's some feedback on current version:

  1. The PR is based on an old branch
  2. The new files aren't integrated - The context management files aren't being required in the main library loading
  3. Tests are failing
  4. Implementation issues - The model matching algorithm has bugs where it doesn't correctly match partial model names

…upport

- Added support for arm64-darwin-24 in Gemfile.lock.
- Introduced a comprehensive context management guide detailing usage and testing of the automatic context management feature.
- Updated helpers and workflow files to integrate new context management functionalities, including token counting and compaction strategies.
- Enhanced tests to cover new context management features and ensure thread safety.
…ity with main branch while preserving the new context management features
- Merged latest upstream changes including new features and improvements
- Resolved conflicts by preserving all context management features
- Maintained compatibility with upstream error handling improvements
- Added new upstream functionality while keeping our enhancements
@nicobailon nicobailon marked this pull request as ready for review May 27, 2025 03:12
@nicobailon nicobailon marked this pull request as draft May 27, 2025 03:56
…or resource handling in workflow_runner.rb

- Updated prompt.md to reflect the correct variable name for resource contents.
- Refactored resource handling in workflow_runner.rb to create a dedicated method for resource management based on file presence.
@nicobailon nicobailon marked this pull request as ready for review May 27, 2025 04:19
- Added a new section in README.md detailing automatic context management features, including configuration options and strategies.
- Updated setup instructions for testing with OpenRouter, including API key configuration and example workflows.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants