Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Initial implementation for BaseVerifier #1597

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Wendong-Fan
Copy link
Member

@Wendong-Fan Wendong-Fan commented Feb 13, 2025

RFC

📋 Problem

The CAMEL framework needs a robust, extensible verification system to validate and assess the quality of LLM responses across various task types. The verifiers module provides this critical functionality.

✔️ Goals

  • Provide a flexible, extensible framework for verifying LLM outputs
  • Support both synchronous and asynchronous verification workflows
  • Enable batch processing with controlled concurrency
  • Implement comprehensive error handling and retry mechanisms
  • Track detailed performance metrics and verification statistics
  • Support resource-aware execution with system monitoring
  • Enable verification across diverse task types (math, logic, software engineering, etc.)

❌ NonGoals

  • Implement specific verification logic (left to concrete verifier implementations)
  • Handle model-specific preprocessing or post-processing
  • Manage model setting or scaling

💡 Solution

The verifiers module implements a base verification framework with the following key components:

  1. BaseVerifier: Abstract base class providing core verification infrastructure
    • Configurable retry logic and timeout handling
    • Resource monitoring and adaptive concurrency
    • Metrics collection and performance tracking
  2. VerificationResult: Structured output format containing:
    • Verification status (success/failure/error/timeout)
    • Performance metrics and scores
    • Error messages and metadata
    • Timestamps and duration tracking
  3. VerifierConfig: Configuration management for:
    • Timeout and retry settings
    • Strict mode validation
    • Resource thresholds
  4. Support for diverse task types through TaskType enumeration

Performance Impact

  • Controlled concurrency through max_parallel setting
  • Resource monitoring (CPU/Memory thresholds)
  • Batch processing capabilities
  • Configurable timeouts and retry mechanisms
  • Performance metrics tracking

🎨 Alternatives

No alternative solutions currently exist for this upgrade.

📃 Work Estimates

  1. Core Framework Implementation (2 days)
    • Base verifier class
    • Result and configuration models
    • Error handling infrastructure
  2. Testing Infrastructure (1 days)
    • Unit tests
    • Integration tests
    • Performance benchmarks
  3. Documentation (0.5 days)
    • API documentation
    • Usage examples

🗂️ Future Work

  • Caching layer for verification results
  • Improved batch processing algorithms
  • Enhanced resource monitoring
  • Integration with monitoring systems
  • Extended task type support

❓ FAQ

Q: How do I implement a custom verifier? A: Extend the BaseVerifier class and implement the _verify_implementation method.

Q: How does batch verification work? A: Use the verify_batch method with a list of responses. Concurrency is automatically managed.

Q: What happens if verification fails? A: The system provides detailed error information and can retry based on configuration.

Q: How can I monitor verification performance? A: Use the VerificationMetrics class which tracks success rates, durations, and error counts.

Q: Can I disable verification for certain scenarios? A: Yes, use VerifierConfig to enable/disable verification and configure behavior.

Checklist

Go over all the following points, and put an x in all the boxes that apply.

  • I have read the CONTRIBUTION guide (required)
  • I have linked this PR to an issue using the Development section on the right sidebar or by adding Fixes #issue-number in the PR description (required)
  • I have checked if any dependencies need to be added or updated in pyproject.toml and poetry.lock
  • I have updated the tests accordingly (required for a bug fix or a new feature)
  • I have updated the documentation if needed:
  • I have added examples if this is a new feature

If you are unsure about any of these, don't hesitate to ask. We are here to help!

@Wendong-Fan Wendong-Fan self-assigned this Feb 13, 2025
@Wendong-Fan Wendong-Fan added this to the Sprint 23 milestone Feb 13, 2025
@Wendong-Fan Wendong-Fan marked this pull request as ready for review February 13, 2025 07:21
@Wendong-Fan Wendong-Fan linked an issue Feb 13, 2025 that may be closed by this pull request
2 tasks
@Wendong-Fan
Copy link
Member Author

Wendong-Fan commented Feb 13, 2025

  1. environment base module research & design @lightaime (camel), @cherlix, @SwordFaith
  2. dataset base module research & design @lightaime (camel), @zekun Wang, @shuyhere
  3. verifier base module research & design @Wendong-Fan (camel), @jiemingcheng-hub, @astitwlathe, @For-Chance

then implementation

then talk to other domain teams to unify the interface
each one takes one domain and provide guidance

Target DDL:
18th Feb

@GitHoobar GitHoobar self-requested a review February 14, 2025 04:26
Copy link
Collaborator

@GitHoobar GitHoobar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Wendong-Fan Looks promising! Left some comments~ (considering future work)

@hallerite
Copy link
Collaborator

hallerite commented Feb 15, 2025

@Wendong-Fan nice work, but I think it could be improved.

From my understanding, the verifiers will be used in two main ways:

  1. to filter synthetic data
  2. to get a reward from RLVR

In both cases, most of the complexity should probably be encapsulated in the environment class and a verifier should mostly just be a function that we can call with an extracted part of the LLM response (we may even need an extra class for extractors, as extraction can get quite complicated) and get a "correct", "incorrect" or "timeout" (to try again or discard the data point in RLVR).

If we implement the way it is currently being done, calling the verifier will take a lot of code for defining everything that we will probably have to define for the environment and data filtering pipeline in the first place, leading to a lot of duplicated code.

Further, I think it makes sense to have one verifier be instantiated in the environment and keeping state (as is currently implemented), which is especially crucial when setting up the verifier is non-trivial, hence a setup function should probably be added that does nothing, if not overridden, but can be used for starting the verifier backend and making sure there is a connection to the verifier if the verifier is an online service. The same thing is true for a tear down function to free processes.

What do you guys think? @Wendong-Fan @lightaime

Happy to discuss.

@GitHoobar
Copy link
Collaborator

GitHoobar commented Feb 16, 2025

@Wendong-Fan nice work, but I think it could be improved.

From my understanding, the verifiers will be used in two main ways:

  1. to filter synthetic data
  2. to get a reward from RLVR

In both cases, most of the complexity should probably be encapsulated in the environment class and a verifier should mostly just be a function that we can call with an extracted part of the LLM response (we may even need an extra class for extractors, as extraction can get quite complicated) and get a "correct", "incorrect" or "timeout" (to try again or discard the data point in RLVR).

If we implement the way it is currently being done, calling the verifier will take a lot of code for defining everything that we will probably have to define for the environment and data filtering pipeline in the first place, leading to a lot of duplicated code.

Further, I think it makes sense to have one verifier be instantiated in the environment and keeping state (as is currently implemented), which is especially crucial when setting up the verifier is non-trivial, hence a setup function should probably be added that does nothing, if not overridden, but can be used for starting the verifier backend and making sure there is a connection to the verifier if the verifier is an online service. The same thing is true for a tear down function to free processes.

What do you guys think? @Wendong-Fan @lightaime

Happy to discuss.

Hey @hallerite,
Ahh, the current verifier implementation actually serves a different primary purpose - domain-specific verification of LLM responses (MathVerifier, CodeVerifier etc). The verifiers are designed to validate outputs against domain-specific criteria, provide detailed verification results, and handle complex verification logic.

Did you think of it as something different or am I missing something? Your points about setup/teardown lifecycle and extraction logic are interesting - how do you see these fitting with domain-specific verification needs?
cc: @Wendong-Fan @lightaime

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

[Feature Request] Implement BaseVerifier
3 participants