feat: Initial implementation for BaseVerifier #1597

Wendong-Fan · 2025-02-13T07:20:28Z

RFC

📋 Problem

The CAMEL framework needs a robust, extensible verification system to validate and assess the quality of LLM responses across various task types. The verifiers module provides this critical functionality.

✔️ Goals

Provide a flexible, extensible framework for verifying LLM outputs
Support both synchronous and asynchronous verification workflows
Enable batch processing with controlled concurrency
Implement comprehensive error handling and retry mechanisms
Track detailed performance metrics and verification statistics
Support resource-aware execution with system monitoring
Enable verification across diverse task types (math, logic, software engineering, etc.)

❌ NonGoals

Implement specific verification logic (left to concrete verifier implementations)
Handle model-specific preprocessing or post-processing
Manage model setting or scaling

💡 Solution

The verifiers module implements a base verification framework with the following key components:

BaseVerifier: Abstract base class providing core verification infrastructure
- Configurable retry logic and timeout handling
- Resource monitoring and adaptive concurrency
- Metrics collection and performance tracking
VerificationResult: Structured output format containing:
- Verification status (success/failure/error/timeout)
- Performance metrics and scores
- Error messages and metadata
- Timestamps and duration tracking
VerifierConfig: Configuration management for:
- Timeout and retry settings
- Strict mode validation
- Resource thresholds
Support for diverse task types through TaskType enumeration

Performance Impact

Controlled concurrency through max_parallel setting
Resource monitoring (CPU/Memory thresholds)
Batch processing capabilities
Configurable timeouts and retry mechanisms
Performance metrics tracking

🎨 Alternatives

No alternative solutions currently exist for this upgrade.

📃 Work Estimates

Core Framework Implementation (2 days)
- Base verifier class
- Result and configuration models
- Error handling infrastructure
Testing Infrastructure (1 days)
- Unit tests
- Integration tests
- Performance benchmarks
Documentation (0.5 days)
- API documentation
- Usage examples

🗂️ Future Work

Caching layer for verification results
Improved batch processing algorithms
Enhanced resource monitoring
Integration with monitoring systems
Extended task type support

❓ FAQ

Q: How do I implement a custom verifier? A: Extend the BaseVerifier class and implement the _verify_implementation method.

Q: How does batch verification work? A: Use the verify_batch method with a list of responses. Concurrency is automatically managed.

Q: What happens if verification fails? A: The system provides detailed error information and can retry based on configuration.

Q: How can I monitor verification performance? A: Use the VerificationMetrics class which tracks success rates, durations, and error counts.

Q: Can I disable verification for certain scenarios? A: Yes, use VerifierConfig to enable/disable verification and configure behavior.

Checklist

Go over all the following points, and put an x in all the boxes that apply.

I have read the CONTRIBUTION guide (required)
I have linked this PR to an issue using the Development section on the right sidebar or by adding Fixes #issue-number in the PR description (required)
I have checked if any dependencies need to be added or updated in pyproject.toml and poetry.lock
I have updated the tests accordingly (required for a bug fix or a new feature)
I have updated the documentation if needed:
I have added examples if this is a new feature

If you are unsure about any of these, don't hesitate to ask. We are here to help!

Wendong-Fan · 2025-02-13T14:49:07Z

environment base module research & design @lightaime (camel), @cherlix, @SwordFaith
dataset base module research & design @lightaime (camel), @zekun Wang, @shuyhere
verifier base module research & design @Wendong-Fan (camel), @jiemingcheng-hub, @astitwlathe, @For-Chance

then implementation

then talk to other domain teams to unify the interface
each one takes one domain and provide guidance

Target DDL:
18th Feb

GitHoobar

@Wendong-Fan Looks promising! Left some comments~ (considering future work)

camel/verifiers/base.py

camel/verifiers/models.py

old-hallerite · 2025-02-15T21:22:10Z

@Wendong-Fan nice work, but I think it could be improved.

From my understanding, the verifiers will be used in two main ways:

to filter synthetic data
to get a reward from RLVR

In both cases, most of the complexity should probably be encapsulated in the environment class and a verifier should mostly just be a function that we can call with an extracted part of the LLM response (we may even need an extra class for extractors, as extraction can get quite complicated) and get a "correct", "incorrect" or "timeout" (to try again or discard the data point in RLVR).

If we implement the way it is currently being done, calling the verifier will take a lot of code for defining everything that we will probably have to define for the environment and data filtering pipeline in the first place, leading to a lot of duplicated code.

Further, I think it makes sense to have one verifier be instantiated in the environment and keeping state (as is currently implemented), which is especially crucial when setting up the verifier is non-trivial, hence a setup function should probably be added that does nothing, if not overridden, but can be used for starting the verifier backend and making sure there is a connection to the verifier if the verifier is an online service. The same thing is true for a tear down function to free processes.

What do you guys think? @Wendong-Fan @lightaime

Happy to discuss.

GitHoobar · 2025-02-16T04:13:35Z

@Wendong-Fan nice work, but I think it could be improved.

From my understanding, the verifiers will be used in two main ways:

to filter synthetic data

to get a reward from RLVR

In both cases, most of the complexity should probably be encapsulated in the environment class and a verifier should mostly just be a function that we can call with an extracted part of the LLM response (we may even need an extra class for extractors, as extraction can get quite complicated) and get a "correct", "incorrect" or "timeout" (to try again or discard the data point in RLVR).

If we implement the way it is currently being done, calling the verifier will take a lot of code for defining everything that we will probably have to define for the environment and data filtering pipeline in the first place, leading to a lot of duplicated code.

Further, I think it makes sense to have one verifier be instantiated in the environment and keeping state (as is currently implemented), which is especially crucial when setting up the verifier is non-trivial, hence a setup function should probably be added that does nothing, if not overridden, but can be used for starting the verifier backend and making sure there is a connection to the verifier if the verifier is an online service. The same thing is true for a tear down function to free processes.

What do you guys think? @Wendong-Fan @lightaime

Happy to discuss.

Hey @hallerite,
Ahh, the current verifier implementation actually serves a different primary purpose - domain-specific verification of LLM responses (MathVerifier, CodeVerifier etc). The verifiers are designed to validate outputs against domain-specific criteria, provide detailed verification results, and handle complex verification logic.

Did you think of it as something different or am I missing something? Your points about setup/teardown lifecycle and extraction logic are interesting - how do you see these fitting with domain-specific verification needs?
cc: @Wendong-Fan @lightaime

…1620) Co-authored-by: Rishabh <[email protected]>

old-hallerite

Thanks @Wendong-Fan,
I left some comments.

camel/verifiers/models.py

camel/datasets/base.py

camel/environments/base.py

camel/extractors/base.py

camel/verifiers/base.py

camel/verifiers/models.py

Wendong-Fan · 2025-02-26T17:39:42Z

Thanks @Wendong-Fan, I left some comments.

Thanks @hallerite 's comment, my implementation principle for the base module is to ensure flexibility and broad usability. That's why you'll notice some optional attributes, they allow users to choose whether or not to use them. IMO for the base module, supporting these attributes could be beneficial

Co-authored-by: Wendong <[email protected]>

old-hallerite

This looks good to me now and ready to be merged.

Co-authored-by: Rishabh <[email protected]> Co-authored-by: hallerite <[email protected]>

feat: Initial implementation for BaseVerifier

2937b1a

Wendong-Fan self-assigned this Feb 13, 2025

Wendong-Fan requested a review from lightaime February 13, 2025 07:20

Wendong-Fan added the New Feature label Feb 13, 2025

Wendong-Fan added this to the Sprint 23 milestone Feb 13, 2025

Wendong-Fan marked this pull request as ready for review February 13, 2025 07:21

Wendong-Fan linked an issue Feb 13, 2025 that may be closed by this pull request

[Feature Request] Implement BaseVerifier #1598

Closed

2 tasks

GitHoobar self-requested a review February 14, 2025 04:26

GitHoobar reviewed Feb 14, 2025

View reviewed changes

camel/verifiers/base.py Show resolved Hide resolved

camel/verifiers/base.py Outdated Show resolved Hide resolved

camel/verifiers/models.py Outdated Show resolved Hide resolved

update based on GitHoobar's review comment

33d7564

feat: Initial implementation for env, dataset, extractor base model (#…

3ee00af

…1620) Co-authored-by: Rishabh <[email protected]>

old-hallerite self-requested a review February 26, 2025 15:00

old-hallerite requested changes Feb 26, 2025

View reviewed changes

old-hallerite and others added 2 commits March 4, 2025 12:14

feat: Generative Dataset Base Class (#1652)

57c7723

Co-authored-by: Wendong <[email protected]>

fix: rename files to not confuse pytest

acd928f

old-hallerite self-assigned this Mar 4, 2025

old-hallerite self-requested a review March 4, 2025 11:30

old-hallerite approved these changes Mar 4, 2025

View reviewed changes

Merge branch 'master' into base_verifier

88a708b

Wendong-Fan merged commit 84185a5 into master Mar 4, 2025
6 checks passed

Wendong-Fan deleted the base_verifier branch March 4, 2025 11:38

ZackZikaiXiao pushed a commit to ZackZikaiXiao/camel that referenced this pull request Mar 23, 2025

feat: Initial implementation for BaseVerifier (camel-ai#1597)

9ee7714

Co-authored-by: Rishabh <[email protected]> Co-authored-by: hallerite <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Initial implementation for BaseVerifier #1597

feat: Initial implementation for BaseVerifier #1597

Uh oh!

Wendong-Fan commented Feb 13, 2025 •

edited

Loading

Uh oh!

Wendong-Fan commented Feb 13, 2025 •

edited

Loading

Uh oh!

GitHoobar left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

old-hallerite commented Feb 15, 2025 •

edited

Loading

Uh oh!

GitHoobar commented Feb 16, 2025 •

edited

Loading

Uh oh!

old-hallerite left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Wendong-Fan commented Feb 26, 2025

Uh oh!

old-hallerite left a comment

Uh oh!

Uh oh!

Uh oh!

feat: Initial implementation for BaseVerifier #1597

feat: Initial implementation for BaseVerifier #1597

Uh oh!

Conversation

Wendong-Fan commented Feb 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

RFC

📋 Problem

✔️ Goals

❌ NonGoals

💡 Solution

Performance Impact

🎨 Alternatives

📃 Work Estimates

🗂️ Future Work

❓ FAQ

Checklist

Uh oh!

Wendong-Fan commented Feb 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GitHoobar left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

old-hallerite commented Feb 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GitHoobar commented Feb 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

old-hallerite left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Wendong-Fan commented Feb 26, 2025

Uh oh!

old-hallerite left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Wendong-Fan commented Feb 13, 2025 •

edited

Loading

Wendong-Fan commented Feb 13, 2025 •

edited

Loading

GitHoobar left a comment •

edited

Loading

old-hallerite commented Feb 15, 2025 •

edited

Loading

GitHoobar commented Feb 16, 2025 •

edited

Loading