-
Notifications
You must be signed in to change notification settings - Fork 213
test: support running evals in containerized holmesgpt #1261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Results of HolmesGPT evalsDuration: 4m 28s | View workflow logs Results of HolmesGPT evals
Legend
🔄 Re-run evals manuallyOption 1: Comment on this PR with Or with options (one per line):
Option 2: Trigger via GitHub Actions UI → "Run workflow" |
WalkthroughThe PR introduces MCP (Model Context Protocol) server support to the toolset loading system, allowing toolsets to be loaded from YAML files with an mcp_servers block. It adds path attributes to loaded toolsets, expands public exports with additional toolset classes, and adjusts test mocking to permit MCP-type toolsets. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Pre-merge checks❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
Comment |
|
✅ Docker image ready for
Use this tag to pull the image for testing. 📋 Copy commandsgcloud auth configure-docker us-central1-docker.pkg.dev
docker pull us-central1-docker.pkg.dev/robusta-development/temporary-builds/holmes:643e773
docker tag us-central1-docker.pkg.dev/robusta-development/temporary-builds/holmes:643e773 me-west1-docker.pkg.dev/robusta-development/development/holmes-dev:643e773
docker push me-west1-docker.pkg.dev/robusta-development/development/holmes-dev:643e773Patch Helm values in one line (choose the chart you use): HolmesGPT chart: helm upgrade --install holmesgpt ./helm/holmes \
--set registry=me-west1-docker.pkg.dev/robusta-development/development \
--set image=holmes-dev:643e773Robusta wrapper chart: helm upgrade --install robusta robusta/robusta \
--reuse-values \
--set holmes.registry=me-west1-docker.pkg.dev/robusta-development/development \
--set holmes.image=holmes-dev:643e773 |
| from holmes.plugins.toolsets.azure_sql.azure_sql_toolset import AzureSQLToolset | ||
| from holmes.plugins.toolsets.bash.bash_toolset import BashExecutorToolset | ||
| from holmes.plugins.toolsets.coralogix.toolset_coralogix import CoralogixToolset | ||
| from holmes.plugins.toolsets.datadog.toolset_datadog_general import ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These package import sequence change can be ignored if we introduce #1252
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
holmes/plugins/toolsets/__init__.py (1)
149-155: Minor typos in comments.The logic is correct—builtin toolsets are properly typed and their paths are cleared to avoid exposing internal paths.
🔎 Proposed fix for typos
# disable built-in toolsets by default, and the user can enable them explicitly in config. for toolset in all_toolsets: - # It's safe to set type here as we don't have mcp build-in toolsets. + # It's safe to set type here as we don't have mcp built-in toolsets. toolset.type = ToolsetType.BUILTIN - # dont' expose build-in toolsets path + # don't expose built-in toolsets path toolset.path = None
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
holmes/plugins/toolsets/__init__.pytests/llm/utils/mock_toolset.pytests/llm/utils/test_case_utils.pytests/llm/utils/test_env_vars.py
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py: Use Ruff for formatting and linting (configured in pyproject.toml)
Type hints required (mypy configuration in pyproject.toml)
ALWAYS place Python imports at the top of the file, not inside functions or methods
Files:
tests/llm/utils/test_case_utils.pytests/llm/utils/mock_toolset.pytests/llm/utils/test_env_vars.pyholmes/plugins/toolsets/__init__.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Tests: match source structure under
tests/
Files:
tests/llm/utils/test_case_utils.pytests/llm/utils/mock_toolset.pytests/llm/utils/test_env_vars.py
holmes/plugins/toolsets/**
📄 CodeRabbit inference engine (CLAUDE.md)
Toolsets: organize as
holmes/plugins/toolsets/{name}.yamlor{name}/directories
Files:
holmes/plugins/toolsets/__init__.py
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: HolmesGPT/holmesgpt PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-29T08:35:37.668Z
Learning: New toolsets require integration tests
📚 Learning: 2025-12-29T08:35:37.668Z
Learnt from: CR
Repo: HolmesGPT/holmesgpt PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-29T08:35:37.668Z
Learning: New toolsets require integration tests
Applied to files:
tests/llm/utils/mock_toolset.py
📚 Learning: 2025-12-29T08:35:37.668Z
Learnt from: CR
Repo: HolmesGPT/holmesgpt PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-29T08:35:37.668Z
Learning: Applies to holmes/plugins/toolsets/** : Toolsets: organize as `holmes/plugins/toolsets/{name}.yaml` or `{name}/` directories
Applied to files:
tests/llm/utils/mock_toolset.pyholmes/plugins/toolsets/__init__.py
📚 Learning: 2025-12-29T08:35:37.668Z
Learnt from: CR
Repo: HolmesGPT/holmesgpt PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-29T08:35:37.668Z
Learning: Applies to holmes/plugins/toolsets/**/*.yaml : All toolsets MUST return detailed error messages from underlying APIs to enable LLM self-correction, including exact query/command executed, time ranges and parameters used, and full API error response
Applied to files:
tests/llm/utils/mock_toolset.py
🧬 Code graph analysis (3)
tests/llm/utils/test_case_utils.py (6)
holmes/config.py (1)
Config(46-526)holmes/core/llm.py (2)
DefaultLLM(138-527)models(681-683)holmes/core/models.py (2)
InvestigateRequest(94-107)WorkloadHealthRequest(219-229)holmes/core/prompt.py (1)
append_file_to_user_prompt(9-13)holmes/core/resource_instruction.py (1)
ResourceInstructions(15-17)tests/llm/utils/constants.py (1)
get_allowed_tags_list(47-49)
tests/llm/utils/mock_toolset.py (4)
holmes/core/tools.py (1)
ToolsetType(152-155)holmes/plugins/toolsets/__init__.py (2)
load_builtin_toolsets(126-156)load_toolsets_from_file(57-79)tests/llm/utils/mock_dal.py (1)
load_mock_dal(225-254)tests/llm/utils/test_case_utils.py (1)
HolmesTestCase(113-144)
holmes/plugins/toolsets/__init__.py (3)
holmes/core/tools.py (2)
Toolset(542-786)ToolsetType(152-155)holmes/plugins/toolsets/datadog/toolset_datadog_general.py (1)
DatadogGeneralToolset(198-291)holmes/plugins/toolsets/investigator/core_investigation.py (1)
CoreInvestigationToolset(134-154)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: build
- GitHub Check: llm_evals
- GitHub Check: build (3.12)
- GitHub Check: build (3.11)
- GitHub Check: build (3.10)
🔇 Additional comments (8)
tests/llm/utils/test_env_vars.py (1)
1-6: LGTM!The removal of
MODEL_LIST_FILE_LOCATIONfrom this test environment variables file and relocating its import totest_case_utils.py(fromholmes.core.llm) aligns with the PR objective of exposing/sharing this constant for containerized deployments.tests/llm/utils/test_case_utils.py (3)
8-18: LGTM!The import reorganization properly centralizes
MODEL_LIST_FILE_LOCATIONfromholmes.core.llm, enabling shared model configuration for containerized deployments. The additional imports (Config,DefaultLLM,ResourceInstructions) support the new model list handling functions added later in the file.
37-84: LGTM!The model list handling functions are well-structured:
_model_list_exists()provides clear warning when the env var is set but file doesn't exist_get_models_from_model_list()properly falls back when model list is unavailable_filter_models_from_env()validates that requested models exist with helpful error messagingget_models()correctly requiresCLASSIFIER_MODELwhen multiple models are specified
549-553: LGTM!The
create_eval_llmfunction cleanly delegates toConfig._get_llmwhen the model list file exists, enabling containerized deployments to use shared model configuration, while providing a simple fallback for standalone usage.tests/llm/utils/mock_toolset.py (2)
7-28: LGTM!The import additions (
threading,urllib,Path,ToolsetType,load_mock_dal) support the MCP toolset handling and are properly organized at the top of the file per coding guidelines.
704-710: LGTM!The validation logic correctly allows MCP toolsets to bypass the builtin name check. Since MCP toolsets are external servers (not references to builtin toolsets), they should be permitted regardless of whether their names match existing builtins.
holmes/plugins/toolsets/__init__.py (2)
20-36: LGTM!The new imports for
DatadogGeneralToolset,DatadogTracesToolset,GrafanaToolset, andCoreInvestigationToolsetare correctly organized at the top of the file and align with the toolset ecosystem expansion.
68-78: Approve with minor note on potential name collision.The MCP server loading logic correctly:
- Reads the
mcp_serversblock from YAML- Assigns the
ToolsetType.MCPtype to each entry- Merges them into
toolsets_dictfor unified processing- Sets the source path for traceability
Note that if an MCP server name matches an entry in
toolsets, the MCP server will silently overwrite it. This may be intentional (allowing MCP to take precedence), but if explicit warnings for collisions are desired, consider logging whenname in toolsets_dictbefore assignment.
| THIS_DIR = os.path.abspath(os.path.dirname(__file__)) | ||
|
|
||
|
|
||
| def load_toolsets_from_file( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 2 implementation that loading toolsets files.
There is one in the toolset_manager which loading also mcp and there is this that only used to load_builtin toolsets until now.
I think that we need either implement the toolset file loading 1 and reuse or keep it separate but do not introduce the mcp logic to the builtin loading function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I was trying to merge the toolset load for both places, but the one in toolset_manager requires built_toolsets as the context.
I can revert the change in load_toolsets_from_file since it's used specifically for built-int toolset.
The background is when the holmesgpt is deployed as the pod in k8s cluster, we want to share the model by /etc/holmes/config/model_list.yaml and toolset(mcp) configuration by /etc/holmes/config/custom_toolset.yaml
Two main changes are included:
Summary by CodeRabbit
New Features
Chores
✏️ Tip: You can customize this high-level summary in your review settings.