Skip to content

Conversation

@mainred
Copy link
Collaborator

@mainred mainred commented Dec 29, 2025

The background is when the holmesgpt is deployed as the pod in k8s cluster, we want to share the model by /etc/holmes/config/model_list.yaml and toolset(mcp) configuration by /etc/holmes/config/custom_toolset.yaml

Two main changes are included:

  • assign mcp type to the mcp server when loaded from yaml so that we can load an mcp server and use it without extra change
  • share the MODEL_LIST_FILE_LOCATION which is used for server mode

Summary by CodeRabbit

  • New Features

    • MCP servers can now be configured in YAML toolset files
    • Loaded toolsets now track their source file path
  • Chores

    • Expanded publicly available toolset classes for integration support
    • Updated test utilities to support MCP toolset validation

✏️ Tip: You can customize this high-level summary in your review settings.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 29, 2025

Results of HolmesGPT evals

Duration: 4m 28s | View workflow logs

Results of HolmesGPT evals

  • ask_holmes: 9/9 test cases were successful, 0 regressions
Test suite Test case Status
ask 09_crashpod
ask 101_loki_historical_logs_pod_deleted
ask 111_pod_names_contain_service
ask 12_job_crashing
ask 162_get_runbooks
ask 176_network_policy_blocking_traffic_no_runbooks
ask 24_misconfigured_pvc
ask 43_current_datetime_from_prompt
ask 61_exact_match_counting

Legend

  • ✅ the test was successful
  • :minus: the test was skipped
  • ⚠️ the test failed but is known to be flaky or known to fail
  • 🚧 the test had a setup failure (not a code regression)
  • 🔧 the test failed due to mock data issues (not a code regression)
  • 🚫 the test was throttled by API rate limits/overload
  • ❌ the test failed and should be fixed before merging the PR

🔄 Re-run evals manually

Option 1: Comment on this PR with /eval:

/eval

Or with options (one per line):

/eval
model: gpt-4o
filter: 09_crashpod
iterations: 5
Option Description
model Model(s) to test (default: same as automatic runs)
markers Pytest markers (default: regression)
filter Pytest -k filter
iterations Number of runs, max 10

Option 2: Trigger via GitHub Actions UI → "Run workflow"

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 29, 2025

Walkthrough

The PR introduces MCP (Model Context Protocol) server support to the toolset loading system, allowing toolsets to be loaded from YAML files with an mcp_servers block. It adds path attributes to loaded toolsets, expands public exports with additional toolset classes, and adjusts test mocking to permit MCP-type toolsets.

Changes

Cohort / File(s) Summary
MCP Toolset Loading & Exports
holmes/plugins/toolsets/__init__.py
Adds MCP server recognition in load_toolsets_from_file() to parse mcp_servers block and inject each as a toolset with type=ToolsetType.MCP. Sets path attribute on all loaded toolsets pointing to the source YAML file. Adds public imports for DatadogGeneralToolset, DatadogTracesToolset, CoreInvestigationToolset, and relocates GrafanaToolset. Reorders USE_LEGACY_KUBERNETES_LOGS in env var imports.
Test Validation & Infrastructure
tests/llm/utils/mock_toolset.py
Modifies toolset validation to permit MCP-type toolsets referencing non-builtin names by adding definition.type != ToolsetType.MCP condition before raising validation error. Reorganizes imports (threading, urllib, Path, ToolsetType, load_mock_dal).
Test Import Reorganizations
tests/llm/utils/test_case_utils.py, tests/llm/utils/test_env_vars.py
Reorganizes imports in test_case_utils.py to explicitly include yaml, pydantic utilities, and new references (Config, MODEL_LIST_FILE_LOCATION, DefaultLLM, ResourceInstructions, CLASSIFIER_MODEL, MODEL). Removes MODEL_LIST_FILE_LOCATION environment variable definition from test_env_vars.py.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested reviewers

  • moshemorad
  • RoiGlinik
  • arikalon1

Pre-merge checks

❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title mentions 'support running evals in containerized holmesgpt' but the actual changes focus on supporting MCP servers in YAML configuration and sharing model/toolset files for containerized deployments, not specifically evaluations. Consider revising the title to reflect the actual focus: supporting containerized deployment with shared model/toolset configuration files, or clarify if 'evals' refers to a specific internal concept.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

github-actions bot commented Dec 29, 2025

Docker image ready for 643e773 (built in 4m 7s)

⚠️ Warning: does not support ARM (ARM images are built on release only - not on every PR)

Use this tag to pull the image for testing.

📋 Copy commands

⚠️ Temporary images are deleted after 30 days. Copy to a permanent registry before using them:

gcloud auth configure-docker us-central1-docker.pkg.dev
docker pull us-central1-docker.pkg.dev/robusta-development/temporary-builds/holmes:643e773
docker tag us-central1-docker.pkg.dev/robusta-development/temporary-builds/holmes:643e773 me-west1-docker.pkg.dev/robusta-development/development/holmes-dev:643e773
docker push me-west1-docker.pkg.dev/robusta-development/development/holmes-dev:643e773

Patch Helm values in one line (choose the chart you use):

HolmesGPT chart:

helm upgrade --install holmesgpt ./helm/holmes \
  --set registry=me-west1-docker.pkg.dev/robusta-development/development \
  --set image=holmes-dev:643e773

Robusta wrapper chart:

helm upgrade --install robusta robusta/robusta \
  --reuse-values \
  --set holmes.registry=me-west1-docker.pkg.dev/robusta-development/development \
  --set holmes.image=holmes-dev:643e773

from holmes.plugins.toolsets.azure_sql.azure_sql_toolset import AzureSQLToolset
from holmes.plugins.toolsets.bash.bash_toolset import BashExecutorToolset
from holmes.plugins.toolsets.coralogix.toolset_coralogix import CoralogixToolset
from holmes.plugins.toolsets.datadog.toolset_datadog_general import (
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These package import sequence change can be ignored if we introduce #1252

@mainred mainred changed the title Chore: support running evals in containerized holmesgpt test: support running evals in containerized holmesgpt Dec 29, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
holmes/plugins/toolsets/__init__.py (1)

149-155: Minor typos in comments.

The logic is correct—builtin toolsets are properly typed and their paths are cleared to avoid exposing internal paths.

🔎 Proposed fix for typos
     # disable built-in toolsets by default, and the user can enable them explicitly in config.
     for toolset in all_toolsets:
-        # It's safe to set type here as we don't have mcp build-in toolsets.
+        # It's safe to set type here as we don't have mcp built-in toolsets.
         toolset.type = ToolsetType.BUILTIN
-        # dont' expose build-in toolsets path
+        # don't expose built-in toolsets path
         toolset.path = None
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e6ae36b and b281711.

📒 Files selected for processing (4)
  • holmes/plugins/toolsets/__init__.py
  • tests/llm/utils/mock_toolset.py
  • tests/llm/utils/test_case_utils.py
  • tests/llm/utils/test_env_vars.py
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use Ruff for formatting and linting (configured in pyproject.toml)
Type hints required (mypy configuration in pyproject.toml)
ALWAYS place Python imports at the top of the file, not inside functions or methods

Files:

  • tests/llm/utils/test_case_utils.py
  • tests/llm/utils/mock_toolset.py
  • tests/llm/utils/test_env_vars.py
  • holmes/plugins/toolsets/__init__.py
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Tests: match source structure under tests/

Files:

  • tests/llm/utils/test_case_utils.py
  • tests/llm/utils/mock_toolset.py
  • tests/llm/utils/test_env_vars.py
holmes/plugins/toolsets/**

📄 CodeRabbit inference engine (CLAUDE.md)

Toolsets: organize as holmes/plugins/toolsets/{name}.yaml or {name}/ directories

Files:

  • holmes/plugins/toolsets/__init__.py
🧠 Learnings (4)
📓 Common learnings
Learnt from: CR
Repo: HolmesGPT/holmesgpt PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-29T08:35:37.668Z
Learning: New toolsets require integration tests
📚 Learning: 2025-12-29T08:35:37.668Z
Learnt from: CR
Repo: HolmesGPT/holmesgpt PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-29T08:35:37.668Z
Learning: New toolsets require integration tests

Applied to files:

  • tests/llm/utils/mock_toolset.py
📚 Learning: 2025-12-29T08:35:37.668Z
Learnt from: CR
Repo: HolmesGPT/holmesgpt PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-29T08:35:37.668Z
Learning: Applies to holmes/plugins/toolsets/** : Toolsets: organize as `holmes/plugins/toolsets/{name}.yaml` or `{name}/` directories

Applied to files:

  • tests/llm/utils/mock_toolset.py
  • holmes/plugins/toolsets/__init__.py
📚 Learning: 2025-12-29T08:35:37.668Z
Learnt from: CR
Repo: HolmesGPT/holmesgpt PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-12-29T08:35:37.668Z
Learning: Applies to holmes/plugins/toolsets/**/*.yaml : All toolsets MUST return detailed error messages from underlying APIs to enable LLM self-correction, including exact query/command executed, time ranges and parameters used, and full API error response

Applied to files:

  • tests/llm/utils/mock_toolset.py
🧬 Code graph analysis (3)
tests/llm/utils/test_case_utils.py (6)
holmes/config.py (1)
  • Config (46-526)
holmes/core/llm.py (2)
  • DefaultLLM (138-527)
  • models (681-683)
holmes/core/models.py (2)
  • InvestigateRequest (94-107)
  • WorkloadHealthRequest (219-229)
holmes/core/prompt.py (1)
  • append_file_to_user_prompt (9-13)
holmes/core/resource_instruction.py (1)
  • ResourceInstructions (15-17)
tests/llm/utils/constants.py (1)
  • get_allowed_tags_list (47-49)
tests/llm/utils/mock_toolset.py (4)
holmes/core/tools.py (1)
  • ToolsetType (152-155)
holmes/plugins/toolsets/__init__.py (2)
  • load_builtin_toolsets (126-156)
  • load_toolsets_from_file (57-79)
tests/llm/utils/mock_dal.py (1)
  • load_mock_dal (225-254)
tests/llm/utils/test_case_utils.py (1)
  • HolmesTestCase (113-144)
holmes/plugins/toolsets/__init__.py (3)
holmes/core/tools.py (2)
  • Toolset (542-786)
  • ToolsetType (152-155)
holmes/plugins/toolsets/datadog/toolset_datadog_general.py (1)
  • DatadogGeneralToolset (198-291)
holmes/plugins/toolsets/investigator/core_investigation.py (1)
  • CoreInvestigationToolset (134-154)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: build
  • GitHub Check: llm_evals
  • GitHub Check: build (3.12)
  • GitHub Check: build (3.11)
  • GitHub Check: build (3.10)
🔇 Additional comments (8)
tests/llm/utils/test_env_vars.py (1)

1-6: LGTM!

The removal of MODEL_LIST_FILE_LOCATION from this test environment variables file and relocating its import to test_case_utils.py (from holmes.core.llm) aligns with the PR objective of exposing/sharing this constant for containerized deployments.

tests/llm/utils/test_case_utils.py (3)

8-18: LGTM!

The import reorganization properly centralizes MODEL_LIST_FILE_LOCATION from holmes.core.llm, enabling shared model configuration for containerized deployments. The additional imports (Config, DefaultLLM, ResourceInstructions) support the new model list handling functions added later in the file.


37-84: LGTM!

The model list handling functions are well-structured:

  • _model_list_exists() provides clear warning when the env var is set but file doesn't exist
  • _get_models_from_model_list() properly falls back when model list is unavailable
  • _filter_models_from_env() validates that requested models exist with helpful error messaging
  • get_models() correctly requires CLASSIFIER_MODEL when multiple models are specified

549-553: LGTM!

The create_eval_llm function cleanly delegates to Config._get_llm when the model list file exists, enabling containerized deployments to use shared model configuration, while providing a simple fallback for standalone usage.

tests/llm/utils/mock_toolset.py (2)

7-28: LGTM!

The import additions (threading, urllib, Path, ToolsetType, load_mock_dal) support the MCP toolset handling and are properly organized at the top of the file per coding guidelines.


704-710: LGTM!

The validation logic correctly allows MCP toolsets to bypass the builtin name check. Since MCP toolsets are external servers (not references to builtin toolsets), they should be permitted regardless of whether their names match existing builtins.

holmes/plugins/toolsets/__init__.py (2)

20-36: LGTM!

The new imports for DatadogGeneralToolset, DatadogTracesToolset, GrafanaToolset, and CoreInvestigationToolset are correctly organized at the top of the file and align with the toolset ecosystem expansion.


68-78: Approve with minor note on potential name collision.

The MCP server loading logic correctly:

  1. Reads the mcp_servers block from YAML
  2. Assigns the ToolsetType.MCP type to each entry
  3. Merges them into toolsets_dict for unified processing
  4. Sets the source path for traceability

Note that if an MCP server name matches an entry in toolsets, the MCP server will silently overwrite it. This may be intentional (allowing MCP to take precedence), but if explicit warnings for collisions are desired, consider logging when name in toolsets_dict before assignment.

THIS_DIR = os.path.abspath(os.path.dirname(__file__))


def load_toolsets_from_file(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 2 implementation that loading toolsets files.
There is one in the toolset_manager which loading also mcp and there is this that only used to load_builtin toolsets until now.

I think that we need either implement the toolset file loading 1 and reuse or keep it separate but do not introduce the mcp logic to the builtin loading function.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I was trying to merge the toolset load for both places, but the one in toolset_manager requires built_toolsets as the context.
I can revert the change in load_toolsets_from_file since it's used specifically for built-int toolset.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants