Conversation
…upport for a new gemini_cli backend: Add Backend
- yields an "error" event with the raw error message instead of a content message, so callers can handle fatal conditions separately.
…upport for a new gemini_cli backend: Add Backend
- yields an "error" event with the raw error message instead of a content message, so callers can handle fatal conditions separately.
…upport for a new gemini_cli backend: Add Backend
- yields an "error" event with the raw error message instead of a content message, so callers can handle fatal conditions separately.
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughAdds a new Gemini CLI backend (gemini_cli) with workspace/hook IPC, Docker and local subprocess modes, and MCP/workflow support; extends Copilot backend with Docker-aware permission hooks; introduces native hook adapters for Copilot and Gemini CLI; updates capabilities, CLI wiring, filesystem/docker mounts, validation, docs, and many tests. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant GeminiBackend as GeminiCLIBackend
participant GeminiCLI as Gemini CLI
participant HookIPC as .gemini/hook_payload.json
participant HookScript as gemini_cli_hook_script.py
Client->>GeminiBackend: stream_with_tools(messages, tools)
GeminiBackend->>GeminiBackend: write workspace settings.json (hooks)
GeminiBackend->>GeminiCLI: spawn CLI subprocess (stdin)
Note over GeminiCLI,HookScript: CLI invokes BeforeTool/AfterTool hooks
GeminiCLI->>HookScript: execute hook (stdin: tool_event)
HookScript->>HookIPC: read/write hook_payload.json (permission/injection)
HookScript-->>GeminiCLI: emit hook response (approved/denied/inject)
GeminiCLI-->>GeminiBackend: stdout JSON stream events
GeminiBackend->>Client: yield parsed StreamChunk(s)
sequenceDiagram
participant Client
participant CopilotBackend
participant CopilotSDK as Copilot SDK
participant PPM as PathPermissionManager
participant MCP as MCP (container)
Client->>CopilotBackend: stream_with_tools(...)
CopilotBackend->>CopilotBackend: build session_hooks (on_pre_tool_use)
CopilotBackend->>CopilotSDK: create_session(hooks)
CopilotSDK->>CopilotBackend: on_pre_tool_use(request)
CopilotBackend->>PPM: pre_tool_use_hook(path/command)
PPM-->>CopilotBackend: allow/deny
alt denied
CopilotBackend-->>CopilotSDK: deny decision
else allowed
CopilotBackend-->>CopilotSDK: allow (or route tool to MCP if docker-mode)
end
CopilotSDK->>Client: stream results
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
|
There was a problem hiding this comment.
Actionable comments posted: 13
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
🟡 Minor comments (13)
massgen/tests/test_concurrent_logging_sessions.py-410-420 (1)
410-420:⚠️ Potential issue | 🟡 MinorAdd a Google-style docstring to
_slow_save.
_slow_saveis a changed function and should include a Google-style docstring for maintainability and guideline compliance.Suggested patch
def _slow_save(self, data): + """Persist registry data in two chunks with a delay. + + Args: + self: SessionRegistry instance. + data: Registry payload to serialize and persist. + """ payload = json.dumps(data, indent=2) midpoint = max(1, len(payload) // 2) tmp_file = self.registry_path.with_suffix(".slow_tmp") with open(tmp_file, "w") as f: f.write(payload[:midpoint]) f.flush() time.sleep(0.002) f.write(payload[midpoint:]) f.flush() tmp_file.replace(self.registry_path)As per coding guidelines:
**/*.py: For new or changed functions, include Google-style docstrings.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@massgen/tests/test_concurrent_logging_sessions.py` around lines 410 - 420, Add a Google-style docstring to the _slow_save function describing its purpose, parameters and side effects: explain that it serializes the passed data to JSON with indentation, writes the first half to a temporary file, sleeps to simulate slowness, writes the rest, and atomically replaces the registry file; document the parameter `data` (type: any/serializable), note that it returns None and may raise IO/OSError on file operations, and include a short example or note about the temporary file name if helpful.massgen/tests/test_task_plan_support.py-13-15 (1)
13-15:⚠️ Potential issue | 🟡 MinorPreserve empty-list semantics in
_FakeHost.__init__.
if active_tasksturns[]intoNone, so tests can’t represent a valid “empty task plan” state.Suggested fix
- self._active_tasks = [dict(t) for t in active_tasks] if active_tasks else None + self._active_tasks = ( + [dict(t) for t in active_tasks] if active_tasks is not None else None + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@massgen/tests/test_task_plan_support.py` around lines 13 - 15, The constructor for _FakeHost incorrectly treats an empty list as None by using "if active_tasks"; change the check to "if active_tasks is not None" so that an explicit empty list is preserved, and keep the defensive copy logic (self._active_tasks = [dict(t) for t in active_tasks] if active_tasks is not None else None) so callers can represent a valid empty-task-plan state; update the __init__ in _FakeHost accordingly.massgen/tests/test_task_plan_support.py-108-113 (1)
108-113:⚠️ Potential issue | 🟡 MinorAssert
plan_updatesis populated before indexing.Add an explicit check so failures are diagnostic instead of
IndexError.Suggested fix
assert handled is True assert host.pinned_updates, "Expected pinned task plan update" + assert host.plan_updates, "Expected host task-plan state update" assert host.pinned_updates[-1]["operation"] == "update" assert host.pinned_updates[-1]["focused_task_id"] == "task_1" assert host.pinned_updates[-1]["tasks"][0]["status"] == "completed" assert host.plan_updates[-1]["operation"] == "update"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@massgen/tests/test_task_plan_support.py` around lines 108 - 113, Add an explicit assertion that host.plan_updates is populated before the test indexes its last element: after asserting host.pinned_updates and before accessing host.plan_updates[-1], assert that host.plan_updates (or len(host.plan_updates) > 0) is truthy so failures report a clear assertion instead of raising IndexError when the list is empty; update the test in test_task_plan_support.py where variables handled and host are used to include this check.massgen/skills/backend-integrator/SKILL.md-669-669 (1)
669-669:⚠️ Potential issue | 🟡 MinorFix markdown formatting issues.
Static analysis flagged several formatting violations:
- Tables at lines 669, 711, 729, 735 need blank lines before and after
- Fenced code block at line 951 needs a language specifier
📝 Proposed fixes
Add blank lines around tables and specify language for code blocks:
**Docker mode**: Skip backend sandbox config — the container IS the sandbox. Grant full access inside the container. + ### SDK Wrapper Backend (like Claude Code, Copilot) -``` + +```python __init__: import SDK, configure options, detect Docker modeApply similar blank line additions around the tables at lines 669, 711, 729, and 735.
Also applies to: 711-711, 729-729, 735-735, 951-951
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@massgen/skills/backend-integrator/SKILL.md` at line 669, The markdown has tables that lack surrounding blank lines and a fenced code block without a language; add a blank line before and after each table occurrence of the header row "| Section | Gate |" (and the other two table blocks referenced) so they are separated from adjacent paragraphs, and change the fenced code block that contains "__init__: import SDK, configure options, detect Docker mode" to use a language specifier (e.g., ```python) on the opening fence; update the three other table occurrences noted (around the same table content) similarly to ensure proper Markdown rendering.docs/source/user_guide/backends.rst-156-165 (1)
156-165:⚠️ Potential issue | 🟡 Minor
gemini_cliis misclassified as built-in code execution here.The rest of this page treats Gemini CLI as native shell/filesystem access, not provider-side
enable_code_execution. Keeping ⭐ in the Code Execution column implies thatenable_code_executionis valid for this backend when the docs elsewhere only describe shell/bash support.As per coding guidelines, "Behavior descriptions must match what the code does."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/source/user_guide/backends.rst` around lines 156 - 165, The table entry for gemini_cli incorrectly marks Code Execution as supported (⭐); update the backends.rst table row for gemini_cli to remove or replace the Code Execution star so it reflects native shell/filesystem behavior described elsewhere (i.e., do not advertise provider-side enable_code_execution for gemini_cli); ensure the row now matches the documentation narrative and any other mentions of gemini_cli and enable_code_execution across the docs.docs/source/reference/yaml_schema.rst-793-797 (1)
793-797:⚠️ Potential issue | 🟡 MinorThis row shifted the backend note into the
Requiredcolumn.Line 795 now reads like
command_line_docker_network_modeis “required: Codex/Gemini CLI”, while Line 796 still says “All with MCP support”. Please keep theRequiredcell asNoand move the backend-specific constraint into the Supported Backends/Description column.As per coding guidelines, "Ensure proper RST syntax."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/source/reference/yaml_schema.rst` around lines 793 - 797, The table row for `command_line_docker_network_mode` incorrectly put the backend-specific note into the "Required" column; change the "Required" cell back to "No" and move the backend constraint ("Codex, Gemini CLI (Docker mode)") and the note about Docker network mode into the Supported Backends/Description column so the row reads: name `command_line_docker_network_mode`, type `string`, supported backends `Codex, Gemini CLI (Docker mode)` plus the description "Docker network mode: 'bridge', 'host', 'none' (use 'bridge'). All with MCP support", and ensure the RST table cell separators/inline code markup (``command_line_docker_network_mode`` and quoted modes) follow proper RST syntax.docs/source/user_guide/backends.rst-64-66 (1)
64-66:⚠️ Potential issue | 🟡 MinorRemove the duplicated
codexbackend row.
codexis already listed earlier in this table, so this second entry makes the backend catalog internally inconsistent and likely displaced the backend that was meant to be added here.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/source/user_guide/backends.rst` around lines 64 - 66, Remove the duplicated backend row that contains "* - ``codex``" in the table: delete that second "codex" entry so the table remains consistent, ensure the subsequent rows (the OpenAI (CLI) / GPT-5.4, GPT-5.3-Codex, GPT-5.2-Codex, GPT-5.1-Codex, GPT-4.1 lines) are correctly aligned under the intended columns, and verify no other backend was inadvertently displaced by this duplicate removal.massgen/tests/integration/test_copilot_hook_orchestrator.py-15-24 (1)
15-24:⚠️ Potential issue | 🟡 MinorMark this module as an integration test.
Because this file lives under
massgen/tests/integration/but none of the tests are taggedintegration, it won't participate correctly in the repo's opt-in integration test flow.As per coding guidelines, `{massgen/tests/**/*.py,webui/src/**/*.test.ts}`: Use `@pytest.mark.integration` (opt-in: `--run-integration`), `@pytest.mark.live_api` (opt-in: `--run-live-api`), `@pytest.mark.expensive`, `@pytest.mark.docker`, and `@pytest.mark.asyncio` markers when writing tests.Suggested change
import pytest +pytestmark = pytest.mark.integration + from massgen.mcp_tools.hooks import (🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@massgen/tests/integration/test_copilot_hook_orchestrator.py` around lines 15 - 24, This integration test module isn't tagged for the opt-in integration test run; add a module-level marker by defining pytestmark = pytest.mark.integration (since pytest is already imported) at the top of the file so pytest treats all tests in this module as integration tests; modify or add the pytestmark variable near the existing imports and leave individual test functions (e.g., any tests using GeneralHookManager, CopilotNativeHookAdapter) unchanged.scripts/test_hook_backends.py-152-155 (1)
152-155:⚠️ Potential issue | 🟡 MinorPreserve exception context using chaining.
The ImportError should be chained to the new exception for better debuggability. This aligns with the codebase convention, which uses exception chaining throughout (e.g., in
massgen/__init__.py,massgen/cli.py,massgen/orchestrator.py, and others).Suggested change
- except ImportError: - raise ValueError("massgen.backend.copilot not available") + except ImportError as err: + raise ValueError("massgen.backend.copilot not available") from err🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/test_hook_backends.py` around lines 152 - 155, The import failure for COPILOT_SDK_AVAILABLE and CopilotBackend is re-raised as a ValueError without preserving the original ImportError; modify the except block to capture the ImportError (e.g., except ImportError as e) and re-raise the ValueError using exception chaining (raise ValueError("massgen.backend.copilot not available") from e) so the original ImportError context is preserved for debugging.massgen/orchestrator.py-9701-9703 (1)
9701-9703:⚠️ Potential issue | 🟡 MinorUse Google-style docstring for the new helper method.
_coerce_answer_content_to_textis a new changed function but currently has a one-line docstring. Please convert to Google-style withArgs:andReturns:.As per coding guidelines,
**/*.py: “For new or changed functions, include Google-style docstrings”.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@massgen/orchestrator.py` around lines 9701 - 9703, The docstring for the helper method _coerce_answer_content_to_text should be expanded to a Google-style docstring: add a short summary line, an Args: section describing the content parameter (type Any and what inputs are expected/handled), and a Returns: section specifying it returns a str (normalized plain-text answer). Keep description concise and place it immediately under the def for _coerce_answer_content_to_text.massgen/filesystem_manager/_docker_manager.py-115-115 (1)
115-115:⚠️ Potential issue | 🟡 MinorPlease document
gemini_configin the constructor credential-mount list.The feature is implemented, but the
credentials.mountoption list in the constructor docstring still omits it.📝 Suggested docstring update
credentials: Credential management configuration: mount: List of credential types to mount - ["ssh_keys", "git_config", "gh_config", "npm_config", "pypi_config", "claude_config", "codex_config"] + ["ssh_keys", "git_config", "gh_config", "npm_config", "pypi_config", "claude_config", "codex_config", "gemini_config"]Also applies to: 384-393
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@massgen/filesystem_manager/_docker_manager.py` at line 115, Update the constructor docstring for the class that sets self.mount_gemini_config to document the new credentials.mount option "gemini_config": locate the constructor docstring where the credentials.mount list/options are documented and add "gemini_config" with a brief description (e.g., mounts Gemini configuration files into the container), and mirror this update in the other constructor/docstring block referenced around the alternative constructor or overload (the region corresponding to lines ~384-393) so both places list "gemini_config" alongside the existing mount options; ensure the term matches the code symbol mount_gemini_config and that the description is concise and consistent.massgen/mcp_tools/native_hook_adapters/gemini_cli_hook_script.py-231-252 (1)
231-252:⚠️ Potential issue | 🟡 MinorRemove duplicate
"directory"key in path extraction tuple.The key
"directory"appears twice in the tuple (lines 236 and 246), making the second occurrence dead code.🐛 Proposed fix
for key in ( "file_path", "absolute_path", "path", "dir_path", "directory", "filename", "file", "notebook_path", "target", "source", "source_path", "destination", "destination_path", "destination_base_path", - "directory", "dir", ):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@massgen/mcp_tools/native_hook_adapters/gemini_cli_hook_script.py` around lines 231 - 252, The tuple of candidate keys used to extract a path from tool_args contains a duplicate "directory" entry; update the key sequence in the loop that checks tool_args.get(key) (the tuple containing "file_path", "absolute_path", ..., "directory", "dir") to remove the duplicated "directory" so each lookup key is unique, leaving only one "directory" in the list used by the for key in (...) loop and keeping the subsequent logic (isinstance(value, str) and value) unchanged.massgen/tests/test_gemini_cli_backend.py-618-658 (1)
618-658:⚠️ Potential issue | 🟡 MinorAdd
@pytest.mark.skipiffor Windows-specific test causing pipeline failure.The test
test_windows_env_keeps_path_when_node_already_resolvedpatchesos.nameto"nt"but doesn't skip on non-Windows systems. Unliketest_windows_cmd_wrapper_rewrites_to_direct_node_launch(line 521), this test lacks the skipif decorator, causingNotImplementedError: cannot instantiate 'WindowsPath'on Linux/macOS CI.🐛 Proposed fix
+ `@pytest.mark.skipif`( + sys.platform != "win32", + reason="WindowsPath cannot be instantiated on non-Windows", + ) def test_windows_env_keeps_path_when_node_already_resolved(self, backend): """Do not prepend node path when node is already resolvable.""" backend.api_key = None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@massgen/tests/test_gemini_cli_backend.py` around lines 618 - 658, Add a pytest.skipif decorator to the test_windows_env_keeps_path_when_node_already_resolved test so it only runs on Windows (sys.platform == "win32"), mirroring the existing skip on test_windows_env_adds_node_path_when_missing; update the test function declaration for test_windows_env_keeps_path_when_node_already_resolved and ensure it uses the same condition and reason text, leaving the patches that set os.name to "nt" and the call to backend._build_subprocess_env unchanged.
🧹 Nitpick comments (18)
massgen/tests/test_config_validator.py (1)
1759-1793: Add the same Docker-network test pair forgemini_clifor parity.
config_validator.pyadded identical Docker-mode requirements forgemini_cli, but this test block only coverscopilot.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@massgen/tests/test_config_validator.py` around lines 1759 - 1793, Add a parallel test pair for the gemini_cli backend mirroring test_copilot_docker_requires_network_mode and test_copilot_docker_with_network_mode_passes: create two tests (e.g., test_gemini_cli_docker_requires_network_mode and test_gemini_cli_docker_with_network_mode_passes) that build configs with agent.backend.type set to "gemini_cli", command_line_execution_mode "docker", and verify ConfigValidator().validate_config(config) returns an invalid result when command_line_docker_network_mode is missing and no such network_mode error when command_line_docker_network_mode is provided; use the same error-message predicate as the copilot tests (checking "gemini_cli" or "gemini" as appropriate along with "docker" and "network_mode") against result.errors.massgen/config_validator.py (1)
606-639: Consider consolidating duplicate Docker-network validation blocks.
codex,copilot, andgemini_clinow share the same requirement. A single set-based check would reduce drift risk.♻️ Small DRY refactor
- # Validate Codex Docker mode requirements - if backend_type == "codex": - execution_mode = backend_config.get("command_line_execution_mode") - if execution_mode == "docker": - # command_line_docker_network_mode is required for Codex in Docker mode - if "command_line_docker_network_mode" not in backend_config: - result.add_error( - "Codex backend in Docker mode requires 'command_line_docker_network_mode'", - f"{location}.command_line_docker_network_mode", - "Add 'command_line_docker_network_mode: bridge' (required for Codex Docker execution)", - ) - - # Validate Copilot Docker mode requirements - if backend_type == "copilot": - execution_mode = backend_config.get("command_line_execution_mode") - if execution_mode == "docker": - if "command_line_docker_network_mode" not in backend_config: - result.add_error( - "Copilot backend in Docker mode requires 'command_line_docker_network_mode'", - f"{location}.command_line_docker_network_mode", - "Add 'command_line_docker_network_mode: bridge' (required for Copilot Docker execution)", - ) - - # Validate Gemini CLI Docker mode requirements - if backend_type == "gemini_cli": - execution_mode = backend_config.get("command_line_execution_mode") - if execution_mode == "docker": - if "command_line_docker_network_mode" not in backend_config: - result.add_error( - "Gemini CLI backend in Docker mode requires 'command_line_docker_network_mode'", - f"{location}.command_line_docker_network_mode", - "Add 'command_line_docker_network_mode: bridge' (required for Gemini CLI Docker execution)", - ) + docker_network_required_backends = {"codex", "copilot", "gemini_cli"} + if backend_type in docker_network_required_backends: + if backend_config.get("command_line_execution_mode") == "docker": + if "command_line_docker_network_mode" not in backend_config: + label = backend_type.replace("_", " ").title().replace("Cli", "CLI") + result.add_error( + f"{label} backend in Docker mode requires 'command_line_docker_network_mode'", + f"{location}.command_line_docker_network_mode", + f"Add 'command_line_docker_network_mode: bridge' (required for {label} Docker execution)", + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@massgen/config_validator.py` around lines 606 - 639, The three nearly identical Docker-network validation blocks for backend_type == "codex"/"copilot"/"gemini_cli" should be consolidated: replace them with one check that tests if backend_type is in the set {"codex", "copilot", "gemini_cli"}, reads execution_mode = backend_config.get("command_line_execution_mode"), and if execution_mode == "docker" ensures "command_line_docker_network_mode" exists on backend_config and calls result.add_error with the same location and hint; use backend_type in the error text or a single generic message to preserve context and keep references to backend_type, backend_config, execution_mode, result.add_error, and location to find and update the code.massgen/tests/integration/test_copilot_hook_orchestrator.py (1)
85-112: Avoid bypassingCopilotBackend.__init__in the integration helper.Constructing the backend with
__new__and hand-populating private fields can let init-time wiring regressions slip through. A smoke test that patches the SDK boundary and calls the real constructor would cover adapter setup more realistically.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@massgen/tests/integration/test_copilot_hook_orchestrator.py` around lines 85 - 112, Replace the __new__ bypass in _make_copilot_backend with a real construction path that calls CopilotBackend.__init__ under the patched copilot SDK: inside the patch.dict block import CopilotBackend and invoke its constructor (e.g., CopilotBackend(...) with a minimal/mock config or required args) instead of using CopilotBackend.__new__ and manually setting private attributes; ensure you still patch or inject a mock client/filesystem_manager as needed so the real __init__ runs and the call to backend._init_native_hook_adapter (and other init wiring) happens naturally.massgen/tests/test_docker_skills_write_access.py (1)
55-55: Add a dedicated positive test formount_gemini_config.Good fixture parity change. Consider adding a
test_build_credential_mounts_supports_gemini_configcase (mirroring Claude/Codex tests) to lock in the new behavior.➕ Suggested test addition
+def test_build_credential_mounts_supports_gemini_config(docker_manager, tmp_path, monkeypatch): + """gemini_config mount should map ~/.gemini into /home/massgen/.gemini.""" + fake_home = tmp_path / "home" + gemini_dir = fake_home / ".gemini" + gemini_dir.mkdir(parents=True) + docker_manager.mount_gemini_config = True + + monkeypatch.setattr("massgen.filesystem_manager._docker_manager.Path.home", lambda: fake_home) + + mounts = docker_manager._build_credential_mounts() + host_path = str(gemini_dir.resolve()) + + assert host_path in mounts + assert mounts[host_path]["bind"] == "/home/massgen/.gemini" + assert mounts[host_path]["mode"] == "ro"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@massgen/tests/test_docker_skills_write_access.py` at line 55, Add a positive unit test named test_build_credential_mounts_supports_gemini_config that mirrors the Claude/Codex tests: instantiate or use the existing manager fixture, set manager.mount_gemini_config = True, call the method that builds credential mounts (e.g., manager.build_credential_mounts or the equivalent function used elsewhere in the file), and assert that the returned mounts include the Gemini config mount (validate by presence of the expected mount entry or path). Ensure the test uses the same setup/teardown patterns as the other tests in this file and references manager.mount_gemini_config and the build_credential_mounts call so the behavior is locked in.massgen/tests/integration/test_orchestrator_stream_enforcement.py (1)
44-45: Add@pytest.mark.integrationmarker.This integration test is missing the
@pytest.mark.integrationmarker. Per coding guidelines, integration tests should use this marker for opt-in execution with--run-integration.💡 Suggested fix
+@pytest.mark.integration `@pytest.mark.asyncio` async def test_stream_agent_execution_coerces_structured_new_answer_content(mock_orchestrator):As per coding guidelines: "Use
@pytest.mark.integration(opt-in:--run-integration)... markers when writing tests".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@massgen/tests/integration/test_orchestrator_stream_enforcement.py` around lines 44 - 45, The test function test_stream_agent_execution_coerces_structured_new_answer_content is missing the integration marker; add the `@pytest.mark.integration` decorator above its existing `@pytest.mark.asyncio` so the test is opt-in for --run-integration (ensure the decorator is imported/available in the test module and placed directly above the test definition).massgen/tests/test_cli_cwd_context_flag.py (1)
82-89: Consider using a non-/tmp path to avoid static analysis warnings.Ruff S108 flags
/tmp/otheras potentially insecure temp usage. While this is a false positive in the test context (it's just a string literal), using a different sentinel path would silence the linter.💡 Suggested fix
def test_apply_cli_cwd_context_path_noop_when_not_set(tmp_path, monkeypatch): """Should be a no-op when cwd context mode is not provided.""" monkeypatch.chdir(tmp_path) - config = {"orchestrator": {"context_paths": [{"path": "/tmp/other", "permission": "read"}]}} + config = {"orchestrator": {"context_paths": [{"path": "/some/other/path", "permission": "read"}]}} apply_cli_cwd_context_path(config, None) - assert config["orchestrator"]["context_paths"] == [{"path": "/tmp/other", "permission": "read"}] + assert config["orchestrator"]["context_paths"] == [{"path": "/some/other/path", "permission": "read"}]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@massgen/tests/test_cli_cwd_context_flag.py` around lines 82 - 89, The test test_apply_cli_cwd_context_path_noop_when_not_set uses the literal "/tmp/other" which triggers Ruff S108; update the sentinel path to a non-/tmp value (e.g. "/var/other" or "/nonexistent/other" or "SENTINEL_PATH") so the test still asserts identity but avoids the linter warning; modify the literal in the test where apply_cli_cwd_context_path is called and in the final assert to the new sentinel string.massgen/tests/integration/test_gemini_cli_hook_orchestrator.py (1)
1-27: Consider adding@pytest.mark.integrationmarker to this module.Per coding guidelines, integration tests should use the
@pytest.mark.integrationmarker for opt-in execution. Since this file tests multi-component orchestration flows, consider adding the marker to the relevant test classes or as a module-levelpytestmark.💡 Suggested addition at module level
"""Integration tests for Gemini CLI hook adapter with orchestrator hook manager. These are deterministic, non-API tests that verify the hook adapter correctly builds settings.json config and integrates with MassGen's GeneralHookManager. """ from __future__ import annotations import json from pathlib import Path from unittest.mock import patch +import pytest + from massgen.mcp_tools.hooks import ( GeneralHookManager, HookResult, HookType, PatternHook, ) from massgen.mcp_tools.native_hook_adapters import GeminiCLINativeHookAdapter + +pytestmark = pytest.mark.integrationAs per coding guidelines: "Use
@pytest.mark.integration(opt-in:--run-integration)... markers when writing tests" and learning "Place integration tests inmassgen/tests/integration/for multi-component orchestration flows".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@massgen/tests/integration/test_gemini_cli_hook_orchestrator.py` around lines 1 - 27, Add the pytest integration marker at the module level so all tests in this file are opt-in; define a module-level pytestmark = pytest.mark.integration (or apply `@pytest.mark.integration` to the test classes) and import pytest if needed so the marker is recognized; update the top of the file near the imports and ensure the marker applies to _AllowHook and any test classes that follow.massgen/configs/providers/gemini/gemini_cli_local.yaml (1)
1-9: Consider adding a "What happens" comment and using a cost-effective model.Per coding guidelines, YAML configs should include "What happens" comments explaining the execution flow. Also, the guidelines suggest preferring cost-effective models like
gemini-2.5-flashover preview models.💡 Suggested improvement
`#uv` run massgen --config "massgen/configs/providers/gemini/gemini_cli_local.yaml" "whats the deep learning about?" +# What happens: Single Gemini CLI agent executes locally, streaming responses via the gemini CLI subprocess. agents: - id: agent_a backend: type: gemini_cli - model: gemini-3-flash-preview + model: gemini-2.5-flash ui: display_type: textual_terminal logging_enabled: trueAs per coding guidelines: "Include 'What happens' comments explaining execution flow" and "Prefer cost-effective models (gpt-5-nano, gpt-5-mini, gemini-2.5-flash)".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@massgen/configs/providers/gemini/gemini_cli_local.yaml` around lines 1 - 9, Add a top-level "What happens" comment describing the execution flow for this YAML (how the agents list is used, backend.type=gemini_cli invoking the local CLI adapter, and ui.display_type controlling output) and replace the preview model name in the agents[0].backend.model field (currently "gemini-3-flash-preview") with a cost-effective alternative such as "gemini-2.5-flash" to follow the cost-preference guideline; ensure the comment appears before the agents key and clearly states the single-agent flow and UI logging behavior.massgen/tests/test_gemini_cli_live.py (1)
180-219: Consider removing@pytest.mark.live_apifrom this test.This test validates settings.json writing without actually calling the Gemini API—it only exercises local config persistence. The
live_apimarker may cause this test to be skipped unnecessarily in CI environments without API credentials.Proposed change
`@pytest.mark.integration` -@pytest.mark.live_api `@pytest.mark.asyncio` async def test_settings_json_written_with_hooks_config(): """Verify settings.json is written correctly when hooks config is set.""" - _skip_if_no_credentials() _skip_if_no_gemini_cli()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@massgen/tests/test_gemini_cli_live.py` around lines 180 - 219, The test test_settings_json_written_with_hooks_config is marked with `@pytest.mark.live_api` but doesn't call the Gemini API; remove the `@pytest.mark.live_api` decorator to avoid unnecessary skipping in CI, keeping `@pytest.mark.integration` and `@pytest.mark.asyncio`; locate the test function (test_settings_json_written_with_hooks_config) in massgen/tests/test_gemini_cli_live.py and delete the `@pytest.mark.live_api` line so the test still uses GeminiCLIBackend, backend._massgen_hooks_config setup, backend._write_workspace_config, and settings.json assertions unchanged.massgen/tests/test_copilot_live.py (2)
183-211: Log the caught exception for debugging instead of silently passing.The try-except-pass pattern at lines 203-205 is intentional to verify non-hanging behavior, but logging the exception would help diagnose issues when this test fails unexpectedly.
Proposed enhancement
try: async for chunk in backend.stream_with_tools(messages, []): chunks.append(chunk) if chunk.type in ("done", "error"): break except Exception: - # Exception is acceptable — we just don't want a hang - pass + # Exception is acceptable — we just don't want a hang + import logging + logging.debug("Expected exception during unauthenticated access test")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@massgen/tests/test_copilot_live.py` around lines 183 - 211, In test_error_handling_unauthenticated replace the silent except-pass with logging of the caught exception so failures are diagnosable: inside the except block that wraps the async for over backend.stream_with_tools(messages, []) (in test_error_handling_unauthenticated) call logging.exception or pytest.fail with the exception info instead of just pass, ensuring the exception from CopilotBackend.stream_with_tools is recorded for debugging while preserving the test's intent of not hanging.
30-41: Consider logging the exception before skipping.The blind
Exceptioncatch at line 35 is acceptable for test skip logic, but logging the error details would aid debugging when authentication checks fail unexpectedly.Proposed enhancement
async def _check_copilot_auth(backend) -> None: """Skip test if Copilot is not authenticated.""" try: await backend._ensure_started() is_auth, msg = await backend._query_auth_status(context="live_test") except Exception as e: + import logging + logging.debug(f"Copilot auth check failed: {e}") pytest.skip(f"Copilot SDK not available or auth check failed: {e}")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@massgen/tests/test_copilot_live.py` around lines 30 - 41, The test helper _check_copilot_auth swallows exceptions when calling backend._ensure_started and backend._query_auth_status and directly calls pytest.skip; add a log of the caught exception before skipping so failures are visible. Update the except block in _check_copilot_auth to record the exception (e.g., logging.exception or logger.exception) including context like "Copilot SDK not available or auth check failed", ensure the logging module is imported or an existing test logger is used, then call pytest.skip with the same message; reference backend._ensure_started, backend._query_auth_status, and pytest.skip to find the try/except to modify.massgen/tests/test_vote_only_mode.py (1)
220-241: Mutable class attribute should use a factory or immutable default.The
config = {}at line 224 is a mutable class attribute. If any test inadvertently modifies it, the change would persist across instances, potentially causing flaky tests.Proposed fix
class _FakeBackendForToolFiltering: """Minimal backend stub for tool-name extraction.""" filesystem_manager = None - config = {} + config: dict = None # type: ignore[assignment] + + def __init__(self): + self.config = {} `@staticmethod` def extract_tool_name(tool_call):Alternatively, if instance creation is not desired in the test context:
class _FakeBackendForToolFiltering: """Minimal backend stub for tool-name extraction.""" filesystem_manager = None + + `@property` + def config(self): + return {}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@massgen/tests/test_vote_only_mode.py` around lines 220 - 241, The class _FakeBackendForToolFiltering defines a mutable class attribute config = {} which can leak mutations across tests; change it to an immutable default or initialize per-instance instead—either set config = None (or an immutable empty mapping) at class level and create a fresh dict in an __init__ (or use a dataclass/constructor with a default_factory) so each _FakeBackendForToolFiltering instance gets its own config and tests cannot mutate a shared class dict.massgen/backend/copilot.py (2)
576-612: Async-to-sync bridging pattern has potential edge cases.The pattern of calling
asyncio.run()from within a ThreadPoolExecutor when already in an event loop is functional but complex. Consider documenting the threading implications or extracting this into a utility function for reuse.The current implementation is correct, but the nested event loop creation could have subtle issues in certain environments (e.g., nested async frameworks).
Consider extracting to a utility
def _run_async_in_sync_context(coro): """Run an async coroutine from a sync context, handling event loop presence.""" import asyncio import concurrent.futures try: loop = asyncio.get_running_loop() except RuntimeError: loop = None if loop and loop.is_running(): with concurrent.futures.ThreadPoolExecutor(max_workers=1) as pool: return pool.submit(asyncio.run, coro).result() return asyncio.run(coro)This could be placed in a utilities module for reuse across backends that need similar sync/async bridging.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@massgen/backend/copilot.py` around lines 576 - 612, Extract the async-to-sync bridging logic into a reusable helper (e.g., _run_async_in_sync_context) and replace the inline ThreadPoolExecutor/asyncio.run block inside _ppm_permission_callback's "shell" branch; the helper should accept a coroutine (the call to ppm.pre_tool_use_hook("Bash", {"command": cmd})) and return its result, handling the presence of a running event loop by using a ThreadPoolExecutor to call asyncio.run when required, otherwise calling asyncio.run directly, then use that helper to get (allowed, _reason) and proceed as before.
1003-1009: Accessingppm.managed_pathsassumes specific PPM structure.The code accesses
ppm.managed_pathsviagetattrwith a fallback to empty list, but then accesses.pathand.permissionattributes on each item. If the PPM structure changes, this could fail silently.Consider adding explicit type check or try-except
if ppm: - writable_paths = sorted(str(mp.path) for mp in getattr(ppm, "managed_paths", []) if getattr(mp, "permission", None) == Permission.WRITE) + try: + writable_paths = sorted( + str(mp.path) + for mp in getattr(ppm, "managed_paths", []) + if getattr(mp, "permission", None) == Permission.WRITE + ) + except (AttributeError, TypeError): + logger.debug("[Copilot] Could not extract writable paths from PPM") + writable_paths = []🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@massgen/backend/copilot.py` around lines 1003 - 1009, The access of ppm.managed_paths in the writable_paths computation assumes each managed path object has .path and .permission; change the logic in the block that reads filesystem_manager.path_permission_manager.managed_paths so it validates items before using their attributes (e.g., ensure ppm and managed_paths exist, iterate safely and for each item check hasattr(item, "path") and hasattr(item, "permission") or isinstance checks, and only include str(item.path) when item.permission == Permission.WRITE); alternatively wrap the extraction in a try/except that logs or skips invalid entries to avoid silent failures while preserving the sorted writable_paths assignment.massgen/cli.py (1)
2447-2452: GeminiCLIBackend already performs CLI validation in__init__; factory-level preflight is optional for consistency.
GeminiCLIBackend.__init__(line 59 ofmassgen/backend/gemini_cli.py) already includes a fail-fast preflight via_find_gemini_cli(). If thegeminiCLI is missing, it raisesRuntimeErrorwith helpful installation instructions. However, for consistency with the factory pattern used inclaude_code(which validates at the factory level) and to catch missing dependencies earlier, consider adding a factory-level check here as well. Note thatCodexBackendhas a similar backend-level preflight but no factory-level check, so if implementing this forgemini_cli, consider applying the same pattern tocodexfor parity.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@massgen/cli.py` around lines 2447 - 2452, Factory returns GeminiCLIBackend without preflight check; add a factory-level validation to mirror claude_code and surface missing CLI dependency earlier. Before returning GeminiCLIBackend in the factory branch for backend_type "gemini_cli" in massgen/cli.py, call GeminiCLIBackend._find_gemini_cli() (or wrap instantiation in a try/except catching RuntimeError from GeminiCLIBackend.__init__) to propagate the same RuntimeError with installation instructions; consider applying the same pattern to CodexBackend for parity with backend-level preflight checks.massgen/mcp_tools/native_hook_adapters/gemini_cli_hook_script.py (1)
170-182: Break long line for readability.This line chains many fallback keys and is difficult to read.
♻️ Proposed refactor
def _extract_tool_args(tool_event: dict) -> dict: """Support both legacy and current Gemini CLI hook event schemas.""" - raw_args = tool_event.get("tool_input") or tool_event.get("input") or tool_event.get("toolArgs") or tool_event.get("tool_args") or tool_event.get("arguments") or tool_event.get("parameters") or {} + raw_args = ( + tool_event.get("tool_input") + or tool_event.get("input") + or tool_event.get("toolArgs") + or tool_event.get("tool_args") + or tool_event.get("arguments") + or tool_event.get("parameters") + or {} + ) if isinstance(raw_args, dict): return raw_args🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@massgen/mcp_tools/native_hook_adapters/gemini_cli_hook_script.py` around lines 170 - 182, The long fallback chain in _extract_tool_args makes the raw_args assignment hard to read; refactor by replacing the single long expression with a clear sequence: define an ordered list of possible keys (e.g., ["tool_input","input","toolArgs","tool_args","arguments","parameters"]), then iterate over that list to set raw_args from tool_event.get(key) until a non-None value is found (default to {}), preserving the existing subsequent logic that handles dict and JSON-string cases for raw_args; keep the function name _extract_tool_args and the variable raw_args unchanged so callers remain intact.massgen/backend/gemini_cli.py (2)
1164-1164: Use!sconversion flag in f-string instead of explicitstr()calls.Per Ruff RUF010, f-strings should use conversion flags for clarity.
♻️ Proposed fix
- command = f"{sys.executable} {str(_HOOK_SCRIPT_PATH)} " f"--hook-dir {str(config_dir)} --event BeforeTool" + command = f"{sys.executable} {_HOOK_SCRIPT_PATH!s} --hook-dir {config_dir!s} --event BeforeTool"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@massgen/backend/gemini_cli.py` at line 1164, The f-string assigned to variable command uses explicit str() calls; update it to use the !s conversion flag for clarity per Ruff RUF010 by removing str(_HOOK_SCRIPT_PATH) and str(config_dir) and instead using {_HOOK_SCRIPT_PATH!s} and {config_dir!s} inside the f-string so the command construction in gemini_cli.py (variable name: command, symbols: _HOOK_SCRIPT_PATH, config_dir) uses conversion flags rather than explicit str() calls.
72-72: Extract API key resolution into separate lines for readability.This line is very long and chains multiple
oroperations, making it hard to read and maintain.♻️ Proposed refactor
- self.api_key = api_key or os.getenv("GEMINI_API_KEY") or os.getenv("GOOGLE_API_KEY") or configured_env.get("GEMINI_API_KEY") or configured_env.get("GOOGLE_API_KEY") + self.api_key = ( + api_key + or os.getenv("GEMINI_API_KEY") + or os.getenv("GOOGLE_API_KEY") + or configured_env.get("GEMINI_API_KEY") + or configured_env.get("GOOGLE_API_KEY") + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@massgen/backend/gemini_cli.py` at line 72, The long chained assignment to self.api_key (currently using multiple or operations with os.getenv and configured_env) should be split for readability: build an ordered list/tuple of candidate sources (e.g., env_gemini = os.getenv("GEMINI_API_KEY"), env_google = os.getenv("GOOGLE_API_KEY"), cfg_gemini = configured_env.get("GEMINI_API_KEY"), cfg_google = configured_env.get("GOOGLE_API_KEY"), and the api_key param) then iterate or use next(filter(None, candidates), None) to pick the first non-empty value and assign it to self.api_key in the class initializer (constructor) where the original expression lives; keep the same precedence (explicit api_key param first) and preserve behavior if none found.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 35bade69-a7c0-4246-b533-5fcee8e21689
📒 Files selected for processing (50)
.gitignoredocs/source/reference/yaml_schema.rstdocs/source/user_guide/backends.rstmassgen/backend/__init__.pymassgen/backend/capabilities.pymassgen/backend/copilot.pymassgen/backend/gemini_cli.pymassgen/cli.pymassgen/config_validator.pymassgen/configs/providers/gemini/gemini_cli_local.yamlmassgen/filesystem_manager/_constants.pymassgen/filesystem_manager/_docker_manager.pymassgen/filesystem_manager/_filesystem_manager.pymassgen/filesystem_manager/_path_permission_manager.pymassgen/mcp_tools/native_hook_adapters/__init__.pymassgen/mcp_tools/native_hook_adapters/copilot_adapter.pymassgen/mcp_tools/native_hook_adapters/gemini_cli_adapter.pymassgen/mcp_tools/native_hook_adapters/gemini_cli_hook_script.pymassgen/orchestrator.pymassgen/skills/backend-integrator/SKILL.mdmassgen/tests/backend/test_copilot.pymassgen/tests/backend/test_copilot_hooks.pymassgen/tests/backend/test_copilot_permission_callback.pymassgen/tests/backend/test_copilot_permission_integration.pymassgen/tests/integration/test_copilot_hook_orchestrator.pymassgen/tests/integration/test_gemini_cli_hook_orchestrator.pymassgen/tests/integration/test_orchestrator_stream_enforcement.pymassgen/tests/test_backend_capabilities.pymassgen/tests/test_cli_cwd_context_flag.pymassgen/tests/test_concurrent_logging_sessions.pymassgen/tests/test_config_validator.pymassgen/tests/test_copilot_live.pymassgen/tests/test_docker_skills_write_access.pymassgen/tests/test_gemini_cli_backend.pymassgen/tests/test_gemini_cli_hook_ipc.pymassgen/tests/test_gemini_cli_hook_script.pymassgen/tests/test_gemini_cli_live.pymassgen/tests/test_interactive_controller_commands.pymassgen/tests/test_native_tools_sandbox_script.pymassgen/tests/test_snapshot_preservation.pymassgen/tests/test_subagent_manager_paths.pymassgen/tests/test_task_plan_support.pymassgen/tests/test_tui_modes.pymassgen/tests/test_vote_only_mode.pymassgen/tests/unit/test_errors.pymassgen/tests/unit/test_orchestrator_unit.pymassgen/token_manager/token_manager.pypyproject.tomlscripts/test_hook_backends.pyscripts/test_native_tools_sandbox.py
| # Gemini CLI runtime artifacts (project-local, generated by MassGen) | ||
| .gemini/settings.json | ||
| .gemini/workflow_tool_specs.json | ||
| .gemini/custom_tool_specs.json | ||
| .gemini/GEMINI.md | ||
| .gemini/google_accounts.json | ||
| .gemini/oauth_creds.json | ||
| .gemini/.env |
There was a problem hiding this comment.
Ignore the full .gemini/ directory to prevent secret leakage.
Line-specific ignores are fragile; new credential/artifact filenames can still slip into git history. Prefer a directory-level ignore.
🔐 Proposed hardening
-# Gemini CLI runtime artifacts (project-local, generated by MassGen)
-.gemini/settings.json
-.gemini/workflow_tool_specs.json
-.gemini/custom_tool_specs.json
-.gemini/GEMINI.md
-.gemini/google_accounts.json
-.gemini/oauth_creds.json
-.gemini/.env
+# Gemini CLI runtime artifacts (project-local, generated by MassGen)
+.gemini/📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Gemini CLI runtime artifacts (project-local, generated by MassGen) | |
| .gemini/settings.json | |
| .gemini/workflow_tool_specs.json | |
| .gemini/custom_tool_specs.json | |
| .gemini/GEMINI.md | |
| .gemini/google_accounts.json | |
| .gemini/oauth_creds.json | |
| .gemini/.env | |
| # Gemini CLI runtime artifacts (project-local, generated by MassGen) | |
| .gemini/ |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.gitignore around lines 239 - 246, Replace the fragile per-file ignores with
a directory-level ignore by removing the explicit entries
(.gemini/settings.json, .gemini/workflow_tool_specs.json,
.gemini/custom_tool_specs.json, .gemini/GEMINI.md, .gemini/google_accounts.json,
.gemini/oauth_creds.json, .gemini/.env) and add a single .gemini/ rule in
.gitignore; after updating .gitignore, ensure any already-tracked files under
the .gemini directory are removed from the index (git rm --cached) so the new
.gemini/ rule takes effect.
| GitHub Copilot Backend | ||
| ~~~~~~~~~~~~~~~~~~~~~~ | ||
|
|
||
| The GitHub Copilot backend uses the ``github-copilot-sdk`` with native MCP support. | ||
| Requires a GitHub Copilot subscription and the Copilot CLI (``gh copilot``). | ||
|
|
||
| **Basic Configuration:** | ||
|
|
||
| .. code-block:: yaml | ||
|
|
||
| backend: | ||
| type: "copilot" | ||
| model: "gpt-5-mini" # Also: gpt-4.1, claude-sonnet-4, gemini-2.5-pro | ||
|
|
||
| **With MCP Servers and Custom Tools:** | ||
|
|
||
| .. code-block:: yaml | ||
|
|
||
| backend: | ||
| type: "copilot" | ||
| model: "gpt-5-mini" | ||
| mcp_servers: | ||
| - name: "filesystem" | ||
| command: "npx" | ||
| args: ["-y", "@modelcontextprotocol/server-filesystem", "."] | ||
| custom_tools: | ||
| - path: "massgen/tool/_basic" | ||
| function: "two_num_tool" | ||
|
|
||
| **Configuration Options:** | ||
|
|
||
| - ``copilot_system_message_mode`` (string: "append"|"replace", default: "append"): | ||
| How the system message is applied to the Copilot session. | ||
| - ``copilot_permission_policy`` (string: "approve"|"deny", default: "approve"): | ||
| Permission callback policy. "approve" validates paths via PathPermissionManager. | ||
| - ``allowed_tools`` / ``exclude_tools``: Backend-level tool filtering. | ||
| - ``enable_multimodal_tools`` (bool): Enable read_media/generate_media tools. | ||
|
|
||
| **Docker Mode:** | ||
|
|
||
| .. code-block:: yaml | ||
|
|
||
| backend: | ||
| type: "copilot" | ||
| model: "gpt-5-mini" | ||
| command_line_execution_mode: "docker" | ||
| command_line_docker_network_mode: "bridge" | ||
|
|
There was a problem hiding this comment.
The schema reference still doesn't cover the new backend keys.
This page introduces Copilot-specific options and adds gemini_cli as a valid backend type, but the backend parameter reference below never adds entries for those Copilot fields and still omits Gemini CLI support from rows users will check first, like cwd and enable_web_search. As written, yaml_schema.rst is no longer the authoritative reference for the new configuration surface.
As per coding guidelines, "For new YAML parameters, update docs/source/reference/yaml_schema.rst."
Also applies to: 687-687
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/source/reference/yaml_schema.rst` around lines 278 - 325, The YAML
schema reference is missing the new Copilot-specific backend keys and the new
backend type "gemini_cli"; update docs/source/reference/yaml_schema.rst to add
authoritative entries for the Copilot options (add keys/copilot fields:
copilot_system_message_mode, copilot_permission_policy, mcp_servers,
custom_tools, allowed_tools, exclude_tools, enable_multimodal_tools,
command_line_execution_mode, command_line_docker_network_mode) and ensure
backend parameter rows (e.g., cwd, enable_web_search) list "copilot" and
"gemini_cli" where applicable; follow existing schema table/row format so these
new fields appear in the backend parameter reference and include short
descriptions and types consistent with the surrounding entries.
| "gemini_cli": BackendCapabilities( | ||
| backend_type="gemini_cli", | ||
| provider_name="Gemini CLI", | ||
| supported_capabilities={ | ||
| "bash", | ||
| "mcp", | ||
| "filesystem_native", | ||
| "web_search", | ||
| }, | ||
| builtin_tools=[ | ||
| "run_shell_command", | ||
| "read_file", | ||
| "write_file", | ||
| "replace", | ||
| "glob", | ||
| "grep_search", | ||
| "list_directory", | ||
| "read_many_files", | ||
| "google_web_search", | ||
| "web_fetch", | ||
| ], | ||
| filesystem_support="native", | ||
| models=[ | ||
| "gemini-3.1-pro-preview", | ||
| "gemini-3-flash-preview", | ||
| "gemini-2.5-pro", | ||
| "gemini-2.5-flash", | ||
| "gemini-2.5-flash-lite", | ||
| ], | ||
| default_model="gemini-3.1-pro-preview", | ||
| env_var="GOOGLE_API_KEY", | ||
| model_release_dates={ | ||
| "gemini-3.1-pro-preview": "2026-02", | ||
| "gemini-3-flash-preview": "2025-12", | ||
| "gemini-2.5-pro": "2025-06", | ||
| "gemini-2.5-flash": "2025-06", | ||
| "gemini-2.5-flash-lite": "2025-06", | ||
| }, | ||
| notes=( | ||
| "Google Gemini CLI subprocess wrapper. Auth: CLI login (gemini) first, " | ||
| "then GOOGLE_API_KEY or GEMINI_API_KEY. Native filesystem via CLI. " | ||
| "Requires: npm install -g @google/gemini-cli. " | ||
| "Docker mode requires command_line_docker_network_mode." | ||
| ), | ||
| ), |
There was a problem hiding this comment.
Also classify gemini_cli as an agent-framework backend.
The new registry entry makes gemini_cli a native CLI integration, but AGENT_FRAMEWORK_BACKENDS below still excludes it. is_agent_framework_backend("gemini_cli") therefore returns False, so any code branching on framework/native backends will miss the new backend.
🩹 Suggested follow-up outside this hunk
AGENT_FRAMEWORK_BACKENDS = frozenset(
{
"claude_code",
"codex",
"copilot",
+ "gemini_cli",
},
)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@massgen/backend/capabilities.py` around lines 339 - 383, The gemini_cli
BackendCapabilities entry is registered as a native CLI integration but not
included in the agent-framework list, so
is_agent_framework_backend("gemini_cli") returns False; update the
classification by adding "gemini_cli" to the AGENT_FRAMEWORK_BACKENDS set (or
adjust the is_agent_framework_backend logic) so that the backend named
"gemini_cli" (referenced in the BackendCapabilities block) is treated as an
agent-framework backend by functions checking
AGENT_FRAMEWORK_BACKENDS/is_agent_framework_backend.
|
|
||
| agent_id = kwargs.get("agent_id", self.config.get("agent_id") or "default") | ||
| model = kwargs.get("model") or self.config.get("model") or "gpt-4.1" | ||
| model = kwargs.get("model") or self.config.get("model") or "gpt-5-mini" |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
GitHub Copilot SDK gpt-5-mini model availability 2025
💡 Result:
- GPT‑5 mini in GitHub Copilot (2025): GitHub began rolling out GPT‑5 mini to GitHub Copilot in public preview on August 13, 2025, appearing in the model picker (VS Code, github.com Copilot Chat, and GitHub Mobile). [1]
- GA status: GitHub’s “Supported AI models in Copilot” documentation lists GPT‑5 mini as GA (generally available). [2]
- “Copilot SDK” timing vs 2025: The product GitHub calls “Copilot SDK” (language-specific SDKs for programmatic access to Copilot CLI) entered technical preview on January 14, 2026—so it was not available in 2025. [3]
- What was available in 2025 under an SDK name: GitHub released the Copilot Language Server SDK on February 10, 2025 (to integrate Copilot into other editors/IDEs), but that’s a different SDK than the 2026 “Copilot SDK.” [4]
Sources: [1]–[4] correspond to the cited links above.
🏁 Script executed:
# Check if the copilot.py file exists
fd copilot.py massgen/
# If it exists, examine the context around line 918
if [ -f "massgen/backend/copilot.py" ]; then
echo "=== File found, checking size ==="
wc -l massgen/backend/copilot.py
echo "=== Content around line 918 ==="
sed -n '910,925p' massgen/backend/copilot.py
fiRepository: massgen/MassGen
Length of output: 904
🏁 Script executed:
# Search for other model identifier patterns in backend files
rg "gpt-.*-mini|model.*=" massgen/backend/ -A 2 -B 2 --type py | head -50Repository: massgen/MassGen
Length of output: 3468
🏁 Script executed:
# Check capabilities.py to see if copilot backend is registered
if [ -f "massgen/backend/capabilities.py" ]; then
echo "=== Checking capabilities.py ==="
cat -n massgen/backend/capabilities.py | grep -i copilot -A 5 -B 5
fiRepository: massgen/MassGen
Length of output: 1619
Add gpt-5-mini to the supported models list in massgen/backend/capabilities.py.
While gpt-5-mini is a valid, GA model in GitHub Copilot, it is currently set as the default_model in capabilities.py (line 713) but is missing from the models list (line 710). Update the models list to include "gpt-5-mini" to ensure consistency between the default model and supported models, as required by the coding guideline: "If adding new model support → ensure massgen/backend/capabilities.py is updated."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@massgen/backend/copilot.py` at line 918, The default model "gpt-5-mini" used
in massgen/backend/copilot.py is not present in the supported models list;
update the models collection in massgen/backend/capabilities.py by adding
"gpt-5-mini" to the models list (the variable named models) so it matches the
default_model setting (variable default_model) and ensures the new model is
recognized by the capabilities checks.
| # Serialize tool args for MassGen hook interface | ||
| tool_args = input_data.get("toolArgs", {}) | ||
| arguments_str = json.dumps(tool_args) if tool_args else "{}" | ||
|
|
There was a problem hiding this comment.
Preserve already-serialized toolArgs.
toolArgs is declared as Any. If Copilot hands this wrapper a JSON string, json.dumps(tool_args) double-encodes it, so downstream hooks receive a quoted string instead of the original object payload and can fail closed on parsing.
🧩 Proposed fix
- tool_args = input_data.get("toolArgs", {})
- arguments_str = json.dumps(tool_args) if tool_args else "{}"
+ tool_args = input_data.get("toolArgs")
+ if tool_args in (None, ""):
+ arguments_str = "{}"
+ elif isinstance(tool_args, str):
+ arguments_str = tool_args
+ else:
+ arguments_str = json.dumps(tool_args)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@massgen/mcp_tools/native_hook_adapters/copilot_adapter.py` around lines 97 -
100, The current serialization always calls json.dumps on tool_args (in the
CopilotAdapter code where tool_args = input_data.get("toolArgs", {}) and
arguments_str is set), which double-encodes when tool_args is already a JSON
string; change the logic to detect an already-serialized JSON string and
preserve it as-is (e.g., if isinstance(tool_args, str) use tool_args for
arguments_str, otherwise json.dumps(tool_args)), while still defaulting to "{}"
for empty/None values so downstream hooks receive the original object payload
rather than a quoted, double-encoded string.
| def test_copilot_docker_with_network_mode_passes(self): | ||
| """Copilot in Docker mode with network_mode should pass.""" | ||
| config = { | ||
| "agent": { | ||
| "id": "test-agent", | ||
| "backend": { | ||
| "type": "copilot", | ||
| "model": "gpt-5-mini", | ||
| "command_line_execution_mode": "docker", | ||
| "command_line_docker_network_mode": "bridge", | ||
| }, | ||
| }, | ||
| } | ||
| validator = ConfigValidator() | ||
| result = validator.validate_config(config) | ||
| # Should not have the network_mode error | ||
| assert not any("copilot" in error.message.lower() and "docker" in error.message.lower() and "network_mode" in error.message.lower() for error in result.errors) | ||
|
|
There was a problem hiding this comment.
“Passes” test is too weak and can hide unrelated validation failures.
test_copilot_docker_with_network_mode_passes should assert overall validity, not just absence of one substring.
✅ Tighten the success assertion
validator = ConfigValidator()
result = validator.validate_config(config)
- # Should not have the network_mode error
+ assert result.is_valid()
+ # Should not have the network_mode error
assert not any("copilot" in error.message.lower() and "docker" in error.message.lower() and "network_mode" in error.message.lower() for error in result.errors)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@massgen/tests/test_config_validator.py` around lines 1776 - 1793, The test
test_copilot_docker_with_network_mode_passes is too weak—update it to assert
overall validity by calling ConfigValidator().validate_config(config) and
asserting result.valid (or that result.errors is empty) instead of only checking
that no error message contains specific substrings; locate the test function
name test_copilot_docker_with_network_mode_passes and the validate_config call
on ConfigValidator to change the final assertion to a strict check like assert
result.valid (or assert not result.errors) so unrelated validation failures will
fail the test.
| ("claude_code", "auto", "direct"), | ||
| ("codex", "auto", "direct"), | ||
| ("copilot", "auto", "orchestrator"), | ||
| ("gemini_cli", "auto", "orchestrator"), |
There was a problem hiding this comment.
CI blocker: gemini_cli auto-runner expectation is currently wrong.
The provided pipeline failure shows resolve_runner("gemini_cli", "auto") returns "direct", but this test expects "orchestrator" (Line 24). Please align either the resolver logic or this expected tuple before merge.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@massgen/tests/test_native_tools_sandbox_script.py` around lines 21 - 24, The
test tuple ("gemini_cli", "auto", "orchestrator") is out of sync with the
resolver behavior: resolve_runner("gemini_cli", "auto") returns "direct" in CI,
so update the test expectation to ("gemini_cli", "auto", "direct") (or if you
intend different behavior, change resolve_runner's logic for the
"gemini_cli"/"auto" case to return "orchestrator"); reference resolve_runner and
the test tuple when making the change so the expectation and implementation
match.
| class _FakeHost: | ||
| def __init__(self, active_tasks: list[dict[str, Any]] | None = None) -> None: | ||
| self._active_tasks = [dict(t) for t in active_tasks] if active_tasks else None | ||
| self.pinned_updates: list[dict[str, Any]] = [] | ||
| self.plan_updates: list[dict[str, Any]] = [] | ||
|
|
||
| def get_active_tasks(self) -> list[dict[str, Any]] | None: | ||
| if self._active_tasks is None: | ||
| return None | ||
| return [dict(t) for t in self._active_tasks] | ||
|
|
||
| def update_pinned_task_plan( | ||
| self, | ||
| *, | ||
| tasks: list[dict[str, Any]], | ||
| focused_task_id: str | None = None, | ||
| operation: str = "create", | ||
| show_notification: bool = True, | ||
| ) -> None: | ||
| self.pinned_updates.append( | ||
| { | ||
| "tasks": [dict(t) for t in tasks], | ||
| "focused_task_id": focused_task_id, | ||
| "operation": operation, | ||
| "show_notification": show_notification, | ||
| }, | ||
| ) | ||
| self._active_tasks = [dict(t) for t in tasks] | ||
|
|
||
| def update_task_plan( | ||
| self, | ||
| tasks: list[dict[str, Any]], | ||
| plan_id: str | None = None, | ||
| operation: str = "create", | ||
| ) -> None: | ||
| self.plan_updates.append( | ||
| { | ||
| "tasks": [dict(t) for t in tasks], | ||
| "plan_id": plan_id, | ||
| "operation": operation, | ||
| }, | ||
| ) | ||
| self._active_tasks = [dict(t) for t in tasks] | ||
|
|
||
|
|
||
| def _tool(tool_name: str, result_full: str, tool_id: str = "tool_1") -> SimpleNamespace: | ||
| return SimpleNamespace(tool_name=tool_name, result_full=result_full, tool_id=tool_id) | ||
|
|
||
|
|
||
| def test_update_task_plan_from_codex_wrapper_create_task_plan() -> None: | ||
| host = _FakeHost() | ||
| result = ( | ||
| "{'content': [{'type': 'text', 'text': " | ||
| '\'{"success":true,"operation":"create_task_plan","tasks":[{"id":"task_1","description":"Define scope","status":"pending","priority":"medium","dependencies":[],"metadata":{}}]}\'' | ||
| "}], 'structured_content': {'success': True, 'operation': 'create_task_plan', " | ||
| "'tasks': [{'id': 'task_1', 'description': 'Define scope', 'status': 'pending', " | ||
| "'priority': 'medium', 'dependencies': [], 'metadata': {}}]}}" | ||
| ) | ||
| tool_data = _tool("planning_agent_a/create_task_plan", result, tool_id="item_5") | ||
|
|
||
| handled = update_task_plan_from_tool(host, tool_data, timeline=None) | ||
|
|
||
| assert handled is True | ||
| assert host.pinned_updates, "Expected pinned task plan update" | ||
| assert host.plan_updates, "Expected host task-plan state update" | ||
| assert host.pinned_updates[-1]["operation"] == "create" | ||
| assert host.pinned_updates[-1]["tasks"][0]["id"] == "task_1" | ||
| assert host.plan_updates[-1]["plan_id"] == "item_5" | ||
|
|
||
|
|
||
| def test_update_task_plan_from_codex_wrapper_status_update_patches_cached_tasks() -> None: | ||
| host = _FakeHost( | ||
| active_tasks=[ | ||
| { | ||
| "id": "task_1", | ||
| "description": "Define scope", | ||
| "status": "pending", | ||
| "priority": "medium", | ||
| "dependencies": [], | ||
| "metadata": {}, | ||
| }, | ||
| ], | ||
| ) | ||
| result = ( | ||
| "{'content': [{'type': 'text', 'text': " | ||
| '\'{"success":true,"operation":"update_task_status","task":{"id":"task_1",' | ||
| '"description":"Define scope","status":"completed","priority":"medium",' | ||
| '"dependencies":[],"metadata":{"completion_notes":"done"}}}\'' | ||
| "}], 'structured_content': {'success': True, 'operation': 'update_task_status', " | ||
| "'task': {'id': 'task_1', 'description': 'Define scope', 'status': 'completed', " | ||
| "'priority': 'medium', 'dependencies': [], 'metadata': {'completion_notes': 'done'}}}}" | ||
| ) | ||
| tool_data = _tool("planning_agent_a/update_task_status", result, tool_id="item_9") | ||
|
|
||
| handled = update_task_plan_from_tool(host, tool_data, timeline=None) | ||
|
|
||
| assert handled is True | ||
| assert host.pinned_updates, "Expected pinned task plan update" | ||
| assert host.pinned_updates[-1]["operation"] == "update" | ||
| assert host.pinned_updates[-1]["focused_task_id"] == "task_1" | ||
| assert host.pinned_updates[-1]["tasks"][0]["status"] == "completed" | ||
| assert host.plan_updates[-1]["operation"] == "update" |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Add Google-style docstrings to new methods/functions in this module.
_FakeHost methods, _tool, and both test functions are new but undocumented.
As per coding guidelines: "**/*.py: For new or changed functions, include Google-style docstrings".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@massgen/tests/test_task_plan_support.py` around lines 12 - 113, The tests and
helper class lack Google-style docstrings: add concise Google-style docstrings
to class _FakeHost and its methods get_active_tasks, update_pinned_task_plan,
update_task_plan, the helper function _tool, and the two test functions
test_update_task_plan_from_codex_wrapper_create_task_plan and
test_update_task_plan_from_codex_wrapper_status_update_patches_cached_tasks;
each docstring should describe purpose, parameters (types), return value, and
any important behavior (e.g., that _FakeHost stores updates in
pinned_updates/plan_updates and that _tool returns a SimpleNamespace mimicking
tool data).
| if hasattr(backend, "set_general_hook_manager"): | ||
| backend.set_general_hook_manager(manager) | ||
|
|
There was a problem hiding this comment.
The Copilot/Gemini "unit" path can false-pass even if native hooks are broken.
For those backends, Test 2-4 skip set_general_hook_manager() and then call GeneralHookManager.execute_hooks(...) directly, so a broken CopilotNativeHookAdapter or Gemini CLI IPC path would still report PASS here. run_e2e_tests() already has backend-specific branches; run_tests() needs the same kind of native-path coverage or should skip these checks for native-hook backends.
Based on learnings, backend-specific tooling changes should not assume the shared hook path covers native backends; parity tests need to hit the native execution path.
Also applies to: 369-370, 457-458, 540-555, 567-588
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/test_hook_backends.py` around lines 303 - 305, The tests call
GeneralHookManager.execute_hooks directly for some backends causing false-passes
for native paths; update run_tests to mirror run_e2e_tests backend-specific
branching: detect native-hook backends (e.g., CopilotNativeHookAdapter, Gemini
CLI IPC) and either (A) call backend.set_general_hook_manager(manager) and
invoke the backend's native execute path so tests exercise the native
IPC/adapter code, or (B) explicitly skip the shared-path parity checks for those
backends and mark them to run native-path checks instead; modify the run_tests
branches at the locations around
set_general_hook_manager/GeneralHookManager.execute_hooks (also apply same
change at the other noted regions) so parity tests hit the native execution path
rather than only the shared GeneralHookManager.
| Run: | ||
| uv run python scripts/test_native_tools_sandbox.py [--backend TYPE] [--runner TYPE] [--llm-judge] | ||
|
|
||
| Options: | ||
| --backend Backend to test: claude_code (default) or codex | ||
| --backend Backend to test: claude_code, codex, copilot, or gemini_cli | ||
| --runner Runner to test: auto, direct, or orchestrator | ||
| auto uses orchestrator for copilot/gemini_cli and direct otherwise | ||
| --llm-judge Use LLM to analyze responses for subtle leakage (requires OPENAI_API_KEY) |
There was a problem hiding this comment.
Honor the documented auto runner behavior for Gemini CLI.
The header and CLI help say auto uses the orchestrator for both Copilot and Gemini CLI, but resolve_runner() only special-cases Copilot. Right now --backend gemini_cli --runner auto silently falls back to direct, so this script misses the intended path.
Suggested change
- if backend_type == "copilot":
+ if backend_type in {"copilot", "gemini_cli"}:
return "orchestrator"Also applies to: 91-99, 524-535
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/test_native_tools_sandbox.py` around lines 13 - 20, The CLI's
auto-runner logic ignores the documented behavior for Gemini CLI because
resolve_runner() only special-cases "copilot"; update resolve_runner() (and the
other places with the same logic around the reported ranges) so that when runner
== "auto" and backend is either "copilot" or "gemini_cli" it returns/chooses the
"orchestrator" runner instead of falling back to "direct"; find the conditional
that checks backend == "copilot" and extend it to include backend ==
"gemini_cli" (or check membership in a set like {"copilot","gemini_cli"}) so
auto maps to orchestrator for both backends.
PR Title Format
Your PR title must follow the format:
<type>: <brief description>Valid types:
fix:- Bug fixesfeat:- New featuresbreaking:- Breaking changesdocs:- Documentation updatesrefactor:- Code refactoringtest:- Test additions/modificationschore:- Maintenance tasksperf:- Performance improvementsstyle:- Code style changesci:- CI/CD configuration changesExamples:
fix: resolve memory leak in data processingfeat: add export to CSV functionalitybreaking: change API response formatdocs: update installation guideDescription
Brief description of the changes in this PR
Type of change
fix:) - Non-breaking change which fixes an issuefeat:) - Non-breaking change which adds functionalitybreaking:) - Fix or feature that would cause existing functionality to not work as expecteddocs:) - Documentation updatesrefactor:) - Code changes that neither fix a bug nor add a featuretest:) - Adding missing tests or correcting existing testschore:) - Maintenance tasks, dependency updates, etc.perf:) - Code changes that improve performancestyle:) - Changes that do not affect the meaning of the code (formatting, missing semi-colons, etc.)ci:) - Changes to CI/CD configuration files and scriptsChecklist
Pre-commit status
How to Test
Add test method for this PR.
Test CLI Command
Write down the test bash command. If there is pre-requests, please emphasize.
Expected Results
Description/screenshots of expected results.
Additional context
Add any other context about the PR here.
Summary by CodeRabbit
New Features
Improvements
Documentation
Tests