feat: Gemini cli clean by ncrispino · Pull Request #999 · massgen/MassGen

ncrispino · 2026-03-15T04:29:47Z

PR Title Format

Your PR title must follow the format: <type>: <brief description>

Valid types:

fix: - Bug fixes
feat: - New features
breaking: - Breaking changes
docs: - Documentation updates
refactor: - Code refactoring
test: - Test additions/modifications
chore: - Maintenance tasks
perf: - Performance improvements
style: - Code style changes
ci: - CI/CD configuration changes

Examples:

fix: resolve memory leak in data processing
feat: add export to CSV functionality
breaking: change API response format
docs: update installation guide

Description

Brief description of the changes in this PR

Type of change

Checklist

I have run pre-commit on my changed files and all checks pass
My code follows the style guidelines of this project
I have performed a self-review of my own code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

Pre-commit status

# Paste the output of running pre-commit on your changed files:
# uv run pre-commit install
# git diff --name-only HEAD~1 | xargs uv run pre-commit run --files # for last commit
# git diff --name-only origin/<base branch>...HEAD | xargs uv run pre-commit run --files # for all commits in PR
# git add <your file> # if any fixes were applied
# git commit -m "chore: apply pre-commit fixes"
# git push origin <branch-name>

How to Test

Add test method for this PR.

Test CLI Command

Write down the test bash command. If there is pre-requests, please emphasize.

Expected Results

Description/screenshots of expected results.

Additional context

Add any other context about the PR here.

Summary by CodeRabbit

New Features
- Added Gemini CLI backend and Copilot backend support (local & Docker modes), including native hook integrations and workspace IPC.
Improvements
- Enhanced permission and hook plumbing for safer file/shell operations and Docker-aware tool exclusion.
- Docker-mode validation expanded to Copilot and Gemini CLI.
Documentation
- Updated backend reference and user guides with Gemini CLI and Copilot configuration, auth, and Docker examples.
Tests
- Extensive unit/integration tests added for Gemini CLI, Copilot, hooks, permissions, and orchestrator behavior.

…upport for a new gemini_cli backend: Add Backend

Test Cases

- yields an "error" event with the raw error message instead of a content message, so callers can handle fatal conditions separately.

…upport for a new gemini_cli backend: Add Backend

Test Cases

- yields an "error" event with the raw error message instead of a content message, so callers can handle fatal conditions separately.

…upport for a new gemini_cli backend: Add Backend

Test Cases

- yields an "error" event with the raw error message instead of a content message, so callers can handle fatal conditions separately.

coderabbitai · 2026-03-15T04:30:16Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 59ca0d50-820b-4df8-b57b-2b2983fcfacf

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a new Gemini CLI backend (gemini_cli) with workspace/hook IPC, Docker and local subprocess modes, and MCP/workflow support; extends Copilot backend with Docker-aware permission hooks; introduces native hook adapters for Copilot and Gemini CLI; updates capabilities, CLI wiring, filesystem/docker mounts, validation, docs, and many tests.

Changes

Cohort / File(s)	Summary
Backends: new/exports `massgen/backend/gemini_cli.py`, `massgen/backend/__init__.py`, `massgen/backend/capabilities.py`, `massgen/backend/copilot.py`	Add GeminiCLIBackend; export it; register gemini_cli capabilities; enhance CopilotBackend with Docker mode, permission hooks, path/command extraction, and session signature writable_paths.
Native hook adapters & hook script `massgen/mcp_tools/native_hook_adapters/gemini_cli_adapter.py`, `massgen/mcp_tools/native_hook_adapters/gemini_cli_hook_script.py`, `massgen/mcp_tools/native_hook_adapters/copilot_adapter.py`, `massgen/mcp_tools/native_hook_adapters/__init__.py`	Add GeminiCLINativeHookAdapter (file IPC payloads, settings.json hooks), hook subprocess script for BeforeTool/AfterTool with permission manifest enforcement, and CopilotNativeHookAdapter; expose adapters and SDK-availability helpers.
CLI / config / validation `massgen/cli.py`, `massgen/config_validator.py`, `massgen/configs/providers/gemini/*.yaml`	Wire gemini_cli into create_backend, agent creation, provider_map, and CLI args; add Docker network_mode validation for copilot and gemini_cli; add example local/docker configs.
Filesystem / Docker manager `massgen/filesystem_manager/_constants.py`, `massgen/filesystem_manager/_filesystem_manager.py`, `massgen/filesystem_manager/_docker_manager.py`, `massgen/filesystem_manager/_path_permission_manager.py`	Treat `.gemini` as metadata/critical/excluded directory; add mount_gemini_config to Docker manager to optionally mount ~/.gemini; PathPermissionManagerHook now inherits PatternHook.
Orchestrator & token mapping `massgen/orchestrator.py`, `massgen/token_manager/token_manager.py`	Large orchestrator changes for answer normalization/versioning, partial/final result methods and timeouts; add gemini_cli provider alias mappings.
Docs & SKILL `docs/source/...yaml_schema.rst`, `docs/source/user_guide/backends.rst`, `massgen/skills/backend-integrator/SKILL.md`	Document gemini_cli backend, Docker mode examples, MCP/hook notes, and update backend catalogs and SDK/Docker guidance.
Tests (many) `massgen/tests/...` (numerous files, incl. copilot and extensive gemini_cli unit/integration/live tests)	Add/extend comprehensive unit, integration, and live tests for Gemini CLI backend, hook IPC/script, Copilot permission/hooks, orchestrator enforcement, Docker-mode behavior, and many helper flows.
Scripts / sandbox / misc `scripts/test_hook_backends.py`, `scripts/test_native_tools_sandbox.py`, `pyproject.toml`, `massgen/docker/`, `massgen/token_manager/`	Extend test harness and sandbox runner to support copilot & gemini_cli; add pytest norecursedirs; install `@google/gemini-cli` in Dockerfiles; token manager provider aliases updated.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant GeminiBackend as GeminiCLIBackend
    participant GeminiCLI as Gemini CLI
    participant HookIPC as .gemini/hook_payload.json
    participant HookScript as gemini_cli_hook_script.py

    Client->>GeminiBackend: stream_with_tools(messages, tools)
    GeminiBackend->>GeminiBackend: write workspace settings.json (hooks)
    GeminiBackend->>GeminiCLI: spawn CLI subprocess (stdin)
    Note over GeminiCLI,HookScript: CLI invokes BeforeTool/AfterTool hooks
    GeminiCLI->>HookScript: execute hook (stdin: tool_event)
    HookScript->>HookIPC: read/write hook_payload.json (permission/injection)
    HookScript-->>GeminiCLI: emit hook response (approved/denied/inject)
    GeminiCLI-->>GeminiBackend: stdout JSON stream events
    GeminiBackend->>Client: yield parsed StreamChunk(s)

sequenceDiagram
    participant Client
    participant CopilotBackend
    participant CopilotSDK as Copilot SDK
    participant PPM as PathPermissionManager
    participant MCP as MCP (container)

    Client->>CopilotBackend: stream_with_tools(...)
    CopilotBackend->>CopilotBackend: build session_hooks (on_pre_tool_use)
    CopilotBackend->>CopilotSDK: create_session(hooks)
    CopilotSDK->>CopilotBackend: on_pre_tool_use(request)
    CopilotBackend->>PPM: pre_tool_use_hook(path/command)
    PPM-->>CopilotBackend: allow/deny
    alt denied
        CopilotBackend-->>CopilotSDK: deny decision
    else allowed
        CopilotBackend-->>CopilotSDK: allow (or route tool to MCP if docker-mode)
    end
    CopilotSDK->>Client: stream results

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

feat: Adding support for Copilot SDK #862 — Overlaps in adding/wiring CopilotBackend and related exports, capabilities, and tests.
feat: Hook Framework for Agent Lifecycle Events #769 — Related hook framework and native adapter integration affecting PatternHook/GeneralHookManager and adapters.
feat: Decomposition #858 — Overlapping orchestrator modifications (answer normalization/versioning and timeout/restart flows).

Suggested reviewers

a5507203

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch gemini_cli_clean

📝 Coding Plan

Generate coding plan for human review comments

coderabbitai

Actionable comments posted: 13

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

🟡 Minor comments (13)

massgen/tests/test_concurrent_logging_sessions.py-410-420 (1)
410-420: ⚠️ Potential issue | 🟡 Minor

Add a Google-style docstring to _slow_save.

_slow_save is a changed function and should include a Google-style docstring for maintainability and guideline compliance.
Suggested patch
         def _slow_save(self, data):
+            """Persist registry data in two chunks with a delay.
+
+            Args:
+                self: SessionRegistry instance.
+                data: Registry payload to serialize and persist.
+            """
             payload = json.dumps(data, indent=2)
             midpoint = max(1, len(payload) // 2)
             tmp_file = self.registry_path.with_suffix(".slow_tmp")
             with open(tmp_file, "w") as f:
                 f.write(payload[:midpoint])
                 f.flush()
                 time.sleep(0.002)
                 f.write(payload[midpoint:])
                 f.flush()
             tmp_file.replace(self.registry_path)
As per coding guidelines: **/*.py: For new or changed functions, include Google-style docstrings.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/tests/test_concurrent_logging_sessions.py` around lines 410 - 420,
Add a Google-style docstring to the _slow_save function describing its purpose,
parameters and side effects: explain that it serializes the passed data to JSON
with indentation, writes the first half to a temporary file, sleeps to simulate
slowness, writes the rest, and atomically replaces the registry file; document
the parameter `data` (type: any/serializable), note that it returns None and may
raise IO/OSError on file operations, and include a short example or note about
the temporary file name if helpful.
massgen/tests/test_task_plan_support.py-13-15 (1)
13-15: ⚠️ Potential issue | 🟡 Minor

Preserve empty-list semantics in _FakeHost.__init__.

if active_tasks turns [] into None, so tests can’t represent a valid “empty task plan” state.
Suggested fix
-        self._active_tasks = [dict(t) for t in active_tasks] if active_tasks else None
+        self._active_tasks = (
+            [dict(t) for t in active_tasks] if active_tasks is not None else None
+        )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/tests/test_task_plan_support.py` around lines 13 - 15, The
constructor for _FakeHost incorrectly treats an empty list as None by using "if
active_tasks"; change the check to "if active_tasks is not None" so that an
explicit empty list is preserved, and keep the defensive copy logic
(self._active_tasks = [dict(t) for t in active_tasks] if active_tasks is not
None else None) so callers can represent a valid empty-task-plan state; update
the __init__ in _FakeHost accordingly.
massgen/tests/test_task_plan_support.py-108-113 (1)
108-113: ⚠️ Potential issue | 🟡 Minor

Assert plan_updates is populated before indexing.

Add an explicit check so failures are diagnostic instead of IndexError.
Suggested fix
     assert handled is True
     assert host.pinned_updates, "Expected pinned task plan update"
+    assert host.plan_updates, "Expected host task-plan state update"
     assert host.pinned_updates[-1]["operation"] == "update"
     assert host.pinned_updates[-1]["focused_task_id"] == "task_1"
     assert host.pinned_updates[-1]["tasks"][0]["status"] == "completed"
     assert host.plan_updates[-1]["operation"] == "update"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/tests/test_task_plan_support.py` around lines 108 - 113, Add an
explicit assertion that host.plan_updates is populated before the test indexes
its last element: after asserting host.pinned_updates and before accessing
host.plan_updates[-1], assert that host.plan_updates (or len(host.plan_updates)
> 0) is truthy so failures report a clear assertion instead of raising
IndexError when the list is empty; update the test in test_task_plan_support.py
where variables handled and host are used to include this check.
massgen/skills/backend-integrator/SKILL.md-669-669 (1)
669-669: ⚠️ Potential issue | 🟡 Minor

Fix markdown formatting issues.

Static analysis flagged several formatting violations:

Tables at lines 669, 711, 729, 735 need blank lines before and after

Fenced code block at line 951 needs a language specifier
📝 Proposed fixes

Add blank lines around tables and specify language for code blocks:
 **Docker mode**: Skip backend sandbox config — the container IS the sandbox. Grant full access inside the container.
+
 ### SDK Wrapper Backend (like Claude Code, Copilot)
-```
+
+```python
 __init__: import SDK, configure options, detect Docker mode
Apply similar blank line additions around the tables at lines 669, 711, 729, and 735.
Also applies to: 711-711, 729-729, 735-735, 951-951
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/skills/backend-integrator/SKILL.md` at line 669, The markdown has
tables that lack surrounding blank lines and a fenced code block without a
language; add a blank line before and after each table occurrence of the header
row "| Section | Gate |" (and the other two table blocks referenced) so they are
separated from adjacent paragraphs, and change the fenced code block that
contains "__init__: import SDK, configure options, detect Docker mode" to use a
language specifier (e.g., ```python) on the opening fence; update the three
other table occurrences noted (around the same table content) similarly to
ensure proper Markdown rendering.
docs/source/user_guide/backends.rst-156-165 (1)
156-165: ⚠️ Potential issue | 🟡 Minor

gemini_cli is misclassified as built-in code execution here.

The rest of this page treats Gemini CLI as native shell/filesystem access, not provider-side enable_code_execution. Keeping ⭐ in the Code Execution column implies that enable_code_execution is valid for this backend when the docs elsewhere only describe shell/bash support.

As per coding guidelines, "Behavior descriptions must match what the code does."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/source/user_guide/backends.rst` around lines 156 - 165, The table entry
for gemini_cli incorrectly marks Code Execution as supported (⭐); update the
backends.rst table row for gemini_cli to remove or replace the Code Execution
star so it reflects native shell/filesystem behavior described elsewhere (i.e.,
do not advertise provider-side enable_code_execution for gemini_cli); ensure the
row now matches the documentation narrative and any other mentions of gemini_cli
and enable_code_execution across the docs.
docs/source/reference/yaml_schema.rst-793-797 (1)
793-797: ⚠️ Potential issue | 🟡 Minor

This row shifted the backend note into the Required column.

Line 795 now reads like command_line_docker_network_mode is “required: Codex/Gemini CLI”, while Line 796 still says “All with MCP support”. Please keep the Required cell as No and move the backend-specific constraint into the Supported Backends/Description column.

As per coding guidelines, "Ensure proper RST syntax."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/source/reference/yaml_schema.rst` around lines 793 - 797, The table row
for `command_line_docker_network_mode` incorrectly put the backend-specific note
into the "Required" column; change the "Required" cell back to "No" and move the
backend constraint ("Codex, Gemini CLI (Docker mode)") and the note about Docker
network mode into the Supported Backends/Description column so the row reads:
name `command_line_docker_network_mode`, type `string`, supported backends
`Codex, Gemini CLI (Docker mode)` plus the description "Docker network mode:
'bridge', 'host', 'none' (use 'bridge'). All with MCP support", and ensure the
RST table cell separators/inline code markup
(``command_line_docker_network_mode`` and quoted modes) follow proper RST
syntax.
docs/source/user_guide/backends.rst-64-66 (1)
64-66: ⚠️ Potential issue | 🟡 Minor

Remove the duplicated codex backend row.

codex is already listed earlier in this table, so this second entry makes the backend catalog internally inconsistent and likely displaced the backend that was meant to be added here.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/source/user_guide/backends.rst` around lines 64 - 66, Remove the
duplicated backend row that contains "* - ``codex``" in the table: delete that
second "codex" entry so the table remains consistent, ensure the subsequent rows
(the OpenAI (CLI) / GPT-5.4, GPT-5.3-Codex, GPT-5.2-Codex, GPT-5.1-Codex,
GPT-4.1 lines) are correctly aligned under the intended columns, and verify no
other backend was inadvertently displaced by this duplicate removal.
massgen/tests/integration/test_copilot_hook_orchestrator.py-15-24 (1)
15-24: ⚠️ Potential issue | 🟡 Minor

Mark this module as an integration test.

Because this file lives under massgen/tests/integration/ but none of the tests are tagged integration, it won't participate correctly in the repo's opt-in integration test flow.
Suggested change
 import pytest
 
+pytestmark = pytest.mark.integration
+
 from massgen.mcp_tools.hooks import (
As per coding guidelines, `{massgen/tests/**/*.py,webui/src/**/*.test.ts}`: Use `@pytest.mark.integration` (opt-in: `--run-integration`), `@pytest.mark.live_api` (opt-in: `--run-live-api`), `@pytest.mark.expensive`, `@pytest.mark.docker`, and `@pytest.mark.asyncio` markers when writing tests.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/tests/integration/test_copilot_hook_orchestrator.py` around lines 15
- 24, This integration test module isn't tagged for the opt-in integration test
run; add a module-level marker by defining pytestmark = pytest.mark.integration
(since pytest is already imported) at the top of the file so pytest treats all
tests in this module as integration tests; modify or add the pytestmark variable
near the existing imports and leave individual test functions (e.g., any tests
using GeneralHookManager, CopilotNativeHookAdapter) unchanged.
scripts/test_hook_backends.py-152-155 (1)
152-155: ⚠️ Potential issue | 🟡 Minor

Preserve exception context using chaining.

The ImportError should be chained to the new exception for better debuggability. This aligns with the codebase convention, which uses exception chaining throughout (e.g., in massgen/__init__.py, massgen/cli.py, massgen/orchestrator.py, and others).
Suggested change
-        except ImportError:
-            raise ValueError("massgen.backend.copilot not available")
+        except ImportError as err:
+            raise ValueError("massgen.backend.copilot not available") from err
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/test_hook_backends.py` around lines 152 - 155, The import failure for
COPILOT_SDK_AVAILABLE and CopilotBackend is re-raised as a ValueError without
preserving the original ImportError; modify the except block to capture the
ImportError (e.g., except ImportError as e) and re-raise the ValueError using
exception chaining (raise ValueError("massgen.backend.copilot not available")
from e) so the original ImportError context is preserved for debugging.
massgen/orchestrator.py-9701-9703 (1)
9701-9703: ⚠️ Potential issue | 🟡 Minor

Use Google-style docstring for the new helper method.

_coerce_answer_content_to_text is a new changed function but currently has a one-line docstring. Please convert to Google-style with Args: and Returns:.

As per coding guidelines, **/*.py: “For new or changed functions, include Google-style docstrings”.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/orchestrator.py` around lines 9701 - 9703, The docstring for the
helper method _coerce_answer_content_to_text should be expanded to a
Google-style docstring: add a short summary line, an Args: section describing
the content parameter (type Any and what inputs are expected/handled), and a
Returns: section specifying it returns a str (normalized plain-text answer).
Keep description concise and place it immediately under the def for
_coerce_answer_content_to_text.
massgen/filesystem_manager/_docker_manager.py-115-115 (1)
115-115: ⚠️ Potential issue | 🟡 Minor

Please document gemini_config in the constructor credential-mount list.

The feature is implemented, but the credentials.mount option list in the constructor docstring still omits it.
📝 Suggested docstring update
             credentials: Credential management configuration:
                 mount: List of credential types to mount
-                    ["ssh_keys", "git_config", "gh_config", "npm_config", "pypi_config", "claude_config", "codex_config"]
+                    ["ssh_keys", "git_config", "gh_config", "npm_config", "pypi_config", "claude_config", "codex_config", "gemini_config"]
Also applies to: 384-393
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/filesystem_manager/_docker_manager.py` at line 115, Update the
constructor docstring for the class that sets self.mount_gemini_config to
document the new credentials.mount option "gemini_config": locate the
constructor docstring where the credentials.mount list/options are documented
and add "gemini_config" with a brief description (e.g., mounts Gemini
configuration files into the container), and mirror this update in the other
constructor/docstring block referenced around the alternative constructor or
overload (the region corresponding to lines ~384-393) so both places list
"gemini_config" alongside the existing mount options; ensure the term matches
the code symbol mount_gemini_config and that the description is concise and
consistent.
massgen/mcp_tools/native_hook_adapters/gemini_cli_hook_script.py-231-252 (1)
231-252: ⚠️ Potential issue | 🟡 Minor

Remove duplicate "directory" key in path extraction tuple.

The key "directory" appears twice in the tuple (lines 236 and 246), making the second occurrence dead code.
🐛 Proposed fix
     for key in (
         "file_path",
         "absolute_path",
         "path",
         "dir_path",
         "directory",
         "filename",
         "file",
         "notebook_path",
         "target",
         "source",
         "source_path",
         "destination",
         "destination_path",
         "destination_base_path",
-        "directory",
         "dir",
     ):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/mcp_tools/native_hook_adapters/gemini_cli_hook_script.py` around
lines 231 - 252, The tuple of candidate keys used to extract a path from
tool_args contains a duplicate "directory" entry; update the key sequence in the
loop that checks tool_args.get(key) (the tuple containing "file_path",
"absolute_path", ..., "directory", "dir") to remove the duplicated "directory"
so each lookup key is unique, leaving only one "directory" in the list used by
the for key in (...) loop and keeping the subsequent logic (isinstance(value,
str) and value) unchanged.
massgen/tests/test_gemini_cli_backend.py-618-658 (1)
618-658: ⚠️ Potential issue | 🟡 Minor

Add @pytest.mark.skipif for Windows-specific test causing pipeline failure.

The test test_windows_env_keeps_path_when_node_already_resolved patches os.name to "nt" but doesn't skip on non-Windows systems. Unlike test_windows_cmd_wrapper_rewrites_to_direct_node_launch (line 521), this test lacks the skipif decorator, causing NotImplementedError: cannot instantiate 'WindowsPath' on Linux/macOS CI.
🐛 Proposed fix
+    `@pytest.mark.skipif`(
+        sys.platform != "win32",
+        reason="WindowsPath cannot be instantiated on non-Windows",
+    )
     def test_windows_env_keeps_path_when_node_already_resolved(self, backend):
         """Do not prepend node path when node is already resolvable."""
         backend.api_key = None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/tests/test_gemini_cli_backend.py` around lines 618 - 658, Add a
pytest.skipif decorator to the
test_windows_env_keeps_path_when_node_already_resolved test so it only runs on
Windows (sys.platform == "win32"), mirroring the existing skip on
test_windows_env_adds_node_path_when_missing; update the test function
declaration for test_windows_env_keeps_path_when_node_already_resolved and
ensure it uses the same condition and reason text, leaving the patches that set
os.name to "nt" and the call to backend._build_subprocess_env unchanged.

🧹 Nitpick comments (18)

massgen/tests/test_config_validator.py (1)

1759-1793: Add the same Docker-network test pair for gemini_cli for parity.

config_validator.py added identical Docker-mode requirements for gemini_cli, but this test block only covers copilot.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@massgen/tests/test_config_validator.py` around lines 1759 - 1793, Add a
parallel test pair for the gemini_cli backend mirroring
test_copilot_docker_requires_network_mode and
test_copilot_docker_with_network_mode_passes: create two tests (e.g.,
test_gemini_cli_docker_requires_network_mode and
test_gemini_cli_docker_with_network_mode_passes) that build configs with
agent.backend.type set to "gemini_cli", command_line_execution_mode "docker",
and verify ConfigValidator().validate_config(config) returns an invalid result
when command_line_docker_network_mode is missing and no such network_mode error
when command_line_docker_network_mode is provided; use the same error-message
predicate as the copilot tests (checking "gemini_cli" or "gemini" as appropriate
along with "docker" and "network_mode") against result.errors.

massgen/config_validator.py (1)

606-639: Consider consolidating duplicate Docker-network validation blocks.

codex, copilot, and gemini_cli now share the same requirement. A single set-based check would reduce drift risk.

♻️ Small DRY refactor

-        # Validate Codex Docker mode requirements
-        if backend_type == "codex":
-            execution_mode = backend_config.get("command_line_execution_mode")
-            if execution_mode == "docker":
-                # command_line_docker_network_mode is required for Codex in Docker mode
-                if "command_line_docker_network_mode" not in backend_config:
-                    result.add_error(
-                        "Codex backend in Docker mode requires 'command_line_docker_network_mode'",
-                        f"{location}.command_line_docker_network_mode",
-                        "Add 'command_line_docker_network_mode: bridge' (required for Codex Docker execution)",
-                    )
-
-        # Validate Copilot Docker mode requirements
-        if backend_type == "copilot":
-            execution_mode = backend_config.get("command_line_execution_mode")
-            if execution_mode == "docker":
-                if "command_line_docker_network_mode" not in backend_config:
-                    result.add_error(
-                        "Copilot backend in Docker mode requires 'command_line_docker_network_mode'",
-                        f"{location}.command_line_docker_network_mode",
-                        "Add 'command_line_docker_network_mode: bridge' (required for Copilot Docker execution)",
-                    )
-
-        # Validate Gemini CLI Docker mode requirements
-        if backend_type == "gemini_cli":
-            execution_mode = backend_config.get("command_line_execution_mode")
-            if execution_mode == "docker":
-                if "command_line_docker_network_mode" not in backend_config:
-                    result.add_error(
-                        "Gemini CLI backend in Docker mode requires 'command_line_docker_network_mode'",
-                        f"{location}.command_line_docker_network_mode",
-                        "Add 'command_line_docker_network_mode: bridge' (required for Gemini CLI Docker execution)",
-                    )
+        docker_network_required_backends = {"codex", "copilot", "gemini_cli"}
+        if backend_type in docker_network_required_backends:
+            if backend_config.get("command_line_execution_mode") == "docker":
+                if "command_line_docker_network_mode" not in backend_config:
+                    label = backend_type.replace("_", " ").title().replace("Cli", "CLI")
+                    result.add_error(
+                        f"{label} backend in Docker mode requires 'command_line_docker_network_mode'",
+                        f"{location}.command_line_docker_network_mode",
+                        f"Add 'command_line_docker_network_mode: bridge' (required for {label} Docker execution)",
+                    )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@massgen/config_validator.py` around lines 606 - 639, The three nearly
identical Docker-network validation blocks for backend_type ==
"codex"/"copilot"/"gemini_cli" should be consolidated: replace them with one
check that tests if backend_type is in the set {"codex", "copilot",
"gemini_cli"}, reads execution_mode =
backend_config.get("command_line_execution_mode"), and if execution_mode ==
"docker" ensures "command_line_docker_network_mode" exists on backend_config and
calls result.add_error with the same location and hint; use backend_type in the
error text or a single generic message to preserve context and keep references
to backend_type, backend_config, execution_mode, result.add_error, and location
to find and update the code.

massgen/tests/integration/test_copilot_hook_orchestrator.py (1)

85-112: Avoid bypassing CopilotBackend.__init__ in the integration helper.

Constructing the backend with __new__ and hand-populating private fields can let init-time wiring regressions slip through. A smoke test that patches the SDK boundary and calls the real constructor would cover adapter setup more realistically.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/tests/integration/test_copilot_hook_orchestrator.py` around lines 85
- 112, Replace the __new__ bypass in _make_copilot_backend with a real
construction path that calls CopilotBackend.__init__ under the patched copilot
SDK: inside the patch.dict block import CopilotBackend and invoke its
constructor (e.g., CopilotBackend(...) with a minimal/mock config or required
args) instead of using CopilotBackend.__new__ and manually setting private
attributes; ensure you still patch or inject a mock client/filesystem_manager as
needed so the real __init__ runs and the call to
backend._init_native_hook_adapter (and other init wiring) happens naturally.

massgen/tests/test_docker_skills_write_access.py (1)

55-55: Add a dedicated positive test for mount_gemini_config.

Good fixture parity change. Consider adding a test_build_credential_mounts_supports_gemini_config case (mirroring Claude/Codex tests) to lock in the new behavior.

➕ Suggested test addition

+def test_build_credential_mounts_supports_gemini_config(docker_manager, tmp_path, monkeypatch):
+    """gemini_config mount should map ~/.gemini into /home/massgen/.gemini."""
+    fake_home = tmp_path / "home"
+    gemini_dir = fake_home / ".gemini"
+    gemini_dir.mkdir(parents=True)
+    docker_manager.mount_gemini_config = True
+
+    monkeypatch.setattr("massgen.filesystem_manager._docker_manager.Path.home", lambda: fake_home)
+
+    mounts = docker_manager._build_credential_mounts()
+    host_path = str(gemini_dir.resolve())
+
+    assert host_path in mounts
+    assert mounts[host_path]["bind"] == "/home/massgen/.gemini"
+    assert mounts[host_path]["mode"] == "ro"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@massgen/tests/test_docker_skills_write_access.py` at line 55, Add a positive
unit test named test_build_credential_mounts_supports_gemini_config that mirrors
the Claude/Codex tests: instantiate or use the existing manager fixture, set
manager.mount_gemini_config = True, call the method that builds credential
mounts (e.g., manager.build_credential_mounts or the equivalent function used
elsewhere in the file), and assert that the returned mounts include the Gemini
config mount (validate by presence of the expected mount entry or path). Ensure
the test uses the same setup/teardown patterns as the other tests in this file
and references manager.mount_gemini_config and the build_credential_mounts call
so the behavior is locked in.

massgen/tests/integration/test_orchestrator_stream_enforcement.py (1)

44-45: Add @pytest.mark.integration marker.

This integration test is missing the @pytest.mark.integration marker. Per coding guidelines, integration tests should use this marker for opt-in execution with --run-integration.
💡 Suggested fix
+@pytest.mark.integration
 `@pytest.mark.asyncio`
 async def test_stream_agent_execution_coerces_structured_new_answer_content(mock_orchestrator):
As per coding guidelines: "Use @pytest.mark.integration (opt-in: --run-integration)... markers when writing tests".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/tests/integration/test_orchestrator_stream_enforcement.py` around
lines 44 - 45, The test function
test_stream_agent_execution_coerces_structured_new_answer_content is missing the
integration marker; add the `@pytest.mark.integration` decorator above its
existing `@pytest.mark.asyncio` so the test is opt-in for --run-integration
(ensure the decorator is imported/available in the test module and placed
directly above the test definition).

massgen/tests/test_cli_cwd_context_flag.py (1)

82-89: Consider using a non-/tmp path to avoid static analysis warnings.

Ruff S108 flags /tmp/other as potentially insecure temp usage. While this is a false positive in the test context (it's just a string literal), using a different sentinel path would silence the linter.

💡 Suggested fix

 def test_apply_cli_cwd_context_path_noop_when_not_set(tmp_path, monkeypatch):
     """Should be a no-op when cwd context mode is not provided."""
     monkeypatch.chdir(tmp_path)
-    config = {"orchestrator": {"context_paths": [{"path": "/tmp/other", "permission": "read"}]}}
+    config = {"orchestrator": {"context_paths": [{"path": "/some/other/path", "permission": "read"}]}}

     apply_cli_cwd_context_path(config, None)

-    assert config["orchestrator"]["context_paths"] == [{"path": "/tmp/other", "permission": "read"}]
+    assert config["orchestrator"]["context_paths"] == [{"path": "/some/other/path", "permission": "read"}]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@massgen/tests/test_cli_cwd_context_flag.py` around lines 82 - 89, The test
test_apply_cli_cwd_context_path_noop_when_not_set uses the literal "/tmp/other"
which triggers Ruff S108; update the sentinel path to a non-/tmp value (e.g.
"/var/other" or "/nonexistent/other" or "SENTINEL_PATH") so the test still
asserts identity but avoids the linter warning; modify the literal in the test
where apply_cli_cwd_context_path is called and in the final assert to the new
sentinel string.

massgen/tests/integration/test_gemini_cli_hook_orchestrator.py (1)

1-27: Consider adding @pytest.mark.integration marker to this module.

Per coding guidelines, integration tests should use the @pytest.mark.integration marker for opt-in execution. Since this file tests multi-component orchestration flows, consider adding the marker to the relevant test classes or as a module-level pytestmark.
💡 Suggested addition at module level
 """Integration tests for Gemini CLI hook adapter with orchestrator hook manager.

 These are deterministic, non-API tests that verify the hook adapter correctly
 builds settings.json config and integrates with MassGen's GeneralHookManager.
 """

 from __future__ import annotations

 import json
 from pathlib import Path
 from unittest.mock import patch

+import pytest
+
 from massgen.mcp_tools.hooks import (
     GeneralHookManager,
     HookResult,
     HookType,
     PatternHook,
 )
 from massgen.mcp_tools.native_hook_adapters import GeminiCLINativeHookAdapter
+
+pytestmark = pytest.mark.integration
As per coding guidelines: "Use @pytest.mark.integration (opt-in: --run-integration)... markers when writing tests" and learning "Place integration tests in massgen/tests/integration/ for multi-component orchestration flows".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/tests/integration/test_gemini_cli_hook_orchestrator.py` around lines
1 - 27, Add the pytest integration marker at the module level so all tests in
this file are opt-in; define a module-level pytestmark = pytest.mark.integration
(or apply `@pytest.mark.integration` to the test classes) and import pytest if
needed so the marker is recognized; update the top of the file near the imports
and ensure the marker applies to _AllowHook and any test classes that follow.

massgen/configs/providers/gemini/gemini_cli_local.yaml (1)

1-9: Consider adding a "What happens" comment and using a cost-effective model.

Per coding guidelines, YAML configs should include "What happens" comments explaining the execution flow. Also, the guidelines suggest preferring cost-effective models like gemini-2.5-flash over preview models.
💡 Suggested improvement
 `#uv` run massgen --config "massgen/configs/providers/gemini/gemini_cli_local.yaml" "whats the deep learning about?"
+# What happens: Single Gemini CLI agent executes locally, streaming responses via the gemini CLI subprocess.
 agents:
 - id: agent_a
   backend:
     type: gemini_cli
-    model: gemini-3-flash-preview
+    model: gemini-2.5-flash
 ui:
   display_type: textual_terminal
   logging_enabled: true
As per coding guidelines: "Include 'What happens' comments explaining execution flow" and "Prefer cost-effective models (gpt-5-nano, gpt-5-mini, gemini-2.5-flash)".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/configs/providers/gemini/gemini_cli_local.yaml` around lines 1 - 9,
Add a top-level "What happens" comment describing the execution flow for this
YAML (how the agents list is used, backend.type=gemini_cli invoking the local
CLI adapter, and ui.display_type controlling output) and replace the preview
model name in the agents[0].backend.model field (currently
"gemini-3-flash-preview") with a cost-effective alternative such as
"gemini-2.5-flash" to follow the cost-preference guideline; ensure the comment
appears before the agents key and clearly states the single-agent flow and UI
logging behavior.

massgen/tests/test_gemini_cli_live.py (1)

180-219: Consider removing @pytest.mark.live_api from this test.

This test validates settings.json writing without actually calling the Gemini API—it only exercises local config persistence. The live_api marker may cause this test to be skipped unnecessarily in CI environments without API credentials.

Proposed change

 `@pytest.mark.integration`
-@pytest.mark.live_api
 `@pytest.mark.asyncio`
 async def test_settings_json_written_with_hooks_config():
     """Verify settings.json is written correctly when hooks config is set."""
-    _skip_if_no_credentials()
     _skip_if_no_gemini_cli()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@massgen/tests/test_gemini_cli_live.py` around lines 180 - 219, The test
test_settings_json_written_with_hooks_config is marked with
`@pytest.mark.live_api` but doesn't call the Gemini API; remove the
`@pytest.mark.live_api` decorator to avoid unnecessary skipping in CI, keeping
`@pytest.mark.integration` and `@pytest.mark.asyncio`; locate the test function
(test_settings_json_written_with_hooks_config) in
massgen/tests/test_gemini_cli_live.py and delete the `@pytest.mark.live_api` line
so the test still uses GeminiCLIBackend, backend._massgen_hooks_config setup,
backend._write_workspace_config, and settings.json assertions unchanged.

massgen/tests/test_copilot_live.py (2)

183-211: Log the caught exception for debugging instead of silently passing.

The try-except-pass pattern at lines 203-205 is intentional to verify non-hanging behavior, but logging the exception would help diagnose issues when this test fails unexpectedly.

Proposed enhancement

     try:
         async for chunk in backend.stream_with_tools(messages, []):
             chunks.append(chunk)
             if chunk.type in ("done", "error"):
                 break
     except Exception:
-        # Exception is acceptable — we just don't want a hang
-        pass
+        # Exception is acceptable — we just don't want a hang
+        import logging
+        logging.debug("Expected exception during unauthenticated access test")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@massgen/tests/test_copilot_live.py` around lines 183 - 211, In
test_error_handling_unauthenticated replace the silent except-pass with logging
of the caught exception so failures are diagnosable: inside the except block
that wraps the async for over backend.stream_with_tools(messages, []) (in
test_error_handling_unauthenticated) call logging.exception or pytest.fail with
the exception info instead of just pass, ensuring the exception from
CopilotBackend.stream_with_tools is recorded for debugging while preserving the
test's intent of not hanging.

30-41: Consider logging the exception before skipping.

The blind Exception catch at line 35 is acceptable for test skip logic, but logging the error details would aid debugging when authentication checks fail unexpectedly.

Proposed enhancement

 async def _check_copilot_auth(backend) -> None:
     """Skip test if Copilot is not authenticated."""
     try:
         await backend._ensure_started()
         is_auth, msg = await backend._query_auth_status(context="live_test")
     except Exception as e:
+        import logging
+        logging.debug(f"Copilot auth check failed: {e}")
         pytest.skip(f"Copilot SDK not available or auth check failed: {e}")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@massgen/tests/test_copilot_live.py` around lines 30 - 41, The test helper
_check_copilot_auth swallows exceptions when calling backend._ensure_started and
backend._query_auth_status and directly calls pytest.skip; add a log of the
caught exception before skipping so failures are visible. Update the except
block in _check_copilot_auth to record the exception (e.g., logging.exception or
logger.exception) including context like "Copilot SDK not available or auth
check failed", ensure the logging module is imported or an existing test logger
is used, then call pytest.skip with the same message; reference
backend._ensure_started, backend._query_auth_status, and pytest.skip to find the
try/except to modify.

massgen/tests/test_vote_only_mode.py (1)

220-241: Mutable class attribute should use a factory or immutable default.

The config = {} at line 224 is a mutable class attribute. If any test inadvertently modifies it, the change would persist across instances, potentially causing flaky tests.

Proposed fix

 class _FakeBackendForToolFiltering:
     """Minimal backend stub for tool-name extraction."""

     filesystem_manager = None
-    config = {}
+    config: dict = None  # type: ignore[assignment]
+
+    def __init__(self):
+        self.config = {}

     `@staticmethod`
     def extract_tool_name(tool_call):

Alternatively, if instance creation is not desired in the test context:

 class _FakeBackendForToolFiltering:
     """Minimal backend stub for tool-name extraction."""

     filesystem_manager = None
+    
+    `@property`
+    def config(self):
+        return {}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@massgen/tests/test_vote_only_mode.py` around lines 220 - 241, The class
_FakeBackendForToolFiltering defines a mutable class attribute config = {} which
can leak mutations across tests; change it to an immutable default or initialize
per-instance instead—either set config = None (or an immutable empty mapping) at
class level and create a fresh dict in an __init__ (or use a
dataclass/constructor with a default_factory) so each
_FakeBackendForToolFiltering instance gets its own config and tests cannot
mutate a shared class dict.

massgen/backend/copilot.py (2)

576-612: Async-to-sync bridging pattern has potential edge cases.

The pattern of calling asyncio.run() from within a ThreadPoolExecutor when already in an event loop is functional but complex. Consider documenting the threading implications or extracting this into a utility function for reuse.

The current implementation is correct, but the nested event loop creation could have subtle issues in certain environments (e.g., nested async frameworks).

Consider extracting to a utility

def _run_async_in_sync_context(coro):
    """Run an async coroutine from a sync context, handling event loop presence."""
    import asyncio
    import concurrent.futures
    
    try:
        loop = asyncio.get_running_loop()
    except RuntimeError:
        loop = None
    
    if loop and loop.is_running():
        with concurrent.futures.ThreadPoolExecutor(max_workers=1) as pool:
            return pool.submit(asyncio.run, coro).result()
    return asyncio.run(coro)

This could be placed in a utilities module for reuse across backends that need similar sync/async bridging.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@massgen/backend/copilot.py` around lines 576 - 612, Extract the async-to-sync
bridging logic into a reusable helper (e.g., _run_async_in_sync_context) and
replace the inline ThreadPoolExecutor/asyncio.run block inside
_ppm_permission_callback's "shell" branch; the helper should accept a coroutine
(the call to ppm.pre_tool_use_hook("Bash", {"command": cmd})) and return its
result, handling the presence of a running event loop by using a
ThreadPoolExecutor to call asyncio.run when required, otherwise calling
asyncio.run directly, then use that helper to get (allowed, _reason) and proceed
as before.

1003-1009: Accessing ppm.managed_paths assumes specific PPM structure.

The code accesses ppm.managed_paths via getattr with a fallback to empty list, but then accesses .path and .permission attributes on each item. If the PPM structure changes, this could fail silently.

Consider adding explicit type check or try-except

         if ppm:
-            writable_paths = sorted(str(mp.path) for mp in getattr(ppm, "managed_paths", []) if getattr(mp, "permission", None) == Permission.WRITE)
+            try:
+                writable_paths = sorted(
+                    str(mp.path) 
+                    for mp in getattr(ppm, "managed_paths", []) 
+                    if getattr(mp, "permission", None) == Permission.WRITE
+                )
+            except (AttributeError, TypeError):
+                logger.debug("[Copilot] Could not extract writable paths from PPM")
+                writable_paths = []

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@massgen/backend/copilot.py` around lines 1003 - 1009, The access of
ppm.managed_paths in the writable_paths computation assumes each managed path
object has .path and .permission; change the logic in the block that reads
filesystem_manager.path_permission_manager.managed_paths so it validates items
before using their attributes (e.g., ensure ppm and managed_paths exist, iterate
safely and for each item check hasattr(item, "path") and hasattr(item,
"permission") or isinstance checks, and only include str(item.path) when
item.permission == Permission.WRITE); alternatively wrap the extraction in a
try/except that logs or skips invalid entries to avoid silent failures while
preserving the sorted writable_paths assignment.

massgen/cli.py (1)

2447-2452: GeminiCLIBackend already performs CLI validation in __init__; factory-level preflight is optional for consistency.

GeminiCLIBackend.__init__ (line 59 of massgen/backend/gemini_cli.py) already includes a fail-fast preflight via _find_gemini_cli(). If the gemini CLI is missing, it raises RuntimeError with helpful installation instructions. However, for consistency with the factory pattern used in claude_code (which validates at the factory level) and to catch missing dependencies earlier, consider adding a factory-level check here as well. Note that CodexBackend has a similar backend-level preflight but no factory-level check, so if implementing this for gemini_cli, consider applying the same pattern to codex for parity.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/cli.py` around lines 2447 - 2452, Factory returns GeminiCLIBackend
without preflight check; add a factory-level validation to mirror claude_code
and surface missing CLI dependency earlier. Before returning GeminiCLIBackend in
the factory branch for backend_type "gemini_cli" in massgen/cli.py, call
GeminiCLIBackend._find_gemini_cli() (or wrap instantiation in a try/except
catching RuntimeError from GeminiCLIBackend.__init__) to propagate the same
RuntimeError with installation instructions; consider applying the same pattern
to CodexBackend for parity with backend-level preflight checks.

massgen/mcp_tools/native_hook_adapters/gemini_cli_hook_script.py (1)

170-182: Break long line for readability.

This line chains many fallback keys and is difficult to read.

♻️ Proposed refactor

 def _extract_tool_args(tool_event: dict) -> dict:
     """Support both legacy and current Gemini CLI hook event schemas."""
-    raw_args = tool_event.get("tool_input") or tool_event.get("input") or tool_event.get("toolArgs") or tool_event.get("tool_args") or tool_event.get("arguments") or tool_event.get("parameters") or {}
+    raw_args = (
+        tool_event.get("tool_input")
+        or tool_event.get("input")
+        or tool_event.get("toolArgs")
+        or tool_event.get("tool_args")
+        or tool_event.get("arguments")
+        or tool_event.get("parameters")
+        or {}
+    )
     if isinstance(raw_args, dict):
         return raw_args

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@massgen/mcp_tools/native_hook_adapters/gemini_cli_hook_script.py` around
lines 170 - 182, The long fallback chain in _extract_tool_args makes the
raw_args assignment hard to read; refactor by replacing the single long
expression with a clear sequence: define an ordered list of possible keys (e.g.,
["tool_input","input","toolArgs","tool_args","arguments","parameters"]), then
iterate over that list to set raw_args from tool_event.get(key) until a non-None
value is found (default to {}), preserving the existing subsequent logic that
handles dict and JSON-string cases for raw_args; keep the function name
_extract_tool_args and the variable raw_args unchanged so callers remain intact.

massgen/backend/gemini_cli.py (2)

1164-1164: Use !s conversion flag in f-string instead of explicit str() calls.

Per Ruff RUF010, f-strings should use conversion flags for clarity.

♻️ Proposed fix

-        command = f"{sys.executable} {str(_HOOK_SCRIPT_PATH)} " f"--hook-dir {str(config_dir)} --event BeforeTool"
+        command = f"{sys.executable} {_HOOK_SCRIPT_PATH!s} --hook-dir {config_dir!s} --event BeforeTool"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@massgen/backend/gemini_cli.py` at line 1164, The f-string assigned to
variable command uses explicit str() calls; update it to use the !s conversion
flag for clarity per Ruff RUF010 by removing str(_HOOK_SCRIPT_PATH) and
str(config_dir) and instead using {_HOOK_SCRIPT_PATH!s} and {config_dir!s}
inside the f-string so the command construction in gemini_cli.py (variable name:
command, symbols: _HOOK_SCRIPT_PATH, config_dir) uses conversion flags rather
than explicit str() calls.

72-72: Extract API key resolution into separate lines for readability.

This line is very long and chains multiple or operations, making it hard to read and maintain.

♻️ Proposed refactor

-        self.api_key = api_key or os.getenv("GEMINI_API_KEY") or os.getenv("GOOGLE_API_KEY") or configured_env.get("GEMINI_API_KEY") or configured_env.get("GOOGLE_API_KEY")
+        self.api_key = (
+            api_key
+            or os.getenv("GEMINI_API_KEY")
+            or os.getenv("GOOGLE_API_KEY")
+            or configured_env.get("GEMINI_API_KEY")
+            or configured_env.get("GOOGLE_API_KEY")
+        )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@massgen/backend/gemini_cli.py` at line 72, The long chained assignment to
self.api_key (currently using multiple or operations with os.getenv and
configured_env) should be split for readability: build an ordered list/tuple of
candidate sources (e.g., env_gemini = os.getenv("GEMINI_API_KEY"), env_google =
os.getenv("GOOGLE_API_KEY"), cfg_gemini = configured_env.get("GEMINI_API_KEY"),
cfg_google = configured_env.get("GOOGLE_API_KEY"), and the api_key param) then
iterate or use next(filter(None, candidates), None) to pick the first non-empty
value and assign it to self.api_key in the class initializer (constructor) where
the original expression lives; keep the same precedence (explicit api_key param
first) and preserve behavior if none found.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 35bade69-a7c0-4246-b533-5fcee8e21689

📥 Commits

Reviewing files that changed from the base of the PR and between 6291b25 and 4e3d214.

📒 Files selected for processing (50)

.gitignore
docs/source/reference/yaml_schema.rst
docs/source/user_guide/backends.rst
massgen/backend/__init__.py
massgen/backend/capabilities.py
massgen/backend/copilot.py
massgen/backend/gemini_cli.py
massgen/cli.py
massgen/config_validator.py
massgen/configs/providers/gemini/gemini_cli_local.yaml
massgen/filesystem_manager/_constants.py
massgen/filesystem_manager/_docker_manager.py
massgen/filesystem_manager/_filesystem_manager.py
massgen/filesystem_manager/_path_permission_manager.py
massgen/mcp_tools/native_hook_adapters/__init__.py
massgen/mcp_tools/native_hook_adapters/copilot_adapter.py
massgen/mcp_tools/native_hook_adapters/gemini_cli_adapter.py
massgen/mcp_tools/native_hook_adapters/gemini_cli_hook_script.py
massgen/orchestrator.py
massgen/skills/backend-integrator/SKILL.md
massgen/tests/backend/test_copilot.py
massgen/tests/backend/test_copilot_hooks.py
massgen/tests/backend/test_copilot_permission_callback.py
massgen/tests/backend/test_copilot_permission_integration.py
massgen/tests/integration/test_copilot_hook_orchestrator.py
massgen/tests/integration/test_gemini_cli_hook_orchestrator.py
massgen/tests/integration/test_orchestrator_stream_enforcement.py
massgen/tests/test_backend_capabilities.py
massgen/tests/test_cli_cwd_context_flag.py
massgen/tests/test_concurrent_logging_sessions.py
massgen/tests/test_config_validator.py
massgen/tests/test_copilot_live.py
massgen/tests/test_docker_skills_write_access.py
massgen/tests/test_gemini_cli_backend.py
massgen/tests/test_gemini_cli_hook_ipc.py
massgen/tests/test_gemini_cli_hook_script.py
massgen/tests/test_gemini_cli_live.py
massgen/tests/test_interactive_controller_commands.py
massgen/tests/test_native_tools_sandbox_script.py
massgen/tests/test_snapshot_preservation.py
massgen/tests/test_subagent_manager_paths.py
massgen/tests/test_task_plan_support.py
massgen/tests/test_tui_modes.py
massgen/tests/test_vote_only_mode.py
massgen/tests/unit/test_errors.py
massgen/tests/unit/test_orchestrator_unit.py
massgen/token_manager/token_manager.py
pyproject.toml
scripts/test_hook_backends.py
scripts/test_native_tools_sandbox.py

coderabbitai · 2026-03-15T04:50:43Z

.gitignore

+# Gemini CLI runtime artifacts (project-local, generated by MassGen)
+.gemini/settings.json
+.gemini/workflow_tool_specs.json
+.gemini/custom_tool_specs.json
+.gemini/GEMINI.md
+.gemini/google_accounts.json
+.gemini/oauth_creds.json
+.gemini/.env


⚠️ Potential issue | 🟠 Major

Ignore the full .gemini/ directory to prevent secret leakage.

Line-specific ignores are fragile; new credential/artifact filenames can still slip into git history. Prefer a directory-level ignore.

🔐 Proposed hardening

-# Gemini CLI runtime artifacts (project-local, generated by MassGen) -.gemini/settings.json -.gemini/workflow_tool_specs.json -.gemini/custom_tool_specs.json -.gemini/GEMINI.md -.gemini/google_accounts.json -.gemini/oauth_creds.json -.gemini/.env +# Gemini CLI runtime artifacts (project-local, generated by MassGen) +.gemini/

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

# Gemini CLI runtime artifacts (project-local, generated by MassGen)

.gemini/settings.json

.gemini/workflow_tool_specs.json

.gemini/custom_tool_specs.json

.gemini/GEMINI.md

.gemini/google_accounts.json

.gemini/oauth_creds.json

.gemini/.env

# Gemini CLI runtime artifacts (project-local, generated by MassGen)

.gemini/

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.gitignore around lines 239 - 246, Replace the fragile per-file ignores with a directory-level ignore by removing the explicit entries (.gemini/settings.json, .gemini/workflow_tool_specs.json, .gemini/custom_tool_specs.json, .gemini/GEMINI.md, .gemini/google_accounts.json, .gemini/oauth_creds.json, .gemini/.env) and add a single .gemini/ rule in .gitignore; after updating .gitignore, ensure any already-tracked files under the .gemini directory are removed from the index (git rm --cached) so the new .gemini/ rule takes effect.

coderabbitai · 2026-03-15T04:50:43Z

docs/source/reference/yaml_schema.rst

+GitHub Copilot Backend
+~~~~~~~~~~~~~~~~~~~~~~
+
+The GitHub Copilot backend uses the ``github-copilot-sdk`` with native MCP support.
+Requires a GitHub Copilot subscription and the Copilot CLI (``gh copilot``).
+
+**Basic Configuration:**
+
+.. code-block:: yaml
+
+   backend:
+     type: "copilot"
+     model: "gpt-5-mini"       # Also: gpt-4.1, claude-sonnet-4, gemini-2.5-pro
+
+**With MCP Servers and Custom Tools:**
+
+.. code-block:: yaml
+
+   backend:
+     type: "copilot"
+     model: "gpt-5-mini"
+     mcp_servers:
+       - name: "filesystem"
+         command: "npx"
+         args: ["-y", "@modelcontextprotocol/server-filesystem", "."]
+     custom_tools:
+       - path: "massgen/tool/_basic"
+         function: "two_num_tool"
+
+**Configuration Options:**
+
+- ``copilot_system_message_mode`` (string: "append"|"replace", default: "append"):
+  How the system message is applied to the Copilot session.
+- ``copilot_permission_policy`` (string: "approve"|"deny", default: "approve"):
+  Permission callback policy. "approve" validates paths via PathPermissionManager.
+- ``allowed_tools`` / ``exclude_tools``: Backend-level tool filtering.
+- ``enable_multimodal_tools`` (bool): Enable read_media/generate_media tools.
+
+**Docker Mode:**
+
+.. code-block:: yaml
+
+   backend:
+     type: "copilot"
+     model: "gpt-5-mini"
+     command_line_execution_mode: "docker"
+     command_line_docker_network_mode: "bridge"
+


⚠️ Potential issue | 🟠 Major

The schema reference still doesn't cover the new backend keys.

This page introduces Copilot-specific options and adds gemini_cli as a valid backend type, but the backend parameter reference below never adds entries for those Copilot fields and still omits Gemini CLI support from rows users will check first, like cwd and enable_web_search. As written, yaml_schema.rst is no longer the authoritative reference for the new configuration surface.

As per coding guidelines, "For new YAML parameters, update docs/source/reference/yaml_schema.rst."

Also applies to: 687-687

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/source/reference/yaml_schema.rst` around lines 278 - 325, The YAML schema reference is missing the new Copilot-specific backend keys and the new backend type "gemini_cli"; update docs/source/reference/yaml_schema.rst to add authoritative entries for the Copilot options (add keys/copilot fields: copilot_system_message_mode, copilot_permission_policy, mcp_servers, custom_tools, allowed_tools, exclude_tools, enable_multimodal_tools, command_line_execution_mode, command_line_docker_network_mode) and ensure backend parameter rows (e.g., cwd, enable_web_search) list "copilot" and "gemini_cli" where applicable; follow existing schema table/row format so these new fields appear in the backend parameter reference and include short descriptions and types consistent with the surrounding entries.

coderabbitai · 2026-03-15T04:50:43Z

massgen/backend/capabilities.py

+    "gemini_cli": BackendCapabilities(
+        backend_type="gemini_cli",
+        provider_name="Gemini CLI",
+        supported_capabilities={
+            "bash",
+            "mcp",
+            "filesystem_native",
+            "web_search",
+        },
+        builtin_tools=[
+            "run_shell_command",
+            "read_file",
+            "write_file",
+            "replace",
+            "glob",
+            "grep_search",
+            "list_directory",
+            "read_many_files",
+            "google_web_search",
+            "web_fetch",
+        ],
+        filesystem_support="native",
+        models=[
+            "gemini-3.1-pro-preview",
+            "gemini-3-flash-preview",
+            "gemini-2.5-pro",
+            "gemini-2.5-flash",
+            "gemini-2.5-flash-lite",
+        ],
+        default_model="gemini-3.1-pro-preview",
+        env_var="GOOGLE_API_KEY",
+        model_release_dates={
+            "gemini-3.1-pro-preview": "2026-02",
+            "gemini-3-flash-preview": "2025-12",
+            "gemini-2.5-pro": "2025-06",
+            "gemini-2.5-flash": "2025-06",
+            "gemini-2.5-flash-lite": "2025-06",
+        },
+        notes=(
+            "Google Gemini CLI subprocess wrapper. Auth: CLI login (gemini) first, "
+            "then GOOGLE_API_KEY or GEMINI_API_KEY. Native filesystem via CLI. "
+            "Requires: npm install -g @google/gemini-cli. "
+            "Docker mode requires command_line_docker_network_mode."
+        ),
+    ),


⚠️ Potential issue | 🟠 Major

Also classify gemini_cli as an agent-framework backend.

The new registry entry makes gemini_cli a native CLI integration, but AGENT_FRAMEWORK_BACKENDS below still excludes it. is_agent_framework_backend("gemini_cli") therefore returns False, so any code branching on framework/native backends will miss the new backend.

🩹 Suggested follow-up outside this hunk

AGENT_FRAMEWORK_BACKENDS = frozenset( { "claude_code", "codex", "copilot", + "gemini_cli", }, )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@massgen/backend/capabilities.py` around lines 339 - 383, The gemini_cli BackendCapabilities entry is registered as a native CLI integration but not included in the agent-framework list, so is_agent_framework_backend("gemini_cli") returns False; update the classification by adding "gemini_cli" to the AGENT_FRAMEWORK_BACKENDS set (or adjust the is_agent_framework_backend logic) so that the backend named "gemini_cli" (referenced in the BackendCapabilities block) is treated as an agent-framework backend by functions checking AGENT_FRAMEWORK_BACKENDS/is_agent_framework_backend.

coderabbitai · 2026-03-15T04:50:43Z

massgen/backend/copilot.py


        agent_id = kwargs.get("agent_id", self.config.get("agent_id") or "default")
-        model = kwargs.get("model") or self.config.get("model") or "gpt-4.1"
+        model = kwargs.get("model") or self.config.get("model") or "gpt-5-mini"


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

GitHub Copilot SDK gpt-5-mini model availability 2025

💡 Result:

GPT‑5 mini in GitHub Copilot (2025): GitHub began rolling out GPT‑5 mini to GitHub Copilot in public preview on August 13, 2025, appearing in the model picker (VS Code, github.com Copilot Chat, and GitHub Mobile). [1]

GA status: GitHub’s “Supported AI models in Copilot” documentation lists GPT‑5 mini as GA (generally available). [2]

“Copilot SDK” timing vs 2025: The product GitHub calls “Copilot SDK” (language-specific SDKs for programmatic access to Copilot CLI) entered technical preview on January 14, 2026—so it was not available in 2025. [3]

What was available in 2025 under an SDK name: GitHub released the Copilot Language Server SDK on February 10, 2025 (to integrate Copilot into other editors/IDEs), but that’s a different SDK than the 2026 “Copilot SDK.” [4]

Sources: [1]–[4] correspond to the cited links above.

🏁 Script executed:

# Check if the copilot.py file exists fd copilot.py massgen/ # If it exists, examine the context around line 918 if [ -f "massgen/backend/copilot.py" ]; then echo "=== File found, checking size ===" wc -l massgen/backend/copilot.py echo "=== Content around line 918 ===" sed -n '910,925p' massgen/backend/copilot.py fi

Repository: massgen/MassGen

Length of output: 904

🏁 Script executed:

# Search for other model identifier patterns in backend files rg "gpt-.*-mini|model.*=" massgen/backend/ -A 2 -B 2 --type py | head -50

Repository: massgen/MassGen

Length of output: 3468

🏁 Script executed:

# Check capabilities.py to see if copilot backend is registered if [ -f "massgen/backend/capabilities.py" ]; then echo "=== Checking capabilities.py ===" cat -n massgen/backend/capabilities.py | grep -i copilot -A 5 -B 5 fi

Repository: massgen/MassGen

Length of output: 1619

Add gpt-5-mini to the supported models list in massgen/backend/capabilities.py.

While gpt-5-mini is a valid, GA model in GitHub Copilot, it is currently set as the default_model in capabilities.py (line 713) but is missing from the models list (line 710). Update the models list to include "gpt-5-mini" to ensure consistency between the default model and supported models, as required by the coding guideline: "If adding new model support → ensure massgen/backend/capabilities.py is updated."

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@massgen/backend/copilot.py` at line 918, The default model "gpt-5-mini" used in massgen/backend/copilot.py is not present in the supported models list; update the models collection in massgen/backend/capabilities.py by adding "gpt-5-mini" to the models list (the variable named models) so it matches the default_model setting (variable default_model) and ensures the new model is recognized by the capabilities checks.

coderabbitai · 2026-03-15T04:50:43Z

massgen/mcp_tools/native_hook_adapters/copilot_adapter.py

+            # Serialize tool args for MassGen hook interface
+            tool_args = input_data.get("toolArgs", {})
+            arguments_str = json.dumps(tool_args) if tool_args else "{}"
+


⚠️ Potential issue | 🟠 Major

Preserve already-serialized toolArgs.

toolArgs is declared as Any. If Copilot hands this wrapper a JSON string, json.dumps(tool_args) double-encodes it, so downstream hooks receive a quoted string instead of the original object payload and can fail closed on parsing.

🧩 Proposed fix

- tool_args = input_data.get("toolArgs", {}) - arguments_str = json.dumps(tool_args) if tool_args else "{}" + tool_args = input_data.get("toolArgs") + if tool_args in (None, ""): + arguments_str = "{}" + elif isinstance(tool_args, str): + arguments_str = tool_args + else: + arguments_str = json.dumps(tool_args)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@massgen/mcp_tools/native_hook_adapters/copilot_adapter.py` around lines 97 - 100, The current serialization always calls json.dumps on tool_args (in the CopilotAdapter code where tool_args = input_data.get("toolArgs", {}) and arguments_str is set), which double-encodes when tool_args is already a JSON string; change the logic to detect an already-serialized JSON string and preserve it as-is (e.g., if isinstance(tool_args, str) use tool_args for arguments_str, otherwise json.dumps(tool_args)), while still defaulting to "{}" for empty/None values so downstream hooks receive the original object payload rather than a quoted, double-encoded string.

coderabbitai · 2026-03-15T04:50:44Z

massgen/tests/test_config_validator.py

+    def test_copilot_docker_with_network_mode_passes(self):
+        """Copilot in Docker mode with network_mode should pass."""
+        config = {
+            "agent": {
+                "id": "test-agent",
+                "backend": {
+                    "type": "copilot",
+                    "model": "gpt-5-mini",
+                    "command_line_execution_mode": "docker",
+                    "command_line_docker_network_mode": "bridge",
+                },
+            },
+        }
+        validator = ConfigValidator()
+        result = validator.validate_config(config)
+        # Should not have the network_mode error
+        assert not any("copilot" in error.message.lower() and "docker" in error.message.lower() and "network_mode" in error.message.lower() for error in result.errors)
+


⚠️ Potential issue | 🟠 Major

“Passes” test is too weak and can hide unrelated validation failures.

test_copilot_docker_with_network_mode_passes should assert overall validity, not just absence of one substring.

✅ Tighten the success assertion

validator = ConfigValidator() result = validator.validate_config(config) - # Should not have the network_mode error + assert result.is_valid() + # Should not have the network_mode error assert not any("copilot" in error.message.lower() and "docker" in error.message.lower() and "network_mode" in error.message.lower() for error in result.errors)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@massgen/tests/test_config_validator.py` around lines 1776 - 1793, The test test_copilot_docker_with_network_mode_passes is too weak—update it to assert overall validity by calling ConfigValidator().validate_config(config) and asserting result.valid (or that result.errors is empty) instead of only checking that no error message contains specific substrings; locate the test function name test_copilot_docker_with_network_mode_passes and the validate_config call on ConfigValidator to change the final assertion to a strict check like assert result.valid (or assert not result.errors) so unrelated validation failures will fail the test.

coderabbitai · 2026-03-15T04:50:44Z

massgen/tests/test_native_tools_sandbox_script.py

+        ("claude_code", "auto", "direct"),
+        ("codex", "auto", "direct"),
+        ("copilot", "auto", "orchestrator"),
+        ("gemini_cli", "auto", "orchestrator"),


⚠️ Potential issue | 🔴 Critical

CI blocker: gemini_cli auto-runner expectation is currently wrong.

The provided pipeline failure shows resolve_runner("gemini_cli", "auto") returns "direct", but this test expects "orchestrator" (Line 24). Please align either the resolver logic or this expected tuple before merge.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@massgen/tests/test_native_tools_sandbox_script.py` around lines 21 - 24, The test tuple ("gemini_cli", "auto", "orchestrator") is out of sync with the resolver behavior: resolve_runner("gemini_cli", "auto") returns "direct" in CI, so update the test expectation to ("gemini_cli", "auto", "direct") (or if you intend different behavior, change resolve_runner's logic for the "gemini_cli"/"auto" case to return "orchestrator"); reference resolve_runner and the test tuple when making the change so the expectation and implementation match.

coderabbitai · 2026-03-15T04:50:44Z

massgen/tests/test_task_plan_support.py

+class _FakeHost:
+    def __init__(self, active_tasks: list[dict[str, Any]] | None = None) -> None:
+        self._active_tasks = [dict(t) for t in active_tasks] if active_tasks else None
+        self.pinned_updates: list[dict[str, Any]] = []
+        self.plan_updates: list[dict[str, Any]] = []
+
+    def get_active_tasks(self) -> list[dict[str, Any]] | None:
+        if self._active_tasks is None:
+            return None
+        return [dict(t) for t in self._active_tasks]
+
+    def update_pinned_task_plan(
+        self,
+        *,
+        tasks: list[dict[str, Any]],
+        focused_task_id: str | None = None,
+        operation: str = "create",
+        show_notification: bool = True,
+    ) -> None:
+        self.pinned_updates.append(
+            {
+                "tasks": [dict(t) for t in tasks],
+                "focused_task_id": focused_task_id,
+                "operation": operation,
+                "show_notification": show_notification,
+            },
+        )
+        self._active_tasks = [dict(t) for t in tasks]
+
+    def update_task_plan(
+        self,
+        tasks: list[dict[str, Any]],
+        plan_id: str | None = None,
+        operation: str = "create",
+    ) -> None:
+        self.plan_updates.append(
+            {
+                "tasks": [dict(t) for t in tasks],
+                "plan_id": plan_id,
+                "operation": operation,
+            },
+        )
+        self._active_tasks = [dict(t) for t in tasks]
+
+
+def _tool(tool_name: str, result_full: str, tool_id: str = "tool_1") -> SimpleNamespace:
+    return SimpleNamespace(tool_name=tool_name, result_full=result_full, tool_id=tool_id)
+
+
+def test_update_task_plan_from_codex_wrapper_create_task_plan() -> None:
+    host = _FakeHost()
+    result = (
+        "{'content': [{'type': 'text', 'text': "
+        '\'{"success":true,"operation":"create_task_plan","tasks":[{"id":"task_1","description":"Define scope","status":"pending","priority":"medium","dependencies":[],"metadata":{}}]}\''
+        "}], 'structured_content': {'success': True, 'operation': 'create_task_plan', "
+        "'tasks': [{'id': 'task_1', 'description': 'Define scope', 'status': 'pending', "
+        "'priority': 'medium', 'dependencies': [], 'metadata': {}}]}}"
+    )
+    tool_data = _tool("planning_agent_a/create_task_plan", result, tool_id="item_5")
+
+    handled = update_task_plan_from_tool(host, tool_data, timeline=None)
+
+    assert handled is True
+    assert host.pinned_updates, "Expected pinned task plan update"
+    assert host.plan_updates, "Expected host task-plan state update"
+    assert host.pinned_updates[-1]["operation"] == "create"
+    assert host.pinned_updates[-1]["tasks"][0]["id"] == "task_1"
+    assert host.plan_updates[-1]["plan_id"] == "item_5"
+
+
+def test_update_task_plan_from_codex_wrapper_status_update_patches_cached_tasks() -> None:
+    host = _FakeHost(
+        active_tasks=[
+            {
+                "id": "task_1",
+                "description": "Define scope",
+                "status": "pending",
+                "priority": "medium",
+                "dependencies": [],
+                "metadata": {},
+            },
+        ],
+    )
+    result = (
+        "{'content': [{'type': 'text', 'text': "
+        '\'{"success":true,"operation":"update_task_status","task":{"id":"task_1",'
+        '"description":"Define scope","status":"completed","priority":"medium",'
+        '"dependencies":[],"metadata":{"completion_notes":"done"}}}\''
+        "}], 'structured_content': {'success': True, 'operation': 'update_task_status', "
+        "'task': {'id': 'task_1', 'description': 'Define scope', 'status': 'completed', "
+        "'priority': 'medium', 'dependencies': [], 'metadata': {'completion_notes': 'done'}}}}"
+    )
+    tool_data = _tool("planning_agent_a/update_task_status", result, tool_id="item_9")
+
+    handled = update_task_plan_from_tool(host, tool_data, timeline=None)
+
+    assert handled is True
+    assert host.pinned_updates, "Expected pinned task plan update"
+    assert host.pinned_updates[-1]["operation"] == "update"
+    assert host.pinned_updates[-1]["focused_task_id"] == "task_1"
+    assert host.pinned_updates[-1]["tasks"][0]["status"] == "completed"
+    assert host.plan_updates[-1]["operation"] == "update"


🛠️ Refactor suggestion | 🟠 Major

Add Google-style docstrings to new methods/functions in this module.

_FakeHost methods, _tool, and both test functions are new but undocumented.

As per coding guidelines: "**/*.py: For new or changed functions, include Google-style docstrings".

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@massgen/tests/test_task_plan_support.py` around lines 12 - 113, The tests and helper class lack Google-style docstrings: add concise Google-style docstrings to class _FakeHost and its methods get_active_tasks, update_pinned_task_plan, update_task_plan, the helper function _tool, and the two test functions test_update_task_plan_from_codex_wrapper_create_task_plan and test_update_task_plan_from_codex_wrapper_status_update_patches_cached_tasks; each docstring should describe purpose, parameters (types), return value, and any important behavior (e.g., that _FakeHost stores updates in pinned_updates/plan_updates and that _tool returns a SimpleNamespace mimicking tool data).

coderabbitai · 2026-03-15T04:50:44Z

scripts/test_hook_backends.py

+        if hasattr(backend, "set_general_hook_manager"):
+            backend.set_general_hook_manager(manager)



⚠️ Potential issue | 🟠 Major

The Copilot/Gemini "unit" path can false-pass even if native hooks are broken.

For those backends, Test 2-4 skip set_general_hook_manager() and then call GeneralHookManager.execute_hooks(...) directly, so a broken CopilotNativeHookAdapter or Gemini CLI IPC path would still report PASS here. run_e2e_tests() already has backend-specific branches; run_tests() needs the same kind of native-path coverage or should skip these checks for native-hook backends.

Based on learnings, backend-specific tooling changes should not assume the shared hook path covers native backends; parity tests need to hit the native execution path.

Also applies to: 369-370, 457-458, 540-555, 567-588

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scripts/test_hook_backends.py` around lines 303 - 305, The tests call GeneralHookManager.execute_hooks directly for some backends causing false-passes for native paths; update run_tests to mirror run_e2e_tests backend-specific branching: detect native-hook backends (e.g., CopilotNativeHookAdapter, Gemini CLI IPC) and either (A) call backend.set_general_hook_manager(manager) and invoke the backend's native execute path so tests exercise the native IPC/adapter code, or (B) explicitly skip the shared-path parity checks for those backends and mark them to run native-path checks instead; modify the run_tests branches at the locations around set_general_hook_manager/GeneralHookManager.execute_hooks (also apply same change at the other noted regions) so parity tests hit the native execution path rather than only the shared GeneralHookManager.

coderabbitai · 2026-03-15T04:50:44Z

scripts/test_native_tools_sandbox.py

+Run:
+  uv run python scripts/test_native_tools_sandbox.py [--backend TYPE] [--runner TYPE] [--llm-judge]

 Options:
-  --backend      Backend to test: claude_code (default) or codex
+  --backend      Backend to test: claude_code, codex, copilot, or gemini_cli
+  --runner       Runner to test: auto, direct, or orchestrator
+                 auto uses orchestrator for copilot/gemini_cli and direct otherwise
  --llm-judge    Use LLM to analyze responses for subtle leakage (requires OPENAI_API_KEY)


⚠️ Potential issue | 🟠 Major

Honor the documented auto runner behavior for Gemini CLI.

The header and CLI help say auto uses the orchestrator for both Copilot and Gemini CLI, but resolve_runner() only special-cases Copilot. Right now --backend gemini_cli --runner auto silently falls back to direct, so this script misses the intended path.

Suggested change

- if backend_type == "copilot": + if backend_type in {"copilot", "gemini_cli"}: return "orchestrator"

Also applies to: 91-99, 524-535

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@scripts/test_native_tools_sandbox.py` around lines 13 - 20, The CLI's auto-runner logic ignores the documented behavior for Gemini CLI because resolve_runner() only special-cases "copilot"; update resolve_runner() (and the other places with the same logic around the reported ranges) so that when runner == "auto" and backend is either "copilot" or "gemini_cli" it returns/chooses the "orchestrator" runner instead of falling back to "direct"; find the conditional that checks backend == "copilot" and extend it to include backend == "gemini_cli" (or check membership in a set like {"copilot","gemini_cli"}) so auto maps to orchestrator for both backends.

praneeth999 and others added 29 commits March 7, 2026 00:07

feat(gemini-cli): add Gemini CLI backend and docs/config updatesAdd s…

0c7915e

…upport for a new gemini_cli backend: Add Backend

feat(gemini-cli): add Gemini CLI backend and validations

9f26a3e

Test Cases

fix(orchestrator): handle fatal backend errors

40fe197

- yields an "error" event with the raw error message instead of a content message, so callers can handle fatal conditions separately.

feat(backends): add Gemini CLI backend capabilities

7c9ace6

feat(gemini): add Gemini CLI backend with streaming subprocess support

0924d1f

Add hooks, permissions to copilot

0f2284c

Update gemini & copilot permissions and skills

f1ce0f5

feat(gemini-cli): add Gemini CLI backend and docs/config updatesAdd s…

efba6be

…upport for a new gemini_cli backend: Add Backend

feat(gemini-cli): add Gemini CLI backend and validations

14a2ac5

Test Cases

fix(orchestrator): handle fatal backend errors

1781097

- yields an "error" event with the raw error message instead of a content message, so callers can handle fatal conditions separately.

feat(backends): add Gemini CLI backend capabilities

f47b6c8

feat(gemini): add Gemini CLI backend with streaming subprocess support

00fb526

Pre-commit

cbd0d59

test(tests): skip Windows-only path tests on non-Windows

2a15c89

fix(tests)

650e8c5

coderabit comment resolve

29fab2c

Hook testing

0ff66f8

feat(gemini-cli): add Gemini CLI backend and docs/config updatesAdd s…

5d135c8

…upport for a new gemini_cli backend: Add Backend

feat(gemini-cli): add Gemini CLI backend and validations

05489e2

Test Cases

fix(orchestrator): handle fatal backend errors

7ca32cf

- yields an "error" event with the raw error message instead of a content message, so callers can handle fatal conditions separately.

feat(backends): add Gemini CLI backend capabilities

eb9c2fa

feat(gemini): add Gemini CLI backend with streaming subprocess support

ebbf0d3

Pre-commit

8fb489f

test(tests): skip Windows-only path tests on non-Windows

40424a5

fix(tests)

6544738

coderabit comment resolve

c2b8abe

Try for hook and filesystem

9783436

Complete merge

4ba0ad7

Add test

4e3d214

coderabbitai bot reviewed Mar 15, 2026

View reviewed changes

ncrispino mentioned this pull request Mar 16, 2026

feat: add Gemini CLI backend #980

Closed

9 tasks

Gemini cli tools and docker

fc5bacb

ncrispino changed the base branch from main to dev/v0.1.64 March 16, 2026 15:23

Copilot docker yaml

268380e

ncrispino merged commit 0c09bd7 into dev/v0.1.64 Mar 16, 2026
15 of 19 checks passed

coderabbitai bot mentioned this pull request Mar 16, 2026

feat: v0.1.64 #998

Merged

18 tasks

		if hasattr(backend, "set_general_hook_manager"):
		backend.set_general_hook_manager(manager)

Conversation

ncrispino commented Mar 15, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Title Format

Description

Type of change

Checklist

Pre-commit status

How to Test

Test CLI Command

Expected Results

Additional context

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ncrispino commented Mar 15, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 15, 2026 •

edited

Loading