Skip to content

feat: Gemini cli clean#999

Merged
ncrispino merged 31 commits intodev/v0.1.64from
gemini_cli_clean
Mar 16, 2026
Merged

feat: Gemini cli clean#999
ncrispino merged 31 commits intodev/v0.1.64from
gemini_cli_clean

Conversation

@ncrispino
Copy link
Collaborator

@ncrispino ncrispino commented Mar 15, 2026

PR Title Format

Your PR title must follow the format: <type>: <brief description>

Valid types:

  • fix: - Bug fixes
  • feat: - New features
  • breaking: - Breaking changes
  • docs: - Documentation updates
  • refactor: - Code refactoring
  • test: - Test additions/modifications
  • chore: - Maintenance tasks
  • perf: - Performance improvements
  • style: - Code style changes
  • ci: - CI/CD configuration changes

Examples:

  • fix: resolve memory leak in data processing
  • feat: add export to CSV functionality
  • breaking: change API response format
  • docs: update installation guide

Description

Brief description of the changes in this PR

Type of change

  • Bug fix (fix:) - Non-breaking change which fixes an issue
  • New feature (feat:) - Non-breaking change which adds functionality
  • Breaking change (breaking:) - Fix or feature that would cause existing functionality to not work as expected
  • Documentation (docs:) - Documentation updates
  • Code refactoring (refactor:) - Code changes that neither fix a bug nor add a feature
  • Tests (test:) - Adding missing tests or correcting existing tests
  • Chore (chore:) - Maintenance tasks, dependency updates, etc.
  • Performance improvement (perf:) - Code changes that improve performance
  • Code style (style:) - Changes that do not affect the meaning of the code (formatting, missing semi-colons, etc.)
  • CI/CD (ci:) - Changes to CI/CD configuration files and scripts

Checklist

  • I have run pre-commit on my changed files and all checks pass
  • My code follows the style guidelines of this project
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Pre-commit status

# Paste the output of running pre-commit on your changed files:
# uv run pre-commit install
# git diff --name-only HEAD~1 | xargs uv run pre-commit run --files # for last commit
# git diff --name-only origin/<base branch>...HEAD | xargs uv run pre-commit run --files # for all commits in PR
# git add <your file> # if any fixes were applied
# git commit -m "chore: apply pre-commit fixes"
# git push origin <branch-name>

How to Test

Add test method for this PR.

Test CLI Command

Write down the test bash command. If there is pre-requests, please emphasize.

Expected Results

Description/screenshots of expected results.

Additional context

Add any other context about the PR here.

Summary by CodeRabbit

  • New Features

    • Added Gemini CLI backend and Copilot backend support (local & Docker modes), including native hook integrations and workspace IPC.
  • Improvements

    • Enhanced permission and hook plumbing for safer file/shell operations and Docker-aware tool exclusion.
    • Docker-mode validation expanded to Copilot and Gemini CLI.
  • Documentation

    • Updated backend reference and user guides with Gemini CLI and Copilot configuration, auth, and Docker examples.
  • Tests

    • Extensive unit/integration tests added for Gemini CLI, Copilot, hooks, permissions, and orchestrator behavior.

praneeth999 and others added 29 commits March 7, 2026 00:07
…upport for a new gemini_cli backend:

Add Backend
- yields an "error" event with the raw error message instead of a content message, so callers can handle fatal conditions separately.
…upport for a new gemini_cli backend:

Add Backend
- yields an "error" event with the raw error message instead of a content message, so callers can handle fatal conditions separately.
…upport for a new gemini_cli backend:

Add Backend
- yields an "error" event with the raw error message instead of a content message, so callers can handle fatal conditions separately.
@coderabbitai
Copy link

coderabbitai bot commented Mar 15, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 59ca0d50-820b-4df8-b57b-2b2983fcfacf

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a new Gemini CLI backend (gemini_cli) with workspace/hook IPC, Docker and local subprocess modes, and MCP/workflow support; extends Copilot backend with Docker-aware permission hooks; introduces native hook adapters for Copilot and Gemini CLI; updates capabilities, CLI wiring, filesystem/docker mounts, validation, docs, and many tests.

Changes

Cohort / File(s) Summary
Backends: new/exports
massgen/backend/gemini_cli.py, massgen/backend/__init__.py, massgen/backend/capabilities.py, massgen/backend/copilot.py
Add GeminiCLIBackend; export it; register gemini_cli capabilities; enhance CopilotBackend with Docker mode, permission hooks, path/command extraction, and session signature writable_paths.
Native hook adapters & hook script
massgen/mcp_tools/native_hook_adapters/gemini_cli_adapter.py, massgen/mcp_tools/native_hook_adapters/gemini_cli_hook_script.py, massgen/mcp_tools/native_hook_adapters/copilot_adapter.py, massgen/mcp_tools/native_hook_adapters/__init__.py
Add GeminiCLINativeHookAdapter (file IPC payloads, settings.json hooks), hook subprocess script for BeforeTool/AfterTool with permission manifest enforcement, and CopilotNativeHookAdapter; expose adapters and SDK-availability helpers.
CLI / config / validation
massgen/cli.py, massgen/config_validator.py, massgen/configs/providers/gemini/*.yaml
Wire gemini_cli into create_backend, agent creation, provider_map, and CLI args; add Docker network_mode validation for copilot and gemini_cli; add example local/docker configs.
Filesystem / Docker manager
massgen/filesystem_manager/_constants.py, massgen/filesystem_manager/_filesystem_manager.py, massgen/filesystem_manager/_docker_manager.py, massgen/filesystem_manager/_path_permission_manager.py
Treat .gemini as metadata/critical/excluded directory; add mount_gemini_config to Docker manager to optionally mount ~/.gemini; PathPermissionManagerHook now inherits PatternHook.
Orchestrator & token mapping
massgen/orchestrator.py, massgen/token_manager/token_manager.py
Large orchestrator changes for answer normalization/versioning, partial/final result methods and timeouts; add gemini_cli provider alias mappings.
Docs & SKILL
docs/source/...yaml_schema.rst, docs/source/user_guide/backends.rst, massgen/skills/backend-integrator/SKILL.md
Document gemini_cli backend, Docker mode examples, MCP/hook notes, and update backend catalogs and SDK/Docker guidance.
Tests (many)
massgen/tests/... (numerous files, incl. copilot and extensive gemini_cli unit/integration/live tests)
Add/extend comprehensive unit, integration, and live tests for Gemini CLI backend, hook IPC/script, Copilot permission/hooks, orchestrator enforcement, Docker-mode behavior, and many helper flows.
Scripts / sandbox / misc
scripts/test_hook_backends.py, scripts/test_native_tools_sandbox.py, pyproject.toml, massgen/docker/*, massgen/token_manager/*
Extend test harness and sandbox runner to support copilot & gemini_cli; add pytest norecursedirs; install @google/gemini-cli in Dockerfiles; token manager provider aliases updated.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant GeminiBackend as GeminiCLIBackend
    participant GeminiCLI as Gemini CLI
    participant HookIPC as .gemini/hook_payload.json
    participant HookScript as gemini_cli_hook_script.py

    Client->>GeminiBackend: stream_with_tools(messages, tools)
    GeminiBackend->>GeminiBackend: write workspace settings.json (hooks)
    GeminiBackend->>GeminiCLI: spawn CLI subprocess (stdin)
    Note over GeminiCLI,HookScript: CLI invokes BeforeTool/AfterTool hooks
    GeminiCLI->>HookScript: execute hook (stdin: tool_event)
    HookScript->>HookIPC: read/write hook_payload.json (permission/injection)
    HookScript-->>GeminiCLI: emit hook response (approved/denied/inject)
    GeminiCLI-->>GeminiBackend: stdout JSON stream events
    GeminiBackend->>Client: yield parsed StreamChunk(s)
Loading
sequenceDiagram
    participant Client
    participant CopilotBackend
    participant CopilotSDK as Copilot SDK
    participant PPM as PathPermissionManager
    participant MCP as MCP (container)

    Client->>CopilotBackend: stream_with_tools(...)
    CopilotBackend->>CopilotBackend: build session_hooks (on_pre_tool_use)
    CopilotBackend->>CopilotSDK: create_session(hooks)
    CopilotSDK->>CopilotBackend: on_pre_tool_use(request)
    CopilotBackend->>PPM: pre_tool_use_hook(path/command)
    PPM-->>CopilotBackend: allow/deny
    alt denied
        CopilotBackend-->>CopilotSDK: deny decision
    else allowed
        CopilotBackend-->>CopilotSDK: allow (or route tool to MCP if docker-mode)
    end
    CopilotSDK->>Client: stream results
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • a5507203
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch gemini_cli_clean
📝 Coding Plan
  • Generate coding plan for human review comments

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

Note

Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.

🟡 Minor comments (13)
massgen/tests/test_concurrent_logging_sessions.py-410-420 (1)

410-420: ⚠️ Potential issue | 🟡 Minor

Add a Google-style docstring to _slow_save.

_slow_save is a changed function and should include a Google-style docstring for maintainability and guideline compliance.

Suggested patch
         def _slow_save(self, data):
+            """Persist registry data in two chunks with a delay.
+
+            Args:
+                self: SessionRegistry instance.
+                data: Registry payload to serialize and persist.
+            """
             payload = json.dumps(data, indent=2)
             midpoint = max(1, len(payload) // 2)
             tmp_file = self.registry_path.with_suffix(".slow_tmp")
             with open(tmp_file, "w") as f:
                 f.write(payload[:midpoint])
                 f.flush()
                 time.sleep(0.002)
                 f.write(payload[midpoint:])
                 f.flush()
             tmp_file.replace(self.registry_path)

As per coding guidelines: **/*.py: For new or changed functions, include Google-style docstrings.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/tests/test_concurrent_logging_sessions.py` around lines 410 - 420,
Add a Google-style docstring to the _slow_save function describing its purpose,
parameters and side effects: explain that it serializes the passed data to JSON
with indentation, writes the first half to a temporary file, sleeps to simulate
slowness, writes the rest, and atomically replaces the registry file; document
the parameter `data` (type: any/serializable), note that it returns None and may
raise IO/OSError on file operations, and include a short example or note about
the temporary file name if helpful.
massgen/tests/test_task_plan_support.py-13-15 (1)

13-15: ⚠️ Potential issue | 🟡 Minor

Preserve empty-list semantics in _FakeHost.__init__.

if active_tasks turns [] into None, so tests can’t represent a valid “empty task plan” state.

Suggested fix
-        self._active_tasks = [dict(t) for t in active_tasks] if active_tasks else None
+        self._active_tasks = (
+            [dict(t) for t in active_tasks] if active_tasks is not None else None
+        )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/tests/test_task_plan_support.py` around lines 13 - 15, The
constructor for _FakeHost incorrectly treats an empty list as None by using "if
active_tasks"; change the check to "if active_tasks is not None" so that an
explicit empty list is preserved, and keep the defensive copy logic
(self._active_tasks = [dict(t) for t in active_tasks] if active_tasks is not
None else None) so callers can represent a valid empty-task-plan state; update
the __init__ in _FakeHost accordingly.
massgen/tests/test_task_plan_support.py-108-113 (1)

108-113: ⚠️ Potential issue | 🟡 Minor

Assert plan_updates is populated before indexing.

Add an explicit check so failures are diagnostic instead of IndexError.

Suggested fix
     assert handled is True
     assert host.pinned_updates, "Expected pinned task plan update"
+    assert host.plan_updates, "Expected host task-plan state update"
     assert host.pinned_updates[-1]["operation"] == "update"
     assert host.pinned_updates[-1]["focused_task_id"] == "task_1"
     assert host.pinned_updates[-1]["tasks"][0]["status"] == "completed"
     assert host.plan_updates[-1]["operation"] == "update"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/tests/test_task_plan_support.py` around lines 108 - 113, Add an
explicit assertion that host.plan_updates is populated before the test indexes
its last element: after asserting host.pinned_updates and before accessing
host.plan_updates[-1], assert that host.plan_updates (or len(host.plan_updates)
> 0) is truthy so failures report a clear assertion instead of raising
IndexError when the list is empty; update the test in test_task_plan_support.py
where variables handled and host are used to include this check.
massgen/skills/backend-integrator/SKILL.md-669-669 (1)

669-669: ⚠️ Potential issue | 🟡 Minor

Fix markdown formatting issues.

Static analysis flagged several formatting violations:

  • Tables at lines 669, 711, 729, 735 need blank lines before and after
  • Fenced code block at line 951 needs a language specifier
📝 Proposed fixes

Add blank lines around tables and specify language for code blocks:

 **Docker mode**: Skip backend sandbox config — the container IS the sandbox. Grant full access inside the container.
+
 ### SDK Wrapper Backend (like Claude Code, Copilot)
-```
+
+```python
 __init__: import SDK, configure options, detect Docker mode

Apply similar blank line additions around the tables at lines 669, 711, 729, and 735.

Also applies to: 711-711, 729-729, 735-735, 951-951

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/skills/backend-integrator/SKILL.md` at line 669, The markdown has
tables that lack surrounding blank lines and a fenced code block without a
language; add a blank line before and after each table occurrence of the header
row "| Section | Gate |" (and the other two table blocks referenced) so they are
separated from adjacent paragraphs, and change the fenced code block that
contains "__init__: import SDK, configure options, detect Docker mode" to use a
language specifier (e.g., ```python) on the opening fence; update the three
other table occurrences noted (around the same table content) similarly to
ensure proper Markdown rendering.
docs/source/user_guide/backends.rst-156-165 (1)

156-165: ⚠️ Potential issue | 🟡 Minor

gemini_cli is misclassified as built-in code execution here.

The rest of this page treats Gemini CLI as native shell/filesystem access, not provider-side enable_code_execution. Keeping ⭐ in the Code Execution column implies that enable_code_execution is valid for this backend when the docs elsewhere only describe shell/bash support.

As per coding guidelines, "Behavior descriptions must match what the code does."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/source/user_guide/backends.rst` around lines 156 - 165, The table entry
for gemini_cli incorrectly marks Code Execution as supported (⭐); update the
backends.rst table row for gemini_cli to remove or replace the Code Execution
star so it reflects native shell/filesystem behavior described elsewhere (i.e.,
do not advertise provider-side enable_code_execution for gemini_cli); ensure the
row now matches the documentation narrative and any other mentions of gemini_cli
and enable_code_execution across the docs.
docs/source/reference/yaml_schema.rst-793-797 (1)

793-797: ⚠️ Potential issue | 🟡 Minor

This row shifted the backend note into the Required column.

Line 795 now reads like command_line_docker_network_mode is “required: Codex/Gemini CLI”, while Line 796 still says “All with MCP support”. Please keep the Required cell as No and move the backend-specific constraint into the Supported Backends/Description column.

As per coding guidelines, "Ensure proper RST syntax."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/source/reference/yaml_schema.rst` around lines 793 - 797, The table row
for `command_line_docker_network_mode` incorrectly put the backend-specific note
into the "Required" column; change the "Required" cell back to "No" and move the
backend constraint ("Codex, Gemini CLI (Docker mode)") and the note about Docker
network mode into the Supported Backends/Description column so the row reads:
name `command_line_docker_network_mode`, type `string`, supported backends
`Codex, Gemini CLI (Docker mode)` plus the description "Docker network mode:
'bridge', 'host', 'none' (use 'bridge'). All with MCP support", and ensure the
RST table cell separators/inline code markup
(``command_line_docker_network_mode`` and quoted modes) follow proper RST
syntax.
docs/source/user_guide/backends.rst-64-66 (1)

64-66: ⚠️ Potential issue | 🟡 Minor

Remove the duplicated codex backend row.

codex is already listed earlier in this table, so this second entry makes the backend catalog internally inconsistent and likely displaced the backend that was meant to be added here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/source/user_guide/backends.rst` around lines 64 - 66, Remove the
duplicated backend row that contains "* - ``codex``" in the table: delete that
second "codex" entry so the table remains consistent, ensure the subsequent rows
(the OpenAI (CLI) / GPT-5.4, GPT-5.3-Codex, GPT-5.2-Codex, GPT-5.1-Codex,
GPT-4.1 lines) are correctly aligned under the intended columns, and verify no
other backend was inadvertently displaced by this duplicate removal.
massgen/tests/integration/test_copilot_hook_orchestrator.py-15-24 (1)

15-24: ⚠️ Potential issue | 🟡 Minor

Mark this module as an integration test.

Because this file lives under massgen/tests/integration/ but none of the tests are tagged integration, it won't participate correctly in the repo's opt-in integration test flow.

Suggested change
 import pytest
 
+pytestmark = pytest.mark.integration
+
 from massgen.mcp_tools.hooks import (
As per coding guidelines, `{massgen/tests/**/*.py,webui/src/**/*.test.ts}`: Use `@pytest.mark.integration` (opt-in: `--run-integration`), `@pytest.mark.live_api` (opt-in: `--run-live-api`), `@pytest.mark.expensive`, `@pytest.mark.docker`, and `@pytest.mark.asyncio` markers when writing tests.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/tests/integration/test_copilot_hook_orchestrator.py` around lines 15
- 24, This integration test module isn't tagged for the opt-in integration test
run; add a module-level marker by defining pytestmark = pytest.mark.integration
(since pytest is already imported) at the top of the file so pytest treats all
tests in this module as integration tests; modify or add the pytestmark variable
near the existing imports and leave individual test functions (e.g., any tests
using GeneralHookManager, CopilotNativeHookAdapter) unchanged.
scripts/test_hook_backends.py-152-155 (1)

152-155: ⚠️ Potential issue | 🟡 Minor

Preserve exception context using chaining.

The ImportError should be chained to the new exception for better debuggability. This aligns with the codebase convention, which uses exception chaining throughout (e.g., in massgen/__init__.py, massgen/cli.py, massgen/orchestrator.py, and others).

Suggested change
-        except ImportError:
-            raise ValueError("massgen.backend.copilot not available")
+        except ImportError as err:
+            raise ValueError("massgen.backend.copilot not available") from err
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/test_hook_backends.py` around lines 152 - 155, The import failure for
COPILOT_SDK_AVAILABLE and CopilotBackend is re-raised as a ValueError without
preserving the original ImportError; modify the except block to capture the
ImportError (e.g., except ImportError as e) and re-raise the ValueError using
exception chaining (raise ValueError("massgen.backend.copilot not available")
from e) so the original ImportError context is preserved for debugging.
massgen/orchestrator.py-9701-9703 (1)

9701-9703: ⚠️ Potential issue | 🟡 Minor

Use Google-style docstring for the new helper method.

_coerce_answer_content_to_text is a new changed function but currently has a one-line docstring. Please convert to Google-style with Args: and Returns:.

As per coding guidelines, **/*.py: “For new or changed functions, include Google-style docstrings”.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/orchestrator.py` around lines 9701 - 9703, The docstring for the
helper method _coerce_answer_content_to_text should be expanded to a
Google-style docstring: add a short summary line, an Args: section describing
the content parameter (type Any and what inputs are expected/handled), and a
Returns: section specifying it returns a str (normalized plain-text answer).
Keep description concise and place it immediately under the def for
_coerce_answer_content_to_text.
massgen/filesystem_manager/_docker_manager.py-115-115 (1)

115-115: ⚠️ Potential issue | 🟡 Minor

Please document gemini_config in the constructor credential-mount list.

The feature is implemented, but the credentials.mount option list in the constructor docstring still omits it.

📝 Suggested docstring update
             credentials: Credential management configuration:
                 mount: List of credential types to mount
-                    ["ssh_keys", "git_config", "gh_config", "npm_config", "pypi_config", "claude_config", "codex_config"]
+                    ["ssh_keys", "git_config", "gh_config", "npm_config", "pypi_config", "claude_config", "codex_config", "gemini_config"]

Also applies to: 384-393

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/filesystem_manager/_docker_manager.py` at line 115, Update the
constructor docstring for the class that sets self.mount_gemini_config to
document the new credentials.mount option "gemini_config": locate the
constructor docstring where the credentials.mount list/options are documented
and add "gemini_config" with a brief description (e.g., mounts Gemini
configuration files into the container), and mirror this update in the other
constructor/docstring block referenced around the alternative constructor or
overload (the region corresponding to lines ~384-393) so both places list
"gemini_config" alongside the existing mount options; ensure the term matches
the code symbol mount_gemini_config and that the description is concise and
consistent.
massgen/mcp_tools/native_hook_adapters/gemini_cli_hook_script.py-231-252 (1)

231-252: ⚠️ Potential issue | 🟡 Minor

Remove duplicate "directory" key in path extraction tuple.

The key "directory" appears twice in the tuple (lines 236 and 246), making the second occurrence dead code.

🐛 Proposed fix
     for key in (
         "file_path",
         "absolute_path",
         "path",
         "dir_path",
         "directory",
         "filename",
         "file",
         "notebook_path",
         "target",
         "source",
         "source_path",
         "destination",
         "destination_path",
         "destination_base_path",
-        "directory",
         "dir",
     ):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/mcp_tools/native_hook_adapters/gemini_cli_hook_script.py` around
lines 231 - 252, The tuple of candidate keys used to extract a path from
tool_args contains a duplicate "directory" entry; update the key sequence in the
loop that checks tool_args.get(key) (the tuple containing "file_path",
"absolute_path", ..., "directory", "dir") to remove the duplicated "directory"
so each lookup key is unique, leaving only one "directory" in the list used by
the for key in (...) loop and keeping the subsequent logic (isinstance(value,
str) and value) unchanged.
massgen/tests/test_gemini_cli_backend.py-618-658 (1)

618-658: ⚠️ Potential issue | 🟡 Minor

Add @pytest.mark.skipif for Windows-specific test causing pipeline failure.

The test test_windows_env_keeps_path_when_node_already_resolved patches os.name to "nt" but doesn't skip on non-Windows systems. Unlike test_windows_cmd_wrapper_rewrites_to_direct_node_launch (line 521), this test lacks the skipif decorator, causing NotImplementedError: cannot instantiate 'WindowsPath' on Linux/macOS CI.

🐛 Proposed fix
+    `@pytest.mark.skipif`(
+        sys.platform != "win32",
+        reason="WindowsPath cannot be instantiated on non-Windows",
+    )
     def test_windows_env_keeps_path_when_node_already_resolved(self, backend):
         """Do not prepend node path when node is already resolvable."""
         backend.api_key = None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/tests/test_gemini_cli_backend.py` around lines 618 - 658, Add a
pytest.skipif decorator to the
test_windows_env_keeps_path_when_node_already_resolved test so it only runs on
Windows (sys.platform == "win32"), mirroring the existing skip on
test_windows_env_adds_node_path_when_missing; update the test function
declaration for test_windows_env_keeps_path_when_node_already_resolved and
ensure it uses the same condition and reason text, leaving the patches that set
os.name to "nt" and the call to backend._build_subprocess_env unchanged.
🧹 Nitpick comments (18)
massgen/tests/test_config_validator.py (1)

1759-1793: Add the same Docker-network test pair for gemini_cli for parity.

config_validator.py added identical Docker-mode requirements for gemini_cli, but this test block only covers copilot.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/tests/test_config_validator.py` around lines 1759 - 1793, Add a
parallel test pair for the gemini_cli backend mirroring
test_copilot_docker_requires_network_mode and
test_copilot_docker_with_network_mode_passes: create two tests (e.g.,
test_gemini_cli_docker_requires_network_mode and
test_gemini_cli_docker_with_network_mode_passes) that build configs with
agent.backend.type set to "gemini_cli", command_line_execution_mode "docker",
and verify ConfigValidator().validate_config(config) returns an invalid result
when command_line_docker_network_mode is missing and no such network_mode error
when command_line_docker_network_mode is provided; use the same error-message
predicate as the copilot tests (checking "gemini_cli" or "gemini" as appropriate
along with "docker" and "network_mode") against result.errors.
massgen/config_validator.py (1)

606-639: Consider consolidating duplicate Docker-network validation blocks.

codex, copilot, and gemini_cli now share the same requirement. A single set-based check would reduce drift risk.

♻️ Small DRY refactor
-        # Validate Codex Docker mode requirements
-        if backend_type == "codex":
-            execution_mode = backend_config.get("command_line_execution_mode")
-            if execution_mode == "docker":
-                # command_line_docker_network_mode is required for Codex in Docker mode
-                if "command_line_docker_network_mode" not in backend_config:
-                    result.add_error(
-                        "Codex backend in Docker mode requires 'command_line_docker_network_mode'",
-                        f"{location}.command_line_docker_network_mode",
-                        "Add 'command_line_docker_network_mode: bridge' (required for Codex Docker execution)",
-                    )
-
-        # Validate Copilot Docker mode requirements
-        if backend_type == "copilot":
-            execution_mode = backend_config.get("command_line_execution_mode")
-            if execution_mode == "docker":
-                if "command_line_docker_network_mode" not in backend_config:
-                    result.add_error(
-                        "Copilot backend in Docker mode requires 'command_line_docker_network_mode'",
-                        f"{location}.command_line_docker_network_mode",
-                        "Add 'command_line_docker_network_mode: bridge' (required for Copilot Docker execution)",
-                    )
-
-        # Validate Gemini CLI Docker mode requirements
-        if backend_type == "gemini_cli":
-            execution_mode = backend_config.get("command_line_execution_mode")
-            if execution_mode == "docker":
-                if "command_line_docker_network_mode" not in backend_config:
-                    result.add_error(
-                        "Gemini CLI backend in Docker mode requires 'command_line_docker_network_mode'",
-                        f"{location}.command_line_docker_network_mode",
-                        "Add 'command_line_docker_network_mode: bridge' (required for Gemini CLI Docker execution)",
-                    )
+        docker_network_required_backends = {"codex", "copilot", "gemini_cli"}
+        if backend_type in docker_network_required_backends:
+            if backend_config.get("command_line_execution_mode") == "docker":
+                if "command_line_docker_network_mode" not in backend_config:
+                    label = backend_type.replace("_", " ").title().replace("Cli", "CLI")
+                    result.add_error(
+                        f"{label} backend in Docker mode requires 'command_line_docker_network_mode'",
+                        f"{location}.command_line_docker_network_mode",
+                        f"Add 'command_line_docker_network_mode: bridge' (required for {label} Docker execution)",
+                    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/config_validator.py` around lines 606 - 639, The three nearly
identical Docker-network validation blocks for backend_type ==
"codex"/"copilot"/"gemini_cli" should be consolidated: replace them with one
check that tests if backend_type is in the set {"codex", "copilot",
"gemini_cli"}, reads execution_mode =
backend_config.get("command_line_execution_mode"), and if execution_mode ==
"docker" ensures "command_line_docker_network_mode" exists on backend_config and
calls result.add_error with the same location and hint; use backend_type in the
error text or a single generic message to preserve context and keep references
to backend_type, backend_config, execution_mode, result.add_error, and location
to find and update the code.
massgen/tests/integration/test_copilot_hook_orchestrator.py (1)

85-112: Avoid bypassing CopilotBackend.__init__ in the integration helper.

Constructing the backend with __new__ and hand-populating private fields can let init-time wiring regressions slip through. A smoke test that patches the SDK boundary and calls the real constructor would cover adapter setup more realistically.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/tests/integration/test_copilot_hook_orchestrator.py` around lines 85
- 112, Replace the __new__ bypass in _make_copilot_backend with a real
construction path that calls CopilotBackend.__init__ under the patched copilot
SDK: inside the patch.dict block import CopilotBackend and invoke its
constructor (e.g., CopilotBackend(...) with a minimal/mock config or required
args) instead of using CopilotBackend.__new__ and manually setting private
attributes; ensure you still patch or inject a mock client/filesystem_manager as
needed so the real __init__ runs and the call to
backend._init_native_hook_adapter (and other init wiring) happens naturally.
massgen/tests/test_docker_skills_write_access.py (1)

55-55: Add a dedicated positive test for mount_gemini_config.

Good fixture parity change. Consider adding a test_build_credential_mounts_supports_gemini_config case (mirroring Claude/Codex tests) to lock in the new behavior.

➕ Suggested test addition
+def test_build_credential_mounts_supports_gemini_config(docker_manager, tmp_path, monkeypatch):
+    """gemini_config mount should map ~/.gemini into /home/massgen/.gemini."""
+    fake_home = tmp_path / "home"
+    gemini_dir = fake_home / ".gemini"
+    gemini_dir.mkdir(parents=True)
+    docker_manager.mount_gemini_config = True
+
+    monkeypatch.setattr("massgen.filesystem_manager._docker_manager.Path.home", lambda: fake_home)
+
+    mounts = docker_manager._build_credential_mounts()
+    host_path = str(gemini_dir.resolve())
+
+    assert host_path in mounts
+    assert mounts[host_path]["bind"] == "/home/massgen/.gemini"
+    assert mounts[host_path]["mode"] == "ro"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/tests/test_docker_skills_write_access.py` at line 55, Add a positive
unit test named test_build_credential_mounts_supports_gemini_config that mirrors
the Claude/Codex tests: instantiate or use the existing manager fixture, set
manager.mount_gemini_config = True, call the method that builds credential
mounts (e.g., manager.build_credential_mounts or the equivalent function used
elsewhere in the file), and assert that the returned mounts include the Gemini
config mount (validate by presence of the expected mount entry or path). Ensure
the test uses the same setup/teardown patterns as the other tests in this file
and references manager.mount_gemini_config and the build_credential_mounts call
so the behavior is locked in.
massgen/tests/integration/test_orchestrator_stream_enforcement.py (1)

44-45: Add @pytest.mark.integration marker.

This integration test is missing the @pytest.mark.integration marker. Per coding guidelines, integration tests should use this marker for opt-in execution with --run-integration.

💡 Suggested fix
+@pytest.mark.integration
 `@pytest.mark.asyncio`
 async def test_stream_agent_execution_coerces_structured_new_answer_content(mock_orchestrator):

As per coding guidelines: "Use @pytest.mark.integration (opt-in: --run-integration)... markers when writing tests".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/tests/integration/test_orchestrator_stream_enforcement.py` around
lines 44 - 45, The test function
test_stream_agent_execution_coerces_structured_new_answer_content is missing the
integration marker; add the `@pytest.mark.integration` decorator above its
existing `@pytest.mark.asyncio` so the test is opt-in for --run-integration
(ensure the decorator is imported/available in the test module and placed
directly above the test definition).
massgen/tests/test_cli_cwd_context_flag.py (1)

82-89: Consider using a non-/tmp path to avoid static analysis warnings.

Ruff S108 flags /tmp/other as potentially insecure temp usage. While this is a false positive in the test context (it's just a string literal), using a different sentinel path would silence the linter.

💡 Suggested fix
 def test_apply_cli_cwd_context_path_noop_when_not_set(tmp_path, monkeypatch):
     """Should be a no-op when cwd context mode is not provided."""
     monkeypatch.chdir(tmp_path)
-    config = {"orchestrator": {"context_paths": [{"path": "/tmp/other", "permission": "read"}]}}
+    config = {"orchestrator": {"context_paths": [{"path": "/some/other/path", "permission": "read"}]}}

     apply_cli_cwd_context_path(config, None)

-    assert config["orchestrator"]["context_paths"] == [{"path": "/tmp/other", "permission": "read"}]
+    assert config["orchestrator"]["context_paths"] == [{"path": "/some/other/path", "permission": "read"}]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/tests/test_cli_cwd_context_flag.py` around lines 82 - 89, The test
test_apply_cli_cwd_context_path_noop_when_not_set uses the literal "/tmp/other"
which triggers Ruff S108; update the sentinel path to a non-/tmp value (e.g.
"/var/other" or "/nonexistent/other" or "SENTINEL_PATH") so the test still
asserts identity but avoids the linter warning; modify the literal in the test
where apply_cli_cwd_context_path is called and in the final assert to the new
sentinel string.
massgen/tests/integration/test_gemini_cli_hook_orchestrator.py (1)

1-27: Consider adding @pytest.mark.integration marker to this module.

Per coding guidelines, integration tests should use the @pytest.mark.integration marker for opt-in execution. Since this file tests multi-component orchestration flows, consider adding the marker to the relevant test classes or as a module-level pytestmark.

💡 Suggested addition at module level
 """Integration tests for Gemini CLI hook adapter with orchestrator hook manager.

 These are deterministic, non-API tests that verify the hook adapter correctly
 builds settings.json config and integrates with MassGen's GeneralHookManager.
 """

 from __future__ import annotations

 import json
 from pathlib import Path
 from unittest.mock import patch

+import pytest
+
 from massgen.mcp_tools.hooks import (
     GeneralHookManager,
     HookResult,
     HookType,
     PatternHook,
 )
 from massgen.mcp_tools.native_hook_adapters import GeminiCLINativeHookAdapter
+
+pytestmark = pytest.mark.integration

As per coding guidelines: "Use @pytest.mark.integration (opt-in: --run-integration)... markers when writing tests" and learning "Place integration tests in massgen/tests/integration/ for multi-component orchestration flows".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/tests/integration/test_gemini_cli_hook_orchestrator.py` around lines
1 - 27, Add the pytest integration marker at the module level so all tests in
this file are opt-in; define a module-level pytestmark = pytest.mark.integration
(or apply `@pytest.mark.integration` to the test classes) and import pytest if
needed so the marker is recognized; update the top of the file near the imports
and ensure the marker applies to _AllowHook and any test classes that follow.
massgen/configs/providers/gemini/gemini_cli_local.yaml (1)

1-9: Consider adding a "What happens" comment and using a cost-effective model.

Per coding guidelines, YAML configs should include "What happens" comments explaining the execution flow. Also, the guidelines suggest preferring cost-effective models like gemini-2.5-flash over preview models.

💡 Suggested improvement
 `#uv` run massgen --config "massgen/configs/providers/gemini/gemini_cli_local.yaml" "whats the deep learning about?"
+# What happens: Single Gemini CLI agent executes locally, streaming responses via the gemini CLI subprocess.
 agents:
 - id: agent_a
   backend:
     type: gemini_cli
-    model: gemini-3-flash-preview
+    model: gemini-2.5-flash
 ui:
   display_type: textual_terminal
   logging_enabled: true

As per coding guidelines: "Include 'What happens' comments explaining execution flow" and "Prefer cost-effective models (gpt-5-nano, gpt-5-mini, gemini-2.5-flash)".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/configs/providers/gemini/gemini_cli_local.yaml` around lines 1 - 9,
Add a top-level "What happens" comment describing the execution flow for this
YAML (how the agents list is used, backend.type=gemini_cli invoking the local
CLI adapter, and ui.display_type controlling output) and replace the preview
model name in the agents[0].backend.model field (currently
"gemini-3-flash-preview") with a cost-effective alternative such as
"gemini-2.5-flash" to follow the cost-preference guideline; ensure the comment
appears before the agents key and clearly states the single-agent flow and UI
logging behavior.
massgen/tests/test_gemini_cli_live.py (1)

180-219: Consider removing @pytest.mark.live_api from this test.

This test validates settings.json writing without actually calling the Gemini API—it only exercises local config persistence. The live_api marker may cause this test to be skipped unnecessarily in CI environments without API credentials.

Proposed change
 `@pytest.mark.integration`
-@pytest.mark.live_api
 `@pytest.mark.asyncio`
 async def test_settings_json_written_with_hooks_config():
     """Verify settings.json is written correctly when hooks config is set."""
-    _skip_if_no_credentials()
     _skip_if_no_gemini_cli()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/tests/test_gemini_cli_live.py` around lines 180 - 219, The test
test_settings_json_written_with_hooks_config is marked with
`@pytest.mark.live_api` but doesn't call the Gemini API; remove the
`@pytest.mark.live_api` decorator to avoid unnecessary skipping in CI, keeping
`@pytest.mark.integration` and `@pytest.mark.asyncio`; locate the test function
(test_settings_json_written_with_hooks_config) in
massgen/tests/test_gemini_cli_live.py and delete the `@pytest.mark.live_api` line
so the test still uses GeminiCLIBackend, backend._massgen_hooks_config setup,
backend._write_workspace_config, and settings.json assertions unchanged.
massgen/tests/test_copilot_live.py (2)

183-211: Log the caught exception for debugging instead of silently passing.

The try-except-pass pattern at lines 203-205 is intentional to verify non-hanging behavior, but logging the exception would help diagnose issues when this test fails unexpectedly.

Proposed enhancement
     try:
         async for chunk in backend.stream_with_tools(messages, []):
             chunks.append(chunk)
             if chunk.type in ("done", "error"):
                 break
     except Exception:
-        # Exception is acceptable — we just don't want a hang
-        pass
+        # Exception is acceptable — we just don't want a hang
+        import logging
+        logging.debug("Expected exception during unauthenticated access test")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/tests/test_copilot_live.py` around lines 183 - 211, In
test_error_handling_unauthenticated replace the silent except-pass with logging
of the caught exception so failures are diagnosable: inside the except block
that wraps the async for over backend.stream_with_tools(messages, []) (in
test_error_handling_unauthenticated) call logging.exception or pytest.fail with
the exception info instead of just pass, ensuring the exception from
CopilotBackend.stream_with_tools is recorded for debugging while preserving the
test's intent of not hanging.

30-41: Consider logging the exception before skipping.

The blind Exception catch at line 35 is acceptable for test skip logic, but logging the error details would aid debugging when authentication checks fail unexpectedly.

Proposed enhancement
 async def _check_copilot_auth(backend) -> None:
     """Skip test if Copilot is not authenticated."""
     try:
         await backend._ensure_started()
         is_auth, msg = await backend._query_auth_status(context="live_test")
     except Exception as e:
+        import logging
+        logging.debug(f"Copilot auth check failed: {e}")
         pytest.skip(f"Copilot SDK not available or auth check failed: {e}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/tests/test_copilot_live.py` around lines 30 - 41, The test helper
_check_copilot_auth swallows exceptions when calling backend._ensure_started and
backend._query_auth_status and directly calls pytest.skip; add a log of the
caught exception before skipping so failures are visible. Update the except
block in _check_copilot_auth to record the exception (e.g., logging.exception or
logger.exception) including context like "Copilot SDK not available or auth
check failed", ensure the logging module is imported or an existing test logger
is used, then call pytest.skip with the same message; reference
backend._ensure_started, backend._query_auth_status, and pytest.skip to find the
try/except to modify.
massgen/tests/test_vote_only_mode.py (1)

220-241: Mutable class attribute should use a factory or immutable default.

The config = {} at line 224 is a mutable class attribute. If any test inadvertently modifies it, the change would persist across instances, potentially causing flaky tests.

Proposed fix
 class _FakeBackendForToolFiltering:
     """Minimal backend stub for tool-name extraction."""

     filesystem_manager = None
-    config = {}
+    config: dict = None  # type: ignore[assignment]
+
+    def __init__(self):
+        self.config = {}

     `@staticmethod`
     def extract_tool_name(tool_call):

Alternatively, if instance creation is not desired in the test context:

 class _FakeBackendForToolFiltering:
     """Minimal backend stub for tool-name extraction."""

     filesystem_manager = None
+    
+    `@property`
+    def config(self):
+        return {}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/tests/test_vote_only_mode.py` around lines 220 - 241, The class
_FakeBackendForToolFiltering defines a mutable class attribute config = {} which
can leak mutations across tests; change it to an immutable default or initialize
per-instance instead—either set config = None (or an immutable empty mapping) at
class level and create a fresh dict in an __init__ (or use a
dataclass/constructor with a default_factory) so each
_FakeBackendForToolFiltering instance gets its own config and tests cannot
mutate a shared class dict.
massgen/backend/copilot.py (2)

576-612: Async-to-sync bridging pattern has potential edge cases.

The pattern of calling asyncio.run() from within a ThreadPoolExecutor when already in an event loop is functional but complex. Consider documenting the threading implications or extracting this into a utility function for reuse.

The current implementation is correct, but the nested event loop creation could have subtle issues in certain environments (e.g., nested async frameworks).

Consider extracting to a utility
def _run_async_in_sync_context(coro):
    """Run an async coroutine from a sync context, handling event loop presence."""
    import asyncio
    import concurrent.futures
    
    try:
        loop = asyncio.get_running_loop()
    except RuntimeError:
        loop = None
    
    if loop and loop.is_running():
        with concurrent.futures.ThreadPoolExecutor(max_workers=1) as pool:
            return pool.submit(asyncio.run, coro).result()
    return asyncio.run(coro)

This could be placed in a utilities module for reuse across backends that need similar sync/async bridging.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/backend/copilot.py` around lines 576 - 612, Extract the async-to-sync
bridging logic into a reusable helper (e.g., _run_async_in_sync_context) and
replace the inline ThreadPoolExecutor/asyncio.run block inside
_ppm_permission_callback's "shell" branch; the helper should accept a coroutine
(the call to ppm.pre_tool_use_hook("Bash", {"command": cmd})) and return its
result, handling the presence of a running event loop by using a
ThreadPoolExecutor to call asyncio.run when required, otherwise calling
asyncio.run directly, then use that helper to get (allowed, _reason) and proceed
as before.

1003-1009: Accessing ppm.managed_paths assumes specific PPM structure.

The code accesses ppm.managed_paths via getattr with a fallback to empty list, but then accesses .path and .permission attributes on each item. If the PPM structure changes, this could fail silently.

Consider adding explicit type check or try-except
         if ppm:
-            writable_paths = sorted(str(mp.path) for mp in getattr(ppm, "managed_paths", []) if getattr(mp, "permission", None) == Permission.WRITE)
+            try:
+                writable_paths = sorted(
+                    str(mp.path) 
+                    for mp in getattr(ppm, "managed_paths", []) 
+                    if getattr(mp, "permission", None) == Permission.WRITE
+                )
+            except (AttributeError, TypeError):
+                logger.debug("[Copilot] Could not extract writable paths from PPM")
+                writable_paths = []
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/backend/copilot.py` around lines 1003 - 1009, The access of
ppm.managed_paths in the writable_paths computation assumes each managed path
object has .path and .permission; change the logic in the block that reads
filesystem_manager.path_permission_manager.managed_paths so it validates items
before using their attributes (e.g., ensure ppm and managed_paths exist, iterate
safely and for each item check hasattr(item, "path") and hasattr(item,
"permission") or isinstance checks, and only include str(item.path) when
item.permission == Permission.WRITE); alternatively wrap the extraction in a
try/except that logs or skips invalid entries to avoid silent failures while
preserving the sorted writable_paths assignment.
massgen/cli.py (1)

2447-2452: GeminiCLIBackend already performs CLI validation in __init__; factory-level preflight is optional for consistency.

GeminiCLIBackend.__init__ (line 59 of massgen/backend/gemini_cli.py) already includes a fail-fast preflight via _find_gemini_cli(). If the gemini CLI is missing, it raises RuntimeError with helpful installation instructions. However, for consistency with the factory pattern used in claude_code (which validates at the factory level) and to catch missing dependencies earlier, consider adding a factory-level check here as well. Note that CodexBackend has a similar backend-level preflight but no factory-level check, so if implementing this for gemini_cli, consider applying the same pattern to codex for parity.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/cli.py` around lines 2447 - 2452, Factory returns GeminiCLIBackend
without preflight check; add a factory-level validation to mirror claude_code
and surface missing CLI dependency earlier. Before returning GeminiCLIBackend in
the factory branch for backend_type "gemini_cli" in massgen/cli.py, call
GeminiCLIBackend._find_gemini_cli() (or wrap instantiation in a try/except
catching RuntimeError from GeminiCLIBackend.__init__) to propagate the same
RuntimeError with installation instructions; consider applying the same pattern
to CodexBackend for parity with backend-level preflight checks.
massgen/mcp_tools/native_hook_adapters/gemini_cli_hook_script.py (1)

170-182: Break long line for readability.

This line chains many fallback keys and is difficult to read.

♻️ Proposed refactor
 def _extract_tool_args(tool_event: dict) -> dict:
     """Support both legacy and current Gemini CLI hook event schemas."""
-    raw_args = tool_event.get("tool_input") or tool_event.get("input") or tool_event.get("toolArgs") or tool_event.get("tool_args") or tool_event.get("arguments") or tool_event.get("parameters") or {}
+    raw_args = (
+        tool_event.get("tool_input")
+        or tool_event.get("input")
+        or tool_event.get("toolArgs")
+        or tool_event.get("tool_args")
+        or tool_event.get("arguments")
+        or tool_event.get("parameters")
+        or {}
+    )
     if isinstance(raw_args, dict):
         return raw_args
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/mcp_tools/native_hook_adapters/gemini_cli_hook_script.py` around
lines 170 - 182, The long fallback chain in _extract_tool_args makes the
raw_args assignment hard to read; refactor by replacing the single long
expression with a clear sequence: define an ordered list of possible keys (e.g.,
["tool_input","input","toolArgs","tool_args","arguments","parameters"]), then
iterate over that list to set raw_args from tool_event.get(key) until a non-None
value is found (default to {}), preserving the existing subsequent logic that
handles dict and JSON-string cases for raw_args; keep the function name
_extract_tool_args and the variable raw_args unchanged so callers remain intact.
massgen/backend/gemini_cli.py (2)

1164-1164: Use !s conversion flag in f-string instead of explicit str() calls.

Per Ruff RUF010, f-strings should use conversion flags for clarity.

♻️ Proposed fix
-        command = f"{sys.executable} {str(_HOOK_SCRIPT_PATH)} " f"--hook-dir {str(config_dir)} --event BeforeTool"
+        command = f"{sys.executable} {_HOOK_SCRIPT_PATH!s} --hook-dir {config_dir!s} --event BeforeTool"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/backend/gemini_cli.py` at line 1164, The f-string assigned to
variable command uses explicit str() calls; update it to use the !s conversion
flag for clarity per Ruff RUF010 by removing str(_HOOK_SCRIPT_PATH) and
str(config_dir) and instead using {_HOOK_SCRIPT_PATH!s} and {config_dir!s}
inside the f-string so the command construction in gemini_cli.py (variable name:
command, symbols: _HOOK_SCRIPT_PATH, config_dir) uses conversion flags rather
than explicit str() calls.

72-72: Extract API key resolution into separate lines for readability.

This line is very long and chains multiple or operations, making it hard to read and maintain.

♻️ Proposed refactor
-        self.api_key = api_key or os.getenv("GEMINI_API_KEY") or os.getenv("GOOGLE_API_KEY") or configured_env.get("GEMINI_API_KEY") or configured_env.get("GOOGLE_API_KEY")
+        self.api_key = (
+            api_key
+            or os.getenv("GEMINI_API_KEY")
+            or os.getenv("GOOGLE_API_KEY")
+            or configured_env.get("GEMINI_API_KEY")
+            or configured_env.get("GOOGLE_API_KEY")
+        )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/backend/gemini_cli.py` at line 72, The long chained assignment to
self.api_key (currently using multiple or operations with os.getenv and
configured_env) should be split for readability: build an ordered list/tuple of
candidate sources (e.g., env_gemini = os.getenv("GEMINI_API_KEY"), env_google =
os.getenv("GOOGLE_API_KEY"), cfg_gemini = configured_env.get("GEMINI_API_KEY"),
cfg_google = configured_env.get("GOOGLE_API_KEY"), and the api_key param) then
iterate or use next(filter(None, candidates), None) to pick the first non-empty
value and assign it to self.api_key in the class initializer (constructor) where
the original expression lives; keep the same precedence (explicit api_key param
first) and preserve behavior if none found.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 35bade69-a7c0-4246-b533-5fcee8e21689

📥 Commits

Reviewing files that changed from the base of the PR and between 6291b25 and 4e3d214.

📒 Files selected for processing (50)
  • .gitignore
  • docs/source/reference/yaml_schema.rst
  • docs/source/user_guide/backends.rst
  • massgen/backend/__init__.py
  • massgen/backend/capabilities.py
  • massgen/backend/copilot.py
  • massgen/backend/gemini_cli.py
  • massgen/cli.py
  • massgen/config_validator.py
  • massgen/configs/providers/gemini/gemini_cli_local.yaml
  • massgen/filesystem_manager/_constants.py
  • massgen/filesystem_manager/_docker_manager.py
  • massgen/filesystem_manager/_filesystem_manager.py
  • massgen/filesystem_manager/_path_permission_manager.py
  • massgen/mcp_tools/native_hook_adapters/__init__.py
  • massgen/mcp_tools/native_hook_adapters/copilot_adapter.py
  • massgen/mcp_tools/native_hook_adapters/gemini_cli_adapter.py
  • massgen/mcp_tools/native_hook_adapters/gemini_cli_hook_script.py
  • massgen/orchestrator.py
  • massgen/skills/backend-integrator/SKILL.md
  • massgen/tests/backend/test_copilot.py
  • massgen/tests/backend/test_copilot_hooks.py
  • massgen/tests/backend/test_copilot_permission_callback.py
  • massgen/tests/backend/test_copilot_permission_integration.py
  • massgen/tests/integration/test_copilot_hook_orchestrator.py
  • massgen/tests/integration/test_gemini_cli_hook_orchestrator.py
  • massgen/tests/integration/test_orchestrator_stream_enforcement.py
  • massgen/tests/test_backend_capabilities.py
  • massgen/tests/test_cli_cwd_context_flag.py
  • massgen/tests/test_concurrent_logging_sessions.py
  • massgen/tests/test_config_validator.py
  • massgen/tests/test_copilot_live.py
  • massgen/tests/test_docker_skills_write_access.py
  • massgen/tests/test_gemini_cli_backend.py
  • massgen/tests/test_gemini_cli_hook_ipc.py
  • massgen/tests/test_gemini_cli_hook_script.py
  • massgen/tests/test_gemini_cli_live.py
  • massgen/tests/test_interactive_controller_commands.py
  • massgen/tests/test_native_tools_sandbox_script.py
  • massgen/tests/test_snapshot_preservation.py
  • massgen/tests/test_subagent_manager_paths.py
  • massgen/tests/test_task_plan_support.py
  • massgen/tests/test_tui_modes.py
  • massgen/tests/test_vote_only_mode.py
  • massgen/tests/unit/test_errors.py
  • massgen/tests/unit/test_orchestrator_unit.py
  • massgen/token_manager/token_manager.py
  • pyproject.toml
  • scripts/test_hook_backends.py
  • scripts/test_native_tools_sandbox.py

Comment on lines +239 to +246
# Gemini CLI runtime artifacts (project-local, generated by MassGen)
.gemini/settings.json
.gemini/workflow_tool_specs.json
.gemini/custom_tool_specs.json
.gemini/GEMINI.md
.gemini/google_accounts.json
.gemini/oauth_creds.json
.gemini/.env
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Ignore the full .gemini/ directory to prevent secret leakage.

Line-specific ignores are fragile; new credential/artifact filenames can still slip into git history. Prefer a directory-level ignore.

🔐 Proposed hardening
-# Gemini CLI runtime artifacts (project-local, generated by MassGen)
-.gemini/settings.json
-.gemini/workflow_tool_specs.json
-.gemini/custom_tool_specs.json
-.gemini/GEMINI.md
-.gemini/google_accounts.json
-.gemini/oauth_creds.json
-.gemini/.env
+# Gemini CLI runtime artifacts (project-local, generated by MassGen)
+.gemini/
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Gemini CLI runtime artifacts (project-local, generated by MassGen)
.gemini/settings.json
.gemini/workflow_tool_specs.json
.gemini/custom_tool_specs.json
.gemini/GEMINI.md
.gemini/google_accounts.json
.gemini/oauth_creds.json
.gemini/.env
# Gemini CLI runtime artifacts (project-local, generated by MassGen)
.gemini/
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.gitignore around lines 239 - 246, Replace the fragile per-file ignores with
a directory-level ignore by removing the explicit entries
(.gemini/settings.json, .gemini/workflow_tool_specs.json,
.gemini/custom_tool_specs.json, .gemini/GEMINI.md, .gemini/google_accounts.json,
.gemini/oauth_creds.json, .gemini/.env) and add a single .gemini/ rule in
.gitignore; after updating .gitignore, ensure any already-tracked files under
the .gemini directory are removed from the index (git rm --cached) so the new
.gemini/ rule takes effect.

Comment on lines +278 to +325
GitHub Copilot Backend
~~~~~~~~~~~~~~~~~~~~~~

The GitHub Copilot backend uses the ``github-copilot-sdk`` with native MCP support.
Requires a GitHub Copilot subscription and the Copilot CLI (``gh copilot``).

**Basic Configuration:**

.. code-block:: yaml

backend:
type: "copilot"
model: "gpt-5-mini" # Also: gpt-4.1, claude-sonnet-4, gemini-2.5-pro

**With MCP Servers and Custom Tools:**

.. code-block:: yaml

backend:
type: "copilot"
model: "gpt-5-mini"
mcp_servers:
- name: "filesystem"
command: "npx"
args: ["-y", "@modelcontextprotocol/server-filesystem", "."]
custom_tools:
- path: "massgen/tool/_basic"
function: "two_num_tool"

**Configuration Options:**

- ``copilot_system_message_mode`` (string: "append"|"replace", default: "append"):
How the system message is applied to the Copilot session.
- ``copilot_permission_policy`` (string: "approve"|"deny", default: "approve"):
Permission callback policy. "approve" validates paths via PathPermissionManager.
- ``allowed_tools`` / ``exclude_tools``: Backend-level tool filtering.
- ``enable_multimodal_tools`` (bool): Enable read_media/generate_media tools.

**Docker Mode:**

.. code-block:: yaml

backend:
type: "copilot"
model: "gpt-5-mini"
command_line_execution_mode: "docker"
command_line_docker_network_mode: "bridge"

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The schema reference still doesn't cover the new backend keys.

This page introduces Copilot-specific options and adds gemini_cli as a valid backend type, but the backend parameter reference below never adds entries for those Copilot fields and still omits Gemini CLI support from rows users will check first, like cwd and enable_web_search. As written, yaml_schema.rst is no longer the authoritative reference for the new configuration surface.

As per coding guidelines, "For new YAML parameters, update docs/source/reference/yaml_schema.rst."

Also applies to: 687-687

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/source/reference/yaml_schema.rst` around lines 278 - 325, The YAML
schema reference is missing the new Copilot-specific backend keys and the new
backend type "gemini_cli"; update docs/source/reference/yaml_schema.rst to add
authoritative entries for the Copilot options (add keys/copilot fields:
copilot_system_message_mode, copilot_permission_policy, mcp_servers,
custom_tools, allowed_tools, exclude_tools, enable_multimodal_tools,
command_line_execution_mode, command_line_docker_network_mode) and ensure
backend parameter rows (e.g., cwd, enable_web_search) list "copilot" and
"gemini_cli" where applicable; follow existing schema table/row format so these
new fields appear in the backend parameter reference and include short
descriptions and types consistent with the surrounding entries.

Comment on lines +339 to +383
"gemini_cli": BackendCapabilities(
backend_type="gemini_cli",
provider_name="Gemini CLI",
supported_capabilities={
"bash",
"mcp",
"filesystem_native",
"web_search",
},
builtin_tools=[
"run_shell_command",
"read_file",
"write_file",
"replace",
"glob",
"grep_search",
"list_directory",
"read_many_files",
"google_web_search",
"web_fetch",
],
filesystem_support="native",
models=[
"gemini-3.1-pro-preview",
"gemini-3-flash-preview",
"gemini-2.5-pro",
"gemini-2.5-flash",
"gemini-2.5-flash-lite",
],
default_model="gemini-3.1-pro-preview",
env_var="GOOGLE_API_KEY",
model_release_dates={
"gemini-3.1-pro-preview": "2026-02",
"gemini-3-flash-preview": "2025-12",
"gemini-2.5-pro": "2025-06",
"gemini-2.5-flash": "2025-06",
"gemini-2.5-flash-lite": "2025-06",
},
notes=(
"Google Gemini CLI subprocess wrapper. Auth: CLI login (gemini) first, "
"then GOOGLE_API_KEY or GEMINI_API_KEY. Native filesystem via CLI. "
"Requires: npm install -g @google/gemini-cli. "
"Docker mode requires command_line_docker_network_mode."
),
),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Also classify gemini_cli as an agent-framework backend.

The new registry entry makes gemini_cli a native CLI integration, but AGENT_FRAMEWORK_BACKENDS below still excludes it. is_agent_framework_backend("gemini_cli") therefore returns False, so any code branching on framework/native backends will miss the new backend.

🩹 Suggested follow-up outside this hunk
 AGENT_FRAMEWORK_BACKENDS = frozenset(
     {
         "claude_code",
         "codex",
         "copilot",
+        "gemini_cli",
     },
 )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/backend/capabilities.py` around lines 339 - 383, The gemini_cli
BackendCapabilities entry is registered as a native CLI integration but not
included in the agent-framework list, so
is_agent_framework_backend("gemini_cli") returns False; update the
classification by adding "gemini_cli" to the AGENT_FRAMEWORK_BACKENDS set (or
adjust the is_agent_framework_backend logic) so that the backend named
"gemini_cli" (referenced in the BackendCapabilities block) is treated as an
agent-framework backend by functions checking
AGENT_FRAMEWORK_BACKENDS/is_agent_framework_backend.


agent_id = kwargs.get("agent_id", self.config.get("agent_id") or "default")
model = kwargs.get("model") or self.config.get("model") or "gpt-4.1"
model = kwargs.get("model") or self.config.get("model") or "gpt-5-mini"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

GitHub Copilot SDK gpt-5-mini model availability 2025

💡 Result:

  • GPT‑5 mini in GitHub Copilot (2025): GitHub began rolling out GPT‑5 mini to GitHub Copilot in public preview on August 13, 2025, appearing in the model picker (VS Code, github.com Copilot Chat, and GitHub Mobile). [1]
  • GA status: GitHub’s “Supported AI models in Copilot” documentation lists GPT‑5 mini as GA (generally available). [2]
  • “Copilot SDK” timing vs 2025: The product GitHub calls “Copilot SDK” (language-specific SDKs for programmatic access to Copilot CLI) entered technical preview on January 14, 2026—so it was not available in 2025. [3]
  • What was available in 2025 under an SDK name: GitHub released the Copilot Language Server SDK on February 10, 2025 (to integrate Copilot into other editors/IDEs), but that’s a different SDK than the 2026 “Copilot SDK.” [4]

Sources: [1]–[4] correspond to the cited links above.


🏁 Script executed:

# Check if the copilot.py file exists
fd copilot.py massgen/

# If it exists, examine the context around line 918
if [ -f "massgen/backend/copilot.py" ]; then
  echo "=== File found, checking size ==="
  wc -l massgen/backend/copilot.py
  echo "=== Content around line 918 ==="
  sed -n '910,925p' massgen/backend/copilot.py
fi

Repository: massgen/MassGen

Length of output: 904


🏁 Script executed:

# Search for other model identifier patterns in backend files
rg "gpt-.*-mini|model.*=" massgen/backend/ -A 2 -B 2 --type py | head -50

Repository: massgen/MassGen

Length of output: 3468


🏁 Script executed:

# Check capabilities.py to see if copilot backend is registered
if [ -f "massgen/backend/capabilities.py" ]; then
  echo "=== Checking capabilities.py ==="
  cat -n massgen/backend/capabilities.py | grep -i copilot -A 5 -B 5
fi

Repository: massgen/MassGen

Length of output: 1619


Add gpt-5-mini to the supported models list in massgen/backend/capabilities.py.

While gpt-5-mini is a valid, GA model in GitHub Copilot, it is currently set as the default_model in capabilities.py (line 713) but is missing from the models list (line 710). Update the models list to include "gpt-5-mini" to ensure consistency between the default model and supported models, as required by the coding guideline: "If adding new model support → ensure massgen/backend/capabilities.py is updated."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/backend/copilot.py` at line 918, The default model "gpt-5-mini" used
in massgen/backend/copilot.py is not present in the supported models list;
update the models collection in massgen/backend/capabilities.py by adding
"gpt-5-mini" to the models list (the variable named models) so it matches the
default_model setting (variable default_model) and ensures the new model is
recognized by the capabilities checks.

Comment on lines +97 to +100
# Serialize tool args for MassGen hook interface
tool_args = input_data.get("toolArgs", {})
arguments_str = json.dumps(tool_args) if tool_args else "{}"

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Preserve already-serialized toolArgs.

toolArgs is declared as Any. If Copilot hands this wrapper a JSON string, json.dumps(tool_args) double-encodes it, so downstream hooks receive a quoted string instead of the original object payload and can fail closed on parsing.

🧩 Proposed fix
-            tool_args = input_data.get("toolArgs", {})
-            arguments_str = json.dumps(tool_args) if tool_args else "{}"
+            tool_args = input_data.get("toolArgs")
+            if tool_args in (None, ""):
+                arguments_str = "{}"
+            elif isinstance(tool_args, str):
+                arguments_str = tool_args
+            else:
+                arguments_str = json.dumps(tool_args)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/mcp_tools/native_hook_adapters/copilot_adapter.py` around lines 97 -
100, The current serialization always calls json.dumps on tool_args (in the
CopilotAdapter code where tool_args = input_data.get("toolArgs", {}) and
arguments_str is set), which double-encodes when tool_args is already a JSON
string; change the logic to detect an already-serialized JSON string and
preserve it as-is (e.g., if isinstance(tool_args, str) use tool_args for
arguments_str, otherwise json.dumps(tool_args)), while still defaulting to "{}"
for empty/None values so downstream hooks receive the original object payload
rather than a quoted, double-encoded string.

Comment on lines +1776 to +1793
def test_copilot_docker_with_network_mode_passes(self):
"""Copilot in Docker mode with network_mode should pass."""
config = {
"agent": {
"id": "test-agent",
"backend": {
"type": "copilot",
"model": "gpt-5-mini",
"command_line_execution_mode": "docker",
"command_line_docker_network_mode": "bridge",
},
},
}
validator = ConfigValidator()
result = validator.validate_config(config)
# Should not have the network_mode error
assert not any("copilot" in error.message.lower() and "docker" in error.message.lower() and "network_mode" in error.message.lower() for error in result.errors)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

“Passes” test is too weak and can hide unrelated validation failures.

test_copilot_docker_with_network_mode_passes should assert overall validity, not just absence of one substring.

✅ Tighten the success assertion
     validator = ConfigValidator()
     result = validator.validate_config(config)
-    # Should not have the network_mode error
+    assert result.is_valid()
+    # Should not have the network_mode error
     assert not any("copilot" in error.message.lower() and "docker" in error.message.lower() and "network_mode" in error.message.lower() for error in result.errors)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/tests/test_config_validator.py` around lines 1776 - 1793, The test
test_copilot_docker_with_network_mode_passes is too weak—update it to assert
overall validity by calling ConfigValidator().validate_config(config) and
asserting result.valid (or that result.errors is empty) instead of only checking
that no error message contains specific substrings; locate the test function
name test_copilot_docker_with_network_mode_passes and the validate_config call
on ConfigValidator to change the final assertion to a strict check like assert
result.valid (or assert not result.errors) so unrelated validation failures will
fail the test.

Comment on lines +21 to +24
("claude_code", "auto", "direct"),
("codex", "auto", "direct"),
("copilot", "auto", "orchestrator"),
("gemini_cli", "auto", "orchestrator"),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

CI blocker: gemini_cli auto-runner expectation is currently wrong.

The provided pipeline failure shows resolve_runner("gemini_cli", "auto") returns "direct", but this test expects "orchestrator" (Line 24). Please align either the resolver logic or this expected tuple before merge.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/tests/test_native_tools_sandbox_script.py` around lines 21 - 24, The
test tuple ("gemini_cli", "auto", "orchestrator") is out of sync with the
resolver behavior: resolve_runner("gemini_cli", "auto") returns "direct" in CI,
so update the test expectation to ("gemini_cli", "auto", "direct") (or if you
intend different behavior, change resolve_runner's logic for the
"gemini_cli"/"auto" case to return "orchestrator"); reference resolve_runner and
the test tuple when making the change so the expectation and implementation
match.

Comment on lines +12 to +113
class _FakeHost:
def __init__(self, active_tasks: list[dict[str, Any]] | None = None) -> None:
self._active_tasks = [dict(t) for t in active_tasks] if active_tasks else None
self.pinned_updates: list[dict[str, Any]] = []
self.plan_updates: list[dict[str, Any]] = []

def get_active_tasks(self) -> list[dict[str, Any]] | None:
if self._active_tasks is None:
return None
return [dict(t) for t in self._active_tasks]

def update_pinned_task_plan(
self,
*,
tasks: list[dict[str, Any]],
focused_task_id: str | None = None,
operation: str = "create",
show_notification: bool = True,
) -> None:
self.pinned_updates.append(
{
"tasks": [dict(t) for t in tasks],
"focused_task_id": focused_task_id,
"operation": operation,
"show_notification": show_notification,
},
)
self._active_tasks = [dict(t) for t in tasks]

def update_task_plan(
self,
tasks: list[dict[str, Any]],
plan_id: str | None = None,
operation: str = "create",
) -> None:
self.plan_updates.append(
{
"tasks": [dict(t) for t in tasks],
"plan_id": plan_id,
"operation": operation,
},
)
self._active_tasks = [dict(t) for t in tasks]


def _tool(tool_name: str, result_full: str, tool_id: str = "tool_1") -> SimpleNamespace:
return SimpleNamespace(tool_name=tool_name, result_full=result_full, tool_id=tool_id)


def test_update_task_plan_from_codex_wrapper_create_task_plan() -> None:
host = _FakeHost()
result = (
"{'content': [{'type': 'text', 'text': "
'\'{"success":true,"operation":"create_task_plan","tasks":[{"id":"task_1","description":"Define scope","status":"pending","priority":"medium","dependencies":[],"metadata":{}}]}\''
"}], 'structured_content': {'success': True, 'operation': 'create_task_plan', "
"'tasks': [{'id': 'task_1', 'description': 'Define scope', 'status': 'pending', "
"'priority': 'medium', 'dependencies': [], 'metadata': {}}]}}"
)
tool_data = _tool("planning_agent_a/create_task_plan", result, tool_id="item_5")

handled = update_task_plan_from_tool(host, tool_data, timeline=None)

assert handled is True
assert host.pinned_updates, "Expected pinned task plan update"
assert host.plan_updates, "Expected host task-plan state update"
assert host.pinned_updates[-1]["operation"] == "create"
assert host.pinned_updates[-1]["tasks"][0]["id"] == "task_1"
assert host.plan_updates[-1]["plan_id"] == "item_5"


def test_update_task_plan_from_codex_wrapper_status_update_patches_cached_tasks() -> None:
host = _FakeHost(
active_tasks=[
{
"id": "task_1",
"description": "Define scope",
"status": "pending",
"priority": "medium",
"dependencies": [],
"metadata": {},
},
],
)
result = (
"{'content': [{'type': 'text', 'text': "
'\'{"success":true,"operation":"update_task_status","task":{"id":"task_1",'
'"description":"Define scope","status":"completed","priority":"medium",'
'"dependencies":[],"metadata":{"completion_notes":"done"}}}\''
"}], 'structured_content': {'success': True, 'operation': 'update_task_status', "
"'task': {'id': 'task_1', 'description': 'Define scope', 'status': 'completed', "
"'priority': 'medium', 'dependencies': [], 'metadata': {'completion_notes': 'done'}}}}"
)
tool_data = _tool("planning_agent_a/update_task_status", result, tool_id="item_9")

handled = update_task_plan_from_tool(host, tool_data, timeline=None)

assert handled is True
assert host.pinned_updates, "Expected pinned task plan update"
assert host.pinned_updates[-1]["operation"] == "update"
assert host.pinned_updates[-1]["focused_task_id"] == "task_1"
assert host.pinned_updates[-1]["tasks"][0]["status"] == "completed"
assert host.plan_updates[-1]["operation"] == "update"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add Google-style docstrings to new methods/functions in this module.

_FakeHost methods, _tool, and both test functions are new but undocumented.

As per coding guidelines: "**/*.py: For new or changed functions, include Google-style docstrings".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@massgen/tests/test_task_plan_support.py` around lines 12 - 113, The tests and
helper class lack Google-style docstrings: add concise Google-style docstrings
to class _FakeHost and its methods get_active_tasks, update_pinned_task_plan,
update_task_plan, the helper function _tool, and the two test functions
test_update_task_plan_from_codex_wrapper_create_task_plan and
test_update_task_plan_from_codex_wrapper_status_update_patches_cached_tasks;
each docstring should describe purpose, parameters (types), return value, and
any important behavior (e.g., that _FakeHost stores updates in
pinned_updates/plan_updates and that _tool returns a SimpleNamespace mimicking
tool data).

Comment on lines +303 to 305
if hasattr(backend, "set_general_hook_manager"):
backend.set_general_hook_manager(manager)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The Copilot/Gemini "unit" path can false-pass even if native hooks are broken.

For those backends, Test 2-4 skip set_general_hook_manager() and then call GeneralHookManager.execute_hooks(...) directly, so a broken CopilotNativeHookAdapter or Gemini CLI IPC path would still report PASS here. run_e2e_tests() already has backend-specific branches; run_tests() needs the same kind of native-path coverage or should skip these checks for native-hook backends.

Based on learnings, backend-specific tooling changes should not assume the shared hook path covers native backends; parity tests need to hit the native execution path.

Also applies to: 369-370, 457-458, 540-555, 567-588

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/test_hook_backends.py` around lines 303 - 305, The tests call
GeneralHookManager.execute_hooks directly for some backends causing false-passes
for native paths; update run_tests to mirror run_e2e_tests backend-specific
branching: detect native-hook backends (e.g., CopilotNativeHookAdapter, Gemini
CLI IPC) and either (A) call backend.set_general_hook_manager(manager) and
invoke the backend's native execute path so tests exercise the native
IPC/adapter code, or (B) explicitly skip the shared-path parity checks for those
backends and mark them to run native-path checks instead; modify the run_tests
branches at the locations around
set_general_hook_manager/GeneralHookManager.execute_hooks (also apply same
change at the other noted regions) so parity tests hit the native execution path
rather than only the shared GeneralHookManager.

Comment on lines +13 to 20
Run:
uv run python scripts/test_native_tools_sandbox.py [--backend TYPE] [--runner TYPE] [--llm-judge]

Options:
--backend Backend to test: claude_code (default) or codex
--backend Backend to test: claude_code, codex, copilot, or gemini_cli
--runner Runner to test: auto, direct, or orchestrator
auto uses orchestrator for copilot/gemini_cli and direct otherwise
--llm-judge Use LLM to analyze responses for subtle leakage (requires OPENAI_API_KEY)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Honor the documented auto runner behavior for Gemini CLI.

The header and CLI help say auto uses the orchestrator for both Copilot and Gemini CLI, but resolve_runner() only special-cases Copilot. Right now --backend gemini_cli --runner auto silently falls back to direct, so this script misses the intended path.

Suggested change
-    if backend_type == "copilot":
+    if backend_type in {"copilot", "gemini_cli"}:
         return "orchestrator"

Also applies to: 91-99, 524-535

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/test_native_tools_sandbox.py` around lines 13 - 20, The CLI's
auto-runner logic ignores the documented behavior for Gemini CLI because
resolve_runner() only special-cases "copilot"; update resolve_runner() (and the
other places with the same logic around the reported ranges) so that when runner
== "auto" and backend is either "copilot" or "gemini_cli" it returns/chooses the
"orchestrator" runner instead of falling back to "direct"; find the conditional
that checks backend == "copilot" and extend it to include backend ==
"gemini_cli" (or check membership in a set like {"copilot","gemini_cli"}) so
auto maps to orchestrator for both backends.

@ncrispino ncrispino mentioned this pull request Mar 16, 2026
9 tasks
@ncrispino ncrispino changed the base branch from main to dev/v0.1.64 March 16, 2026 15:23
@ncrispino ncrispino merged commit 0c09bd7 into dev/v0.1.64 Mar 16, 2026
15 of 19 checks passed
@coderabbitai coderabbitai bot mentioned this pull request Mar 16, 2026
18 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants