[#907/#909 follow-up] _scan_compose_content rejects all built-ins except comfyui — second gate blocks EXTENSIONS_DIR activation even after #299 fix

## Summary

PR #907 (origin: #299) correctly relaxed the top-level `CORE_SERVICE_IDS` blocklist down to the 4 `ALWAYS_ON_SERVICES`, and PR #909 correctly added the `EXTENSIONS_DIR` branch to `_activate_service`. Together they *should* make built-in extensions activatable via the internal template-apply path. But there's a **second, downstream gate** in the same code that still rejects 19 of 20 built-ins by name — `_scan_compose_content` at `routers/extensions.py:187`.

When `_activate_service` tries to activate a built-in in its `compose.yaml.disabled` state (line 1023), it calls `_scan_compose_content(disabled_compose)`. That function walks the compose file's internal `services:` map and rejects any entry whose key matches `CORE_SERVICE_IDS` with:

```python
for svc_name in services:
    if svc_name in CORE_SERVICE_IDS:
        raise HTTPException(
            status_code=400,
            detail=f"Extension rejected: service name '{svc_name}' conflicts with core service",
        )
```

This is a legitimate **name-collision defense** against a user extension trying to shadow a built-in's service name. PR #907 explicitly kept it, correctly — this check has real security value for the `/install` path (line 856). But when the caller is `_activate_service` activating a **known built-in whose compose file LEGITIMATELY declares its own built-in service name**, the check misfires: the compose IS declaring `n8n: ...` precisely because the extension is named `n8n`.

Result: every built-in whose `compose.yaml` contains its own service id in the `services:` map gets rejected at this gate *before* reaching the rename at line 1032. The only built-in that reliably slips past is **comfyui**, because its `compose.yaml` is a stub with `services: {}` (the real service definitions live in `compose.{nvidia,amd,multigpu}.yaml` GPU overlays).

So the full state of built-in activatability post-PR-907+909 is:

| Built-in | internal `services:` map | Activatable via `_activate_service`? |
|---|---|---|
| `comfyui` | `{}` (empty stub) | ✅ reaches `os.rename` — but then hits #331 EROFS |
| `n8n` | `{n8n: ...}` | ❌ blocked at `_scan_compose_content:204` with 400 |
| `tts` | `{tts: ...}` | ❌ blocked |
| `whisper` | `{whisper: ...}` | ❌ blocked |
| `langfuse` | `{langfuse-*: ...}` | ❌ blocked (its sub-services include `langfuse`) |
| `openclaw` | `{openclaw: ...}` | ❌ blocked |
| `searxng` | `{searxng: ...}` | ❌ blocked |
| `perplexica` | `{perplexica: ...}` | ❌ blocked |
| `token-spy` | `{token-spy: ...}` | ❌ blocked (verified empirically below) |
| `qdrant` | `{qdrant: ...}` | ❌ blocked |
| `embeddings` | `{embeddings: ...}` | ❌ blocked |
| `ape` | `{ape: ...}` | ❌ blocked |
| `dreamforge` | `{dreamforge: ...}` | ❌ blocked |
| `litellm` | `{litellm: ...}` | ❌ blocked |
| `privacy-shield` | `{privacy-shield: ...}` | ❌ blocked |
| `opencode` | host-only, no compose | N/A |
| `llama-server`, `open-webui`, `dashboard`, `dashboard-api` | always-on | blocked at `_assert_not_core` (correct) |

**Net: 1 of 16 togglable built-ins is activatable past the second gate — and that 1 then immediately hits #331's `:ro`-mount EROFS.** No built-in can actually be activated via the integration branch's template-apply path without fixing BOTH this gate and #331.

## Reproduction

On macOS Apple Silicon, integration branch (`upstream/main` + 17 open PRs):

```bash
# 1. Put token-spy into disabled state
$ docker stop dream-token-spy
$ mv /Volumes/X/dream-server-test/extensions/services/token-spy/compose.yaml \\
     /Volumes/X/dream-server-test/extensions/services/token-spy/compose.yaml.disabled

# 2. Apply a template that references token-spy
$ curl -s -X POST -H \"Authorization: Bearer \$KEY\" \\
    http://127.0.0.1:3002/api/templates/llm-platform/apply

{
  \"template_id\": \"llm-platform\",
  \"results\": {
    \"litellm\": \"already_enabled\",
    \"langfuse\": \"skipped: Extension rejected: service name 'langfuse' conflicts with core service\",
    \"token-spy\": \"skipped: Extension rejected: service name 'token-spy' conflicts with core service\",
    ...
  }
}
```

token-spy and langfuse both get skipped at the `_scan_compose_content` gate, **not** at `_assert_not_core` (which they correctly pass now). The skip is silent at the HTTP layer (200 OK with per-service error strings), so a user running a template apply just sees \"some built-ins skipped with opaque conflicts-with-core-service errors\" and no actionable feedback.

## Why existing tests don't catch this

- **`TestActivateServiceBuiltinBranch`** (added by #909) uses `monkeypatch` to set up test compose files in `tmp_path` with generic service names like `fakesvc`. Since `fakesvc` isn't in `CORE_SERVICE_IDS`, it slips past `_scan_compose_content` and the test correctly verifies the `EXTENSIONS_DIR` branch. But the test **doesn't exercise a real built-in name**, so it doesn't hit the second gate.

- **`TestAssertNotCoreAllowsBuiltins`** (added by #907) tests `_assert_not_core` **in isolation** and doesn't call `_activate_service` or run a compose file through `_scan_compose_content`. It verifies the name gets past the *first* gate; it doesn't verify it gets past the *second* gate.

Either test would have caught this if it had driven `_activate_service` against a real built-in's real `compose.yaml.disabled`.

## Relationship to adjacent issues

- **#299** — Reports the blocklist-misuse at the `validate_service_id()` level. PR #907 fixes that layer. This issue is downstream of #299 — a second check nobody realized existed.
- **#331** — `:ro` mount + `os.rename` crash. Only reachable via **comfyui** on the integration branch because comfyui is the only built-in that gets past the `_scan_compose_content` gate. Fixing just #331 without this gate still leaves 15 built-ins non-activatable.
- **#333** — `/enable`+`/disable` REST handlers miss `EXTENSIONS_DIR` fallback. Orthogonal — #333 is about the PUBLIC endpoints not finding built-ins in the first place; this issue is about even the INTERNAL template-apply path being unable to activate them.

## Suggested fix

Add a `trusted=True` branch to the `_scan_compose_content` call at line 1023 when the compose being scanned is under `EXTENSIONS_DIR` (built-in trust root), same as the `_install_from_library` call at line 856 already does:

```diff
     # Re-scan compose content (TOCTOU prevention)
-    _scan_compose_content(disabled_compose)
+    # Skip the name-collision check for built-in extensions — their compose
+    # files LEGITIMATELY declare their own service id, which is in CORE_SERVICE_IDS
+    # by definition. The other checks (privileged, label spoofing, unsafe volumes)
+    # still run.
+    is_builtin = ext_dir.resolve().is_relative_to(EXTENSIONS_DIR.resolve())
+    _scan_compose_content(disabled_compose, trusted=is_builtin)
```

Then update `_scan_compose_content(trusted=True)` semantics: skip the `CORE_SERVICE_IDS` collision check, keep the `privileged: true` + label-spoofing + unsafe-mount checks (which are still valid defenses for built-ins).

Alternatively: gate the collision check on `ext_dir` being under `USER_EXTENSIONS_DIR` only — built-ins get scanned by the other defenses but not the name check.

Either approach preserves the security intent of PR #907's \"keep `CORE_SERVICE_IDS`\" for name-shadowing defense against user extensions, while unblocking the legitimate template-apply path for built-ins.

## Severity

Medium-high. #907's promised behavior (\"16 other built-in extensions are now first-class manageable\") doesn't ship at the internal activation layer *or* the public REST layer (see #333). Template apply is the only documented user-facing path, and it's broken for 15 of 16 toggleable built-ins. Combined with #331, the only one that slips through the name gate then crashes on EROFS. Net: every template in the catalog that contains any toggleable built-in either silently skips it with an opaque error, or crashes with HTTP 500.

Not a regression in the direct sense — on `upstream/main` the first gate (`validate_service_id`) blocked everything, so nothing reached the second gate. PR #907 opened the first gate and exposed that the second gate was always there.

## Environment

- macOS Darwin 25.2.0, Apple Silicon M4, 24 GB unified memory
- Integration branch: `local/integration-test` = `upstream/main` + all 17 open yasinBursali PRs (#893–909)
- Install at `/Volumes/X/dream-server-test`
- Reproduced while cross-verifying #331 on Mac (which only triggered on comfyui because every other built-in gets blocked by this earlier gate).

Built-in	internal `services:` map	Activatable via `_activate_service`?
`comfyui`	`{}` (empty stub)	✅ reaches `os.rename` — but then hits #331 EROFS
`n8n`	`{n8n: ...}`	❌ blocked at `_scan_compose_content:204` with 400
`tts`	`{tts: ...}`	❌ blocked
`whisper`	`{whisper: ...}`	❌ blocked
`langfuse`	`{langfuse-*: ...}`	❌ blocked (its sub-services include `langfuse`)
`openclaw`	`{openclaw: ...}`	❌ blocked
`searxng`	`{searxng: ...}`	❌ blocked
`perplexica`	`{perplexica: ...}`	❌ blocked
`token-spy`	`{token-spy: ...}`	❌ blocked (verified empirically below)
`qdrant`	`{qdrant: ...}`	❌ blocked
`embeddings`	`{embeddings: ...}`	❌ blocked
`ape`	`{ape: ...}`	❌ blocked
`dreamforge`	`{dreamforge: ...}`	❌ blocked
`litellm`	`{litellm: ...}`	❌ blocked
`privacy-shield`	`{privacy-shield: ...}`	❌ blocked
`opencode`	host-only, no compose	N/A
`llama-server`, `open-webui`, `dashboard`, `dashboard-api`	always-on	blocked at `_assert_not_core` (correct)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[#907/#909 follow-up] _scan_compose_content rejects all built-ins except comfyui — second gate blocks EXTENSIONS_DIR activation even after #299 fix #338

Summary

Reproduction

Why existing tests don't catch this

Relationship to adjacent issues

Suggested fix

Severity

Environment

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[#907/#909 follow-up] _scan_compose_content rejects all built-ins except comfyui — second gate blocks EXTENSIONS_DIR activation even after #299 fix #338

Description

Summary

Reproduction

Why existing tests don't catch this

Relationship to adjacent issues

Suggested fix

Severity

Environment

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions