Skip to content

[#907/#909 follow-up] _scan_compose_content rejects all built-ins except comfyui — second gate blocks EXTENSIONS_DIR activation even after #299 fix #338

@yasinBursali

Description

@yasinBursali

Summary

PR Light-Heart-Labs#907 (origin: #299) correctly relaxed the top-level CORE_SERVICE_IDS blocklist down to the 4 ALWAYS_ON_SERVICES, and PR Light-Heart-Labs#909 correctly added the EXTENSIONS_DIR branch to _activate_service. Together they should make built-in extensions activatable via the internal template-apply path. But there's a second, downstream gate in the same code that still rejects 19 of 20 built-ins by name — _scan_compose_content at routers/extensions.py:187.

When _activate_service tries to activate a built-in in its compose.yaml.disabled state (line 1023), it calls _scan_compose_content(disabled_compose). That function walks the compose file's internal services: map and rejects any entry whose key matches CORE_SERVICE_IDS with:

for svc_name in services:
    if svc_name in CORE_SERVICE_IDS:
        raise HTTPException(
            status_code=400,
            detail=f"Extension rejected: service name '{svc_name}' conflicts with core service",
        )

This is a legitimate name-collision defense against a user extension trying to shadow a built-in's service name. PR Light-Heart-Labs#907 explicitly kept it, correctly — this check has real security value for the /install path (line 856). But when the caller is _activate_service activating a known built-in whose compose file LEGITIMATELY declares its own built-in service name, the check misfires: the compose IS declaring n8n: ... precisely because the extension is named n8n.

Result: every built-in whose compose.yaml contains its own service id in the services: map gets rejected at this gate before reaching the rename at line 1032. The only built-in that reliably slips past is comfyui, because its compose.yaml is a stub with services: {} (the real service definitions live in compose.{nvidia,amd,multigpu}.yaml GPU overlays).

So the full state of built-in activatability post-PR-907+909 is:

Built-in internal services: map Activatable via _activate_service?
comfyui {} (empty stub) ✅ reaches os.rename — but then hits #331 EROFS
n8n {n8n: ...} ❌ blocked at _scan_compose_content:204 with 400
tts {tts: ...} ❌ blocked
whisper {whisper: ...} ❌ blocked
langfuse {langfuse-*: ...} ❌ blocked (its sub-services include langfuse)
openclaw {openclaw: ...} ❌ blocked
searxng {searxng: ...} ❌ blocked
perplexica {perplexica: ...} ❌ blocked
token-spy {token-spy: ...} ❌ blocked (verified empirically below)
qdrant {qdrant: ...} ❌ blocked
embeddings {embeddings: ...} ❌ blocked
ape {ape: ...} ❌ blocked
dreamforge {dreamforge: ...} ❌ blocked
litellm {litellm: ...} ❌ blocked
privacy-shield {privacy-shield: ...} ❌ blocked
opencode host-only, no compose N/A
llama-server, open-webui, dashboard, dashboard-api always-on blocked at _assert_not_core (correct)

Net: 1 of 16 togglable built-ins is activatable past the second gate — and that 1 then immediately hits #331's :ro-mount EROFS. No built-in can actually be activated via the integration branch's template-apply path without fixing BOTH this gate and #331.

Reproduction

On macOS Apple Silicon, integration branch (upstream/main + 17 open PRs):

# 1. Put token-spy into disabled state
$ docker stop dream-token-spy
$ mv /Volumes/X/dream-server-test/extensions/services/token-spy/compose.yaml \\
     /Volumes/X/dream-server-test/extensions/services/token-spy/compose.yaml.disabled

# 2. Apply a template that references token-spy
$ curl -s -X POST -H \"Authorization: Bearer \$KEY\" \\
    http://127.0.0.1:3002/api/templates/llm-platform/apply

{
  \"template_id\": \"llm-platform\",
  \"results\": {
    \"litellm\": \"already_enabled\",
    \"langfuse\": \"skipped: Extension rejected: service name 'langfuse' conflicts with core service\",
    \"token-spy\": \"skipped: Extension rejected: service name 'token-spy' conflicts with core service\",
    ...
  }
}

token-spy and langfuse both get skipped at the _scan_compose_content gate, not at _assert_not_core (which they correctly pass now). The skip is silent at the HTTP layer (200 OK with per-service error strings), so a user running a template apply just sees "some built-ins skipped with opaque conflicts-with-core-service errors" and no actionable feedback.

Why existing tests don't catch this

Either test would have caught this if it had driven _activate_service against a real built-in's real compose.yaml.disabled.

Relationship to adjacent issues

Suggested fix

Add a trusted=True branch to the _scan_compose_content call at line 1023 when the compose being scanned is under EXTENSIONS_DIR (built-in trust root), same as the _install_from_library call at line 856 already does:

     # Re-scan compose content (TOCTOU prevention)
-    _scan_compose_content(disabled_compose)
+    # Skip the name-collision check for built-in extensions — their compose
+    # files LEGITIMATELY declare their own service id, which is in CORE_SERVICE_IDS
+    # by definition. The other checks (privileged, label spoofing, unsafe volumes)
+    # still run.
+    is_builtin = ext_dir.resolve().is_relative_to(EXTENSIONS_DIR.resolve())
+    _scan_compose_content(disabled_compose, trusted=is_builtin)

Then update _scan_compose_content(trusted=True) semantics: skip the CORE_SERVICE_IDS collision check, keep the privileged: true + label-spoofing + unsafe-mount checks (which are still valid defenses for built-ins).

Alternatively: gate the collision check on ext_dir being under USER_EXTENSIONS_DIR only — built-ins get scanned by the other defenses but not the name check.

Either approach preserves the security intent of PR Light-Heart-Labs#907's "keep CORE_SERVICE_IDS" for name-shadowing defense against user extensions, while unblocking the legitimate template-apply path for built-ins.

Severity

Medium-high. Light-Heart-Labs#907's promised behavior ("16 other built-in extensions are now first-class manageable") doesn't ship at the internal activation layer or the public REST layer (see #333). Template apply is the only documented user-facing path, and it's broken for 15 of 16 toggleable built-ins. Combined with #331, the only one that slips through the name gate then crashes on EROFS. Net: every template in the catalog that contains any toggleable built-in either silently skips it with an opaque error, or crashes with HTTP 500.

Not a regression in the direct sense — on upstream/main the first gate (validate_service_id) blocked everything, so nothing reached the second gate. PR Light-Heart-Labs#907 opened the first gate and exposed that the second gate was always there.

Environment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions