Skip to content

Conversation

@surcyf123
Copy link

Route sampling inference to ephemeral Lium GPU pods while keeping sampling, pre- and post-processing unchanged.
New affine/lium_backend.py: provisions a pod, starts a minimal OpenAI-style model server, exposes via public port or SSH tunnel.
affine/tasks.py: when AFFINE_USE_LIUM=1, set base_url to the Lium server; relax CHUTES_API_KEY requirement.
affine/miners.py: bypass Chutes lookups/filters in Lium mode only.
Usage:
export LIUM_API_KEY=...
export AFFINE_USE_LIUM=1
Optional: export AFFINE_LIUM_GPU="H200,A100"
Run existing commands unchanged (e.g., python -m affine.cli runner).
Compatibility: Default behavior unchanged (Chutes path). Lium is strictly opt-in via env var.

…ffine/lium_backend.py: provision Lium pod, host minimal OpenAI-style server, expose via public port or SSH tunnel\n- tasks: in AFFINE_USE_LIUM mode, route Evaluator payload base_url to Lium server; relax CHUTES_API_KEY requirement\n- miners: bypass Chutes lookups/filters when AFFINE_USE_LIUM=1\n\nNo changes to sampling/pre/post logic; only inference backend swapped when enabled.
Copilot AI review requested due to automatic review settings November 13, 2025 00:21
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces an opt-in Lium GPU backend for model inference while maintaining backward compatibility with the existing Chutes infrastructure. When AFFINE_USE_LIUM=1 is set, the system provisions ephemeral GPU pods on Lium, starts an OpenAI-compatible model server, and routes sampling requests through either public URLs or SSH tunnels.

Key Changes

  • Adds affine/lium_backend.py with pod provisioning, model server deployment, and SSH tunnel management
  • Modifies affine/tasks.py to conditionally route inference requests to Lium servers and relax API key requirements
  • Updates affine/miners.py to bypass Chutes validation checks when operating in Lium mode

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 22 comments.

File Description
affine/lium_backend.py New module implementing Lium pod provisioning, inline OpenAI-compatible server creation, SSH tunnel management, and resource cleanup
affine/tasks.py Conditional routing to Lium backend via ensure_model_server() and relaxed environment variable requirements in Lium mode
affine/miners.py Bypass Chutes metadata validation (chute name, revision, affine prefix checks) when AFFINE_USE_LIUM=1 is enabled

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +199 to +202
pod = _LIUM.wait_ready(pod_dict, timeout=900)
if not pod:
raise RuntimeError("Lium pod failed to become ready")

Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The 900-second (15-minute) timeout for wait_ready is quite long and could cause the application to hang for extended periods if pod provisioning fails. Consider either:

  1. Reducing the timeout to a more reasonable value (e.g., 300 seconds / 5 minutes)
  2. Adding logging to inform users that pod provisioning is in progress
  3. Making the timeout configurable via an environment variable

Additionally, the error message "Lium pod failed to become ready" doesn't provide details about why it failed, making debugging difficult.

Suggested change
pod = _LIUM.wait_ready(pod_dict, timeout=900)
if not pod:
raise RuntimeError("Lium pod failed to become ready")
timeout = int(os.getenv("AFFINE_LIUM_POD_TIMEOUT", "300"))
print(f"Provisioning Lium pod for model '{model}' (timeout: {timeout}s)...")
pod = _LIUM.wait_ready(pod_dict, timeout=timeout)
if not pod:
pod_info = pod_dict if pod_dict else {}
raise RuntimeError(
f"Lium pod failed to become ready after {timeout}s. "
f"Pod info: {pod_info}"
)

Copilot uses AI. Check for mistakes.
Comment on lines +230 to +234
ssh_cmd = [
"ssh",
"-i", key,
"-N",
"-o", "StrictHostKeyChecking=no",
Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Security concern: The SSH command uses -o StrictHostKeyChecking=no, which disables host key verification. This makes the connection vulnerable to man-in-the-middle attacks. While this may be acceptable for ephemeral pods with dynamic IPs, consider documenting this security trade-off or providing an option to use a known_hosts file if the Lium infrastructure uses stable host keys.

Suggested change
ssh_cmd = [
"ssh",
"-i", key,
"-N",
"-o", "StrictHostKeyChecking=no",
# SECURITY NOTE: By default, disables SSH host key verification for ephemeral pods.
# If your infrastructure uses stable host keys, set AFFINE_KNOWN_HOSTS_PATH to a known_hosts file
# to enable verification and mitigate MITM risks.
known_hosts_path = os.environ.get("AFFINE_KNOWN_HOSTS_PATH")
ssh_cmd = [
"ssh",
"-i", key,
"-N",
]
if known_hosts_path:
ssh_cmd += [
"-o", f"UserKnownHostsFile={known_hosts_path}",
"-o", "StrictHostKeyChecking=yes",
]
else:
ssh_cmd += [
"-o", "StrictHostKeyChecking=no",
]
ssh_cmd += [

Copilot uses AI. Check for mistakes.
Comment on lines +134 to +135
if os.getenv("AFFINE_USE_LIUM") == "1":
return {}
Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] When AFFINE_USE_LIUM=1, this method returns an empty dictionary and skips the CHUTES_API_KEY validation entirely. However, if the Lium backend is being used alongside Chutes (e.g., in a mixed environment), or if the env_vars are needed for other purposes beyond authentication, this could cause issues. Consider documenting this behavior or providing a more targeted exemption that only skips the specific validation that's not needed for Lium mode.

Copilot uses AI. Check for mistakes.
Comment on lines +65 to +67
from affine.lium_backend import ensure_model_server
srv = ensure_model_server(miner.model)
payload["base_url"] = srv.base_url
Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The import statement from affine.lium_backend import ensure_model_server is inside the conditional block, which is good for avoiding import errors when Lium is not used. However, if ensure_model_server raises an exception during execution, the error handling context may be unclear. Consider adding a try-except block with a descriptive error message:

if os.getenv("AFFINE_USE_LIUM") == "1":
    try:
        from affine.lium_backend import ensure_model_server
        srv = ensure_model_server(miner.model)
        payload["base_url"] = srv.base_url
    except Exception as e:
        raise RuntimeError(f"Failed to initialize Lium backend: {e}") from e
else:
    payload["base_url"] = f"https://{miner.slug}.chutes.ai/v1"
Suggested change
from affine.lium_backend import ensure_model_server
srv = ensure_model_server(miner.model)
payload["base_url"] = srv.base_url
try:
from affine.lium_backend import ensure_model_server
srv = ensure_model_server(miner.model)
payload["base_url"] = srv.base_url
except Exception as e:
raise RuntimeError(f"Failed to initialize Lium backend: {e}") from e

Copilot uses AI. Check for mistakes.


def ensure_model_server(model: str) -> ModelServer:
"""Return a live server for model, provisioning if necessary."""
Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing input validation for the model parameter. If model is empty or None, it could lead to accessing the wrong cached server or creating an invalid cache entry. Consider adding validation at the start of the function:

if not model:
    raise ValueError("model parameter cannot be empty")
Suggested change
"""Return a live server for model, provisioning if necessary."""
"""Return a live server for model, provisioning if necessary."""
if not model:
raise ValueError("model parameter cannot be empty")

Copilot uses AI. Check for mistakes.
Comment on lines +107 to +117
def _gen(prompt: str, max_new: int, temp: float) -> str:
encoded = tok(prompt, return_tensors="pt").to(device)
with torch.inference_mode():
out = model.generate(
**encoded,
do_sample=(temp > 0),
temperature=max(1e-5, temp),
max_new_tokens=max_new,
pad_token_id=tok.pad_token_id,
)
return tok.decode(out[0], skip_special_tokens=True)
Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inline server code has a potential issue with the generation logic for decoder-only models. For causal language models, the decode operation will include the input prompt in the output, but the code doesn't strip it. This means the response will contain both the prompt and the completion, which may not match the expected OpenAI API behavior. Consider updating the decoder logic:

def _gen(prompt: str, max_new: int, temp: float) -> str:
    encoded = tok(prompt, return_tensors="pt").to(device)
    input_length = encoded.input_ids.shape[1]
    with torch.inference_mode():
        out = model.generate(
            **encoded,
            do_sample=(temp > 0),
            temperature=max(1e-5, temp),
            max_new_tokens=max_new,
            pad_token_id=tok.pad_token_id,
        )
    # For causal models, skip the input tokens
    decoded = tok.decode(out[0][input_length:], skip_special_tokens=True)
    return decoded

Copilot uses AI. Check for mistakes.
Comment on lines +47 to +61
def _choose_executor() -> ExecutorInfo:
"""Choose a Lium executor, preferring GPU types from AFFINE_LIUM_GPU."""
prefs = (os.getenv("AFFINE_LIUM_GPU") or "").split(",")
prefs = [p.strip().upper() for p in prefs if p.strip()]
exs = _LIUM.ls()
if prefs:
for p in prefs:
for e in exs:
if e.gpu_type.upper().startswith(p):
return e
# Prefer docker-in-docker for easier server setups
for e in exs:
if getattr(e, "docker_in_docker", False):
return e
return exs[0]
Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The executor list could be empty, which would cause an IndexError on line 61 (return exs[0]). Add validation to handle this case:

if not exs:
    raise RuntimeError("No Lium executors available")
# Prefer docker-in-docker for easier server setups
for e in exs:
    if getattr(e, "docker_in_docker", False):
        return e
return exs[0]

Copilot uses AI. Check for mistakes.
import atexit
import os
import random
import shlex
Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'shlex' is not used.

Suggested change
import shlex

Copilot uses AI. Check for mistakes.
for port, proc in list(_SSH_TUNNELS.items()):
try:
proc.terminate()
except Exception:
Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'except' clause does nothing but pass and there is no explanatory comment.

Copilot uses AI. Check for mistakes.
for model, srv in list(_MODEL_TO_SERVER.items()):
try:
_LIUM._request("DELETE", f"/pods/{srv.pod.id}")
except Exception:
Copy link

Copilot AI Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'except' clause does nothing but pass and there is no explanatory comment.

Suggested change
except Exception:
except Exception:
# Ignore errors during pod deletion; best-effort cleanup.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant