-
Notifications
You must be signed in to change notification settings - Fork 42
Add Lium-backed inference for sampling (opt-in, no sampling logic changes) #165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ffine/lium_backend.py: provision Lium pod, host minimal OpenAI-style server, expose via public port or SSH tunnel\n- tasks: in AFFINE_USE_LIUM mode, route Evaluator payload base_url to Lium server; relax CHUTES_API_KEY requirement\n- miners: bypass Chutes lookups/filters when AFFINE_USE_LIUM=1\n\nNo changes to sampling/pre/post logic; only inference backend swapped when enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces an opt-in Lium GPU backend for model inference while maintaining backward compatibility with the existing Chutes infrastructure. When AFFINE_USE_LIUM=1 is set, the system provisions ephemeral GPU pods on Lium, starts an OpenAI-compatible model server, and routes sampling requests through either public URLs or SSH tunnels.
Key Changes
- Adds
affine/lium_backend.pywith pod provisioning, model server deployment, and SSH tunnel management - Modifies
affine/tasks.pyto conditionally route inference requests to Lium servers and relax API key requirements - Updates
affine/miners.pyto bypass Chutes validation checks when operating in Lium mode
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 22 comments.
| File | Description |
|---|---|
| affine/lium_backend.py | New module implementing Lium pod provisioning, inline OpenAI-compatible server creation, SSH tunnel management, and resource cleanup |
| affine/tasks.py | Conditional routing to Lium backend via ensure_model_server() and relaxed environment variable requirements in Lium mode |
| affine/miners.py | Bypass Chutes metadata validation (chute name, revision, affine prefix checks) when AFFINE_USE_LIUM=1 is enabled |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| pod = _LIUM.wait_ready(pod_dict, timeout=900) | ||
| if not pod: | ||
| raise RuntimeError("Lium pod failed to become ready") | ||
|
|
Copilot
AI
Nov 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The 900-second (15-minute) timeout for wait_ready is quite long and could cause the application to hang for extended periods if pod provisioning fails. Consider either:
- Reducing the timeout to a more reasonable value (e.g., 300 seconds / 5 minutes)
- Adding logging to inform users that pod provisioning is in progress
- Making the timeout configurable via an environment variable
Additionally, the error message "Lium pod failed to become ready" doesn't provide details about why it failed, making debugging difficult.
| pod = _LIUM.wait_ready(pod_dict, timeout=900) | |
| if not pod: | |
| raise RuntimeError("Lium pod failed to become ready") | |
| timeout = int(os.getenv("AFFINE_LIUM_POD_TIMEOUT", "300")) | |
| print(f"Provisioning Lium pod for model '{model}' (timeout: {timeout}s)...") | |
| pod = _LIUM.wait_ready(pod_dict, timeout=timeout) | |
| if not pod: | |
| pod_info = pod_dict if pod_dict else {} | |
| raise RuntimeError( | |
| f"Lium pod failed to become ready after {timeout}s. " | |
| f"Pod info: {pod_info}" | |
| ) |
| ssh_cmd = [ | ||
| "ssh", | ||
| "-i", key, | ||
| "-N", | ||
| "-o", "StrictHostKeyChecking=no", |
Copilot
AI
Nov 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] Security concern: The SSH command uses -o StrictHostKeyChecking=no, which disables host key verification. This makes the connection vulnerable to man-in-the-middle attacks. While this may be acceptable for ephemeral pods with dynamic IPs, consider documenting this security trade-off or providing an option to use a known_hosts file if the Lium infrastructure uses stable host keys.
| ssh_cmd = [ | |
| "ssh", | |
| "-i", key, | |
| "-N", | |
| "-o", "StrictHostKeyChecking=no", | |
| # SECURITY NOTE: By default, disables SSH host key verification for ephemeral pods. | |
| # If your infrastructure uses stable host keys, set AFFINE_KNOWN_HOSTS_PATH to a known_hosts file | |
| # to enable verification and mitigate MITM risks. | |
| known_hosts_path = os.environ.get("AFFINE_KNOWN_HOSTS_PATH") | |
| ssh_cmd = [ | |
| "ssh", | |
| "-i", key, | |
| "-N", | |
| ] | |
| if known_hosts_path: | |
| ssh_cmd += [ | |
| "-o", f"UserKnownHostsFile={known_hosts_path}", | |
| "-o", "StrictHostKeyChecking=yes", | |
| ] | |
| else: | |
| ssh_cmd += [ | |
| "-o", "StrictHostKeyChecking=no", | |
| ] | |
| ssh_cmd += [ |
| if os.getenv("AFFINE_USE_LIUM") == "1": | ||
| return {} |
Copilot
AI
Nov 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] When AFFINE_USE_LIUM=1, this method returns an empty dictionary and skips the CHUTES_API_KEY validation entirely. However, if the Lium backend is being used alongside Chutes (e.g., in a mixed environment), or if the env_vars are needed for other purposes beyond authentication, this could cause issues. Consider documenting this behavior or providing a more targeted exemption that only skips the specific validation that's not needed for Lium mode.
| from affine.lium_backend import ensure_model_server | ||
| srv = ensure_model_server(miner.model) | ||
| payload["base_url"] = srv.base_url |
Copilot
AI
Nov 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The import statement from affine.lium_backend import ensure_model_server is inside the conditional block, which is good for avoiding import errors when Lium is not used. However, if ensure_model_server raises an exception during execution, the error handling context may be unclear. Consider adding a try-except block with a descriptive error message:
if os.getenv("AFFINE_USE_LIUM") == "1":
try:
from affine.lium_backend import ensure_model_server
srv = ensure_model_server(miner.model)
payload["base_url"] = srv.base_url
except Exception as e:
raise RuntimeError(f"Failed to initialize Lium backend: {e}") from e
else:
payload["base_url"] = f"https://{miner.slug}.chutes.ai/v1"| from affine.lium_backend import ensure_model_server | |
| srv = ensure_model_server(miner.model) | |
| payload["base_url"] = srv.base_url | |
| try: | |
| from affine.lium_backend import ensure_model_server | |
| srv = ensure_model_server(miner.model) | |
| payload["base_url"] = srv.base_url | |
| except Exception as e: | |
| raise RuntimeError(f"Failed to initialize Lium backend: {e}") from e |
|
|
||
|
|
||
| def ensure_model_server(model: str) -> ModelServer: | ||
| """Return a live server for model, provisioning if necessary.""" |
Copilot
AI
Nov 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing input validation for the model parameter. If model is empty or None, it could lead to accessing the wrong cached server or creating an invalid cache entry. Consider adding validation at the start of the function:
if not model:
raise ValueError("model parameter cannot be empty")| """Return a live server for model, provisioning if necessary.""" | |
| """Return a live server for model, provisioning if necessary.""" | |
| if not model: | |
| raise ValueError("model parameter cannot be empty") |
| def _gen(prompt: str, max_new: int, temp: float) -> str: | ||
| encoded = tok(prompt, return_tensors="pt").to(device) | ||
| with torch.inference_mode(): | ||
| out = model.generate( | ||
| **encoded, | ||
| do_sample=(temp > 0), | ||
| temperature=max(1e-5, temp), | ||
| max_new_tokens=max_new, | ||
| pad_token_id=tok.pad_token_id, | ||
| ) | ||
| return tok.decode(out[0], skip_special_tokens=True) |
Copilot
AI
Nov 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The inline server code has a potential issue with the generation logic for decoder-only models. For causal language models, the decode operation will include the input prompt in the output, but the code doesn't strip it. This means the response will contain both the prompt and the completion, which may not match the expected OpenAI API behavior. Consider updating the decoder logic:
def _gen(prompt: str, max_new: int, temp: float) -> str:
encoded = tok(prompt, return_tensors="pt").to(device)
input_length = encoded.input_ids.shape[1]
with torch.inference_mode():
out = model.generate(
**encoded,
do_sample=(temp > 0),
temperature=max(1e-5, temp),
max_new_tokens=max_new,
pad_token_id=tok.pad_token_id,
)
# For causal models, skip the input tokens
decoded = tok.decode(out[0][input_length:], skip_special_tokens=True)
return decoded| def _choose_executor() -> ExecutorInfo: | ||
| """Choose a Lium executor, preferring GPU types from AFFINE_LIUM_GPU.""" | ||
| prefs = (os.getenv("AFFINE_LIUM_GPU") or "").split(",") | ||
| prefs = [p.strip().upper() for p in prefs if p.strip()] | ||
| exs = _LIUM.ls() | ||
| if prefs: | ||
| for p in prefs: | ||
| for e in exs: | ||
| if e.gpu_type.upper().startswith(p): | ||
| return e | ||
| # Prefer docker-in-docker for easier server setups | ||
| for e in exs: | ||
| if getattr(e, "docker_in_docker", False): | ||
| return e | ||
| return exs[0] |
Copilot
AI
Nov 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The executor list could be empty, which would cause an IndexError on line 61 (return exs[0]). Add validation to handle this case:
if not exs:
raise RuntimeError("No Lium executors available")
# Prefer docker-in-docker for easier server setups
for e in exs:
if getattr(e, "docker_in_docker", False):
return e
return exs[0]| import atexit | ||
| import os | ||
| import random | ||
| import shlex |
Copilot
AI
Nov 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Import of 'shlex' is not used.
| import shlex |
| for port, proc in list(_SSH_TUNNELS.items()): | ||
| try: | ||
| proc.terminate() | ||
| except Exception: |
Copilot
AI
Nov 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'except' clause does nothing but pass and there is no explanatory comment.
| for model, srv in list(_MODEL_TO_SERVER.items()): | ||
| try: | ||
| _LIUM._request("DELETE", f"/pods/{srv.pod.id}") | ||
| except Exception: |
Copilot
AI
Nov 13, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'except' clause does nothing but pass and there is no explanatory comment.
| except Exception: | |
| except Exception: | |
| # Ignore errors during pod deletion; best-effort cleanup. |
Route sampling inference to ephemeral Lium GPU pods while keeping sampling, pre- and post-processing unchanged.
New affine/lium_backend.py: provisions a pod, starts a minimal OpenAI-style model server, exposes via public port or SSH tunnel.
affine/tasks.py: when AFFINE_USE_LIUM=1, set base_url to the Lium server; relax CHUTES_API_KEY requirement.
affine/miners.py: bypass Chutes lookups/filters in Lium mode only.
Usage:
export LIUM_API_KEY=...
export AFFINE_USE_LIUM=1
Optional: export AFFINE_LIUM_GPU="H200,A100"
Run existing commands unchanged (e.g., python -m affine.cli runner).
Compatibility: Default behavior unchanged (Chutes path). Lium is strictly opt-in via env var.