-
Notifications
You must be signed in to change notification settings - Fork 7
Feat/k3s autoscaler dynamic offering #296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughDynamic GPU offering selection, warm-pool sizing, and offering caching added to the autoscaler; secure‑cloud/rental types and client updated; SSH provisioning switched to system SSH; NodePool/ScalingPolicy controllers and CRDs extended; CLI/Python deployment templates for vLLM/SGLang added; two capacity errors converted to non‑fatal warnings. Changes
Sequence Diagram(s)sequenceDiagram
participant Scaling as ScalingPolicyController
participant Offering as OfferingMatcher (OfferingSelector)
participant Secure as SecureCloudApi / SecureCloudClient
participant K8s as K8sClient
participant NodePool as NodePool CR
Scaling->>K8s: collect_metrics(ns) & list pending GPU pods
Scaling->>Offering: find_offering_with_fallback(requirements, constraints)
Offering->>Secure: list_offerings / get_offering (uses cache/TTL)
Secure-->>Offering: offerings / offering
Offering-->>Scaling: OfferingResult (Found / UsedFallback / NotFound)
alt Found or UsedFallback
Scaling->>K8s: create_node_pool_from_template(offering_id, is_warm_pool)
K8s-->>NodePool: create NodePool CR
NodePool->>K8s: persist status.offering_id & labels
NodePool->>Secure: start_rental / register node with gpu_specs
Secure-->>NodePool: rental + registration (wireguard info)
NodePool->>K8s: label node, create events (create_node_pool_event)
else NotFound
Scaling->>K8s: emit_no_offering_events(pending pods)
end
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120+ minutes Focus areas:
Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
scripts/autoscaler/rollout.sh (1)
7-52: Help text and example still reference old default namespaceRuntime default is now
basilica-autoscaler(Line 7), but:
- Line 41: help says default is
basilica-system.- Line 51: example uses
--namespace basilica-system.Recommend updating those strings to
basilica-autoscalerto avoid confusion.
♻️ Duplicate comments (1)
crates/basilica-cli/src/cli/handlers/deploy/templates/sglang.rs (1)
230-377: Duplicate code with vllm.rs.As noted in the vllm.rs review, these helper functions (
parse_env_vars,create_with_retry,is_quota_exceeded,extract_quota_message,WaitResult,wait_for_ready) are duplicated. Please refer to the refactoring suggestion in vllm.rs.
🧹 Nitpick comments (23)
orchestrator/k8s/services/autoscaler/secrets-template.yaml (1)
1-40: Clarify secrets workflow and avoid secret-scanner noise on SSH key placeholder
- The template structure and placeholders look good and align with the deployment/env vars.
- The SSH key block (Lines 38–40) intentionally mimics a real private key and will be flagged by tools like gitleaks even though it’s a placeholder.
Two small suggestions:
- Consider adjusting the placeholder so it doesn’t use a real-looking OpenSSH header/footer (or add a repo-level allow rule) to keep CI scanners quieter.
- Since the file is named
secrets-template.yaml, you might explicitly state that users should copy it tosecrets.yamlbefore filling values, to align with the comment on Line 4 and.gitignoreforsecrets.yaml.crates/basilica-autoscaler/src/lib.rs (1)
9-21: Public re-exports for offering matcher look coherentExposing
offering_matcherand re-exportingGpuRequirements,OfferingMatcher*,OfferingSelector, etc., from the crate root is a clean way to make the dynamic-offering functionality available to controllers/runtime without deep module imports. Just be aware this solidifies these type names as part of the public API surface going forward.crates/basilica-autoscaler/src/api/secure_cloud.rs (1)
263-265:get_offeringimplementation is correct; consider future optimizationThe new trait method and implementation are straightforward and reuse
list_offerings, which keeps circuit-breaker behavior consistent. For now this is fine.If the offerings list grows large or
get_offeringis used in hot paths, you may eventually want a more direct API call or a cached view in the offering matcher layer to avoid fetching the full list on each lookup.Also applies to: 431-434
crates/basilica-autoscaler/src/crd/node_pool.rs (1)
386-388: New status fields align with dynamic offering and labeling
offering_id: Option<String>with#[serde(default, skip_serializing_if = "Option::is_none")]is backward-compatible and gives you a clear link back to the Basilica offering for dynamic mode.labels_applied: boolwith#[serde(default)]is a simple, robust flag to gate GPU label application and avoid races.Schema-wise this looks good. The main follow‑through is on the controller side: ensuring
labels_appliedis reset appropriately if node labels or selected offering change.Also applies to: 454-457
crates/basilica-api/src/api/routes/deployments/handlers.rs (1)
74-82: Intentional non-blocking capacity validation looks good.The change from error to warning allows deployments to proceed when capacity is insufficient, enabling the autoscaler to provision additional nodes dynamically. The structured log fields (
namespace,reason) provide good observability.Consider adding a metric here to track how often deployments proceed with insufficient capacity - this would help correlate with autoscaler provisioning events.
orchestrator/k8s/services/autoscaler/scaling-policy.yaml (1)
72-80: K3s server URL is hardcoded.The server URL
https://connect.basilica.ai:6443is hardcoded. For production use, consider documenting that this needs to be customized, or add a comment indicating this is a placeholder that must be updated per deployment.crates/basilica-autoscaler/src/controllers/k8s_client.rs (1)
474-474: Potential event name collision when nanoseconds unavailable.Using
timestamp_nanos_opt().unwrap_or(0)could create duplicate event names if multiple events are created when nanoseconds are unavailable (returns 0). Consider adding a random suffix or using a UUID fragment:-let event_name = format!("{}.{}", pod_name, now.timestamp_nanos_opt().unwrap_or(0)); +let event_name = format!( + "{}.{:x}", + pod_name, + now.timestamp_nanos_opt().unwrap_or_else(|| rand::random::<i64>().unsigned_abs() as i64) +);Alternatively, use the simpler pattern with millis which is always available:
let event_name = format!("{}.{}", pod_name, now.timestamp_millis());crates/basilica-autoscaler/src/metrics.rs (1)
191-199: High cardinality risk withoffering_idlabel.The
offering_selection_totalmetric includesoffering_idas a label. If there are many unique offering IDs (which is likely with dynamic offerings from multiple providers), this could cause metrics cardinality explosion and impact Prometheus performance.Consider:
- Using a separate counter without
offering_idfor total selections- Adding
offering_idonly to a lower-frequency metric or log- Limiting the label to a normalized/bucketed value (e.g., GPU type only)
let offering_selection_total = IntCounterVec::new( Opts::new( "autoscaler_offering_selection_total", "Total number of GPU offering selections", ), - &["policy", "offering_id", "gpu_type"], + &["policy", "gpu_type"], // Remove offering_id to avoid cardinality issues ) .unwrap();orchestrator/k8s/services/autoscaler/crds.yaml (1)
676-679: Clarify nullable vs optional behavior forofferingId.The
offeringIdfield is marked asnullable: truebut lacks adefaultvalue. When using dynamic selection, this field will be populated by the autoscaler. Consider adding a description clarifying that when null/empty, dynamic selection is triggered.The current description mentions dynamic selection but doesn't explicitly state what happens when
offeringIdis omitted vs explicitly set.crates/basilica-autoscaler/src/offering_matcher.rs (1)
295-315: Document the prefix matching behavior in user-facing docs.The forward prefix matching (requested "A100" matches offering "A100SXM4") is useful but may surprise users. Consider adding documentation or an info log when prefix matching is used, so users understand why their pod was scheduled to a specific GPU variant.
crates/basilica-autoscaler/src/controllers/scaling_policy_controller.rs (1)
561-602: Consider rate-limiting fallback offering lookups.When
find_offering_with_fallbackrepeatedly fails to find a matching offering and falls back, it callsself.api.get_offering(fallback_id)on each reconciliation. If the fallback is invalid or unavailable, this generates repeated API calls.Consider caching the fallback validation result or using the offering cache for fallback lookups.
crates/basilica-operator/src/controllers/user_deployment_controller.rs (1)
620-624: Annotations wiring looks good; consider test coverage for GPU annotation presenceConditionally setting
metadata.annotationstoNonewhen the map is empty avoids spurious changes for non-GPU workloads, and the behavior is backward‑compatible with previous templates that had no annotations.Given the autoscaler will likely depend on
autoscaler.basilica.ai/gpu-requirements, consider adding a dedicated test that:
- Uses a
UserDeploymentSpecwithGpuSpec.- Asserts that the pod template metadata includes the
gpu-request.basilica.ai/*labels and theautoscaler.basilica.ai/gpu-requirementsannotation.crates/basilica-cli/src/cli/handlers/deploy/templates/vllm.rs (2)
159-189: Consider extracting shared name sanitization logic.The model name sanitization logic (lines 161-177) is duplicated in
sglang.rs. Consider extracting this into a shared helper function in the parent module ormodel_size.rsto reduce duplication.// In a shared module (e.g., templates/mod.rs or a new helpers.rs): pub fn sanitize_model_name(model: &str, max_len: usize) -> String { let model_part = model .split('/') .next_back() .unwrap_or(model) .to_lowercase() .chars() .filter_map(|c| { if c.is_ascii_alphanumeric() { Some(c) } else if c == '-' || c == '_' || c == '.' { Some('-') } else { None } }) .collect::<String>(); let sanitized = model_part.trim_matches('-'); if sanitized.len() > max_len { &sanitized[..max_len] } else { sanitized }.to_string() }
20-21: Consider pinning the vLLM image to a specific version for better reproducibility.Using
:latesttag can lead to reproducibility issues when the image is updated upstream. While vLLM's official documentation recommends the:latesttag, pinning to a specific version (e.g.,vllm/vllm-openai:v0.12.0) provides more consistent deployments. Alternatively, make this configurable to allow users to specify their preferred version.crates/basilica-cli/src/cli/handlers/deploy/templates/model_size.rs (1)
73-110: Consider making model family patterns extensible.The hardcoded model family patterns work for known models but may become outdated as new models are released. Consider adding a note or TODO about future extensibility, or accepting this as a best-effort fallback since the regex-based extraction is the primary method.
crates/basilica-cli/src/cli/handlers/deploy/templates/sglang.rs (1)
20-21: Consider pinning the SGLang Docker image to a specific version.Using
:latesttag compromises reproducibility across deployments. Pin to a specific version (e.g.,v0.4.6-cu125or the latest stable release available on Docker Hub) or make this configurable via environment variable or command-line option.crates/basilica-cli/src/cli/commands.rs (2)
808-846: Consider adding basic validation for vLLM numeric / enum-like optionsA few options could benefit from clap-level validation for better UX:
gpu_memory_utilization: Option<f32>is documented as0.0-1.0but not range-checked.dtypeandquantizationare free-formStrings even though they are effectively enums (e.g.,auto,float16,bfloat16, etc.).Recommended (non-blocking) improvements:
- Add a range parser for utilization:
- #[arg(long)] - pub gpu_memory_utilization: Option<f32>, + #[arg(long, value_parser = clap::value_parser!(f32).range(0.0..=1.0))] + pub gpu_memory_utilization: Option<f32>,
- Optionally later, replace
dtype/quantizationwithValueEnumtypes to surface valid values in--helpand get automatic validation.
848-870: Mirror validation for SGLangmem_fraction_staticand consider enums forquantizationSimilar to vLLM:
mem_fraction_static: Option<f32>is documented as0.0-1.0but not range-validated.quantization: Option<String>is conceptually an enum.Non-blocking suggestion:
- #[arg(long)] - pub mem_fraction_static: Option<f32>, + #[arg(long, value_parser = clap::value_parser!(f32).range(0.0..=1.0))] + pub mem_fraction_static: Option<f32>,And optionally later, migrate
quantizationto aValueEnumif you want tighter validation and better--help.examples/20_deploy_sglang_template.py (1)
23-28: Fix redundant f-strings and optionally harden cleanup in the example
- Lines 23 and 41 use f-strings with no interpolations; you can drop the
fprefix to satisfy Ruff and keep things simple:-print(f"SGLang deployment ready!") +print("SGLang deployment ready!") ... -print(f"Qwen 3B deployment ready!") +print("Qwen 3B deployment ready!")
- Optional: for a slightly more robust example, you could wrap deployment/printing in a
tryand move thedelete()calls into afinallyblock so deployments are cleaned up even if something fails mid-script.Also applies to: 41-48
crates/basilica-sdk-python/python/basilica/templates/model_size.py (1)
81-97: Align estimate_memory_from_params docstring with implementationThe function always rounds up to the next multiple of 8GB, not to the nearest 8GB. To avoid confusion for callers, consider tightening the wording:
- Returns: - Estimated VRAM in GB, rounded to nearest 8GB + Returns: + Estimated VRAM in GB, rounded up to the nearest 8GBThe conservative “round up” behavior itself seems appropriate for sizing GPUs.
examples/19_deploy_vllm_template.py (1)
23-26: Clean up redundant f-strings and consider making cleanup more robust
- Lines 23 and 41 use f-strings without placeholders; they can be plain strings:
-print(f"vLLM deployment ready!") +print("vLLM deployment ready!") ... -print(f"Llama-2-7B deployment ready!") +print("Llama-2-7B deployment ready!")
- Optional: like the SGLang example, you might wrap the deployments in a
tryand move thedelete()calls into afinallyso resources are cleaned up if something goes wrong between deploy and teardown.Also applies to: 41-48
crates/basilica-sdk-python/python/basilica/__init__.py (2)
446-506: Factor shared GPU-sizing logic and avoid Optional[int] in GpuRequirementsSpec callsThe vLLM and SGLang helpers share nearly identical blocks for:
- Estimating GPU requirements,
- Choosing an effective GPU count and model list, and
- Constructing
GpuRequirementsSpec/ResourceRequirements.Two small refactors would improve maintainability and type ergonomics:
- Use a non-optional local for the effective GPU count (helps type-checkers and makes intent explicit):
- # Use user-specified GPU count or auto-detected - if gpu_count is None: - gpu_count = reqs.gpu_count - - # Use user-specified GPU models or recommended GPU - effective_gpu_models = gpu_models if gpu_models else [reqs.recommended_gpu] + # Use user-specified GPU count or auto-detected + effective_gpu_count = gpu_count if gpu_count is not None else reqs.gpu_count + + # Use user-specified GPU models or recommended GPU + effective_gpu_models = gpu_models if gpu_models else [reqs.recommended_gpu] ... - gpu_spec = GpuRequirementsSpec( - count=gpu_count, + gpu_spec = GpuRequirementsSpec( + count=effective_gpu_count, model=effective_gpu_models, min_cuda_version=None, min_gpu_memory_gb=None, )…and mirror the same pattern in
deploy_sglang().
- (Optional) Consider extracting a tiny internal helper for the repeated “estimate requirements → pick effective_gpu_count/effective_gpu_models → build gpu_spec/resources” sequence to keep the two methods in sync as you iterate on the heuristics.
Also applies to: 559-572, 600-651, 698-711
517-586: Consider making image and storage configuration overridable rather than hard-codedBoth deploy helpers hard-code:
- Images:
"vllm/vllm-openai:latest"and"lmsysorg/sglang:latest".- Storage backend details, including
backend=StorageBackend.R2,bucket="", andcredentials_secret="basilica-r2-credentials".For long-term operability and self-hosted scenarios, it may be worth:
- Exposing
image(and maybeport) as optional parameters with sensible defaults, so users can pin specific tags or swap images without forking the SDK.- Making bucket/credentials either:
- Derived from environment/config, or
- Optional parameters (with these as defaults) so non-Basilica-R2 environments can still reuse the helpers.
This keeps the “one-liner” ergonomics while avoiding environment-specific assumptions being baked into the SDK.
Also applies to: 664-725
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (32)
crates/basilica-api/src/api/routes/deployments/handlers.rs(2 hunks)crates/basilica-autoscaler/src/api/secure_cloud.rs(2 hunks)crates/basilica-autoscaler/src/controllers/k8s_client.rs(3 hunks)crates/basilica-autoscaler/src/controllers/node_pool_controller.rs(11 hunks)crates/basilica-autoscaler/src/controllers/scaling_policy_controller.rs(13 hunks)crates/basilica-autoscaler/src/crd/node_pool.rs(2 hunks)crates/basilica-autoscaler/src/crd/scaling_policy.rs(3 hunks)crates/basilica-autoscaler/src/error.rs(3 hunks)crates/basilica-autoscaler/src/lib.rs(2 hunks)crates/basilica-autoscaler/src/metrics.rs(5 hunks)crates/basilica-autoscaler/src/offering_matcher.rs(1 hunks)crates/basilica-autoscaler/src/runtime.rs(6 hunks)crates/basilica-cli/Cargo.toml(1 hunks)crates/basilica-cli/src/cli/commands.rs(1 hunks)crates/basilica-cli/src/cli/handlers/deploy/mod.rs(2 hunks)crates/basilica-cli/src/cli/handlers/deploy/templates/mod.rs(1 hunks)crates/basilica-cli/src/cli/handlers/deploy/templates/model_size.rs(1 hunks)crates/basilica-cli/src/cli/handlers/deploy/templates/sglang.rs(1 hunks)crates/basilica-cli/src/cli/handlers/deploy/templates/vllm.rs(1 hunks)crates/basilica-operator/src/controllers/user_deployment_controller.rs(2 hunks)crates/basilica-sdk-python/python/basilica/__init__.py(2 hunks)crates/basilica-sdk-python/python/basilica/templates/__init__.py(1 hunks)crates/basilica-sdk-python/python/basilica/templates/model_size.py(1 hunks)examples/19_deploy_vllm_template.py(1 hunks)examples/20_deploy_sglang_template.py(1 hunks)orchestrator/k8s/services/autoscaler/.gitignore(1 hunks)orchestrator/k8s/services/autoscaler/crds.yaml(4 hunks)orchestrator/k8s/services/autoscaler/deployment.yaml(1 hunks)orchestrator/k8s/services/autoscaler/namespaces.yaml(1 hunks)orchestrator/k8s/services/autoscaler/scaling-policy.yaml(1 hunks)orchestrator/k8s/services/autoscaler/secrets-template.yaml(1 hunks)scripts/autoscaler/rollout.sh(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (11)
crates/basilica-cli/src/cli/handlers/deploy/templates/sglang.rs (6)
crates/basilica-cli/src/progress/mod.rs (2)
complete_spinner_and_clear(137-139)create_spinner(98-109)crates/basilica-sdk-python/python/basilica/__init__.py (1)
GpuRequirementsSpec(94-112)crates/basilica-cli/src/cli/handlers/deploy/templates/model_size.rs (1)
estimate_gpu_requirements(32-50)crates/basilica-cli/src/output/mod.rs (1)
json_output(12-17)crates/basilica-sdk-python/python/basilica/exceptions.py (1)
DeploymentFailed(237-266)crates/basilica-sdk/src/client.rs (1)
timeout(1151-1154)
crates/basilica-cli/src/cli/handlers/deploy/templates/model_size.rs (2)
crates/basilica-cli/src/cli/handlers/deploy/templates/sglang.rs (1)
model(161-176)crates/basilica-cli/src/cli/handlers/deploy/templates/vllm.rs (1)
model(162-177)
crates/basilica-cli/src/cli/handlers/deploy/templates/mod.rs (3)
crates/basilica-cli/src/cli/handlers/deploy/templates/model_size.rs (1)
estimate_gpu_requirements(32-50)crates/basilica-cli/src/cli/handlers/deploy/templates/sglang.rs (1)
handle_sglang_deploy(36-157)crates/basilica-cli/src/cli/handlers/deploy/templates/vllm.rs (1)
handle_vllm_deploy(36-157)
crates/basilica-cli/src/cli/handlers/deploy/templates/vllm.rs (3)
crates/basilica-cli/src/progress/mod.rs (2)
complete_spinner_and_clear(137-139)create_spinner(98-109)crates/basilica-cli/src/cli/handlers/deploy/templates/model_size.rs (1)
estimate_gpu_requirements(32-50)crates/basilica-cli/src/output/mod.rs (1)
json_output(12-17)
crates/basilica-autoscaler/src/metrics.rs (1)
crates/basilica-billing/src/domain/rentals.rs (1)
offering_id(223-225)
crates/basilica-autoscaler/src/runtime.rs (2)
crates/basilica-autoscaler/src/controllers/node_pool_controller.rs (3)
new(59-74)clone(41-50)with_offering_selector(84-100)crates/basilica-autoscaler/src/controllers/scaling_policy_controller.rs (2)
new(58-70)clone(42-49)
examples/19_deploy_vllm_template.py (3)
crates/basilica-cli/src/github_releases.rs (1)
basilica(21-26)crates/basilica-sdk-python/python/basilica/deployment.py (1)
url(183-191)crates/basilica-cli/src/cli/handlers/deploy/validation.rs (1)
memory(401-402)
examples/20_deploy_sglang_template.py (2)
crates/basilica-sdk-python/python/basilica/__init__.py (1)
deploy_sglang(600-737)crates/basilica-sdk-python/python/basilica/deployment.py (1)
url(183-191)
crates/basilica-sdk-python/python/basilica/__init__.py (3)
crates/basilica-sdk-python/python/basilica/deployment.py (3)
name(178-180)Deployment(114-453)_from_response(436-453)crates/basilica-sdk-python/python/basilica/templates/model_size.py (1)
estimate_gpu_requirements(30-60)crates/basilica-sdk-python/python/basilica/decorators.py (2)
deployment(35-37)deployment(129-208)
crates/basilica-sdk-python/python/basilica/templates/__init__.py (1)
crates/basilica-sdk-python/python/basilica/templates/model_size.py (2)
GpuRequirements(22-27)estimate_gpu_requirements(30-60)
crates/basilica-autoscaler/src/controllers/scaling_policy_controller.rs (1)
crates/basilica-autoscaler/src/offering_matcher.rs (4)
calculate_nodes_needed(638-658)group_pending_pods_by_requirements(620-635)has_gpu_node_affinity(499-542)from_template(107-120)
🪛 Gitleaks (8.30.0)
orchestrator/k8s/services/autoscaler/secrets-template.yaml
[high] 38-40: Identified a Private Key, which may compromise cryptographic security and sensitive data encryption.
(private-key)
🪛 Ruff (0.14.8)
examples/19_deploy_vllm_template.py
23-23: f-string without any placeholders
Remove extraneous f prefix
(F541)
41-41: f-string without any placeholders
Remove extraneous f prefix
(F541)
examples/20_deploy_sglang_template.py
23-23: f-string without any placeholders
Remove extraneous f prefix
(F541)
41-41: f-string without any placeholders
Remove extraneous f prefix
(F541)
crates/basilica-sdk-python/python/basilica/__init__.py
520-520: Possible binding to all interfaces
(S104)
551-551: Possible hardcoded password assigned to argument: "credentials_secret"
(S106)
667-667: Possible binding to all interfaces
(S104)
690-690: Possible hardcoded password assigned to argument: "credentials_secret"
(S106)
🔇 Additional comments (64)
crates/basilica-cli/Cargo.toml (2)
66-67: The dependency addition is well-organized.The placement after
ssh-keyand before internal dependencies follows a logical grouping convention. The comment clearly indicates the purpose, and using a workspace dependency ensures consistent versioning across the monorepo.
66-67: The regex dependency addition is correct and actively used in the codebase.The
regexworkspace dependency is properly imported and used incrates/basilica-cli/src/cli/handlers/deploy/templates/model_size.rsfor extracting parameter counts from model names (matching patterns like "7b", "70b", "0.5b", etc.). The dependency is initialized efficiently usingLazyLock<Regex>. The addition is appropriate.orchestrator/k8s/services/autoscaler/.gitignore (1)
1-1: Good call ignoring secrets.yamlIgnoring the concrete
secrets.yamlfile aligns with usingsecrets-template.yamland reduces risk of committing real credentials.orchestrator/k8s/services/autoscaler/namespaces.yaml (1)
1-16: Namespaces and labeling look consistent
basilica-autoscalerandbasilica-ssh-keysnamespaces, plus labels (includingrestricted: "true"for SSH keys), line up with the deployment and secrets usage and give you room for policy enforcement.orchestrator/k8s/services/autoscaler/deployment.yaml (2)
1-212: Deployment, Service, and config wiring are consistent with new namespace and secrets
- Deployment, Service, and ConfigMap all target
basilica-autoscalerand shareapp: basilica-autoscalerlabels, matching the new namespaces andbasilica-autoscaler-credentialsSecret.- Env var
BASILICA_API_KEYcorrectly referencesbasilica-autoscaler-credentials/api-key, which matches the secrets template.
183-212: ServiceMonitor and PDB configuration look sound
- ServiceMonitor:
- Selects
app: basilica-autoscalerand restricts to thebasilica-autoscalernamespace.- Scrapes the
metricsport on/metricswith reasonable interval/timeout settings.- PodDisruptionBudget:
minAvailable: 1with a single replica effectively forbids voluntary disruptions, which is a reasonable choice for a leader‑elected singleton if you prefer availability over graceful drain upgrades.No issues from a manifest perspective; just confirm that the strict PDB behavior matches your upgrade/maintenance expectations.
crates/basilica-api/src/api/routes/deployments/handlers.rs (1)
359-368: Consistent non-blocking behavior for scale operations.The warning pattern matches
validate_deployment_prerequisitesand provides relevant context (current_replicas,new_replicas,reason) for debugging. This aligns well with the autoscaler-driven capacity provisioning model.orchestrator/k8s/services/autoscaler/scaling-policy.yaml (2)
84-91: Pre-settingbasilica.ai/validated: "true"may bypass validation.Setting
basilica.ai/validated: "true"as a default label on provisioned nodes could skip any node validation logic that depends on this label being set only after successful validation. Ensure this is intentional and that the provisioning flow actually validates nodes before they become schedulable.
62-69: SSH key secrets reference looks correct, but verify namespace access.The SSH key secret is referenced from a different namespace (
basilica-ssh-keys). Ensure that the autoscaler's ServiceAccount has RBAC permissions to read secrets from this namespace, or the provisioning will fail at runtime.crates/basilica-autoscaler/src/crd/scaling_policy.rs (3)
49-52: New offering_constraints field looks good.The optional
offering_constraintsfield with proper serde attributes allows dynamic GPU offering selection while maintaining backward compatibility with existing policies that don't specify constraints.
176-195: Dynamic selection fields are well-structured.The new fields for dynamic offering selection (
gpu_type,min_gpu_count,max_hourly_rate,preferred_provider,region) provide good flexibility. The defaults and skip_serializing patterns are consistent with the rest of the codebase.
170-175: CRD schema already reflectsoffering_idas optional—no breaking change for ScalingPolicy manifests.The CRD definition in
orchestrator/k8s/services/autoscaler/crds.yamlalready marksofferingIdas optional (nullable: true) and excludes it from the required fields forScalingPolicy.spec.nodeTemplate.secureCloud. The#[serde(default)]attribute on the Rust struct ensures backward compatibility for existing deserialization. No action required on CRD schema updates or existing manifests; they can safely omit the field.crates/basilica-autoscaler/src/controllers/k8s_client.rs (3)
75-98: New event creation trait methods provide good observability.The trait methods for creating Pod and NodePool events enable visibility into autoscaler decisions. The signatures are clean and the parameters are sufficient for meaningful event messages.
493-495: Consider usingfirst_timestampandlast_timestampfor event aggregation.Setting
first_timestamp: Noneandlast_timestamp: Nonewhile usingevent_timeis valid for newer-style events, but some older K8s tooling may expect the legacy timestamp fields. The current approach is correct for modern clusters.
509-557: NodePool event creation mirrors Pod event pattern correctly.The implementation is consistent with
create_pod_event. The same event name collision concern applies here (line 523).crates/basilica-autoscaler/src/error.rs (3)
91-96: EnhancedNoMatchingOfferingerror provides useful context.The struct variant with
gpu_count,models, andmin_memory_gbfields produces actionable error messages that help operators understand why no offering matched. The error message format is clear.
123-128: NewApiTimeoutvariant is well-structured.The variant captures the operation name and timeout duration, providing context for debugging timeout issues. This aligns with the broader timeout handling patterns in the autoscaler.
147-159: Retryability classification is appropriate.Adding
NoMatchingOfferingandApiTimeouttois_retryableis correct:
- Offerings may become available as capacity changes
- API timeouts are typically transient
Removing
NoMatchingOfferingfromis_permanentis consistent with the retryability change.crates/basilica-autoscaler/src/metrics.rs (3)
44-55: New offering-related metrics provide good observability.The metrics cover the key aspects of offering selection: successful selections, failures, duration, and cache behavior. This enables correlation between offering selection events and scaling decisions.
210-218: Histogram buckets are well-suited for cache/lookup operations.The sub-second buckets (0.001s to 1.0s) are appropriate for offering selection which should be fast when cached. If API calls are expected, you might want to extend the upper buckets (e.g., add 2.0, 5.0).
412-439: Recording methods follow established patterns.The new methods (
record_offering_selection,record_no_matching_offering, etc.) are consistent with existing metric recording methods in the file. The method signatures provide appropriate context for the metrics.orchestrator/k8s/services/autoscaler/crds.yaml (3)
380-383: LGTM -labelsAppliedstatus field.The
labelsAppliedboolean with defaultfalsecorrectly tracks GPU label application state to avoid race conditions between node readiness and label application.
417-420: LGTM -offeringIdstatus field.Tracking the offering ID in status enables correlation between the provisioned node and the selected GPU offering for debugging and auditing.
715-740: LGTM -offeringConstraintsstructure.The constraints object provides appropriate filtering capabilities:
providerswith empty default (any provider)regionsas nullable arraymaxHourlyRatewith properdoubleformatfallbackOfferingIdfor graceful degradationThe field structure aligns with the
OfferingConstraintsRust struct inoffering_matcher.rs.crates/basilica-autoscaler/src/runtime.rs (4)
55-60: LGTM - OfferingMatcher initialization with metrics.The offering matcher is correctly instantiated with:
- Shared API client via
Arc::clone- Default configuration (60s cache TTL, 30s API timeout)
- Metrics integration for cache hit/miss observability
109-125: LGTM - Controller wiring with offering selector.Both controllers are correctly wired:
NodePoolController::with_offering_selectorreceives the offering matcher for cache invalidation on rental failuresScalingPolicyController::newreceives it for dynamic offering selection during scale-upThe
Arc::clonepattern ensures efficient shared ownership.
133-156: LGTM - Context structs with generic parameter.The generic parameter
Sis correctly added to both context structs:
ControllerContext<K, A, P, S>for NodePool reconciliationScalingContext<K, A, S>withS: OfferingSelectorbound for ScalingPolicyThis enables type-safe offering selection throughout the reconciliation lifecycle.
183-218: LGTM - NodePool reconcile functions withMaybeOfferingSelectorbound.Using
MaybeOfferingSelectorfor the NodePool controller is the correct design choice - it allows the controller to optionally invalidate offerings on failure without requiring an offering selector to always be present.crates/basilica-autoscaler/src/controllers/node_pool_controller.rs (9)
77-101: LGTM -with_offering_selectorconstructor.The separate constructor for controllers with an offering selector is a clean design pattern. It maintains backward compatibility with the existing
new()constructor while enabling the new offering-aware behavior.
103-120: LGTM -invalidate_offering_on_failureimplementation.The
MaybeOfferingSelectorbound correctly enables optional offering invalidation. The warning log provides visibility into cache invalidation events for debugging failed rentals.
355-399: Good defensive pattern - persisting GPU metadata immediately.Fetching offering details and persisting to status before starting the rental is a sound crash-recovery pattern. If the process crashes after fetching but before persisting, the next reconciliation will re-fetch.
The event emission for unavailable offerings provides user visibility.
406-415: LGTM - Offering invalidation on rental failure.Invalidating the offering from the cache after a rental failure prevents the autoscaler from repeatedly attempting to provision with the same failing offering. The offering will reappear after the cache TTL expires, allowing transient issues to resolve.
646-696: Proactive GPU label application prevents scheduling race conditions.Applying GPU labels immediately when the node is detected (before Ready) is the correct approach. This prevents pods from failing to schedule during the brief window between node readiness and label application.
The retry logic with warning logs and status persistence handles transient label application failures gracefully.
758-767: LGTM - Label re-application during Ready phase.Re-checking and re-applying labels during health checks handles recovery from the Unhealthy state and ensures labels remain consistent even if they were inadvertently removed.
1194-1226: LGTM -build_gpu_labelshelper.The helper correctly:
- Normalizes GPU model names for consistent matching
- Prefers
status.offering_idoverspec.secure_cloud.offering_id- Returns an empty map when no GPU info is available (safe fallback)
1228-1241: LGTM -labels_need_reapplicationhelper.Simple and correct logic - returns
trueif any expected label is missing or has a different value. Handles theNonelabels case appropriately.
1317-1431: LGTM - Comprehensive test coverage for label helpers.The tests cover key scenarios:
- All labels present → no reapplication needed
- Missing labels → reapplication needed
- Mismatched values → reapplication needed
- Empty expected set → no reapplication needed
- Node with no labels → reapplication needed
Good edge case coverage.
crates/basilica-autoscaler/src/offering_matcher.rs (9)
38-47: LGTM - GPU model normalization.The normalization function correctly:
- Converts to uppercase for case-insensitive matching
- Removes non-alphanumeric characters (spaces, hyphens, underscores)
- Produces consistent output for various input formats
This enables flexible user input while ensuring reliable matching.
49-72: LGTM -GpuRequirementswith normalization.Using
BTreeSetforgpu_modelsensures:
- Order-independent equality (important for hash-based grouping)
- Deterministic iteration order
- No duplicate models
Normalizing models in the constructor ensures all comparisons use the normalized form.
122-139: LGTM -merge_withfor constraint composition.The merge logic correctly implements "self takes precedence when set":
- Empty providers list falls back to other's providers
Optionfields useor_elsefor proper fallback- This enables template-level defaults with policy-level overrides
257-293: LGTM - Double-checked locking for cache refresh.The implementation correctly:
- Uses read lock for fast-path cache hit
- Acquires write lock and re-checks before fetching (prevents thundering herd)
- Applies timeout to API call to prevent blocking indefinitely
- Records cache hits/misses for observability
The
Arc<Vec<GpuOffering>>pattern enables efficient sharing of the cached offerings.
380-395: LGTM - Copy-on-Write pattern for cache invalidation.The
Arc::try_unwrapwith fallback to clone is an efficient CoW pattern:
- If no other references exist, reuses the allocation
- If shared, clones only when necessary
- Retains the filtered list in a new Arc
This avoids unnecessary allocations when the cache is exclusively owned.
412-419: LGTM - Non-blocking staleness check.Using
try_read()and returningfalsewhen the lock is held prevents the thundering herd problem. When another task holds the write lock (likely refreshing), callers proceed with potentially stale data rather than all blocking.
422-440: LGTM - Offering comparison logic.The comparison priorities are well-designed:
- Exact GPU count match preferred (avoid over-provisioning)
- Lower hourly rate (cost optimization)
- Fewer GPUs (minimize waste when no exact match)
- Deterministic by ID (stable selection across reconciliations)
Using
total_cmpfor f64 handles NaN gracefully.
637-658: Verify division behavior when offering.gpu_count equals pod_gpu_request.The calculation
pods_per_node = offering.gpu_count / pod_gpu_requestis correct, but consider adding a test case for when they're equal (e.g., 4-GPU offering, 4-GPU request → 1 pod per node).Current tests cover 8/2=4 and 4/1=4 but not the edge case where they match exactly.
660-1136: LGTM - Comprehensive test suite.The tests provide excellent coverage:
- Normalization edge cases
- Hash equality for requirements
- Offering comparison priorities
- Node calculation scenarios
- Constraint filtering
- Cache behavior (hit, miss, invalidation, refresh)
- Mock API implementations
The mock implementations are minimal but sufficient for testing.
crates/basilica-autoscaler/src/controllers/scaling_policy_controller.rs (7)
94-126: LGTM - Provisioning count guard prevents stuck counter.This is a good defensive measure. The
pending_scale_upcounter could become stuck if NodePools are deleted externally or if status updates fail. Clamping it to the actual count of provisioning NodePools ensures accurate state representation.The warning log provides visibility into when this correction occurs.
165-224: LGTM - Rich error handling for NoMatchingOffering.The error handling correctly:
- Records the
OfferingAvailabilitycondition with detailed diagnostics- Persists status before returning the error (ensures condition is visible)
- Clears the condition on success to prevent stale warnings
- Preserves the original error for the caller
432-490: LGTM - Static vs dynamic offering selection.The branching logic is clear:
- Static mode: Uses explicit
offering_idfrom template, bypasses dynamic selection- Dynamic mode: Groups pods by requirements, selects offerings per group
Returning early for static mode avoids unnecessary pod grouping.
648-653: Timestamp format includes milliseconds - verify uniqueness guarantee.The pool name uses
%Y%m%d%H%M%S%3f(milliseconds precision) with an indexi. Ifscale_up_with_offeringis called multiple times within the same millisecond (unlikely but possible under high load), the index differentiates them.This should be sufficient for uniqueness, but consider using a UUID suffix if collision-resistance becomes a concern.
711-759: Good UX - warning events for missing nodeAffinity.Emitting events to pods that request specific GPU models without nodeAffinity helps users understand potential scheduling issues. The message is clear and actionable.
The per-pod lookup to find the raw Pod is O(n²) but acceptable given typical pending pod counts are small.
806-830: LGTM - Constraint merging logic.The effective constraints correctly merge template and policy fields:
- Policy constraints take precedence via
merge_with- Template fields are used only when policy constraints are absent
- Returns
Nonewhen no constraints are specified at either level
886-891: LGTM -OfferingResultenum.Clean design for representing the three possible outcomes of offering lookup:
Found: Normal path, offering selectedUsedFallback: Fallback was used, nodes already createdNotFound: No matching offering, caller handles the errorcrates/basilica-cli/src/cli/handlers/deploy/mod.rs (2)
12-12: LGTM!The new
templatesmodule is correctly declared aspubto allow external access to the deployment template handlers.
41-50: LGTM!The new
VllmandSglangdeploy actions are cleanly integrated following the existing pattern. The destructuring and delegation to template handlers is consistent with other action handlers in this match block.crates/basilica-cli/src/cli/handlers/deploy/templates/mod.rs (1)
1-13: LGTM!Clean module structure with appropriate documentation and re-exports. The public API surface is well-organized, exposing the key items (
GpuRequirements,estimate_gpu_requirements, and the deploy handlers) for downstream use.crates/basilica-cli/src/cli/handlers/deploy/templates/vllm.rs (1)
36-157: LGTM!The main deployment handler is well-structured with clear separation of concerns: GPU requirement estimation, resource building, deployment creation with retry, and status waiting. The flow handles both attached and detached modes appropriately.
crates/basilica-cli/src/cli/handlers/deploy/templates/sglang.rs (2)
194-228: LGTM!The SGLang command building correctly uses
python -m sglang.launch_serverwith appropriate arguments for model path, host, port, and optional parameters like tensor parallelism, context length, quantization, and memory fraction.
36-157: LGTM!The SGLang deployment handler follows the same well-structured pattern as vLLM, with appropriate framework-specific customizations for the default model, port, and command structure.
crates/basilica-cli/src/cli/handlers/deploy/templates/model_size.rs (3)
23-26: LGTM!Good use of
LazyLockfor lazy initialization of the compiled regex. The pattern(\d+\.?\d*)bcorrectly captures parameter counts like "7b", "70b", "0.5b", "1.5b" after the input is lowercased.
62-71: LGTM!The memory estimation formula (2GB per billion parameters × 1.2 overhead, rounded up to 8GB) is a reasonable heuristic for FP16 models with KV cache overhead.
131-187: LGTM!Comprehensive unit tests covering parameter extraction, memory estimation, GPU count calculation, and GPU recommendations. The tests verify edge cases like fractional parameters (0.5B), capped GPU counts, and various model sizes.
crates/basilica-cli/src/cli/commands.rs (1)
736-762: Newvllm/sglangdeploy actions are wired cleanlyThe new subcommands and their flattened option groups are consistent with existing clap patterns in this file and give a clear, dedicated surface for templated inference deployments. No issues from this diff alone.
crates/basilica-sdk-python/python/basilica/templates/__init__.py (1)
1-14: Template API re-exports look goodThe module docstring and
__all__surface (GpuRequirements,estimate_gpu_requirements) are consistent withmodel_size.pyand give a clear, minimal public API for templates. No changes needed.
…lity * Extract namespace definitions to separate namespaces.yaml file * Add secrets-template.yaml with placeholder credentials documentation * Add .gitignore to prevent accidental secrets.yaml commits * Remove ServiceMonitor from deployment.yaml (moved elsewhere) * Update rollout.sh to use basilica-autoscaler namespace
* Add `basilica deploy vllm` command for vLLM inference server deployment * Add `basilica deploy sglang` command for SGLang inference server deployment * Implement model_size module for auto-detecting GPU requirements from model names * Add TemplateCommonOptions for shared deployment parameters (name, gpu, memory, ttl) * Add VllmOptions for vLLM-specific settings (tensor_parallel_size, dtype, quantization) * Add SglangOptions for SGLang-specific settings (context_length, mem_fraction_static) * Support automatic storage configuration for HuggingFace model caching * Add regex dependency for parameter extraction from model names
* Add BasilicaClient.deploy_vllm() for one-line vLLM deployments * Add BasilicaClient.deploy_sglang() for one-line SGLang deployments * Implement templates/model_size.py for GPU requirement estimation * Auto-detect GPU count and recommended GPU based on model size * Support all vLLM options (tensor_parallel_size, dtype, quantization, etc.) * Support all SGLang options (context_length, mem_fraction_static, etc.) * Auto-configure persistent storage for HuggingFace model caching * Auto-generate deployment names from model identifiers
* Add 19_deploy_vllm_template.py demonstrating deploy_vllm() method * Add 20_deploy_sglang_template.py demonstrating deploy_sglang() method * Show minimal configuration using defaults * Show custom model deployment with advanced options * Include cleanup examples
* Extend AutoscalerError with NoMatchingOffering struct variant for detailed error reporting * Add ApiTimeout error variant for timeout handling in API operations * Add get_offering method to SecureCloudApi trait for single offering lookup * Implement K8s event creation methods for Pod and NodePool visibility * Add offering selection metrics: selection counts, cache hits/misses, duration histograms * Add no_matching_offering counter for tracking failed selections
* Implement OfferingMatcher with caching and TTL-based refresh (60s default) * Add OfferingSelector trait for testability and dependency injection * Support GPU model matching with normalization and prefix matching * Add OfferingConstraints for provider/region/rate filtering with fallback support * Implement pod grouping by GPU requirements for efficient batch scaling * Add node label constants (basilica.ai/gpu-model, gpu-count, gpu-memory-gb) * Extract GPU requirements from pod annotations and labels * Support has_gpu_node_affinity detection for scheduling validation * Include comprehensive test suite with mock API implementations
* Add offering_constraints field for dynamic offering selection * Make SecureCloudTemplate.offering_id optional to enable dynamic mode * Update field descriptions to clarify dynamic selection behavior * Support provider, region, max_hourly_rate, and fallback_offering_id constraints
…Policy controller * Add OfferingSelector dependency to controller for dynamic offering lookup * Implement pod grouping by GPU requirements for batch provisioning * Support static offering_id mode (bypass dynamic selection) for backwards compatibility * Add fallback_offering_id support when no dynamic match found * Emit K8s events to pods when no matching offering is available * Add OfferingAvailability condition to ScalingPolicy status for diagnostics * Implement pending_scale_up counter correction to prevent stuck state * Add millisecond precision to NodePool names for collision avoidance * Emit MissingNodeAffinity warnings for pods lacking GPU node affinity * Build effective constraints by merging policy-level and template-level settings
* Add offering_id field to track which GPU offering provisioned the node * Add labels_applied boolean to track GPU label application state * Prevents race conditions between node Ready and label application
…ache invalidation * Add optional OfferingSelector for cache invalidation on rental failures * Fetch and persist GPU metadata (model, count, memory) from offering API * Apply GPU labels to nodes before Ready to prevent scheduling race conditions * Implement labels_need_reapplication check for recovery scenarios * Build GPU labels using node_labels constants for consistency * Add OfferingUnavailable event emission when offering not found * Include GpuInfo in node registration API call * Fix phase timeout to skip u64::MAX (infinite timeout for Ready/Failed phases) * Re-apply GPU labels during health checks for crash recovery * Add comprehensive unit tests for label reapplication logic
* Initialize OfferingMatcher with metrics integration on startup * Wire offering matcher to NodePool controller for cache invalidation * Wire offering matcher to ScalingPolicy controller for dynamic selection * Update controller context structs to support generic offering selector type * Add MaybeOfferingSelector bounds to reconcile functions for flexibility
…tion * Update ScalingPolicy CRD with offeringConstraints schema * Make SecureCloudTemplate.offeringId nullable for dynamic mode * Add labelsApplied and offeringId fields to NodePool status * Add ServiceMonitor for Prometheus metrics scraping * Add default ScalingPolicy manifest with dynamic offering example * Document offering_constraints fields with usage examples
* Add gpu-request.basilica.ai/count label to pods with GPU requirements * Add gpu-request.basilica.ai/model label for GPU model preferences * Add autoscaler.basilica.ai/gpu-requirements annotation with full JSON spec * Enables autoscaler to extract GPU requirements from pending pods
…ning * Change capacity validation from error to warning for insufficient capacity * Allow deployments to proceed with pending pods for autoscaler to detect * Allow scale operations to proceed even when cluster lacks immediate capacity * Autoscaler will provision nodes when it detects pending GPU pods
* Format model_size.rs test comment on separate line * Format sglang.rs generate_sglang_name return expression * Format sglang.rs print_sglang_success curl example * Format vllm.rs gpu-memory-utilization argument extension
…dpoint
* Add GpuPricesResponse wrapper struct for paginated API response
* Rename GpuOffering fields to match API (gpu_memory_gb_per_gpu, hourly_rate_per_gpu)
* Add deserialize_string_or_f64 helper for flexible numeric parsing
* Update API endpoint from /v1/secure-cloud/offerings to /secure-cloud/gpu-prices
* Update RentalInfo struct to match API response (deployment_id, ip_address fields)
* Change stop_rental from DELETE to POST /rentals/{id}/stop
* Add convenience accessor methods (gpu_memory_gb, hourly_rate, available)
* Add tests for deserialization with string rates and field aliases
…tcher tests * Add allow_model_fallback field to OfferingConstraints for flexible GPU selection * Update accessor methods to use GpuOffering's new convenience methods * Simplify region matching logic (region is now non-optional) * Add test_offering helper function to reduce test boilerplate * Refactor all tests to use test_offering helper for cleaner assertions
* Update NodePoolController to use GpuOffering accessor methods * Update RentalInfo field mappings (deployment_id, ip_address) * Add model fallback logic in ScalingPolicyController when allow_model_fallback enabled * Log warnings when falling back to alternative GPU model
…ntals * Add health_check_interval_secs to AggregatorConfig (default 60s) * Update rental_health_check_interval to use configured value * Rename constant to DEFAULT_SECURE_CLOUD_HEALTH_CHECK_INTERVAL_SECS
* Add ID field to registered SSH key display for easier reference
* Add allowModelFallback field to ScalingPolicy CRD offeringConstraints * Extract ServiceMonitor to separate servicemonitor.yaml file * Update scaling-policy.yaml with new scaleUp/scaleDown structure * Add metrics collection configuration to ScalingPolicy * Add scaling-policy.template.yaml for reference configuration * Rename secrets-template.yaml to secrets.template.yaml for consistency * Add scaling-policy.yaml to .gitignore (now generated from template) * Update lifecycle config with deleteGracePeriodSeconds and forceDrain
b9ce972 to
0753e19
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
crates/basilica-autoscaler/src/controllers/scaling_policy_controller.rs (1)
84-127: Address inconsistentpending_scale_uphandling between error casesThe comment on lines 161–164 states "pending_scale_up remains incremented" when scale_up fails, but this is inconsistent:
try_increment_pending_scale_upatomically increments the counter in the CRD- On
scale_up()success: you incrementstatus.pending_scale_uplocally (line 171), preserving the change- On
AutoscalerError::NoMatchingOffering: you do not incrementstatus.pending_scale_up, then callupdate_scaling_policy_status(line 215) with the unmodified status object, which patches the CRD and resets the counter back to its pre-increment value- On other errors: you return immediately without calling
update_scaling_policy_status, leaving the counter incremented as intendedIf the intent is to keep the counter incremented only when NodePools were actually created and will be handled by the NodePool controller, then the NoMatchingOffering case is correct (do not persist the increment). Otherwise, either explicitly adjust
status.pending_scale_upin the NoMatchingOffering handler or refine the comment to clarify that the counter is reset only when no offerings match.
♻️ Duplicate comments (3)
crates/basilica-cli/src/cli/handlers/deploy/templates/model_size.rs (1)
118-129: GPU recommendation strings do not align with autoscaler offerings.This issue was already flagged in a previous review. The GPU strings like "NVIDIA-RTX-A4000" and "A100-40GB" don't match the autoscaler's offering matcher which uses plain model names like "A100" and "H100".
crates/basilica-cli/src/cli/handlers/deploy/templates/vllm.rs (1)
241-258: Shared helpers are still duplicated between vLLM and SGLang templates.
parse_env_vars,create_with_retry,is_quota_exceeded,extract_quota_message,WaitResult,wait_for_ready, andformat_phase_messageare effectively the same as insglang.rs. Extracting these into a sharedtemplates::commonmodule (or similar) would reduce maintenance and keep behavior aligned across frameworks.For example:
- Move the shared helpers into
crates/basilica-cli/src/cli/handlers/deploy/templates/common.rs.- Re-export them from
templates::mod.rsif you want a stable public path.- Update both
vllm.rsandsglang.rstousethe shared helpers instead of their own copies.This is non-blocking but will pay off as more templates are added.
Also applies to: 333-395, 397-478
crates/basilica-cli/src/cli/commands.rs (1)
737-762: Template subcommands still define a separate--jsonflag instead of using the global one
DeployCommanddefines a global--jsonflag (Line 456–458), andTemplateCommonOptionsdefines another--json(Line 803–805), which is flattened intodeploy vllm/deploy sglang(Line 743–747, 756–760). This means:
basilica deploy --json vllm …sets the global flag but the template handlers only seecommon.json.basilica deploy vllm --json …sets the template-local flag, diverging from other deploy subcommands that rely solely on the global one.This is inconsistent UX and risks clap argument conflicts.
Recommend removing
jsonfromTemplateCommonOptionsand threading the globalDeployCommand.jsoninto the template handlers instead:pub struct TemplateCommonOptions { @@ - /// Output as JSON - #[arg(long)] - pub json: bool, }Then adjust the deploy handler wiring so
handle_vllm_deploy/handle_sglang_deployreceivecmd.json(fromDeployCommand) as a separate parameter rather than readingcommon.json.Also applies to: 764-806, 456-458
🧹 Nitpick comments (14)
crates/basilica-autoscaler/src/error.rs (3)
91-96: Richer NoMatchingOffering payload looks good; consider tightening message formattingThe struct-like variant with
gpu_count,models, andmin_memory_gbis a nice upgrade for observability. One nit: using{:?}formin_memory_gbplus theGBsuffix will produce strings likeSome(24)GBorNoneGB, which are a bit awkward to read in logs. If you care about operator UX, consider formatting the optional memory more explicitly (e.g.,"min_memory: {min_memory_gb.map(|v| format!(\"{v}GB\")).unwrap_or_else(|| \"none\".into())}"or similar), even if that means introducing a small helper or a preformatted field.
123-127: ApiTimeout variant is clear; optionally avoid stringly-typed operationsThe dedicated
ApiTimeout { operation, timeout_secs }variant makes the source and SLA of timeouts more explicit than overloadingPhaseTimeout, which is good. If the set of operations is finite and relatively stable (e.g.,SecureCloudProvision,KubeScaleUp, etc.), you might later want to replace theStringwith an enum to avoid typos and ease filtering/metrics, but the current shape is fine for now.
145-171: Reclassifying NoMatchingOffering as retryable is a behavioral change; confirm it’s intendedMarking
NoMatchingOffering { .. }andApiTimeout { .. }as retryable and removingNoMatchingOfferingfromis_permanentchanges control-flow semantics for callers that rely on these helpers to decide whether to keep scaling or give up. ForApiTimeoutthis is intuitive, but forNoMatchingOfferingit depends on whether the underlying condition is expected to be transient (e.g., provider capacity flapping) or effectively a configuration issue (e.g., impossible GPU requirements) in a given code path.If some call sites are closer to “unsatisfiable config” than “transient shortage”, you may want to either:
- Introduce a distinct permanent variant (e.g.,
UnfulfillableOfferingRequirements), or- Gate retries with a counter/backoff to avoid infinite retry loops with noisy logs.
At minimum, worth double‑checking the main scaling loops/watchers that use
is_retryable/is_permanentto ensure this reclassification doesn’t accidentally change a previously-terminal failure into an unbounded retry.orchestrator/k8s/services/autoscaler/scaling-policy.yaml (2)
66-82: Replace hardcoded SSH key ID with a placeholder.The SSH key ID
35f0ab63-e983-40b0-b53d-953cf67b53d2appears to be a real UUID. Templates should use placeholder values to avoid accidental usage of incorrect credentials.# SSH key ID registered with your cloud provider - sshKeyId: "35f0ab63-e983-40b0-b53d-953cf67b53d2" + sshKeyId: "<your-ssh-key-id>" # Replace with your registered SSH key ID
83-108: Consider using a placeholder for the K3s server URL.The server URL
https://connect.basilica.ai:6443may be environment-specific. A placeholder would make the template more universally applicable.# K3s server URL (your cluster's API endpoint) - serverUrl: "https://connect.basilica.ai:6443" + serverUrl: "<your-k3s-server-url>" # e.g., https://k3s.example.com:6443crates/basilica-cli/src/cli/handlers/deploy/templates/sglang.rs (1)
159-192: Extract shared name generation logic to avoid duplication.This function is nearly identical to
generate_vllm_nameinvllm.rs(see relevant code snippets). Consider extracting to a shared utility function.Create a shared helper in the templates module:
// In templates/mod.rs or a new templates/common.rs pub fn generate_deployment_name(model: &str, prefix: &str) -> String { let model_part = model .split('/') .next_back() .unwrap_or(model) .to_lowercase() .chars() .filter_map(|c| { if c.is_ascii_alphanumeric() { Some(c) } else if c == '-' || c == '_' || c == '.' { Some('-') } else { None } }) .collect::<String>(); let sanitized = model_part.trim_matches('-'); let prefix_part = if sanitized.len() > 40 { &sanitized[..40] } else { sanitized }; let prefix_part = if prefix_part.is_empty() { prefix } else { prefix_part }; format!("{}-{}-{}", prefix, prefix_part, &uuid::Uuid::new_v4().to_string()[..8]) }crates/basilica-autoscaler/src/controllers/k8s_client.rs (1)
460-507: Consider adding a random suffix to prevent event name collisions.Using only the nanosecond timestamp for event name uniqueness could lead to collisions if multiple events are created within the same nanosecond (e.g., during rapid pod processing). Consider adding a short random suffix.
+ use rand::Rng; let now = Utc::now(); - let event_name = format!("{}.{}", pod_name, now.timestamp_nanos_opt().unwrap_or(0)); + let random_suffix: u16 = rand::thread_rng().gen(); + let event_name = format!( + "{}.{}.{}", + pod_name, + now.timestamp_nanos_opt().unwrap_or(0), + random_suffix + );crates/basilica-autoscaler/src/api/secure_cloud.rs (1)
497-500: Consider caching or a dedicated API endpoint for single-offering lookup.The current implementation calls
list_offerings()and then filters, which is O(n) and makes a full API call each time. If single-offering lookups are frequent, consider:
- Adding server-side endpoint for single offering lookup
- Caching the offerings list with TTL
For now, this is acceptable if lookups are infrequent, but worth noting for future optimization.
examples/20_deploy_sglang_template.py (1)
23-23: Remove redundant f-strings in constant messages (and optionally harden cleanup).Lines 23 and 41 use
f"...ready!"without any placeholders, which triggers Ruff F541 and is unnecessary. You can drop thefprefix:-print(f"SGLang deployment ready!") +print("SGLang deployment ready!") -print(f"Qwen 3B deployment ready!") +print("Qwen 3B deployment ready!")Optionally, you could wrap the body in a
try/finallyso thatdeployment.delete()still runs if something fails between the two deployments, but that’s not critical for a demo script.Also applies to: 41-41
orchestrator/k8s/services/autoscaler/secrets.template.yaml (1)
27-40: Template private key block is fine but may keep triggering scanners.The inlined OpenSSH private key is clearly a placeholder and marked “REPLACE_WITH_YOUR_SSH_PRIVATE_KEY”, so functionally this is OK. If CI security scanners (like gitleaks) are noisy on this, consider either:
- Renaming the field to something like
ssh-private-key-placeholder, or- Adding this path/signature to your scanner’s allowlist.
That way you keep the helpful scaffold without false-positive noise.
examples/19_deploy_vllm_template.py (1)
23-23: Drop unused f-strings in static messages (and optionally add a finally block).Lines 23 and 41 use
f"..."without interpolation; you can simplify to plain strings to satisfy Ruff F541:-print(f"vLLM deployment ready!") +print("vLLM deployment ready!") -print(f"Llama-2-7B deployment ready!") +print("Llama-2-7B deployment ready!")As with the SGLang script, you could also wrap the main body in
try/finallysodeployment.delete()anddeployment2.delete()run even if something fails, but that’s optional for an example.Also applies to: 41-41
crates/basilica-sdk-python/python/basilica/__init__.py (2)
85-113: Consider propagating estimated VRAM intomin_gpu_memory_gbfor tighter offering constraints
deploy_vllm/deploy_sglangcallestimate_gpu_requirements(model)and use:
reqs.gpu_countto setGpuRequirementsSpec.countreqs.recommended_gputo set candidate modelsbut leave
min_gpu_memory_gb=NoneinGpuRequirementsSpec(Line 562–564, 701–703). That means offering selection only constrains by count + model string, not by the estimated VRAM requirement you already computed.If the backend’s notion of
gpu_typeever includes multiple memory SKUs, this could under-constrain scheduling. A lightweight improvement would be to derive a per‑GPU minimum from the estimate, e.g.:per_gpu_mem = max(1, reqs.memory_gb // max(gpu_count, 1)) gpu_spec = GpuRequirementsSpec( count=gpu_count, model=effective_gpu_models, min_cuda_version=None, min_gpu_memory_gb=per_gpu_mem, )or at least make
min_gpu_memory_gbconfigurable via an optional parameter so callers can tighten constraints when needed.Also applies to: 446-599, 600-737
543-556: Static-analysis S104/S106 findings are expected in this contextThe Ruff warnings about binding to all interfaces (
--host 0.0.0.0) and a “hardcoded password” oncredentials_secretrefer to:
- vLLM/OpenAI server and SGLang server listening inside the pod (
--host 0.0.0.0).- A Kubernetes secret name (
"basilica-r2-credentials"), not an inlined credential.Given these run in cluster-scoped pods and only reference a secret identifier, there’s no obvious security issue. If you want to quiet the lints, you could:
- Add per-rule ignores around these lines, or
- Centralize the secret name in a constant and document that it’s an identifier, not a secret value.
Also applies to: 683-695
crates/basilica-autoscaler/src/offering_matcher.rs (1)
443-474: Optionally also consider GPU limits when derivinggpu_countfrom pods
get_gpu_count_from_podinspects onlyresources.requests["nvidia.com/gpu"]on containers and init containers. In clusters where GPU requests are expressed solely vialimits(with no matching request), such pods would be ignored by the matcher.If you expect that pattern, you could extend the helper to fall back to
resources.limitswhenrequestsis absent:- if let Some(resources) = &container.resources { - if let Some(requests) = &resources.requests { - if let Some(gpu_quantity) = requests.get("nvidia.com/gpu") { + if let Some(resources) = &container.resources { + let gpu_map = resources + .requests + .as_ref() + .or(resources.limits.as_ref()); + if let Some(gpu_quantity) = gpu_map.and_then(|m| m.get("nvidia.com/gpu")) {Same pattern could be applied to init containers. If your admission / templates always set requests, this is optional.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (36)
crates/basilica-api/src/api/routes/deployments/handlers.rs(2 hunks)crates/basilica-api/src/config/mod.rs(4 hunks)crates/basilica-autoscaler/src/api/secure_cloud.rs(8 hunks)crates/basilica-autoscaler/src/controllers/k8s_client.rs(3 hunks)crates/basilica-autoscaler/src/controllers/node_pool_controller.rs(10 hunks)crates/basilica-autoscaler/src/controllers/scaling_policy_controller.rs(13 hunks)crates/basilica-autoscaler/src/crd/node_pool.rs(2 hunks)crates/basilica-autoscaler/src/crd/scaling_policy.rs(3 hunks)crates/basilica-autoscaler/src/error.rs(3 hunks)crates/basilica-autoscaler/src/lib.rs(2 hunks)crates/basilica-autoscaler/src/metrics.rs(5 hunks)crates/basilica-autoscaler/src/offering_matcher.rs(1 hunks)crates/basilica-autoscaler/src/runtime.rs(6 hunks)crates/basilica-cli/Cargo.toml(1 hunks)crates/basilica-cli/src/cli/commands.rs(1 hunks)crates/basilica-cli/src/cli/handlers/deploy/mod.rs(2 hunks)crates/basilica-cli/src/cli/handlers/deploy/templates/mod.rs(1 hunks)crates/basilica-cli/src/cli/handlers/deploy/templates/model_size.rs(1 hunks)crates/basilica-cli/src/cli/handlers/deploy/templates/sglang.rs(1 hunks)crates/basilica-cli/src/cli/handlers/deploy/templates/vllm.rs(1 hunks)crates/basilica-cli/src/cli/handlers/ssh_keys.rs(1 hunks)crates/basilica-operator/src/controllers/user_deployment_controller.rs(2 hunks)crates/basilica-sdk-python/python/basilica/__init__.py(2 hunks)crates/basilica-sdk-python/python/basilica/templates/__init__.py(1 hunks)crates/basilica-sdk-python/python/basilica/templates/model_size.py(1 hunks)examples/19_deploy_vllm_template.py(1 hunks)examples/20_deploy_sglang_template.py(1 hunks)orchestrator/k8s/services/autoscaler/.gitignore(1 hunks)orchestrator/k8s/services/autoscaler/crds.yaml(4 hunks)orchestrator/k8s/services/autoscaler/deployment.yaml(0 hunks)orchestrator/k8s/services/autoscaler/namespaces.yaml(1 hunks)orchestrator/k8s/services/autoscaler/scaling-policy.template.yaml(1 hunks)orchestrator/k8s/services/autoscaler/scaling-policy.yaml(1 hunks)orchestrator/k8s/services/autoscaler/secrets.template.yaml(1 hunks)orchestrator/k8s/services/autoscaler/servicemonitor.yaml(1 hunks)scripts/autoscaler/rollout.sh(1 hunks)
💤 Files with no reviewable changes (1)
- orchestrator/k8s/services/autoscaler/deployment.yaml
✅ Files skipped from review due to trivial changes (1)
- orchestrator/k8s/services/autoscaler/servicemonitor.yaml
🚧 Files skipped from review as they are similar to previous changes (11)
- crates/basilica-operator/src/controllers/user_deployment_controller.rs
- crates/basilica-autoscaler/src/lib.rs
- orchestrator/k8s/services/autoscaler/.gitignore
- crates/basilica-cli/src/cli/handlers/deploy/mod.rs
- crates/basilica-api/src/api/routes/deployments/handlers.rs
- crates/basilica-sdk-python/python/basilica/templates/init.py
- crates/basilica-cli/src/cli/handlers/deploy/templates/mod.rs
- orchestrator/k8s/services/autoscaler/namespaces.yaml
- crates/basilica-cli/Cargo.toml
- crates/basilica-autoscaler/src/metrics.rs
- scripts/autoscaler/rollout.sh
🧰 Additional context used
🧬 Code graph analysis (8)
crates/basilica-cli/src/cli/handlers/deploy/templates/model_size.rs (2)
crates/basilica-cli/src/cli/handlers/deploy/templates/sglang.rs (1)
model(161-176)crates/basilica-cli/src/cli/handlers/deploy/templates/vllm.rs (1)
model(162-177)
crates/basilica-autoscaler/src/controllers/node_pool_controller.rs (1)
crates/basilica-autoscaler/src/offering_matcher.rs (1)
normalize_gpu_model(41-47)
examples/19_deploy_vllm_template.py (3)
crates/basilica-cli/src/github_releases.rs (1)
basilica(21-26)crates/basilica-sdk-python/python/basilica/deployment.py (1)
url(183-191)crates/basilica-cli/src/cli/handlers/deploy/validation.rs (1)
memory(401-402)
crates/basilica-sdk-python/python/basilica/__init__.py (2)
crates/basilica-sdk-python/python/basilica/deployment.py (3)
name(178-180)Deployment(114-453)_from_response(436-453)crates/basilica-sdk-python/python/basilica/templates/model_size.py (1)
estimate_gpu_requirements(30-60)
crates/basilica-autoscaler/src/runtime.rs (2)
crates/basilica-autoscaler/src/controllers/node_pool_controller.rs (3)
new(59-74)clone(41-50)with_offering_selector(84-100)crates/basilica-autoscaler/src/controllers/scaling_policy_controller.rs (2)
new(58-70)clone(42-49)
crates/basilica-sdk-python/python/basilica/templates/model_size.py (1)
crates/basilica-autoscaler/src/api/secure_cloud.rs (1)
gpu_memory_gb(215-217)
crates/basilica-autoscaler/src/api/secure_cloud.rs (1)
crates/basilica-billing/src/domain/rentals.rs (1)
offering_id(223-225)
examples/20_deploy_sglang_template.py (2)
crates/basilica-sdk-python/python/basilica/__init__.py (1)
deploy_sglang(600-737)crates/basilica-sdk-python/python/basilica/deployment.py (1)
url(183-191)
🪛 Gitleaks (8.30.0)
orchestrator/k8s/services/autoscaler/secrets.template.yaml
[high] 38-40: Identified a Private Key, which may compromise cryptographic security and sensitive data encryption.
(private-key)
🪛 Ruff (0.14.8)
examples/19_deploy_vllm_template.py
23-23: f-string without any placeholders
Remove extraneous f prefix
(F541)
41-41: f-string without any placeholders
Remove extraneous f prefix
(F541)
crates/basilica-sdk-python/python/basilica/__init__.py
520-520: Possible binding to all interfaces
(S104)
551-551: Possible hardcoded password assigned to argument: "credentials_secret"
(S106)
667-667: Possible binding to all interfaces
(S104)
690-690: Possible hardcoded password assigned to argument: "credentials_secret"
(S106)
examples/20_deploy_sglang_template.py
23-23: f-string without any placeholders
Remove extraneous f prefix
(F541)
41-41: f-string without any placeholders
Remove extraneous f prefix
(F541)
🔇 Additional comments (38)
crates/basilica-cli/src/cli/handlers/ssh_keys.rs (1)
405-405: LGTM! Good enhancement to display the key ID.This addition provides users with the SSH key ID, which can be useful for debugging or cross-referencing with API calls. The formatting and styling are consistent with the adjacent fields.
crates/basilica-api/src/config/mod.rs (5)
28-29: LGTM!The constant is appropriately named and documented. The 60-second default is reasonable for health check intervals.
129-131: LGTM!The field is properly documented and uses appropriate types. The serde default attribute ensures backward compatibility with existing configuration files that don't specify this field.
141-143: LGTM!The function follows the established pattern in this file for providing serde defaults and correctly returns the constant.
149-149: LGTM!The field initialization correctly uses the default function, maintaining consistency with the existing pattern.
311-313: No remaining usages of the removed constant detected.The verification confirms that the constant
RENTAL_HEALTH_CHECK_INTERVAL_SECShas been completely removed from the codebase with no dangling references. The methodrental_health_check_interval()now correctly uses the configurable fieldself.aggregator.health_check_interval_secs, and all related code has been properly updated. The change is safe to merge.orchestrator/k8s/services/autoscaler/scaling-policy.yaml (4)
1-27: LGTM - Well-documented policy header and scale-up configuration.The file header clearly explains the purpose and includes the apply command. The scale-up configuration with
pendingPodThreshold: 1ensures responsive scaling whilecooldownSeconds: 60prevents thrashing.
28-40: LGTM - Reasonable scale-down configuration.The scale-down parameters provide good defaults: 10-minute idle timeout, 10% GPU utilization threshold, and 5-minute zero-pods duration strike a balance between cost savings and stability.
41-65: LGTM - Dynamic offering constraints well-documented.The offering constraints section clearly explains each option with helpful inline comments. The
allowModelFallback: truedefault is sensible for availability.
109-115: LGTM - Lifecycle hooks configuration is sensible.The drain-before-removal with a 10-minute timeout and 30-second grace period provides safe node removal while
forceDrain: falserespects PDBs.crates/basilica-cli/src/cli/handlers/deploy/templates/sglang.rs (6)
36-68: LGTM - Clean deployment flow with sensible defaults.The function properly handles optional model input, auto-detects GPU requirements, and provides informative output when auto-detecting. The name generation fallback is appropriate.
230-247: LGTM - Robust environment variable parsing.The
parse_env_varsfunction correctly validates the KEY=VALUE format and provides clear error messages. Usingsplitn(2, '=')properly handles values containing=characters.
292-320: LGTM - Well-configured health checks with appropriate timing.The health check configuration uses sensible values:
- Startup probe with 60 failure threshold allows up to 10 minutes for model loading
- Liveness probe with 60s initial delay gives time for model initialization
- Readiness probe is appropriately more sensitive with 10s period
322-354: LGTM - Solid retry implementation with jitter.The exponential backoff with jitter (0.75-1.25 factor) is well-implemented and follows best practices. The quota detection logic properly surfaces quota errors immediately without retrying.
386-452: LGTM - Well-structured wait loop with phase-aware polling.The wait loop implementation is clean:
- Dynamic sleep duration based on phase is efficient
- Phase transitions trigger informative user messages
- Terminal states (Failed, Terminating) are properly detected
20-27: The use oflmsysorg/sglang:latestaligns with SGLang's official documentation, which explicitly recommends this tag. The current stable release is v0.5.6.post2. No changes needed.Likely an incorrect or invalid review comment.
crates/basilica-autoscaler/src/controllers/k8s_client.rs (2)
75-97: LGTM - Well-designed event trait methods.The trait extensions for pod and node pool events provide good observability hooks. The method signatures appropriately make
pod_uid/pool_uidoptional to handle cases where UIDs may not be available.
509-556: LGTM - Consistent event creation for NodePool resources.The NodePool event implementation correctly uses the CRD's API version (
autoscaler.basilica.ai/v1alpha1) and appropriate action type (Provisioning). The structure mirrors the pod event implementation for consistency.crates/basilica-cli/src/cli/handlers/deploy/templates/model_size.rs (4)
1-50: LGTM - Clean module structure with clear documentation.The module is well-organized with appropriate documentation. The
estimate_gpu_requirementsfunction provides a sensible fallback chain: parameter extraction → model family detection → default.
52-71: LGTM - Reasonable memory estimation heuristics.The formula (2GB × params × 1.2 overhead, rounded up to 8GB) is a reasonable approximation for FP16 inference. The 20% overhead accounts for KV cache and runtime memory.
73-110: LGTM - Good coverage of common model families.The model family detection covers major models (Llama variants, Mixtral, Qwen, Mistral, Phi, Gemma, TinyLlama) with appropriate memory estimates. The 16GB default for unknown models is conservative.
131-187: LGTM - Comprehensive test coverage.The tests thoroughly cover parameter extraction, memory estimation, GPU count calculation, and end-to-end estimation. The floating-point comparisons use appropriate epsilon values.
crates/basilica-autoscaler/src/api/secure_cloud.rs (6)
153-187: LGTM - Robust deserialization for flexible API responses.The
deserialize_string_or_f64implementation correctly handles string, f64, i64, and u64 inputs, making it resilient to API variations where numeric fields may be serialized as strings (common with decimal types).
196-228: LGTM - Well-structured GpuOffering with backwards compatibility.The struct uses
#[serde(alias = ...)]for backward compatibility with legacy field names (available,hourly_rate). The convenience accessors (gpu_memory_gb(),hourly_rate(),available()) provide a clean API.
329-330: LGTM - Clean trait extension for offering lookup.Adding
get_offeringto the trait provides a convenient way to retrieve a specific offering by ID.
543-575: LGTM - Stop rental endpoint correctly uses POST.The change from DELETE to POST for
/secure-cloud/rentals/{id}/stopis appropriate since stopping a rental is an action rather than resource deletion.
767-829: LGTM - Comprehensive deserialization tests.The tests cover:
- Standard API response format
- Legacy field aliases (
available,hourly_rate)- String-formatted hourly rates (API's Decimal type)
This ensures robust handling of API variations.
230-243: TheRentalInfostruct does not have aregionfield. The struct contains only 8 fields:rental_id,deployment_id,provider,status,ip_address,ssh_command, andhourly_cost. Theregionfield exists in theGpuOfferingstruct (as a requiredString), not inRentalInfo.Likely an incorrect or invalid review comment.
crates/basilica-autoscaler/src/crd/node_pool.rs (1)
386-389: New status fields for offering and label state look consistent and safe.
offering_id: Option<String>andlabels_applied: boolare wired with sensible serde defaults and match the camelCase CRD fields (offeringId,labelsApplied). This should integrate cleanly with dynamic offering tracking and GPU label application without breaking existing consumers.Also applies to: 454-457
crates/basilica-autoscaler/src/crd/scaling_policy.rs (1)
7-7: Dynamic offering fields are well-integrated with the CRD model.Making
SecureCloudTemplate.offering_idoptional and addingoffering_constraints: Option<OfferingConstraints>onScalingPolicySpeccleanly express “static versus dynamic” offering selection in the type system. The serde defaults/skip_serializing_ifusage match the CRD and should preserve backward compatibility for existing manifests that don’t set these fields.Also applies to: 49-52, 170-201
orchestrator/k8s/services/autoscaler/scaling-policy.template.yaml (1)
11-115: ScalingPolicy template matches the CRD and clearly documents dynamic offering behavior.The
offeringConstraintsblock andnodeTemplate.secureCloudfields line up with the CRD schema and give a sensible default policy for dynamic GPU provisioning (including clear comments about optionalofferingIdoverrides and cost caps). No issues from a config/CRD compatibility perspective.crates/basilica-cli/src/cli/handlers/deploy/templates/vllm.rs (1)
35-157: vLLM deployment flow, resources, and readiness handling look solid.The overall flow in
handle_vllm_deploy(GPU estimation, resource construction, storage + health checks, exponential-backoffcreate_with_retry, and phase-awarewait_for_ready) is cohesive and matches the intended UX for template-based deployments.build_vllm_resourcesandbuild_vllm_health_checkprovide sane defaults, and the success output fromprint_vllm_successgives users clear endpoints and usage examples. I don’t see correctness issues in these paths given the surrounding types.Also applies to: 191-331, 333-478, 480-519
orchestrator/k8s/services/autoscaler/crds.yaml (1)
380-383: CRD schema updates align with Rust CRD structs and dynamic offering semantics.The additions of
status.labelsApplied,status.offeringId,spec.nodeTemplate.secureCloud.offeringId, andspec.offeringConstraintsare consistent with the corresponding Rust types (NodePoolStatus,SecureCloudTemplate,ScalingPolicySpec). Defaults and nullability are set appropriately, so these CRDs should round-trip cleanly with the controller code while exposing the new dynamic offering and label-application state.Also applies to: 417-420, 661-711, 715-744
crates/basilica-autoscaler/src/runtime.rs (1)
23-24: OfferingMatcher integration into controller runtime looks soundThe runtime cleanly injects a single
OfferingMatcherinstance (with metrics) into bothNodePoolController::with_offering_selectorandScalingPolicyController::new, and the updatedControllerContext/ScalingContextgenerics plus reconcile bounds (S: MaybeOfferingSelectorvsS: OfferingSelector) are consistent with the controller impls.No obvious correctness issues in how the selector is constructed, shared via
Arc, or threaded through reconcile/error-policy paths.Also applies to: 55-61, 109-126, 131-156, 183-218, 221-254
crates/basilica-autoscaler/src/controllers/node_pool_controller.rs (2)
20-33: Offering selector integration and rental failure invalidation are wired correctlyThe new generic parameter
Swith default()plus:
NodePoolController::new(S = (),offering_selector: None)NodePoolController::with_offering_selector(S:OfferingSelector,offering_selector: Some(Arc<S>))invalidate_offering_on_failureguarded onOption<Arc<S>>andS: MaybeOfferingSelectorgives a clean way to enable offering‑cache invalidation only when an
OfferingMatcheris present. The call fromhandle_provisioningonstart_rentalfailure (Line 406–415) looks correct and ensures repeated rental failures for a givenoffering_idwill evict that entry from the in‑memory cache without affecting non‑dynamic setups.No changes needed here.
Also applies to: 35-51, 53-75, 77-101, 103-121, 401-455
646-736: GPU label application and reapplication logic is robust and well‑testedThe new helpers:
build_gpu_labels(Line 1194–1225) centralize construction of GPU/node/offering labels using normalized GPU model strings and status/spec precedence.labels_need_reapplication(Line 1228–1241) correctly detects missing/mismatched labels, with thorough unit tests (Line 1318–1431).Their use in:
handle_waiting_for_nodeto eagerly apply labels as soon as the node appears (before Ready), and to ensure labels are applied before transitioning to Ready.handle_readyto re-apply labels when drift is detected.nicely covers both the initial race window and later drift / recovery scenarios. This should materially reduce “node is Ready but unschedulable due to missing labels” issues.
Also applies to: 738-785, 1194-1241, 1318-1431
crates/basilica-autoscaler/src/offering_matcher.rs (1)
49-72: Core offering matcher design (caching, selection, grouping) looks solidHighlights:
GpuRequirementsnormalizes model names into aBTreeSet, making grouping by requirements deterministic and order‑insensitive (Line 49–72, 696–705).OfferingMatcher’s cache (RwLock<OfferingCache>, TTL,tokio::time::timeoutwrapped API calls) with metrics hooks is well‑structured (Line 206–221, 228–301).- Model matching (exact + forward prefix only), memory filtering, and constraint checks are sensibly layered before
compare_offerings, which encodes a clear preference order (exact GPU count → cheaper → fewer GPUs → stable ID) (Line 303–357, 423–441).- Pod grouping and node‑count calculation (
group_pending_pods_by_requirements,calculate_nodes_needed) are correct for homogenous groups and well tested (Line 620–660, 734–783).The test suite exercises normalization, equality, comparison ordering, cache behavior, and constraint filters thoroughly, which gives good confidence in the matcher’s behavior.
Also applies to: 206-221, 228-301, 303-357, 359-421, 423-441, 620-660, 661-905
crates/basilica-autoscaler/src/controllers/scaling_policy_controller.rs (1)
432-490: Dynamic GPU offering–aware scale-up, constraints, and pod visibility are well-structuredThe new dynamic path in
scale_upand helpers:
- Groups pending GPU pods by normalized
GpuRequirements(viagroup_pending_pods_by_requirements) and guards against over‑creating nodes with themax_count/remaininglogic (Line 432–490, 501–559, 630–659).- Uses
OfferingSelector+OfferingConstraintswith a clear precedence/merge strategy (build_effective_constraints) so policy‑level constraints override template hints while still honoring template defaults (Line 501–559, 831–855).- Handles three offering cases explicitly: exact match, relaxed model match (when
allow_model_fallback), and explicit fallbackoffering_id, each with appropriate logging, metrics and, in the fallback case, immediate NodePool creation (Line 561–627, 911–916).- Surfaces user‑visible K8s events for two important failure modes:
- Pods that specify GPU models but lack nodeAffinity (
MissingNodeAffinity) (Line 738–784).- Pods for which no offering could be found (
OfferingUnavailable) (Line 786–829).- Ensures NodePools are always created via
scale_up_with_offeringthat wires the chosenoffering_idintoSecureCloudConfigand tags NodePools with the scaling policy label (Line 661–699, 857–901).Overall this gives a clear, explainable autoscaling behavior around GPU offerings and makes diagnosis much easier for users.
Also applies to: 492-559, 561-627, 630-659, 738-784, 786-829, 831-855, 857-901, 911-916, 918-949
…hain monitor * Add reconnect() method to re-establish blockchain connection on failures * Reuse LegacyRpcMethods client instead of creating new RPC per block fetch * Wrap client and rpc_methods in Arc<RwLock> for thread-safe reconnection * Add is_block_likely_pruned() helper to detect pruned blocks on non-archive nodes * Extract is_insecure_endpoint() helper to reduce code duplication * Fix get_transfers_from_block to be a static method for cleaner code
…and skip behavior * Add MonitorConfig with max_block_retries, block_retention_threshold, max_block_gap, auto_reconnect * Add config validation for new blockchain resilience parameters * Implement automatic block skipping when failures exceed retry threshold * Add reconnection attempts after multiple consecutive connection failures * Detect and handle likely-pruned blocks using bittensor is_block_likely_pruned * Handle large block gaps on service restart by skipping to current block * Export MonitorConfig from blockchain module for use in main * Initialize ChainMonitor with config from BlockchainConfig
…n list * Use only floating_ip for Hyperstack VMs (fixed_ip is VPC-internal only) * Refresh active rentals from provider before listing to get latest IP/status * Poll provider for pending/provisioning rentals to capture IP assignment * Autoscaler will now poll until floating_ip is assigned by Hyperstack
…s and API types * Add WireGuardPeerStatus struct for storing peer config in NodePool status * Add peers field to WireGuardStatus for registration response storage * Add ssh_user field to SecureCloudConfig and SecureCloudTemplate (default: ubuntu) * Add datacenter_id field to SecureCloudTemplate for node registration * Change WireGuardConfig default enabled to true for remote nodes * Refactor NodeRegistrationRequest to use GpuSpecs instead of optional GpuInfo * Update NodeRegistrationResponse to make wireguard optional with peers list * Add WireGuardPeer.allowed_ips() method to compute allowed IPs including pod network * Add get_rental() method to SecureCloudApi trait for polling rental status * Add ListRentalsResponse and RentalListItem for parsing list endpoint * Add ConnectivityCheck error variant for better diagnostics
* Add list_replica_sets_for_deployment to query RS owned by a deployment * Add scale_replica_set to adjust RS replica count via JSON patch * Implement methods in MockK8sClient, KubeClient, and RateLimitedKubeClient * Filter RS by owner references to find deployment-owned replicas
…e ReplicaSets * Use normalize_gpu_models in job, rental, and user_deployment controllers * Add cleanup_stale_replica_sets to scale down old RS with 0 ready replicas * Prevent autoscaler from provisioning nodes for pods with outdated nodeAffinity * Add unit tests for GPU model normalization and deduplication in nodeAffinity
…nerd config * Add PendingPodFilterConfig to limit max pending age and skip stale node affinity * Add filter_pods_by_age to exclude pods pending too long from scaling decisions * Add filter_pods_with_stale_affinity to skip pods whose GPU model matches existing nodes * Refactor NVIDIA container runtime script to use base64 encoding for transfer * Fix containerd v3 config section ordering for default_runtime_name * Add comprehensive unit tests for pending pod filtering behavior
…dling * Return 0 balance for accounts not found on chain instead of parsing errors * Return error instead of 0 when balance parsing fails to prevent silent failures * Extract block_number from finalized transactions for sweep tracking * Add block_number field to TransferReceipt struct * Separate wait_for_finalized and wait_for_success for better error handling
…rage * Add get_pending_sweep query to check for existing pending/submitted sweeps * Add list_stale_sweeps to find sweeps stuck beyond threshold for reconciliation * Pre-compute required_reserve in SweepCalculator to avoid repeated addition * Add required_reserve() accessor for sweep decision visibility
…config * Add max_sweeps_per_cycle config to prevent transaction flooding (default: 50) * Reject default all-zeros AEAD key in production environment * Reject default AEAD key when reconciliation is enabled * Validate AEAD key format: must be exactly 64 hex characters (32 bytes) * Add comprehensive unit tests for AEAD key validation rules
… wallet monitoring * Add idempotency check to skip accounts with pending/submitted sweeps * Re-verify balance before sweep to minimize race condition window * Add reconcile_stale_sweeps to recover sweeps stuck in pending/submitted state * Add monitor_cold_wallet to track treasury balance for operational visibility * Apply max_sweeps_per_cycle rate limit to prevent transaction flooding * Update sweep status to Confirmed on successful transfer with block_number * Mark stale sweeps as Confirmed or Failed based on on-chain balance verification
* Fix PAYMENTS_AEAD_KEY_HEX to PAYMENTS_TREASURY__AEAD_KEY_HEX (nested config) * Add ECR repository for AEAD migration image with lifecycle policy * Add ECS task definition for running migration in Fargate * Add migration tool: standalone Rust binary with AEAD decrypt/re-encrypt logic * Include Dockerfile for building minimal migration container * Document dry-run workflow with aws ecs run-task commands in outputs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 10
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
crates/basilica-payments/src/reconciliation/service.rs (1)
176-177: Extract repeated address preview logic into a helper function.The pattern
address.chars().take(N).collect::<String>()appears throughout this file (lines 176, 193, 205, 216, 230, 260, 287-294, 457-462, 502) and in other files (per relevant code snippets). This violates DRY.Add helper functions to a common module:
fn truncate_account_hex(account_hex: &str) -> String { account_hex.chars().take(8).collect() } fn truncate_ss58_address(address: &str) -> String { address.chars().take(10).collect() }Then use throughout:
-let account_preview = account_hex.chars().take(8).collect::<String>(); +let account_preview = truncate_account_hex(account_hex);-let hotwallet_preview = hotwallet_ss58.chars().take(10).collect::<String>(); +let hotwallet_preview = truncate_ss58_address(&hotwallet_ss58);Also applies to: 193-194, 205-206, 216-217, 230-231, 260-261, 287-294, 457-462, 502-503
crates/basilica-operator/src/controllers/rental_controller.rs (1)
148-155: Align GPU model normalization with BasilicaQueue per‑model limits
build_node_affinitynow normalizesgpu.modelvianormalize_gpu_models(&gpu.model)for thebasilica.ai/gpu-modelselector, which is good for scheduling and for how you attribute GPUs to models when walking pod nodeAffinity later inreconcile.However, the BasilicaQueue per‑model limit path below still uses the raw
spec.container.resources.gpus.model:
requested_modelsis built directly fromspec.container.resources.gpus.model(lines around 899–904).- Per‑model caps are looked up with
per_model.get(&m)wheremis one of those raw strings.If a user specifies models with memory suffixes or mixed case (e.g.
["A100-40GB", "H100-80GB"]), node affinity andmodel_gpusaccounting will see normalized short names like"A100"/"H100", butrequested_modelswill still be"A100-40GB"/"H100-80GB". AGpuLimits.models` map keyed by normalized short names would then never be consulted for such requests, effectively disabling per‑model caps for those workloads.Consider normalizing requested models before enforcing per‑model limits, e.g.:
- let requested_models = if spec.container.resources.gpus.model.is_empty() { - vec!["any".to_string()] - } else { - spec.container.resources.gpus.model.clone() - }; + let requested_models = if spec.container.resources.gpus.model.is_empty() { + vec!["any".to_string()] + } else { + normalize_gpu_models(&spec.container.resources.gpus.model) + };and document that
GpuLimits.modelskeys are expected to use the same normalized short names. This keeps affinity, usage accounting, and caps all speaking the same model vocabulary.
♻️ Duplicate comments (3)
crates/basilica-operator/src/controllers/user_deployment_controller.rs (1)
535-553: Fix invalid GPU model label value for multi‑model specs and surface JSON errors to aid autoscaler debuggingWhen
gpu_confighas multiple models, this block:labels.insert( "gpu-request.basilica.ai/model".to_string(), gpu.model.join(","), );will produce label values like
"A100,H100". Kubernetes label values must be alphanumeric and may only contain-,_, and.as separators; commas are not allowed. A Deployment with such a label will be rejected by the API server for multi‑model specs, breaking those deployments. This was already called out in a previous review.To make the label both valid and consistent with your normalization story, consider:
- Normalizing the models with
normalize_gpu_models(&gpu.model)(same as node affinity), and- Joining with a label‑safe delimiter such as
"-"or".".For example:
- if !gpu.model.is_empty() { - labels.insert( - "gpu-request.basilica.ai/model".to_string(), - gpu.model.join(","), - ); - } + if !gpu.model.is_empty() { + let models = normalize_gpu_models(&gpu.model); + // Kubernetes label values cannot contain commas; join with a label-safe separator. + labels.insert( + "gpu-request.basilica.ai/model".to_string(), + models.join("-"), + ); + }Also, JSON serialization failures for the autoscaler annotation are silently ignored:
if let Ok(json) = serde_json::to_string(gpu) { annotations.insert("autoscaler.basilica.ai/gpu-requirements".to_string(), json); }If this ever fails (schema drift, unexpected values), the autoscaler loses its primary source of truth with no signal. It would be safer to at least emit a debug log:
match serde_json::to_string(gpu) { Ok(json) => { annotations.insert( "autoscaler.basilica.ai/gpu-requirements".to_string(), json, ); } Err(e) => { debug!( error = %e, "Failed to serialize GPU requirements for autoscaler annotation" ); } }Finally, it’s worth adding a small unit test that renders a Deployment with a multi‑model
GpuSpecand asserts both the label (gpu-request.basilica.ai/model) and theautoscaler.basilica.ai/gpu-requirementsannotation are present and valid.Also applies to: 618-625
crates/basilica-autoscaler/src/controllers/scaling_policy_controller.rs (1)
771-771: Fix invalid chrono format specifier for milliseconds.Line 771 uses
%3fwhich is not a valid chrono strftime specifier. The correct format for millisecond precision is%.3f(with a leading dot).Apply this fix:
- Utc::now().format("%Y%m%d%H%M%S%3f"), + Utc::now().format("%Y%m%d%H%M%S%.3f"),Without this fix, milliseconds will not be included in the timestamp, reducing uniqueness guarantees when creating multiple NodePools rapidly within the same second.
crates/basilica-autoscaler/src/provisioner/ssh.rs (1)
453-456: Forcefully killing apt/dpkg and removing locks is still risky.This issue was previously flagged: killing
dpkgwith-9and removing lock files can corrupt the package database if an active operation is in progress (not just a stale lock).The wait loop above (lines 443-451) correctly uses
fuserto detect lock holders, but then forcefully kills processes and removes locks anyway. A safer approach is to fail with a clear error if locks are still held after the timeout, as suggested in the previous review.
🧹 Nitpick comments (24)
crates/basilica-payments/src/blockchain/client.rs (2)
99-121: Debug format string parsing is fragile.The fallback relies on the
Debugtrait's output format ({:?}), which is not a stable API and may change between Rust/library versions. This could cause silent failures if the format changes.Consider adding a comment documenting this fragility, or alternatively, inspect the raw decoded value's structure more robustly using subxt's
ValueAPI (e.g., iterating over composite fields).
220-224:TransferStatus::InBlockappears to be dead code.The
transfermethod now waits for finalization and only returnsFinalizedstatus. TheInBlockvariant is never used.Consider removing it if no longer needed, or add
#[allow(dead_code)]with a comment explaining its intended future use.crates/basilica-autoscaler/src/config.rs (2)
183-205: Consider refactoring to eliminate duplicate defaults.The
Defaultimplementation duplicates the same default values asfrom_env()(600 and true). This violates DRY and creates maintenance risk if defaults need to change.Consider one of these approaches:
- Remove the
Defaultimpl if it's not needed outside tests- Or have
Default::default()delegate tofrom_env()style defaults via a shared constantExample refactor:
+const DEFAULT_MAX_PENDING_AGE_SECONDS: u64 = 600; +const DEFAULT_SKIP_STALE_NODE_AFFINITY: bool = true; + impl PendingPodFilterConfig { fn from_env() -> Self { Self { max_pending_age_seconds: parse_u64_env( "BASILICA_AUTOSCALER_MAX_PENDING_AGE_SECS", - 600, // 10 minutes default + DEFAULT_MAX_PENDING_AGE_SECONDS, ), skip_stale_node_affinity: parse_bool_env( "BASILICA_AUTOSCALER_SKIP_STALE_NODE_AFFINITY", - true, + DEFAULT_SKIP_STALE_NODE_AFFINITY, ), } } } impl Default for PendingPodFilterConfig { fn default() -> Self { Self { - max_pending_age_seconds: 600, - skip_stale_node_affinity: true, + max_pending_age_seconds: DEFAULT_MAX_PENDING_AGE_SECONDS, + skip_stale_node_affinity: DEFAULT_SKIP_STALE_NODE_AFFINITY, } } }
183-205: Consider adding test coverage for the new configuration.The existing test suite validates other config sections but doesn't cover
PendingPodFilterConfig. Adding tests would ensure the environment variable parsing and defaults work as expected.Example test:
#[test] fn pending_pod_filter_defaults() { let config = PendingPodFilterConfig::from_env(); assert_eq!(config.max_pending_age_seconds, 600); assert_eq!(config.skip_stale_node_affinity, true); } #[test] fn pending_pod_filter_in_autoscaler_config() { let config = AutoscalerConfig::from_env(); assert_eq!(config.pending_pod_filter.max_pending_age_seconds, 600); }scripts/cloud/migration.tf (1)
136-153: Helpful operator-facing output, but clarify the DRY_RUN default.The generated commands are well-documented. Note that the first command (DRY RUN) relies on the task definition's default
DRY_RUN=true, which is correct. Consider adding a comment in the output itself clarifying that the first command uses the default dry-run mode without needing an override.value = <<-EOT - # DRY RUN (review logs before proceeding): + # DRY RUN (uses default DRY_RUN=true from task definition - review logs before proceeding): aws ecs run-task \scripts/migrations/Dockerfile (1)
14-18: IncludeCargo.lockfor reproducible builds.Without a
Cargo.lock, each build may pull different dependency versions, potentially introducing subtle behavioral differences in cryptographic operations.# Copy migration source -COPY Cargo.toml migrate_aead_key.rs ./ +COPY Cargo.toml Cargo.lock migrate_aead_key.rs ./Generate the lock file first:
cd scripts/migrations && cargo generate-lockfilescripts/migrations/migrate_aead_key.rs (2)
118-122: Consider reducing key material in logs.Printing the first 8 hex characters (32 bits) of cryptographic keys in logs is low-risk but unnecessary. In production CloudWatch logs, even partial key information adds to the attack surface. Consider reducing to 4 characters or just logging a hash/fingerprint for key identification.
176-206: Consider zeroizing sensitive mnemonic data after use.The decrypted mnemonic remains in memory after the loop iteration. For defense-in-depth, consider using the
zeroizecrate to clear sensitive data, though this is optional given the short-lived nature of this migration task.scripts/migrations/Cargo.toml (1)
13-20: Consider pinning dependency versions for reproducibility.For a migration tool handling sensitive cryptographic operations, pinning exact versions (e.g.,
aes-gcm = "=0.10.3") or using a committedCargo.lockensures identical behavior across builds and environments. This is optional for a one-time tool but recommended for auditability of cryptographic code.crates/basilica-payments/src/reconciliation/calculator.rs (2)
18-20: Consider returning Result instead of panicking on overflow.The
expectcall will panic iftarget_balance_plancks + estimated_fee_plancksoverflows. For a financial reconciliation system, this configuration error should ideally be surfaced to the caller rather than causing a panic.Consider refactoring to return a
Result:- pub fn new( + pub fn new( minimum_threshold_plancks: u128, target_balance_plancks: u128, estimated_fee_plancks: u128, - ) -> Self { + ) -> Result<Self, String> { let required_reserve_plancks = target_balance_plancks .checked_add(estimated_fee_plancks) - .expect("target_balance + estimated_fee overflow - invalid configuration"); + .ok_or_else(|| "target_balance + estimated_fee overflow - invalid configuration".to_string())?; - Self { + Ok(Self { minimum_threshold_plancks, target_balance_plancks, estimated_fee_plancks, required_reserve_plancks, - } + }) }
5-8: Document why fields are retained despite being unused.The
#[allow(dead_code)]annotations suppress warnings fortarget_balance_plancksandestimated_fee_plancks, but these fields are only used during construction to computerequired_reserve_plancks. If they serve no other purpose, consider removing them or adding a comment explaining why they're retained (e.g., for debugging, transparency, or API stability).crates/basilica-payments/src/storage/reconciliation.rs (1)
211-211: Document or justify the hardcoded limit of 100 stale sweeps.The query limits results to 100 stale sweeps. If more than 100 sweeps become stale, they won't be reconciled in this cycle. Consider whether this is acceptable or if pagination/batching should be implemented.
crates/basilica-payments/src/reconciliation/service.rs (3)
142-165: Rate limiting prevents overload but may defer accounts indefinitely.The per-cycle sweep limit is good for preventing overload, but accounts processed later in the list could be deferred indefinitely if the limit is consistently reached. Consider implementing round-robin or priority-based account selection to ensure fairness.
324-340: Sweep record uses current_balance instead of initial_balance—verify this is intentional.Line 333 records
current_balance(verified at lines 296-301) rather thaninitial_balance(passed as parameter). This is correct if the intent is to record the most recent verified balance at sweep initiation, but it differs from the parameter nameinitial_balance, which could cause confusion.Consider renaming the parameter to reflect its role:
async fn execute_sweep( &self, account_hex: &str, - initial_balance: u128, + caller_observed_balance: u128, initial_sweep_amount: u128, ) -> Result<()> {
538-583: Document the 80% heuristic for sweep success detection.The 80% threshold (line 543) is used to determine if a stale sweep likely succeeded based on balance drop. While reasonable, this heuristic could produce false positives/negatives:
False negatives (marking success as failure):
- If a large deposit arrived after the sweep, the balance might not have dropped enough
- This would incorrectly mark a successful sweep as failed
False positives (marking failure as success):
- If the balance dropped due to external transactions unrelated to the sweep
- This would incorrectly mark a failed sweep as confirmed
Consider these improvements:
- Add tx_hash verification: If
sweep.tx_hashexists, query the blockchain to verify the transaction status rather than relying solely on balance changes- Document the heuristic: Add a comment explaining the 80% threshold and known limitations
- Emit metrics: Track reconciliation outcomes (confirmed vs failed) for monitoring
Example improvement:
// If we have a tx_hash, verify it on-chain first if let Some(ref tx_hash) = sweep.tx_hash { match self.blockchain.get_transaction_status(tx_hash).await { Ok(status) if status.is_finalized() => { // Transaction is confirmed on-chain sweep_likely_succeeded = true; } Ok(_) => { // Transaction not finalized, use heuristic } Err(e) => { warn!("Failed to query tx_hash {}: {}, using balance heuristic", tx_hash, e); } } }crates/basilica-operator/src/labels.rs (1)
165-205: Case‑insensitive GPU model extraction and normalization are well‑designedUppercasing once in
extract_short_gpu_modeland matching againstGPU_MODEL_PATTERNS, then layeringnormalize_gpu_modelson top with dedup + order preservation, gives you a clean canonical model vocabulary and the tests exercise the key cases (mixed case, memory suffixes, duplicates, empty input).One minor future‑proofing suggestion: since matching now assumes
product_upper.contains(pattern), it’s worth either documenting thatGPU_MODEL_PATTERNSentries must be uppercase or normalizing the patterns at compile time to avoid subtle mis‑matches if someone later adds a lowercase pattern. Otherwise the implementation and tests look solid.Also applies to: 453-513
crates/basilica-operator/src/controllers/user_deployment_controller.rs (1)
913-917: ReplicaSet cleanup logic is reasonable but has a few behavioral nuances
cleanup_stale_replica_setsis a good addition to prevent the autoscaler from provisioning capacity for pods belonging to old rollouts: it scopes to ReplicaSets for this deployment, finds the newest revision, and scales older RS down to 0 whenready_replicas == 0butspec.replicas > 0. Errors are logged and not propagated, keeping reconcile resilient.A couple of behavior notes to keep in mind:
- Any older RS that still has non‑ready pods (Pending/Failed) but zero ready replicas will be aggressively scaled to 0. That’s usually what you want for stale rollouts, but it means you can’t keep an old RS around with non‑ready pods for debugging without it being cleaned up on the next reconcile.
- RS without a
deployment.kubernetes.io/revisionannotation are treated as revision0; if there are multiple such RS, they’ll all be considered “newest” and thus skipped. That’s safe, but might be worth documenting or filtering out explicitly iflist_replica_sets_for_deploymentcould ever return unrelated RS.Overall this is a sensible best‑effort cleanup; if you start relying on it heavily, consider adding a small unit/integration test around a mocked
list_replica_sets_for_deploymentto lock in the behavior.Also applies to: 1052-1133
crates/basilica-operator/src/k8s_client.rs (3)
739-744: Consider verifying owner reference kind for accuracy.The filter only checks
r.name == deployment_namebut doesn't verify that the owner reference is actually a Deployment (viar.kind). This could match owner references from other resource types (e.g., a StatefulSet) that happen to have the same name.rs.metadata .owner_references .as_ref() - .map(|refs| refs.iter().any(|r| r.name == deployment_name)) + .map(|refs| refs.iter().any(|r| r.kind == "Deployment" && r.name == deployment_name)) .unwrap_or(false)
753-763: Silent success when ReplicaSet doesn't exist differs from real K8s behavior.The mock returns
Ok(())even when the namespace or ReplicaSet doesn't exist. Real Kubernetes would return a NotFound error. This could cause tests to pass incorrectly when they should fail due to missing resources.Consider returning an error when the ReplicaSet isn't found, consistent with other methods like
update_gpu_rental_status.async fn scale_replica_set(&self, ns: &str, name: &str, replicas: i32) -> Result<()> { let mut map = self.replica_sets.write().await; - if let Some(ns_map) = map.get_mut(ns) { - if let Some(rs) = ns_map.get_mut(name) { - if let Some(spec) = rs.spec.as_mut() { - spec.replicas = Some(replicas); - } - } - } - Ok(()) + let ns_map = map + .get_mut(ns) + .ok_or_else(|| anyhow!("namespace not found: {}", ns))?; + let rs = ns_map + .get_mut(name) + .ok_or_else(|| anyhow!("ReplicaSet not found: {}/{}", ns, name))?; + if let Some(spec) = rs.spec.as_mut() { + spec.replicas = Some(replicas); + } + Ok(()) }
1446-1451: Same owner reference filter issue as in the mock.The filter should also verify
r.kind == "Deployment"to avoid incorrectly matching ReplicaSets owned by other resource types with the same name..filter(|rs| { rs.metadata .owner_references .as_ref() - .map(|refs| refs.iter().any(|r| r.name == deployment_name)) + .map(|refs| refs.iter().any(|r| r.kind == "Deployment" && r.name == deployment_name)) .unwrap_or(false) })crates/basilica-autoscaler/src/controllers/node_pool_controller.rs (1)
1312-1330: Label sanitization may be too permissive.The
sanitize_label_valuefunction (lines 1312-1330) converts non-compliant characters to hyphens but doesn't enforce Kubernetes label value length limits (max 63 characters) or the pattern requirement that values cannot consist solely of hyphens/dots.Consider adding validation:
fn sanitize_label_value(value: &str) -> String { let sanitized: String = value .chars() .map(|c| { if c.is_ascii_alphanumeric() || c == '-' || c == '_' || c == '.' { c } else { '-' } }) .collect(); - // Trim leading/trailing non-alphanumeric chars - sanitized + // Trim leading/trailing non-alphanumeric chars and enforce length limit + let trimmed = sanitized .trim_start_matches(|c: char| !c.is_ascii_alphanumeric()) .trim_end_matches(|c: char| !c.is_ascii_alphanumeric()) - .to_string() + .to_string(); + + // Enforce Kubernetes 63-character limit + if trimmed.len() > 63 { + trimmed[..63].to_string() + } else { + trimmed + } }crates/basilica-autoscaler/src/provisioner/ssh.rs (3)
192-323: Optional: Consider creating key file once outside retry loop.The current implementation creates a new temporary key file on each retry attempt. While functionally correct and the overhead is minimal, you could create it once before the retry loop for slightly better efficiency.
Apply this diff if desired:
+ // Create temp file for private key (must stay alive during command execution) + let key_file = create_key_file(&ssh_config.private_key)?; + for attempt in 0..self.max_retries { if attempt > 0 { debug!(host = %host, attempt = %attempt, "Retrying SSH connection"); tokio::time::sleep(self.retry_delay).await; } - // Create temp file for private key (must stay alive during command execution) - let key_file = create_key_file(&ssh_config.private_key)?; - let mut cmd = Command::new("ssh");
784-793: Consider more robust handshake validation.The current string matching for handshake recency ("seconds ago", "1 minute", "Less than a minute") is fragile and depends on the output format of
wg show, which could vary by version or locale.Consider parsing the numeric
latest-handshakefield fromwg show dumporwg show wg0 latest-handshakesfor more reliable validation, or use a broader regex pattern to extract time values.Example approach using
wg show wg0 latest-handshakes:// In validate_wireguard_connectivity, replace the handshake check with: let handshake_output = self .execute_ssh_command( host, port, ssh_config, "sudo wg show wg0 latest-handshakes 2>/dev/null || true", ) .await?; // Parse timestamps and check if any peer has handshake within last 120s let has_recent_handshake = handshake_output.lines().any(|line| { line.split_whitespace() .nth(1) .and_then(|ts| ts.parse::<i64>().ok()) .map(|ts| { let now = std::time::SystemTime::now() .duration_since(std::time::UNIX_EPOCH) .unwrap() .as_secs() as i64; now - ts < 120 }) .unwrap_or(false) });
1060-1074: Complex sed manipulation could be fragile.The script uses sed to insert configuration lines based on the containerd config version. While functionally correct, this approach is brittle and could fail if:
- The config structure changes in future K3s versions
- The expected section markers don't exist
- Multiple matches exist for the patterns
Consider adding validation after sed operations to ensure the expected content was inserted, or log the config file if sed fails to find expected patterns.
Add validation after sed operations:
# After inserting default_runtime_name if ! sudo grep -q 'default_runtime_name' "$TMPL_FILE"; then echo "ERROR: Failed to add default_runtime_name to config" sudo cat "$TMPL_FILE" exit 1 fi
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (20)
crates/basilica-autoscaler/src/config.rs(4 hunks)crates/basilica-autoscaler/src/controllers/node_pool_controller.rs(15 hunks)crates/basilica-autoscaler/src/controllers/scaling_policy_controller.rs(13 hunks)crates/basilica-autoscaler/src/offering_matcher.rs(1 hunks)crates/basilica-autoscaler/src/provisioner/ssh.rs(16 hunks)crates/basilica-operator/src/controllers/job_controller.rs(2 hunks)crates/basilica-operator/src/controllers/rental_controller.rs(2 hunks)crates/basilica-operator/src/controllers/user_deployment_controller.rs(7 hunks)crates/basilica-operator/src/k8s_client.rs(6 hunks)crates/basilica-operator/src/labels.rs(3 hunks)crates/basilica-payments/src/blockchain/client.rs(3 hunks)crates/basilica-payments/src/config.rs(6 hunks)crates/basilica-payments/src/reconciliation/calculator.rs(3 hunks)crates/basilica-payments/src/reconciliation/service.rs(8 hunks)crates/basilica-payments/src/storage/reconciliation.rs(3 hunks)scripts/cloud/compute.tf(1 hunks)scripts/cloud/migration.tf(1 hunks)scripts/migrations/Cargo.toml(1 hunks)scripts/migrations/Dockerfile(1 hunks)scripts/migrations/migrate_aead_key.rs(1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-13T15:34:46.544Z
Learnt from: epappas
Repo: one-covenant/basilica PR: 184
File: crates/basilica-validator/src/rental/billing.rs:129-130
Timestamp: 2025-10-13T15:34:46.544Z
Learning: In the basilica-validator crate, rental_id is a String type (not UUID) throughout the system - in RentalInfo struct, database rentals.id column, and all rental operations. The persistence layer methods should accept &str for rental_id parameters, not &Uuid, since they immediately convert to String for SQL binding anyway.
Applied to files:
crates/basilica-operator/src/controllers/rental_controller.rs
🧬 Code graph analysis (8)
crates/basilica-operator/src/controllers/rental_controller.rs (1)
crates/basilica-operator/src/labels.rs (1)
normalize_gpu_models(197-204)
crates/basilica-payments/src/reconciliation/service.rs (1)
crates/basilica-payments/src/blockchain/client.rs (1)
account_hex(79-79)
crates/basilica-payments/src/storage/reconciliation.rs (2)
crates/basilica-payments/src/reconciliation/service.rs (6)
account_hex(176-176)account_hex(193-193)account_hex(205-205)account_hex(216-216)account_hex(230-230)account_hex(260-260)crates/basilica-payments/src/blockchain/client.rs (1)
account_hex(79-79)
crates/basilica-operator/src/controllers/job_controller.rs (1)
crates/basilica-operator/src/labels.rs (1)
normalize_gpu_models(197-204)
crates/basilica-operator/src/controllers/user_deployment_controller.rs (3)
crates/basilica-operator/src/labels.rs (1)
normalize_gpu_models(197-204)crates/basilica-operator/src/crd/user_deployment.rs (4)
new(338-347)new(379-386)new(400-429)new(500-520)crates/basilica-operator/src/controllers/node_profile_controller.rs (4)
new(28-30)s(260-260)s(265-265)s(400-400)
crates/basilica-payments/src/config.rs (2)
crates/basilica-payments/src/blockchain/monitor.rs (1)
default(102-109)crates/basilica-payments/src/reconciliation/types.rs (1)
default(94-96)
crates/basilica-autoscaler/src/provisioner/ssh.rs (1)
crates/basilica-autoscaler/src/provisioner/wireguard.rs (2)
generate_wg_config(85-120)parse_wg_show_output(123-159)
crates/basilica-autoscaler/src/controllers/scaling_policy_controller.rs (2)
crates/basilica-autoscaler/src/offering_matcher.rs (9)
calculate_nodes_needed(739-759)filter_pods_by_age(644-659)filter_pods_with_stale_affinity(665-714)group_pending_pods_by_requirements(625-640)has_gpu_node_affinity(504-547)new(62-75)new(241-251)default(218-223)from_template(117-131)crates/basilica-autoscaler/src/config.rs (5)
default(199-204)s(224-224)s(232-232)s(239-239)s(246-246)
🔇 Additional comments (45)
crates/basilica-payments/src/blockchain/client.rs (2)
172-197: LGTM on the finalization flow.The updated sequence correctly waits for finalization before extracting block info, then verifies success via
wait_for_success. This provides stronger guarantees about transaction finality and properly captures the block number for the receipt.
81-84: LGTM on handling non-existent accounts.Returning
Ok(0)for accounts not found on-chain is the correct behavior—new accounts without prior transactions won't have storage entries.crates/basilica-autoscaler/src/config.rs (3)
13-13: LGTM!The new
pending_pod_filterfield follows the established pattern for configuration fields and is properly initialized.Also applies to: 27-27
172-181: LGTM!The struct definition is well-documented and the fields are appropriately typed with clear purpose.
243-248: LGTM!The
parse_u64_envhelper follows the same pattern as the existingparse_u16_envandparse_u32_envfunctions, maintaining consistency across the codebase.crates/basilica-payments/src/config.rs (8)
79-91: LGTM! Well-documented blockchain resilience configuration.The new fields provide sensible defaults for blockchain monitoring resilience. The documentation clearly explains each field's purpose, and the defaults align with the
MonitorConfigimplementation shown in the relevant code snippets.
93-107: LGTM! Default functions are correct.The default helper functions return the expected values and match the defaults used in the
PaymentsConfig::default()implementation.
142-149: LGTM! Sensible rate limiting for reconciliation.The
max_sweeps_per_cyclefield with a default of 50 provides reasonable protection against transaction flooding during reconciliation cycles.
259-263: LGTM! Default implementation correctly initialized.The new fields are properly initialized in the
Defaultimplementation with values matching the default helper functions.Also applies to: 288-288
378-396: LGTM! Strong security validation for AEAD key.The validation correctly rejects the default all-zeros AEAD key in production or when reconciliation is enabled, with clear error messages guiding users to set a secure key.
398-411: LGTM! Comprehensive AEAD key format validation.The validation correctly ensures the AEAD key is exactly 64 hexadecimal characters (32 bytes) and contains only valid hex digits. The error messages are clear and informative.
419-438: LGTM! Sound validation logic for blockchain configuration.The validation rules are logically correct:
- Ensures positive values for retry and retention settings
- Ensures
max_block_gap >= block_retention_threshold, which makes sense since the maximum allowed gap should accommodate the retention windowError messages are clear and include actual values for debugging.
597-668: LGTM! Comprehensive test coverage for AEAD key validation.The tests thoroughly cover all validation scenarios including:
- Rejection in production and with reconciliation enabled
- Valid key acceptance
- Format validation (length and character constraints)
- Permitted scenarios (development without reconciliation)
Test names are clear and assertions verify both error occurrence and message content.
scripts/cloud/compute.tf (1)
277-284: LGTM! Clear documentation of the naming convention.The rename from
PAYMENTS_AEAD_KEY_HEXtoPAYMENTS_TREASURY__AEAD_KEY_HEXcorrectly reflects the nested config structure, and the comments helpfully explain the double-underscore convention for readers unfamiliar with this pattern.scripts/cloud/migration.tf (1)
77-84: Confirm the hardcoded old key is intentional.Hardcoding
OLD_AEAD_KEY_HEXas all zeros in the Terraform config is appropriate if this was indeed the incorrectly used default key. However, ensure this is the correct key and consider adding a comment in the code (not just the header) confirming this is the known broken state being migrated from.scripts/migrations/Dockerfile (1)
31-33: LGTM! Good security practice.Running as a non-root user with explicit ownership is the correct approach for containerized tools, especially those handling sensitive cryptographic material.
scripts/migrations/migrate_aead_key.rs (3)
208-221: Good fallback logic for already-migrated detection.The approach of attempting decryption with the new key when the old key fails is a robust way to handle partially-completed migrations and re-runs. This makes the migration idempotent.
76-89: LGTM! Correct nonce generation.Using
rand::thread_rng().fill_bytes()for nonce generation is appropriate for AES-GCM. The 12-byte nonce with CSPRNG ensures uniqueness.
52-74: Validate nonce length before creatingNonce.The code correctly validates nonce length at line 63-65 before using it at line 67. This prevents panics from
Nonce::from_sliceon invalid input.crates/basilica-payments/src/reconciliation/calculator.rs (1)
50-52: LGTM: Clean accessor for computed reserve.The public accessor provides a clean interface to expose the computed reserve threshold, which can be useful for logging, metrics, or debugging.
crates/basilica-payments/src/storage/reconciliation.rs (1)
136-149: LGTM: Idempotency check implementation is correct.The
get_pending_sweepmethod correctly queries for existing pending or submitted sweeps, ordered by most recent first. This enables idempotency checks in the reconciliation service.crates/basilica-payments/src/reconciliation/service.rs (6)
103-111: LGTM: Pre-cycle reconciliation and monitoring enhance reliability.Adding stale sweep reconciliation and cold wallet monitoring before each cycle improves operational visibility and prevents stuck sweeps from accumulating. Error handling is appropriate—logging warnings without stopping the loop ensures the service remains resilient.
262-277: LGTM: Idempotency check prevents duplicate sweeps.Checking for existing pending/submitted sweeps before initiating a new sweep is essential for preventing duplicate transfers, which could lead to incorrect balances or fund loss.
342-363: LGTM: Dry-run mode correctly simulates post-sweep balance.The dry-run implementation properly simulates the balance after sweep using
saturating_sub(line 356), which prevents underflow. This provides accurate simulation without executing blockchain transactions.
377-410: LGTM: Transfer execution with comprehensive error handling.The blockchain transfer execution and post-transfer balance verification are well-structured:
- Logs transaction hash and block number (lines 379-382)
- Fetches and records post-sweep balance (lines 384-393)
- Gracefully handles balance query failures with a warning (lines 387-392)
- Updates sweep status to Confirmed with all relevant data
436-478: LGTM: Cold wallet monitoring provides valuable operational visibility.The cold wallet balance monitoring:
- Correctly converts SS58 address to hex for balance queries
- Logs balance in both human-readable TAO and raw plancks
- Emits metrics for observability
- Handles errors gracefully with warnings
This is valuable for detecting unexpected balance changes or verifying successful sweeps.
517-536: LGTM: Parse error handling prevents incorrect reconciliation.The explicit error handling for parsing
balance_before_plancksandsweep_amount_plancks(lines 517-536) correctly skips sweeps with corrupted data rather than making incorrect reconciliation decisions. Thecontinuestatements ensure the loop processes remaining sweeps.crates/basilica-operator/src/controllers/job_controller.rs (1)
85-116: GPU node affinity normalization for Jobs looks correctUsing
normalize_gpu_models(&gpu.model)for thebasilica.ai/gpu-modelselector keeps Job scheduling consistent with the label/normalization utilities and with user deployments, while still guarding on"miner","validated", and"jobs"node groups. No issues from a correctness or scheduling perspective.crates/basilica-operator/src/controllers/user_deployment_controller.rs (1)
412-476: GPU node affinity normalization and tests are alignedSwitching the
basilica.ai/gpu-modelselector tonormalize_gpu_models(&gpu.model)and adding tests that cover memory‑suffixed inputs ("A100-40GB","H100-80GB") and duplicate entries ensures that:
- Node affinity always uses canonical short model names (e.g.
"A100","H100"), independent of how the UI formats them.- The selector values are deduplicated, reducing label cardinality and avoiding redundant constraints.
The new tests validate both normalization and deduplication behavior end‑to‑end through
render_deployment, so this change looks good.Also applies to: 1978-2073
crates/basilica-operator/src/k8s_client.rs (4)
8-8: LGTM!Import correctly added to support the new ReplicaSet management methods.
150-158: LGTM!Trait method signatures are well-defined and consistent with existing patterns in the K8sClient trait.
181-181: LGTM!Field follows the established pattern for mock state storage.
1820-1835: LGTM!Rate-limited wrapper correctly delegates to the inner client after acquiring the rate limit permit, consistent with all other methods.
crates/basilica-autoscaler/src/controllers/node_pool_controller.rs (2)
442-449: Offering invalidation on rental failure addresses timing gap.The code now invalidates the offering from the cache when rental fails (lines 442-449), which addresses the past review concern about offering availability timing gaps. This is a good defensive measure.
However, note that cache invalidation only affects the in-memory cache and the offering will reappear after TTL expiry. For offerings that are consistently unavailable, consider whether additional retry backoff or persistent invalidation is needed.
770-789: LGTM! Early label application prevents pod scheduling race.Applying GPU labels immediately when the node is detected (before it becomes Ready) is a smart defensive measure. This prevents a race condition where pods might attempt to schedule during the brief window between node Ready and label application.
The validation logic at lines 792-814 ensures labels are present before transitioning to Ready, providing an additional safety layer.
crates/basilica-autoscaler/src/controllers/scaling_policy_controller.rs (2)
115-178: Excellent provisioning guard prevents duplicate VM creation.The provisioning guard (lines 115-147) counts NodePools in provisioning phases and blocks scale-up if any are in-flight (lines 157-178). This is a critical safety feature that prevents race conditions where multiple reconcile loops could create duplicate VMs while waiting for nodes to register.
The correction logic (lines 139-147) also handles stuck counters gracefully.
1019-1050: LGTM! GPU request detection covers both regular and init containers.The
requests_gpufunction properly checks both regular containers (lines 1026-1036) and init containers (lines 1039-1047) for GPU requests. This ensures that pods with GPU requirements in init containers are correctly identified for autoscaling.crates/basilica-autoscaler/src/provisioner/ssh.rs (8)
16-23: LGTM: Clear documentation on key limitations.The documentation appropriately notes that passphrase-protected keys are not supported when using system SSH, which aligns with the architectural choice to use external SSH binary.
96-124: LGTM: Secure temporary key file handling.The implementation correctly creates a temporary file with 0o600 permissions (SSH requirement) and uses appropriate error handling. The
NamedTempFilewill be automatically cleaned up when dropped.
325-345: LGTM: Secure file upload implementation.The use of base64 encoding prevents command injection and heredoc delimiter collisions, and
sudo teeappropriately handles writes to protected system paths.
490-577: LGTM: Comprehensive network performance tuning.The sysctl configuration is well-documented and appropriate for a K3s GPU cluster with WireGuard networking. The values are reasonable and the configuration includes proper module loading and error handling.
583-660: LGTM: Proper WireGuard key lifecycle management.The implementation correctly preserves existing keys (good for idempotency), validates key format (44 characters), and sets appropriate file permissions (600 for private key, 644 for public key).
662-742: LGTM: Robust WireGuard peer configuration with IP mismatch handling.The implementation properly handles IP address changes by tearing down the existing interface when a mismatch is detected, and includes verification of the configured peers.
861-896: LGTM: Robust hostname and NVIDIA version detection.The implementation properly handles errors in NVIDIA version detection by falling back to "unknown", and uses
tr -d '[:space:]'to ensure clean version strings. The driver and CUDA versions as node labels will be useful for GPU workload scheduling.
1118-1141: LGTM: Basic tests cover core functionality.The tests validate provisioner construction and key file creation. While additional tests for SSH execution and provisioning methods would be beneficial, those are better suited for integration tests given the external dependencies.
…ic format * Add normalize_gpu_model function for consistent uppercase alphanumeric output * Change normalize_gpu_models to use normalize_gpu_model instead of extract_short_gpu_model * Update extract_nfd_labels to use normalize_gpu_model for consistent labeling * Update test assertions to match new normalized format (e.g., A10040GB vs A100) * This ensures operator and autoscaler use identical normalization logic
…icy CRD * Add WarmPoolConfig with VRAM target, cost limits, and hysteresis thresholds * Add WarmPoolStatus for VRAM accounting (total, allocated, idle VRAM in GB) * Add ImagePrepullConfig and PrepullImage types for container image pre-pulling * Add warm_pool and image_prepull fields to ScalingPolicySpec * Add warm_pool status field to ScalingPolicyStatus * Export new types from crd module
…ng logic * Add calculate_vram_metrics for tracking total/allocated/idle VRAM across warm pool nodes * Add calculate_warm_pool_target to determine target nodes based on VRAM and cost constraints * Add evaluate_warm_pool_scaling with hysteresis thresholds to prevent flapping * Add build_warm_pool_status to construct status for CRD updates * Add gpu_type_to_memory mapping for common GPU models * Add comprehensive unit tests for scaling decisions and threshold behavior
…GC timeout * Add skip_failing_deployments config to prevent provisioning for CrashLoopBackOff pods * Increase max_pending_age_seconds default from 600s to 1200s (20 min) * Add FAILED_GC_TIMEOUT constant (10 min) for auto-cleanup of Failed NodePools * Add BASILICA_AUTOSCALER_SKIP_FAILING_DEPLOYMENTS env var support
* Add warm_pool_total_vram_gb gauge for total VRAM across warm pool nodes * Add warm_pool_allocated_vram_gb gauge for allocated VRAM * Add warm_pool_idle_vram_gb gauge for available idle VRAM * Add warm_pool_target_vram_gb gauge for configured VRAM target * Add warm_pool_idle_nodes and warm_pool_target_nodes gauges * Add warm_pool_cost_per_hour gauge for estimated hourly cost * Add warm_pool_scale_decisions_total counter by decision type * Add set_warm_pool_metrics and record_warm_pool_scale_decision methods
* Add filter_pods_from_failing_deployments to skip pods from troubled deployments * Add is_pod_failing to detect CrashLoopBackOff, ImagePullBackOff, and other errors * Add get_pod_owner_uid to group pods by ReplicaSet owner * Add FAILING_WAIT_REASONS constant for container failure detection * Add comprehensive unit tests for failing deployment filtering
…K8sClient * Add delete_node_pool for automatic NodePool CR cleanup in GC flows * Add list_pods_in_namespace to support failing deployment filtering * These methods enable auto-cleanup of Failed/Deleted NodePools
… scale-down * Add evaluate_and_apply_warm_pool to ScalingPolicyController for proactive capacity * Add warm pool metrics collection and status updates to reconciliation loop * Add failing deployment filter to apply_pending_pod_filters using list_pods_in_namespace * Add idle-time-based scale-down for non-warm-pool nodes * Add handle_failed with auto-cleanup after FAILED_GC_TIMEOUT expires * Add handle_deleted to clean up stuck NodePools from before the fix * Delete NodePool CR in handle_terminating after successful cleanup * Add is_warm_pool parameter to scale_up_with_offering for node labeling * Add collect_pods_by_node, get_offering_rates, get_available_offerings helpers * Add count_warm_pool_nodes, count_pending_warm_pool_nodes, find_idle_nodes_for_scale_down * Add select_warm_pool_offering for offering selection based on GPU preferences
…ests * Add warmPool spec and status fields to ScalingPolicy CRD schema * Add imagePrepull spec field with images array for pre-pulling * Add image-prepuller DaemonSet manifest for GPU nodes * Configure scaling-policy.yaml with warm pool enabled (160GB target, $15/hr limit) * Configure PyTorch and vLLM images for pre-pulling on warm pool nodes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (4)
crates/basilica-autoscaler/src/offering_matcher.rs (2)
133-151:allow_model_fallbackmerge logic uses OR semantics.Line 148 uses
||which means the merged result is true if either constraint allows fallback. This contradicts the doc comment "preferring self's values where set" and differs from other fields that preferself.This was flagged in a previous review. The fix should change line 148 to use
self.allow_model_fallbackto maintain consistent precedence behavior.
665-714: GPU model normalization incompatibility between operator and autoscaler.The stale affinity filter normalizes both pod requests and node labels using
normalize_gpu_model. However, node labels are set by the operator usingextract_short_gpu_model, which has different behavior. This can cause false negatives in stale affinity detection.This was flagged in a previous review. The codebase should use a shared normalization function or ensure consistent label formats.
crates/basilica-autoscaler/src/controllers/scaling_policy_controller.rs (1)
1071-1076: Fix invalid chrono format specifier (duplicate issue).Line 1074 uses
%3fwhich is not a valid chrono strftime specifier. The correct format for millisecond precision is%.3f(with a leading dot).🔎 Apply this diff to fix the format specifier:
let pool_name = format!( "{}-{}-{}", policy_name, - Utc::now().format("%Y%m%d%H%M%S%3f"), + Utc::now().format("%Y%m%d%H%M%S%.3f"), i );Based on learnings, this issue was previously identified but remains unfixed.
crates/basilica-operator/src/controllers/user_deployment_controller.rs (1)
543-548: The comma delimiter issue for multi-model label values remains unaddressed.The
gpu.model.join(",")still produces invalid Kubernetes label values when multiple models are specified (e.g.,A100,H100). Kubernetes label values only allow alphanumerics,-,_, and..🔎 Apply this diff to use a label-safe delimiter:
if !gpu.model.is_empty() { labels.insert( "gpu-request.basilica.ai/model".to_string(), - gpu.model.join(","), + gpu.model.join("_"), ); }
🧹 Nitpick comments (7)
orchestrator/k8s/services/autoscaler/image-prepuller.yaml (2)
37-71: ConfigMapimage-prepuller-configis defined but not used.The images are hardcoded in the initContainers (lines 39, 56) while the ConfigMap (lines 100-102) lists the same images but is never mounted or referenced. This creates a maintenance burden where updates require changes in two places.
Consider either:
- Remove the unused ConfigMap if the static approach is intentional
- Use a controller/operator pattern that reads from the ConfigMap and dynamically creates the pods
89-90: Consider using a lower priority class thansystem-cluster-critical.
system-cluster-criticalis typically reserved for essential cluster components like kube-proxy, CNI plugins, and core system pods. Using it for image pre-pulling means these pods could preempt legitimate user workloads during resource pressure.Consider using a lower priority class such as
system-node-criticalor a custom priority class with an appropriate value, since image pre-pulling is an optimization rather than a critical system function.orchestrator/k8s/services/autoscaler/scaling-policy.yaml (1)
108-117: Hardcoded environment-specific values may cause deployment issues.The
sshKeyIdanddatacenterIdcontain specific UUIDs that appear to be environment-specific. Users deploying this manifest will need to replace these values. Consider either:
- Using placeholder values with clear documentation (e.g.,
<YOUR-SSH-KEY-ID>)- Referencing a template file instead (the summary mentions
scaling-policy.template.yaml)# SSH key ID registered with your cloud provider sshKeyId: "<YOUR-SSH-KEY-ID>" # Replace with your SSH key UUID datacenterId: "<YOUR-DATACENTER-ID>" # Replace with your datacenter IDcrates/basilica-autoscaler/src/controllers/k8s_client.rs (1)
526-573: Consider extracting common event construction logic.The
create_pod_eventandcreate_node_pool_eventmethods share significant code. A helper method could reduce duplication:🔎 Optional refactoring suggestion
async fn create_event( &self, ns: &str, involved_object: ObjectReference, event_type: &str, reason: &str, message: &str, action: &str, ) -> Result<()> { use chrono::Utc; use kube::api::{Api, PostParams}; let api: Api<Event> = Api::namespaced(self.client.clone(), ns); let now = Utc::now(); let obj_name = involved_object.name.as_deref().unwrap_or("unknown"); let event_name = format!("{}.{}", obj_name, now.timestamp_nanos_opt().unwrap_or(0)); let event = Event { metadata: ObjectMeta { name: Some(event_name), namespace: Some(ns.to_string()), ..Default::default() }, involved_object, type_: Some(event_type.to_string()), reason: Some(reason.to_string()), message: Some(message.to_string()), event_time: Some(MicroTime(now)), reporting_component: Some("basilica-autoscaler".to_string()), reporting_instance: Some( std::env::var("HOSTNAME").unwrap_or_else(|_| "autoscaler".to_string()), ), action: Some(action.to_string()), count: Some(1), ..Default::default() }; api.create(&PostParams::default(), &event).await?; Ok(()) }This is optional since the current implementation works correctly.
crates/basilica-autoscaler/src/warm_pool.rs (1)
253-260: Normalize function differs from offering_matcher'snormalize_gpu_model.This
normalize_gpu_typefunction filters to alphanumeric only, whileoffering_matcher::normalize_gpu_modeldoes the same. They appear functionally identical, but having two separate implementations could lead to drift.Consider re-exporting the normalization function from
offering_matcherto ensure consistent behavior across modules:-/// Normalize GPU type for comparison (uppercase, alphanumeric only) -fn normalize_gpu_type(gpu_type: &str) -> String { - gpu_type - .chars() - .filter(|c| c.is_ascii_alphanumeric()) - .collect::<String>() - .to_uppercase() -} +use crate::offering_matcher::normalize_gpu_model as normalize_gpu_type;crates/basilica-autoscaler/src/controllers/scaling_policy_controller.rs (1)
366-428: Consider caching pods_by_node to avoid redundant API calls.If warm pool evaluation ran (lines 338-363),
collect_pods_by_nodeis called again at line 367 for idle scale-down. This results in duplicate Kubernetes API calls to list pods on every node.🔎 Consider refactoring to reuse pods_by_node:
Collect
pods_by_nodeonce early in the reconciliation flow and pass it to both warm pool and idle scale-down logic to reduce API overhead.crates/basilica-operator/src/controllers/user_deployment_controller.rs (1)
549-552: Consider logging when GPU spec serialization fails.While serialization of
GpuSpecis unlikely to fail, silent failures could make autoscaler issues difficult to diagnose.🔎 Suggested improvement:
// Annotation with full GPU requirements (JSON) for autoscaler - if let Ok(json) = serde_json::to_string(gpu) { - annotations.insert("autoscaler.basilica.ai/gpu-requirements".to_string(), json); - } + match serde_json::to_string(gpu) { + Ok(json) => { + annotations.insert("autoscaler.basilica.ai/gpu-requirements".to_string(), json); + } + Err(e) => { + debug!(error = %e, "Failed to serialize GPU requirements for autoscaler annotation"); + } + }
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (16)
crates/basilica-autoscaler/src/config.rs(5 hunks)crates/basilica-autoscaler/src/controllers/k8s_client.rs(6 hunks)crates/basilica-autoscaler/src/controllers/node_pool_controller.rs(17 hunks)crates/basilica-autoscaler/src/controllers/scaling_policy_controller.rs(13 hunks)crates/basilica-autoscaler/src/crd/mod.rs(1 hunks)crates/basilica-autoscaler/src/crd/scaling_policy.rs(7 hunks)crates/basilica-autoscaler/src/lib.rs(1 hunks)crates/basilica-autoscaler/src/metrics.rs(5 hunks)crates/basilica-autoscaler/src/offering_matcher.rs(1 hunks)crates/basilica-autoscaler/src/warm_pool.rs(1 hunks)crates/basilica-operator/src/controllers/node_profile_controller.rs(3 hunks)crates/basilica-operator/src/controllers/user_deployment_controller.rs(7 hunks)crates/basilica-operator/src/labels.rs(8 hunks)orchestrator/k8s/services/autoscaler/crds.yaml(10 hunks)orchestrator/k8s/services/autoscaler/image-prepuller.yaml(1 hunks)orchestrator/k8s/services/autoscaler/scaling-policy.yaml(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- crates/basilica-operator/src/controllers/node_profile_controller.rs
🚧 Files skipped from review as they are similar to previous changes (1)
- crates/basilica-autoscaler/src/lib.rs
🧰 Additional context used
🧬 Code graph analysis (3)
crates/basilica-operator/src/controllers/user_deployment_controller.rs (1)
crates/basilica-operator/src/labels.rs (1)
normalize_gpu_models(222-229)
crates/basilica-autoscaler/src/warm_pool.rs (1)
crates/basilica-autoscaler/src/api/secure_cloud.rs (2)
gpu_memory_gb(216-218)hourly_rate(221-223)
crates/basilica-autoscaler/src/metrics.rs (1)
crates/basilica-billing/src/domain/rentals.rs (1)
offering_id(223-225)
🔇 Additional comments (80)
orchestrator/k8s/services/autoscaler/image-prepuller.yaml (3)
1-10: LGTM!Metadata and labels follow Kubernetes recommended labeling conventions with proper
app.kubernetes.io/*annotations.
11-36: LGTM!The update strategy with
maxUnavailable: 100%is appropriate for a pre-pulling DaemonSet where availability doesn't matter. Security context is well-hardened with non-root user, seccomp profile, and proper group settings.
92-102: Remove the unused ConfigMap definition (lines 92-102).The ConfigMap
image-prepuller-configis not referenced by the DaemonSet or any other resource in the codebase. It duplicates the container images already specified in the initContainers section and serves no purpose.Likely an incorrect or invalid review comment.
crates/basilica-autoscaler/src/crd/mod.rs (1)
7-13: LGTM!The new CRD type exports (
WireGuardPeerStatus,ImagePrepullConfig,PrepullImage,WarmPoolConfig,WarmPoolStatus) are properly organized and align with the dynamic GPU offering and warm-pool features introduced in this PR.orchestrator/k8s/services/autoscaler/scaling-policy.yaml (2)
119-128: Verify the K3s server URL is the intended default.The
serverUrlpoints toconnect.basilica.ai:6443. Ensure this is the correct default endpoint for your deployment environment. Users deploying to different clusters will need to update this value.
66-89: Well-designed warm pool configuration with proper hysteresis.The warm pool configuration includes appropriate safeguards:
- Cost cap (
maxIdleCostPerHour: 15.0) prevents runaway spending- Hysteresis thresholds (80% up, 120% down) prevent scaling oscillation
- Clear documentation for each parameter
Good default settings for production use.
crates/basilica-autoscaler/src/config.rs (3)
172-213: Well-structured configuration with sensible defaults.The
PendingPodFilterConfigis cleanly implemented with:
- Consistent pattern with other config structs (environment-based with defaults)
Defaultimplementation that mirrorsfrom_env()defaults (good for testing)- Clear documentation on each field's purpose
The 20-minute max pending age is a reasonable safeguard against infinite retry loops.
253-258: LGTM!The
parse_u64_envhelper follows the established pattern of other env parsing functions in this file.
227-228: LGTM - reasonable timeout for failed NodePool cleanup.The 10-minute timeout provides adequate time for operators to investigate failures before automatic garbage collection, while preventing indefinite accumulation of failed resources.
crates/basilica-autoscaler/src/controllers/k8s_client.rs (3)
477-524: Event creation implementation looks correct.The implementation properly constructs Kubernetes Events with appropriate fields. A few observations:
- Using
timestamp_nanos_opt().unwrap_or(0)is a safe fallback for event name uniqueness.- The
HOSTNAMEenv var fallback to"autoscaler"is sensible forreporting_instance.
214-219: LGTM!The
delete_node_poolimplementation follows the established pattern in this file.
430-436: LGTM!The
list_pods_in_namespaceimplementation is correct and follows the existing pattern.crates/basilica-autoscaler/src/metrics.rs (4)
43-64: Comprehensive metrics coverage for the new features.The new metrics provide good observability for:
- Offering selection performance and outcomes
- Cache hit/miss rates for debugging performance issues
- Warm pool VRAM utilization and cost tracking
- Scaling decision visibility
The metric naming follows Prometheus conventions consistently.
556-579: LGTM!The
set_warm_pool_metricsmethod cleanly updates all warm pool gauges from the status struct. Using a single method to update all related metrics ensures consistency in the metric snapshot.
220-227: Histogram buckets look appropriate for typical cache/selection operations.The buckets (1ms to 1s) cover the expected range for in-memory cache lookups and quick API calls. If offering selection involves slower external API calls, you may want to add higher buckets (e.g., 2.0, 5.0) in the future based on observed p99 latencies.
201-209: Monitor for high cardinality onoffering_idlabel.The
offering_selection_totalmetric includesoffering_idas a label. If the number of unique offering IDs grows significantly, this could cause metric cardinality issues in Prometheus.For GPU cloud offerings, this is likely fine (tens of unique IDs), but worth monitoring. If cardinality becomes problematic, consider:
- Removing
offering_idfrom the counter and tracking it separately- Using a histogram with offering metadata in a separate info metric
crates/basilica-autoscaler/src/warm_pool.rs (8)
9-17: Threshold calculation logic is sound.The
calculate_thresholdsfunction correctly uses saturating conversion to prevent overflow when computing hysteresis thresholds from percentage values. The approach of clamping tou32::MAXbefore casting is appropriate for preventing wraparound on very large VRAM values.
59-143: VRAM accounting logic is well-structured.The
calculate_vram_metricsfunction correctly:
- Filters only
Readynodes with the warm-pool label- Calculates total VRAM from GPU memory and count
- Tracks allocated VRAM by summing pod GPU requests per node
- Identifies idle nodes (zero allocated GPUs)
- Uses
saturating_subto prevent underflow when computing idle VRAM
145-183: Init container GPU handling correctly usesmaxinstead ofsum.The
count_pod_gpu_requestsfunction properly handles init containers by taking the maximum GPU count rather than summing, since init containers run sequentially and don't consume GPUs concurrently. This matches Kubernetes resource accounting semantics.
210-219: Offering comparison uses total node cost, not per-GPU cost.The sorting correctly computes
total_hourly_rate = hourly_rate_per_gpu * gpu_countfor comparison. This ensures offerings are compared by their actual cost impact on the warm pool budget rather than just per-GPU rates.
221-235: Consider edge case:vram_per_nodecould be zero.If
gpu_memory_gb()returns 0 (the default whengpu_memory_gb_per_gpuisNone), the loop will never meet the VRAM target and will be constrained only bymax_idle_nodesand cost limits. This may be intentional but could lead to unexpected behavior.Consider adding a guard or debug log when
vram_per_node == 0:🔎 Suggested defensive check
if let Some(best_offering) = offerings.first() { let vram_per_node = best_offering.gpu_memory_gb() * best_offering.gpu_count; let total_hourly_rate = best_offering.hourly_rate_per_gpu * best_offering.gpu_count as f64; + + if vram_per_node == 0 { + debug!("Warning: best offering has 0 VRAM per node, cannot meet VRAM target"); + return WarmPoolTarget { + nodes: 0, + total_vram_gb: 0, + estimated_hourly_cost: 0.0, + }; + } while total_vram < target_vram_gb
357-360: Node removal selection is simplistic but safe.
select_node_for_removalreturns the first idle node. This is simple but may not be optimal for cost savings (e.g., prefer removing the most expensive idle node). However, given the conservative one-at-a-time scale-down approach at line 315, this is acceptable.
384-400: GPU memory mapping covers common types but may need updates.The
gpu_type_to_memorylookup table includes major GPU types (H100, A100, L40S, RTX4090, etc.) with their correct memory sizes. The fallback to 24GB is conservative.As new GPU types are released, this table may need updates. Consider whether this mapping should be driven by the offerings API data instead of hardcoded values.
402-634: Comprehensive test coverage for warm pool logic.The test suite covers:
- Preference ordering and normalization
- GPU type to memory mappings
- Disabled warm pool behavior
- Scaling decisions (up, down, no-action in hysteresis band)
- Deference to reactive scaling when pending pods exist
- Status building
crates/basilica-autoscaler/src/crd/scaling_policy.rs (4)
45-59: New optional configuration fields properly integrated.The
warm_pool,image_prepull, andoffering_constraintsfields are correctly:
- Marked as
Option<T>for backward compatibility- Using
skip_serializing_if = "Option::is_none"to avoid cluttering serialized output- Using
#[serde(default)]to handle deserialization of older CRs
158-225: WarmPoolConfig has well-chosen defaults.The default values are sensible:
min_idle_vram_gb: 80- equivalent to 1x A100-80GBmax_idle_nodes: 3- reasonable upper boundmax_idle_cost_per_hour: 10.0- prevents runaway costs- Hysteresis band of 80-120% provides good stability
269-315: SecureCloudTemplate properly supports dynamic offering selection.Making
offering_idoptional (line 277) with updated documentation correctly enables dynamic offering selection based on pending pod requirements. The newdatacenter_idandssh_userfields with defaults complete the provisioning configuration.
399-442: WarmPoolStatus captures complete VRAM accounting state.The status struct includes all metrics needed for observability:
- Current VRAM state (total, allocated, idle)
- Target and thresholds
- Node counts and names
- Cost estimation
This enables operators to debug warm pool behavior via
kubectl get scalingpolicy -o yaml.orchestrator/k8s/services/autoscaler/crds.yaml (7)
222-229: NewsshUserfield correctly added to NodePool secureCloud.The field has the correct default value (
ubuntu) matching the Rust struct definition, and appropriate description.
276-285: WireGuard default changed from false to true.This is a potentially breaking change for existing deployments. Nodes created after this change will have WireGuard enabled by default, which may fail if the WireGuard infrastructure isn't set up.
Verify that:
- Existing NodePool CRs without explicit
wireguard.enabledwon't have their behavior changed (Kubernetes preserves existing values)- Documentation is updated to reflect this default change
- The WireGuard setup is robust enough to handle being enabled by default
384-387:labelsAppliedstatus field enables idempotent label management.This boolean flag prevents race conditions between node readiness and GPU label application. The description is clear about its purpose.
482-504: WireGuard peers array structure is complete.The peer status includes all necessary fields:
endpointandpublicKey(required)wireguardIpandvpcSubnet(required for routing)routePodNetwork(optional, defaults to false)
716-774: SecureCloudTemplate in ScalingPolicy correctly mirrors NodePool additions.The
datacenterId,offeringId, andsshUserfields are properly added with matching types and defaults. Note thatdatacenterIdis now required (line 771), which is appropriate for the ScalingPolicy template.
873-916: WarmPool CRD spec matches Rust struct definition.All fields, types, defaults, and descriptions match the
WarmPoolConfigstruct inscaling_policy.rs. The JSON schema validation constraints (minimum: 0.0for integer fields) are correctly applied.
1030-1093: WarmPoolStatus in CRD matches Rust struct definition.All status fields are present with correct types and defaults. This enables
kubectlinspection of warm pool state.crates/basilica-autoscaler/src/controllers/node_pool_controller.rs (15)
21-101: GenericOfferingSelectorintegration is well-designed.The controller now properly supports optional offering selection through:
- Generic type parameter
Swith default()- Separate
new()constructor for non-offering-selector usewith_offering_selector()for full capabilityMaybeOfferingSelectortrait bound for conditional invalidationThis maintains backward compatibility while enabling dynamic offering selection.
103-120: Offering invalidation on rental failure is a good resilience pattern.The
invalidate_offering_on_failuremethod removes failed offerings from cache to prevent retry loops with broken offerings. The method gracefully handles the case when no offering selector is present.
142-148: Phase timeout check correctly skips infinite-timeout phases.Using
u64::MAXas a sentinel for "no timeout" and explicitly checking for it prevents spurious timeouts on Ready, Failed, and Deleted phases.
356-400: GPU metadata persistence before rental prevents data loss.Fetching offering details and persisting to status immediately (line 396-399) ensures GPU metadata survives process crashes. This is critical for correct node labeling after recovery.
Note: The past review comment about "timing gap between fetching offering metadata and starting rental" is addressed by the offering invalidation on failure mechanism, so the concern is mitigated.
402-475: Rental handling correctly prevents duplicate rentals.The flow:
- Check
rental_idfirst before creating new rental (line 404)- Persist
rental_idimmediately after creation (line 460-462)- Handle async IP assignment with retry logic
This prevents orphaned VMs on crash/retry scenarios.
512-541: Node registration includes GPU specs from status.The
gpu_specsstruct correctly uses status values with sensible defaults for driver/CUDA versions. The WireGuard peer information is properly extracted and converted to status format.
633-651: WireGuard peer configuration uses registration response.Converting
WireGuardPeerStatusfrom status toWireGuardPeerfor provisioner configuration ensures peers are configured from the authoritative source (API registration response) rather than a separate API call.The warning at line 647 when peers are empty is helpful for debugging connectivity issues.
765-854: GPU label application race condition is properly addressed.The implementation:
- Applies labels immediately when node is detected (before Ready)
- Tracks
labels_appliedflag in status- Validates labels before transitioning to Ready
- Re-applies if validation fails
This prevents pods from failing to schedule during the brief window between node Ready and label application.
876-885: Label re-application in Ready phase handles recovery scenarios.Verifying and re-applying labels on each Ready reconciliation handles cases where labels were removed (e.g., during Unhealthy recovery or manual intervention).
1030-1082: Failed NodePool GC with configurable timeout.The
handle_failedimplementation:
- Waits for
FAILED_GC_TIMEOUTbefore cleanup- Stops rental to prevent cost leakage
- Deregisters node from API
- Deletes K8s node if created
- Deletes NodePool CR
This prevents stuck Failed resources from accumulating while giving operators time to investigate.
1084-1123: handle_deleted cleans up stuck resources.This handles NodePools that reached Deleted phase but weren't removed (pre-fix). The idempotent cleanup ensures resources are freed even for legacy CRs.
1402-1420: Label sanitization correctly handles Kubernetes constraints.The
sanitize_label_valuefunction:
- Replaces invalid characters with
-- Trims leading/trailing non-alphanumeric characters
- Produces valid Kubernetes label values
1422-1470: GPU label building is comprehensive.The
build_gpu_labelsfunction correctly:
- Sets required
node-type: gpulabel- Normalizes GPU model using
normalize_gpu_model- Sanitizes datacenter value for label compliance
- Prefers status values over spec values for dynamic provisioning
1472-1485: Label comparison correctly detects missing or mismatched labels.The
labels_need_reapplicationfunction checks both existence and value equality, which is necessary for detecting partial label application or value drift.
1561-1676: Comprehensive test coverage for label reapplication logic.Tests cover:
- All labels present and matching
- Missing labels
- Mismatched values
- Empty expected labels
- Node with no labels
crates/basilica-autoscaler/src/offering_matcher.rs (13)
16-29: Node label constants follow existing conventions.The
node_labelsmodule defines constants matching thebasilica.ai/prefix convention used throughout the codebase. Including both standard labels (GPU_MODEL, GPU_COUNT) and autoscaler-specific ones (OFFERING_ID) is appropriate.
42-51: Normalization removes all non-alphanumeric characters.The
normalize_gpu_modelfunction converts to uppercase and strips non-alphanumerics. This provides consistent matching but note that "A100-40GB" becomes "A10040GB" which differs from how the operator'sextract_short_gpu_modelwould handle it.
153-174: MaybeOfferingSelector trait enables optional selector pattern.The blanket implementation for
()(no-op) and the forwarding implementation forT: OfferingSelectorprovides a clean abstraction for controllers that may or may not have an offering selector.
269-305: Cache implementation uses double-checked locking correctly.The
get_offeringsmethod:
- Fast path: read lock, check TTL
- Slow path: write lock, re-check TTL (prevents thundering herd)
- Fetch with timeout
- Update cache
This is a correct and efficient caching pattern.
307-327: Forward-prefix matching only is a safe choice.The
matches_modelmethod intentionally only supports:
- Exact match: "A100" == "A100"
- Forward prefix: "A100" matches "A100SXM4"
Not supporting reverse prefix (offering "A100" matching request "A100SXM4") prevents incorrect node selection.
385-400: Cache invalidation uses CoW pattern efficiently.The
invalidate_offeringmethod:
- Checks if offering exists (fast path)
- Uses
Arc::try_unwrapto avoid clone when possible- Retains non-matching offerings
This minimizes allocations for the common case.
417-424:is_cache_stalecorrectly handles lock contention.Returning
falsewhen lock can't be acquired prevents thundering herd - if someone else holds the write lock, they're likely refreshing the cache.
427-445: Offering comparison prioritizes exact GPU count match.The
compare_offeringsfunction correctly:
- Prefers exact GPU count match
- Then cheaper hourly rate
- Then lower GPU count (avoid over-provisioning)
- Finally by ID for determinism
Using
total_cmphandles NaN/infinity correctly.
504-547:has_gpu_node_affinitychecks both required and preferred terms.The function correctly examines both
requiredDuringSchedulingIgnoredDuringExecutionandpreferredDuringSchedulingIgnoredDuringExecutionfor GPU-related labels.
717-760: Failing pod detection covers common failure reasons.The
FAILING_WAIT_REASONSlist includes:
- CrashLoopBackOff
- ImagePullBackOff / ErrImagePull
- CreateContainerError / CreateContainerConfigError
- InvalidImageName
This prevents over-provisioning for deployments with fundamental issues.
774-807: Failing deployment filter prevents wasted node provisioning.The
filter_pods_from_failing_deploymentsfunction uses owner UID matching to identify sibling pods from the same ReplicaSet. This prevents provisioning new nodes for deployments already known to be broken.
831-852: Node calculation uses ceiling division correctly.The
calculate_nodes_neededfunction:
- Guards against division by zero
- Uses
div_ceilfor correct node count calculation- Warns when offering has fewer GPUs than pod requests
854-1509: Comprehensive test suite covers edge cases.The test module covers:
- Normalization behavior
- Requirements hashing/equality
- Offering comparison
- Node calculation scenarios
- Constraint filtering
- Cache operations
- Age/stale affinity/failing deployment filters
The tests use well-structured helper functions and mock implementations.
crates/basilica-autoscaler/src/controllers/scaling_policy_controller.rs (11)
31-96: LGTM: Clean generic parameterization and constructor design.The addition of the
OfferingSelectorgeneric parameter andpending_pod_filterfield is well-integrated. The dual constructor pattern (newwith defaults +with_pending_pod_filterfor customization) provides good flexibility.
120-183: Excellent safeguards against duplicate provisioning.The provisioning count tracking (lines 120-152) correctly identifies NodePools in transitional phases and resets stuck counters. Blocking scale-up when
provisioning_count > 0(lines 162-183) prevents creating duplicate VMs while waiting for node registration—a critical reliability improvement.
240-275: Good error handling with status update.The code correctly updates the
OfferingAvailabilitycondition before returning theNoMatchingOfferingerror (lines 265-267), ensuring status reflects the failure reason for diagnostics.
437-608: LGTM: Well-structured warm pool evaluation.The warm pool scaling logic is clear and correct:
- Properly skips evaluation when nodes are provisioning (lines 473-492)
- Handles missing offerings gracefully (line 514)
- Correctly differentiates warm pool nodes from reactive nodes via labeling
- Updates status conditions consistently across all decision paths
610-727: LGTM: Efficient metrics collection with clear filter pipeline.The metrics collection correctly:
- Returns both the snapshot and raw filtered pods for downstream processing
- Conditionally fetches namespace pods only when
skip_failing_deploymentsis enabled (lines 621-635)- Applies filters in a sensible order with clear logging of filter impact (lines 715-724)
806-1015: LGTM: Robust dynamic offering selection with multi-tier fallback.The scale-up logic correctly:
- Supports both static (offering_id in config) and dynamic (selector-based) modes
- Refreshes offering cache when stale (lines 873-880)
- Implements a 3-tier fallback strategy (exact match → model fallback → explicit fallback_offering_id) with clear diagnostics
- Respects max_count across all pod groups
- Properly emits events and metrics for observability
1137-1228: LGTM: Clear event emission for user visibility.The event emission methods provide valuable user feedback:
emit_missing_affinity_warnings(lines 1137-1183) alerts users when GPU model requests lack nodeAffinity, preventing scheduling confusionemit_no_offering_events(lines 1185-1228) informs users why their pods remain pending- Both handle errors gracefully with warning logs rather than propagating failures
1230-1254: LGTM: Correct constraint merging with policy precedence.The
build_effective_constraintsmethod properly merges template-level and policy-level constraints, with policy constraints taking precedence when both are set. This provides flexibility while maintaining a clear override hierarchy.
1303-1461: LGTM: Well-implemented warm pool helper functions.The warm pool helpers are correctly implemented:
count_warm_pool_nodesandcount_pending_warm_pool_nodesuse consistent labeling and phase checksfind_idle_nodes_for_scale_downproperly excludes warm pool nodes and checks idle duration + pod presenceselect_warm_pool_offeringimplements sensible preference ordering (preferred types → cost)
1463-1518: LGTM: Correct NodePool creation with warm-pool support.The
create_node_pool_from_templatemethod correctly:
- Applies warm-pool label conditionally based on
is_warm_poolparameter- Propagates
offering_idto the SecureCloudConfig- Extracts and sets
datacenter_idfrom the template when available- Maintains proper labeling for autoscaler management
1528-1588: LGTM: Clean helper types and functions.
OfferingResultenum clearly represents the three possible outcomes of offering selectionrequests_gpucorrectly checks both regular and init containers for GPU requestsadd_conditionfollows the standard Kubernetes pattern for updating or appending conditionscrates/basilica-operator/src/controllers/user_deployment_controller.rs (2)
1055-1133: LGTM – well-designed stale ReplicaSet cleanup.The logic correctly identifies and scales down old ReplicaSets that have pending pods but no ready replicas, which prevents the autoscaler from provisioning nodes for outdated pod specs. Error handling is non-blocking with appropriate debug logging.
1978-2074: Good test coverage for GPU model normalization and deduplication.The tests appropriately verify that:
- Model names with memory suffixes are normalized (e.g.,
A100-40GB→A10040GB)- Case-insensitive deduplication works correctly
- Order is preserved (first occurrence wins)
crates/basilica-operator/src/labels.rs (4)
142-165: Well-implemented normalization function.The implementation correctly:
- Trims whitespace
- Returns a sensible default for empty/whitespace input
- Normalizes to uppercase alphanumeric for consistent matching
222-229: LGTM – correct deduplication with order preservation.The implementation properly deduplicates normalized models while preserving order (first occurrence wins). The
clone()in the filter is necessary forHashSetownership and is acceptable for typical small model lists.
478-556: Comprehensive test coverage for normalization functions.The tests appropriately cover:
- Case-insensitive matching in
extract_short_gpu_model- Edge cases for
normalize_gpu_model(empty, whitespace, various formats)- Deduplication, order preservation, and case-insensitive handling in
normalize_gpu_models
310-314: Verify GPU model label normalization strategy across operator and autoscaler.The code at lines 310-314 uses
normalize_gpu_model()to set node labels, converting"Tesla-A100-SXM4-80GB"→"TESLAA100SXM480GB". However, in user_deployment_controller.rs, pod affinity rules normalize user-specified models the same way, so user input"A100"remains"A100".This creates a label mismatch:
- Node label value:
TESLAA100SXM480GB- Pod affinity value:
A100The
extract_short_gpu_model()function exists but is unused. Confirm whether:
- This labeling strategy is intentional (users must request full normalized names)
- The offering_matcher in basilica-autoscaler is the actual matching mechanism (which normalizes both sides using the same function)
- Node labels should instead use
extract_short_gpu_model()for backward compatibility
* Format debug/info/error macro calls with multi-line arguments * Use is_some_and instead of map_or for clearer Option checking * Format method chains and string literals for readability * No functional changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (3)
crates/basilica-payments/src/reconciliation/service.rs (3)
483-587: Balance heuristics remain insufficient for confirming sweep finalization.The stale sweep reconciliation still marks sweeps as
Confirmedusing only an 80% balance-drop threshold. As previously noted, this approach doesn't verify transaction finalization or query the blockchain fortx_hashstatus. In fork/reorg scenarios, a transaction could revert after the balance temporarily changed, yet the code would still mark it as confirmed.The previous review recommended storing finalization status when transfers are submitted, or querying the blockchain client via
tx_hashto verify finalization before marking a sweep asConfirmed.
262-277: Idempotency check helps but doesn't eliminate TOCTOU.The check for pending sweeps reduces duplicate sweep attempts. However, the TOCTOU window persists between creating the sweep record (lines 328-344) and executing the transfer (lines 374-378). The past review recommended account-level locking or database constraints to fully close this gap.
295-326: Balance re-verification narrows but doesn't eliminate TOCTOU.The balance re-check before creating the sweep record is an improvement, but as previously noted, a race condition remains between verification and the blockchain transfer execution. Between creating the sweep record and executing the transfer, deposits or other transactions could alter the balance, causing transfer failures or over-sweeping.
🧹 Nitpick comments (8)
crates/basilica-payments/src/config.rs (1)
426-445: Validation logic is sound; consider wording consistency.The validation correctly ensures:
- Positive values for retry counts and retention thresholds
- Logical relationship between
max_block_gapandblock_retention_thresholdMinor observation: Line 441 uses "should be" while other validation messages use "must be". Since this is a hard validation error (returns
Err), consider using "must be" for consistency with other validation messages.🔎 Optional wording adjustment for consistency:
details: format!( - "blockchain.max_block_gap ({}) should be >= block_retention_threshold ({})", + "blockchain.max_block_gap ({}) must be >= block_retention_threshold ({})", self.blockchain.max_block_gap, self.blockchain.block_retention_threshold ),crates/basilica-payments/src/blockchain/client.rs (2)
175-208: LGTM! Proper finalization flow with transaction success verification.The sequence of waiting for finalization, extracting block info, and verifying success via
wait_for_successis the correct pattern for ensuring transaction integrity.Minor note: Line 187-188 uses
.ok()to silently convert block query failures toNone. This is acceptable sinceblock_numberis optional, but consider logging a debug/warning if the block query fails to aid troubleshooting.
102-124: Consider adding context about version compatibility for the Debug format fallback.The string-based fallback parsing (lines 102-124) uses
format!("{:?}", value)and relies on unstable Debug representation. Derived Debug formats may change with future Rust versions. While the error handling is appropriately defensive, add a comment explaining which chain versions require this fallback or what specific metadata mismatch it accommodates. This provides clarity for future maintenance and indicates whether updating chain metadata would eliminate this workaround.crates/basilica-autoscaler/src/controllers/node_pool_controller.rs (5)
402-475: Consider handling externally deleted rentals.The logic correctly prevents duplicate rentals and handles IP polling. However, if a rental exists in status but
get_rentalreturnsNone(lines 430-434), this could indicate the rental was deleted externally (e.g., manually via provider console). Currently, this returns an error for retry, but the rental may never reappear.Consider transitioning to Failed phase in this scenario, or implement a retry counter to detect persistent rental absence.
488-504: GPU spec defaults may cause scheduling issues.The defaults (
model="GPU",memory_gb=0) could lead to pod scheduling failures if the status doesn't contain the expected GPU metadata. However, since the offering fetch at lines 356-400 should populate these fields, the defaults should rarely be used.Consider adding debug or warn logging when defaults are applied to detect potential bugs in the offering fetch flow.
553-578: Consider extracting duplicate IP polling logic.The IP polling logic here is nearly identical to lines 404-437 in
handle_provisioning. Consider extracting a helper method likeensure_external_ip(&self, ns: &str, pool: &NodePool, status: &mut NodePoolStatus) -> Result<String>to reduce duplication and simplify maintenance.
764-814: Thorough but complex labeling logic.The two-stage label application (early application on detection + validation before Ready transition) correctly prevents the race condition described in comments. However, the deeply nested conditionals make this section complex.
Consider extracting the labeling logic into helper methods like
apply_node_labels_if_neededandvalidate_labels_before_readyto improve readability while preserving the correctness guarantees.
1397-1415: Handle edge case: empty label values.If the input contains no alphanumeric characters,
sanitize_label_valuereturns an empty string. While Kubernetes allows empty label values, this might indicate a data issue upstream.Consider adding a check to return a default value (e.g., "unknown") or logging a warning when sanitization produces an empty string.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
crates/basilica-autoscaler/src/controllers/node_pool_controller.rs(17 hunks)crates/basilica-payments/src/blockchain/client.rs(3 hunks)crates/basilica-payments/src/config.rs(6 hunks)crates/basilica-payments/src/reconciliation/service.rs(8 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
crates/basilica-payments/src/reconciliation/service.rs (1)
crates/basilica-payments/src/blockchain/client.rs (1)
account_hex(79-79)
🔇 Additional comments (24)
crates/basilica-payments/src/config.rs (6)
79-107: LGTM! Well-documented blockchain configuration fields.The new blockchain configuration fields are clearly documented with sensible defaults. The time-based context in comments (e.g., "~51 minutes", "~1 hour") helps operators understand the practical implications of these values.
142-149: LGTM! Good defensive configuration for rate limiting.The
max_sweeps_per_cyclefield provides important protection against transaction flooding. The default value of 50 strikes a reasonable balance between throughput and safety.
259-262: LGTM! Default values properly initialized.The new configuration fields are correctly initialized with values that match their respective default helper functions.
Also applies to: 288-288
378-397: LGTM! Excellent security validation for AEAD keys.This validation properly prevents the use of default all-zeros AEAD keys in production or when reconciliation is enabled. The error messages clearly guide operators to set a secure key via environment variables.
399-418: LGTM! Comprehensive AEAD key format validation.The validation correctly ensures the AEAD key is exactly 64 hexadecimal characters (32 bytes) and contains only valid hex digits. This prevents configuration errors that would lead to runtime failures.
604-675: LGTM! Comprehensive test coverage for AEAD key validation.The test suite thoroughly covers all validation scenarios:
- Default key rejection in production and with reconciliation enabled
- Valid key acceptance
- Format validation (length and character constraints)
- Development mode allowances
This excellent coverage ensures the security validations work correctly across different environment configurations.
crates/basilica-payments/src/reconciliation/service.rs (3)
103-111: Good addition of pre-cycle health checks.Reconciling stale sweeps and monitoring the cold wallet before each cycle improves operational visibility. Treating failures as warnings ensures the main sweep cycle proceeds even if these checks fail.
142-165: LGTM: Per-cycle sweep rate limiting.The rate limit prevents processing too many sweeps in a single cycle, which is good for load management. The early break with logging provides clear operational feedback.
439-481: Monitor implementation is correct.The cold wallet monitoring logic properly converts SS58 to hex and queries the balance. Both
sp_coreandsubxtare declared as workspace dependencies in Cargo.toml, confirming that the imports (subxt::ext::sp_core::crypto::Ss58Codecandsp_core::sr25519::Public) resolve correctly. The inlineusestatements and overall structure are sound.crates/basilica-payments/src/blockchain/client.rs (2)
79-87: LGTM! Proper handling of non-existent accounts.Returning 0 balance for accounts not found on chain is the correct behavior, and the debug logging provides useful diagnostics.
219-219: This is not a breaking change for external consumers.The
block_numberfield addition is safe becauseTransferReceiptis only constructed internally withinclient.rs(at line 202), where all fields are already being specified. External code receives the struct as a return value and reads its public fields without constructing instances. If future fields are expected to be added frequently, consider applying#[non_exhaustive]for better forward compatibility, but it's not required for this change.Likely an incorrect or invalid review comment.
crates/basilica-autoscaler/src/controllers/node_pool_controller.rs (13)
21-120: LGTM: Clean generic design for optional offering selector.The generic parameter approach allows the controller to optionally integrate an offering selector without breaking existing code. The trait bounds are appropriate and the implementation correctly propagates the selector through Clone and constructors.
142-147: LGTM: Clean sentinel value for infinite timeouts.Using
u64::MAXas a sentinel for phases without timeout (Ready, Failed, Deleted) is clear and avoids additional Option wrapping.
356-400: Good improvement: Early offering fetch and persistence.Fetching and persisting offering metadata immediately (before rental start) prevents data loss on crashes and provides better visibility through the OfferingUnavailable event. The invalidation logic at line 446 further mitigates the timing gap issue noted in previous reviews.
514-536: LGTM: Correct WireGuard peer mapping.The conversion from API response types to CRD status types is straightforward and handles missing wireGuard data appropriately with
unwrap_or_default.
633-651: LGTM: Appropriate warning for missing peers.The warning at line 647 provides good visibility if the registration response lacks peers, which would prevent WireGuard from routing traffic correctly.
877-886: LGTM: Clean label reapplication for recovery scenarios.Using the
labels_need_reapplicationhelper makes this recovery logic concise and clear.
1021-1028: LGTM: Correct lifecycle completion.Deleting the CR after transitioning to Deleted phase correctly triggers finalizer removal and completes the lifecycle.
1030-1077: LGTM: Proper auto-GC for failed NodePools.The timeout-based garbage collection prevents accumulation of failed resources while allowing time for investigation. Error handling appropriately logs warnings without blocking cleanup.
1079-1118: LGTM: Recovery path for stuck Deleted NodePools.This handler provides a clean migration path for NodePools that reached Deleted phase before the CR deletion fix was implemented.
1282-1288: LGTM: Appropriate SSH username resolution.The precedence order and "ubuntu" fallback are sensible defaults for cloud VM provisioning.
1417-1465: LGTM: Comprehensive GPU label construction.The label building logic correctly handles optional fields and applies sanitization where needed. The precedence order (status > spec) is appropriate for dynamic provisioning.
1467-1480: LGTM: Correct label comparison logic.The function correctly identifies when labels need reapplication by checking for missing or mismatched values.
1557-1670: LGTM: Comprehensive test coverage.The tests cover all relevant scenarios for
labels_need_reapplication, including edge cases like empty expected labels and nodes without labels.
* Add /target to gitignore for migrations directory * Prevent Rust build artifacts from being tracked in version control
* Acquire RwLock guards in consistent order (client first, then rpc_methods) * Ensures lock ordering is predictable to prevent deadlock scenarios * Follows Rust async lock best practices for concurrent access patterns
* Create row_to_sweep function to convert PgRow to ReconciliationSweep * Replace duplicate mapping closures in list_sweeps and list_sweeps_for_account * Follows DRY principle by consolidating row-to-struct conversion logic
* Replace comma with underscore in gpu-request.basilica.ai/model label * Commas are invalid in Kubernetes label values per K8s naming conventions * Add explicit debug logging for GPU spec JSON serialization failures * Add unit test to verify multi-model GPU labels use valid K8s format
…aults * Add common.rs module with shared retry logic, env parsing, wait functions * Consolidate WaitResult enum, create_with_retry, parse_env_vars, wait_for_ready * Remove duplicated code from sglang.rs and vllm.rs templates * Update default GPU from RTX A4000 (16GB) to A100 (40GB) for model sizing * Simplify GPU recommendation to use canonical A100/H100 model names * Update wait_for_ready to accept service_name parameter for dynamic messages * Export common module from templates mod.rs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (1)
crates/basilica-payments/src/storage/reconciliation.rs (1)
200-220: Implementation looks correct; index optimization remains pending.The query logic correctly identifies stale sweeps, excludes dry runs, and uses the
row_to_sweephelper for consistent mapping.However, the index optimization issue raised in the previous review still applies here. The query filters by
status,initiated_at, ANDdry_run = false(line 209), but the existingidx_sweeps_status(status, initiated_at)index doesn't includedry_run. Consider updating the index to(status, dry_run, initiated_at)for optimal query performance.
🧹 Nitpick comments (10)
crates/basilica-payments/src/storage/reconciliation.rs (1)
166-179: Add composite index to optimizeget_pending_sweepquery performance.The query filters by
account_hexandstatus IN ('pending', 'submitted'), then orders byinitiated_at DESC. Current indexes (idx_sweeps_accountandidx_sweeps_status) don't provide optimal coverage. Consider adding a composite index on(account_hex, status, initiated_at)to eliminate in-memory filtering and sorting as thereconciliation_sweepstable grows.crates/bittensor/src/connect/monitor.rs (1)
111-131: Remove unused parameter for clearer API.The
clientparameter is explicitly unused (line 115) but still required by callers. Since theblockobject contains all necessary information, theclientparameter should be removed to simplify the API and avoid confusion.🔎 Proposed refactor to remove unused parameter
/// Extract transfers from a block async fn get_transfers_from_block( - client: &OnlineClient<PolkadotConfig>, block: subxt::blocks::Block<PolkadotConfig, OnlineClient<PolkadotConfig>>, ) -> Result<Vec<TransferInfo>> { - let _ = client; // Used for type consistency, block already contains client ref let mut transfers = Vec::new();Then update the two call sites at lines 93 and 107:
- Self::get_transfers_from_block(&client, block).await + Self::get_transfers_from_block(block).awaitcrates/basilica-cli/src/cli/handlers/deploy/templates/sglang.rs (4)
20-20: Consider pinning the SGLang image version for reproducibility.The default image uses the
:latesttag, which can lead to non-deterministic deployments as the underlying image changes over time. Consider using a specific version tag (e.g.,lmsysorg/sglang:v0.3.0) or documenting that users should override this default with a pinned version in production.
216-218: Document security implications of trust_remote_code flag.The
trust_remote_codeflag allows model code execution from untrusted sources, which can pose security risks. Consider adding a warning in the CLI help text or documentation that users should only enable this for trusted models.
240-245: Consider making CPU/memory resources configurable.The CPU and memory values are currently hardcoded (CPU: "4", memory_request: "8Gi"). For flexibility, consider exposing these as optional parameters in
TemplateCommonOptionsto allow users to tune resources based on their workload requirements.
260-260: Verify 4GB cache size is adequate for model files.The cache is set to 4096 MB (4GB), which may be insufficient for larger models. For example, a 7B model typically requires 14-16GB. Consider increasing the default cache size or making it configurable based on the estimated model size.
crates/basilica-cli/src/cli/handlers/deploy/templates/vllm.rs (3)
20-20: Consider pinning the vLLM image version for reproducibility.The default image uses the
:latesttag, which can lead to non-deterministic deployments as the underlying image changes over time. Consider using a specific version tag (e.g.,vllm/vllm-openai:v0.6.0) or documenting that users should override this default with a pinned version in production.
215-217: API key passed as command-line argument may be visible in logs.The
--api-keyflag is passed as a command-line argument, which may be visible in process listings and logs. Consider using environment variables (e.g.,VLLM_API_KEY) instead for better security.
227-229: Document security implications of trust_remote_code flag.The
trust_remote_codeflag allows model code execution from untrusted sources, which can pose security risks. Consider adding a warning in the CLI help text or documentation that users should only enable this for trusted models.crates/basilica-operator/src/controllers/user_deployment_controller.rs (1)
1984-2131: Strong test coverage for GPU model normalization and label generation.The tests thoroughly verify:
- GPU model normalization in node affinity (removing special characters, uppercasing)
- Case-insensitive deduplication
- Multi-model labels using underscore delimiter (fixing the previous comma issue)
- JSON annotation with full GPU requirements
Optional: Add test for label vs node affinity consistency
Consider adding a test that uses GPU models requiring normalization (e.g.,
["A100-40GB", "H100-80GB"]) to verify the label and node affinity values are consistent or intentionally different:#[test] fn test_gpu_labels_and_affinity_consistency() { use crate::crd::user_deployment::GpuSpec; let spec = UserDeploymentSpec::new( "user123".to_string(), "gpu-app".to_string(), "pytorch:latest".to_string(), 1, 8080, "/deployments/gpu-app".to_string(), ) .with_resources(ResourceRequirements { cpu: "4000m".to_string(), memory: "16Gi".to_string(), gpus: Some(GpuSpec { count: 1, model: vec!["A100-40GB".to_string(), "H100-80GB".to_string()], min_cuda_version: None, min_gpu_memory_gb: None, }), cpu_request_ratio: 1.0, }); let deployment = render_deployment("gpu-app", "u-user123", &spec, None).unwrap(); let template = deployment.spec.unwrap().template; let template_meta = template.metadata.unwrap(); let labels = template_meta.labels.unwrap(); // Check if label matches normalized format or is intentionally different let gpu_model_label = labels.get("gpu-request.basilica.ai/model").unwrap(); // Currently: "A100-40GB_H100-80GB" (raw) // Node affinity: ["A10040GB", "H10080GB"] (normalized) // Document the expected behavior assert!(gpu_model_label == "A100-40GB_H100-80GB" || gpu_model_label == "A10040GB_H10080GB"); }This would document whether the inconsistency is intentional or should be fixed.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
crates/basilica-cli/src/cli/handlers/deploy/templates/common.rs(1 hunks)crates/basilica-cli/src/cli/handlers/deploy/templates/mod.rs(1 hunks)crates/basilica-cli/src/cli/handlers/deploy/templates/model_size.rs(1 hunks)crates/basilica-cli/src/cli/handlers/deploy/templates/sglang.rs(1 hunks)crates/basilica-cli/src/cli/handlers/deploy/templates/vllm.rs(1 hunks)crates/basilica-operator/src/controllers/user_deployment_controller.rs(7 hunks)crates/basilica-payments/src/storage/reconciliation.rs(4 hunks)crates/bittensor/src/connect/monitor.rs(3 hunks)scripts/migrations/.gitignore(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- scripts/migrations/.gitignore
🧰 Additional context used
🧬 Code graph analysis (5)
crates/basilica-cli/src/cli/handlers/deploy/templates/vllm.rs (2)
crates/basilica-cli/src/cli/handlers/deploy/templates/common.rs (3)
create_with_retry(47-78)parse_env_vars(28-44)wait_for_ready(110-173)crates/basilica-cli/src/cli/handlers/deploy/templates/model_size.rs (1)
estimate_gpu_requirements(32-50)
crates/basilica-cli/src/cli/handlers/deploy/templates/sglang.rs (2)
crates/basilica-cli/src/cli/handlers/deploy/templates/common.rs (3)
create_with_retry(47-78)parse_env_vars(28-44)wait_for_ready(110-173)crates/basilica-cli/src/cli/handlers/deploy/templates/model_size.rs (1)
estimate_gpu_requirements(32-50)
crates/basilica-cli/src/cli/handlers/deploy/templates/common.rs (1)
crates/basilica-cli/src/progress/mod.rs (2)
complete_spinner_and_clear(137-139)create_spinner(98-109)
crates/basilica-cli/src/cli/handlers/deploy/templates/mod.rs (3)
crates/basilica-cli/src/cli/handlers/deploy/templates/model_size.rs (1)
estimate_gpu_requirements(32-50)crates/basilica-cli/src/cli/handlers/deploy/templates/sglang.rs (1)
handle_sglang_deploy(29-150)crates/basilica-cli/src/cli/handlers/deploy/templates/vllm.rs (1)
handle_vllm_deploy(29-150)
crates/basilica-operator/src/controllers/user_deployment_controller.rs (1)
crates/basilica-operator/src/labels.rs (1)
normalize_gpu_models(222-229)
🔇 Additional comments (38)
crates/basilica-payments/src/storage/reconciliation.rs (1)
195-198: Excellent refactoring to use the centralized helper!This change successfully addresses the previous review feedback about duplicated status mapping logic. The code is now cleaner and more maintainable.
crates/bittensor/src/connect/monitor.rs (15)
13-15: LGTM!The helper correctly identifies insecure endpoints for conditional client creation.
29-30: LGTM!Wrapping clients in
Arc<RwLock<...>>correctly enables thread-safe concurrent access and supports the newreconnect()capability.
36-44: LGTM!The constructor properly delegates to helper methods and wraps clients in the appropriate Arc/RwLock containers.
46-54: LGTM!The helper correctly handles both secure and insecure endpoints with appropriate subxt API calls and helpful debug logging.
56-64: LGTM!Consistent with
create_client, this helper properly handles endpoint security and constructs the RPC methods client.
66-80: Deadlock issue resolved!The lock ordering has been corrected. Both
reconnect()(lines 73-74) andget_transfers_at_block()(lines 99-100) now consistently acquire locks in the same order:client→rpc_methods. This eliminates the deadlock risk identified in the previous review. The implementation also correctly creates new clients before acquiring write locks, ensuring atomic updates only on success.
83-87: LGTM!Correctly uses a read lock for this read-only operation.
90-94: LGTM!Correct lock usage and proper delegation to the transfer extraction helper.
97-108: Consistent lock ordering maintained!The explicit comment on line 98 and the implementation (lines 99-100) confirm that the lock ordering now matches
reconnect(), preventing the previously identified deadlock hazard. The method correctly acquires read locks in the proper sequence.
134-179: LGTM!The method properly handles both named and unnamed field representations from the subxt event system, with appropriate length validation and error handling via
Option.
182-214: LGTM!The polling logic correctly detects new blocks, retrieves transfers, and invokes the callback with proper error propagation.
216-219: LGTM!Simple, correct accessor for the endpoint URL.
221-232: LGTM!The pruning heuristic is logically sound: future blocks return
false, and the age calculation correctly determines if retention has been exceeded.
236-260: LGTM!The string-based parsing is inherently tied to subxt's
Displayformat, but the 32-byte validation and test coverage (lines 266-292) provide adequate safeguards for extracting AccountId32 values.
262-293: LGTM!The test validates the account hex parsing logic with appropriate assertions on length, format, and specific byte conversions.
crates/basilica-cli/src/cli/handlers/deploy/templates/sglang.rs (4)
29-150: LGTM: Well-structured deployment orchestration.The deployment flow is comprehensive and handles various scenarios properly:
- GPU auto-detection with user override
- Environment variable parsing
- Retry logic with backoff
- Status polling with timeout handling
- Both attached and detached modes
153-185: LGTM: RFC 1123 compliant name generation.The function correctly sanitizes model names and generates valid deployment names by:
- Extracting the model name without org prefix
- Converting to lowercase and filtering non-compliant characters
- Truncating to prevent excessively long names
- Adding a UUID suffix for uniqueness
297-334: LGTM: Comprehensive success message with helpful examples.The success message provides all necessary information for users to interact with their SGLang deployment, including:
- Deployment status and endpoints
- API endpoint documentation
- Practical curl example
- Management command references
267-294: The/healthendpoint is widely documented and used across SGLang deployments. SGLang implements a GET /health endpoint for basic health checks, and the server starts at http://localhost:30000 by default. Multiple deployment guides confirm using readiness_probe: /health with SGLang. The health check probe configuration in the code is correct and aligns with standard SGLang deployment practices.crates/basilica-cli/src/cli/handlers/deploy/templates/mod.rs (1)
1-14: LGTM: Clean module organization with appropriate exports.The module properly organizes deployment templates and exposes a clean public API through selective re-exports. The documentation clearly describes the module's purpose and supported frameworks.
crates/basilica-cli/src/cli/handlers/deploy/templates/vllm.rs (2)
29-150: LGTM: Consistent deployment orchestration with SGLang template.The deployment flow follows the same robust pattern as the SGLang template, ensuring consistency across inference frameworks. The use of shared utilities from
common.rssuccessfully eliminates code duplication.
307-346: LGTM: Comprehensive success message with OpenAI-compatible examples.The success message effectively guides users on interacting with the vLLM deployment through OpenAI-compatible endpoints, with practical curl examples.
crates/basilica-cli/src/cli/handlers/deploy/templates/model_size.rs (7)
1-27: LGTM: Clean data structures and efficient regex initialization.The module uses appropriate types and efficiently initializes the regex pattern with
LazyLock. TheGpuRequirementsstruct provides clear fields for GPU estimation results.
32-50: LGTM: Clear estimation logic with appropriate fallbacks.The function implements a sensible strategy: first attempting to extract parameter counts from the model name, then falling back to model family heuristics.
66-71: LGTM: Reasonable memory estimation heuristic.The 2GB per billion parameters with 20% overhead is a practical rule of thumb for FP16 inference. Rounding to the nearest 8GB provides cleaner resource allocation.
74-110: LGTM: Comprehensive model family heuristics.The function covers major model families with reasonable memory estimates. The default of 16GB for unknown models is a safe middle-ground choice.
113-116: LGTM: Correct GPU count calculation with reasonable limits.The ceiling division ensures sufficient GPU memory, and the 1-8 GPU clamp prevents unrealistic resource requests.
118-126: LGTM: GPU recommendations now align with autoscaler offerings.The function correctly returns canonical GPU model names ("A100", "H100") without vendor prefixes or memory suffixes, which will properly match the autoscaler's offering matcher after normalization. This addresses the previous review concern.
128-185: LGTM: Comprehensive test coverage.The tests thoroughly verify:
- Parameter extraction from various model name formats
- Memory estimation calculations
- GPU count logic including capping behavior
- End-to-end estimation for real model names
- GPU recommendation logic
crates/basilica-cli/src/cli/handlers/deploy/templates/common.rs (5)
14-25: LGTM: Well-defined constants and status enum.The retry configuration values are reasonable, and the
WaitResultenum provides clear deployment status outcomes. Boxing theDeploymentResponsein theReadyvariant is a good optimization for enum size.
27-44: LGTM: Robust environment variable parsing.The function correctly handles the KEY=VALUE format and properly uses
splitn(2, '=')to allow equals signs in values. Error messages are clear and actionable.
80-101: LGTM: Comprehensive quota detection logic.The functions properly identify quota-exceeded scenarios across different error types and status codes (403, 429), with appropriate message extraction for user feedback.
110-173: LGTM: Robust deployment status polling with phase-aware timing.The function implements effective status monitoring with:
- Timeout handling
- Phase-based progress updates
- Optimized polling intervals based on deployment phase
- Proper terminal state detection (Active, Failed, Terminating)
175-188: LGTM: User-friendly phase messaging.The function provides clear, actionable status messages for each deployment phase, significantly improving the user experience during deployment.
crates/basilica-operator/src/controllers/user_deployment_controller.rs (3)
918-922: Good addition to prevent autoscaler from provisioning nodes for stale pods.Calling
cleanup_stale_replica_setsafter patching the deployment is a smart optimization. This ensures that old ReplicaSets with outdated pod specs (including stale GPU node affinity requirements) don't cause the autoscaler to provision nodes unnecessarily.The best-effort approach (errors logged but reconciliation continues) is appropriate here.
1057-1139: Well-implemented cleanup logic for stale ReplicaSets.The implementation is defensive and correct:
- Only scales down old (non-current) ReplicaSets based on revision annotations
- Only acts on RSs with 0 ready replicas but non-zero desired replicas
- Handles errors gracefully with debug logging
- Early returns minimize overhead on typical reconciliations
The approach of directly scaling RSs to 0 is safe because:
- Only old RSs are targeted (not the current revision)
- The Deployment controller expects old RSs to be scaled down during rollouts
- This just accelerates what Kubernetes would do eventually
535-559: Remove the incorrect autoscaler correlation concern; the label/annotation approach is intentional.The review comment incorrectly assumes the autoscaler correlates pod labels (
gpu-request.basilica.ai/model) with node labels. In reality, the autoscaler reads the JSON annotation (autoscaler.basilica.ai/gpu-requirements), not the pod label.The pod label uses raw GPU model names (e.g.,
A100-40GB_H100-80GB) for human-readable pod identification, while node affinity uses the separatebasilica.ai/gpu-modelkey with normalized model names (e.g.,A10040GB_H10080GB) for selector matching. These serve different purposes and are intentionally separate—no inconsistency exists.The 63-character label value limit is unlikely to be exceeded in practice with standard NVIDIA GPU names. If validation is desired, it can be added as an optional enhancement, but this is a minor concern.
Likely an incorrect or invalid review comment.
Summary by CodeRabbit
New Features
Bug Fixes
Improvements
✏️ Tip: You can customize this high-level summary in your review settings.