Skip to content

Conversation

@itzlambda
Copy link
Collaborator

@itzlambda itzlambda commented Dec 12, 2025

Summary by CodeRabbit

  • New Features

    • End-to-end VIP rentals: CSV/S3/mock sources, validation, in-memory cache, periodic poller, DB reconciliation, optional billing integration, and SDK/API fields to surface VIP metadata.
    • VIP-specific rental operations and utilities; VIPs are visible in status/logs but protected from standard stop/restart flows.
  • Chores

    • Database migration and indexes to track VIP rentals; health checks exclude VIP rows.
    • Config additions and server startup wiring to enable/mock VIP mode.
  • Chore (CLI)

    • CLI resolver updated to optionally exclude VIP rentals where appropriate.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 12, 2025

Walkthrough

Adds end-to-end VIP rental support: new VIP config, CSV/S3/Mock data sources, in-memory cache, DB-backed rental operations, a reconciliation poller with optional billing, server wiring, CLI/SDK flags, and a DB migration for VIP rentals.

Changes

Cohort / File(s) Summary
VIP Module - Core Implementation
crates/basilica-aggregator/src/vip/types.rs, crates/basilica-aggregator/src/vip/cache.rs, crates/basilica-aggregator/src/vip/csv.rs, crates/basilica-aggregator/src/vip/poller.rs, crates/basilica-aggregator/src/vip/rental_ops.rs, crates/basilica-aggregator/src/vip/task.rs, crates/basilica-aggregator/src/vip/mod.rs
New vip module: data models (CSV row, connection/display, validated machine, rental record), thread-safe in-memory cache with DB rebuild, CSV/S3/Mock data sources, a Poller reconciling source→DB→cache (with pricing & optional billing), rental DB operations, and a periodic task wrapper.
Aggregator Configuration & Dependencies
crates/basilica-aggregator/Cargo.toml, crates/basilica-aggregator/src/config.rs, crates/basilica-aggregator/src/lib.rs
Adds csv, aws-config, aws-sdk-s3, workspace deps (basilica-billing, basilica-protocol). Introduces public VipConfig and adds vip: Option<VipConfig> to Config. Exports vip module.
Aggregator Models & DB
crates/basilica-aggregator/src/models.rs, crates/basilica-aggregator/src/db.rs, crates/basilica-aggregator/src/service.rs
Adds Provider::Vip, VIP machine-id helpers, and Deployment.is_vip: bool; updates DB queries/parsing to read/write is_vip and excludes VIP rentals from non-VIP queries.
API Config, Routes & Server
crates/basilica-api/src/config/mod.rs, crates/basilica-api/src/api/routes/secure_cloud.rs, crates/basilica-api/src/server.rs
Propagates vip config into aggregator, extends rental query rows with is_vip and raw_response, enriches rental list with VIP-derived fields, blocks stopping VIP rentals, and wires VIP poller/task startup (mock vs CSV/S3 selection).
DB Migration
crates/basilica-api/migrations/018_vip_rentals.sql
Adds is_vip boolean column (default FALSE), indexes, and comment documenting vip: provider_instance_id prefix semantics.
CLI & SDK
crates/basilica-cli/src/cli/handlers/gpu_rental.rs, crates/basilica-cli/src/cli/handlers/gpu_rental_helpers.rs, crates/basilica-sdk/src/types.rs
Adds exclude_vip: bool parameter to unified rental resolver and call sites to selectively exclude VIPs; SDK SecureCloudRentalListItem gains is_vip: bool with serde(default).
Project Config
.gitignore
Adds dev/ to ignored paths.

Sequence Diagram(s)

sequenceDiagram
    participant Server as Server Startup
    participant DataSource as DataSource (CSV/S3/Mock)
    participant Cache as VipCache
    participant DB as PostgreSQL
    participant Poller as VipPoller
    participant Billing as BillingClient (optional)

    Server->>DataSource: init (config or mock)
    Server->>Cache: rebuild_from_db()
    Cache->>DB: query active VIP rentals
    DB-->>Cache: VipRental rows
    Server->>Poller: new(config, DataSource, Cache, DB, markup, billing?)
    Server->>Poller: poll_once() [initial]

    Poller->>DataSource: fetch_vip_rows()
    DataSource-->>Poller: Vec<VipCsvRow>

    loop per row
        Poller->>Poller: validate_row()
        alt valid
            Poller->>Cache: get(vip_id)
            alt cache hit
                Poller->>DB: update_vip_rental_metadata() [if needed]
                Poller->>Cache: insert/update
            else cache miss
                Poller->>DB: get_vip_rental_by_machine_id()
                alt db found
                    Poller->>Cache: insert(relinked)
                else new rental
                    Poller->>Poller: prepare_vip_rental()
                    Poller->>Billing: register_with_billing() [optional]
                    Poller->>DB: insert_vip_rental()
                    Poller->>Cache: insert(new)
                end
            end
        else invalid
            Poller->>Poller: skip row
        end
    end

    loop stale rentals
        Poller->>Billing: finalize_rental_billing() [best-effort]
        Poller->>DB: close_vip_rental()
        Poller->>Cache: remove(vip_id)
    end

    Poller-->>Server: PollStats
    Server->>Server: spawn VipPollerTask (periodic)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

  • Focus review on:
    • crates/basilica-aggregator/src/vip/poller.rs (reconciliation, billing integration, metadata diffing)
    • crates/basilica-aggregator/src/vip/rental_ops.rs (pricing, insert/close/archive semantics)
    • crates/basilica-aggregator/src/vip/cache.rs (concurrency, rebuild_from_db, JSON parsing)
    • crates/basilica-api/src/api/routes/secure_cloud.rs and src/server.rs (VIP enrichment, stop guard, startup wiring)
    • Migration 018_vip_rentals.sql (column defaults and indexes)

Possibly related PRs

Suggested reviewers

  • epappas
  • distributedstatemachine

Poem

🐰 I found a CSV and a tiny S3 tree,

prefixed vip: machines hopping for me.
Cache keeps watch while pollers hum along,
billing sings softly and DB stays strong.
A rabbit applauds — new features, hooray! 🎉

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(aggregator): managed machines sync' clearly describes the main feature addition: synchronization of managed (VIP) machines into the aggregator system.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch managed-machines-sync

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@itzlambda itzlambda marked this pull request as draft December 12, 2025 08:13
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (13)
crates/basilica-api/migrations/017_vip_rentals.sql (1)

1-18: Consider CREATE INDEX CONCURRENTLY (+ -- no-transaction) to avoid write locks in prod. If secure_cloud_rentals is large/hot, the current CREATE INDEX may block.

+-- no-transaction
 -- VIP Rental Support
@@
-CREATE INDEX IF NOT EXISTS idx_secure_cloud_rentals_is_vip
+CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_secure_cloud_rentals_is_vip
     ON secure_cloud_rentals(is_vip)
     WHERE is_vip = TRUE;
@@
-CREATE INDEX IF NOT EXISTS idx_secure_cloud_rentals_provider_instance_id_vip
+CREATE INDEX CONCURRENTLY IF NOT EXISTS idx_secure_cloud_rentals_provider_instance_id_vip
     ON secure_cloud_rentals(provider_instance_id)
     WHERE provider_instance_id LIKE 'vip:%';
crates/basilica-api/src/api/routes/secure_cloud.rs (1)

585-613: VIP stop prevention logic is correct; consider destructuring the tuple to avoid .0/.1/.2 footguns.

-    let rental: (String, chrono::DateTime<Utc>, bool) = sqlx::query_as(
+    let (owner_user_id, created_at, is_vip): (String, chrono::DateTime<Utc>, bool) = sqlx::query_as(
         "SELECT user_id, created_at, is_vip
          FROM secure_cloud_rentals
          WHERE id = $1",
     )
@@
-    if rental.0 != auth.user_id {
+    if owner_user_id != auth.user_id {
@@
-    if rental.2 {
+    if is_vip {
@@
-    let duration = stop_time.signed_duration_since(rental.1);
+    let duration = stop_time.signed_duration_since(created_at);
crates/basilica-aggregator/src/vip/task.rs (1)

24-69: Consider adding graceful shutdown support.

The polling loop runs indefinitely with no way to stop it gracefully. If the application needs to shut down cleanly (e.g., for zero-downtime deployments), consider accepting a cancellation token or shutdown signal.

+use tokio_util::sync::CancellationToken;
+
 pub struct VipPollerTask {
     poller: Arc<VipPoller>,
     interval: Duration,
+    cancel_token: CancellationToken,
 }

-    pub fn start(self) -> JoinHandle<()> {
+    pub fn start(self) -> JoinHandle<()> {
         tokio::spawn(async move {
             let mut timer = interval(self.interval);
             timer.set_missed_tick_behavior(MissedTickBehavior::Skip);
             // ...
             loop {
-                timer.tick().await;
+                tokio::select! {
+                    _ = self.cancel_token.cancelled() => {
+                        tracing::info!("VIP poller task shutting down");
+                        break;
+                    }
+                    _ = timer.tick() => {}
+                }
                 // ... rest of polling logic
             }
         })
     }
crates/basilica-aggregator/src/db.rs (1)

477-498: Use DeploymentStatus enum instead of string literal for consistency.

Other queries in this file use DeploymentStatus::*.as_str() (e.g., get_all_active_deployments at line 417). Using the enum here ensures consistency and catches typos at compile time.

     pub async fn list_vip_rentals(&self) -> Result<Vec<Deployment>> {
         let rows = sqlx::query(
             r#"
             SELECT id, user_id, provider, provider_instance_id, offering_id, instance_type, location_code,
                    status, hostname, ssh_key_id, ip_address, connection_info, raw_response,
                    error_message, created_at, updated_at, is_vip
             FROM secure_cloud_rentals
-            WHERE is_vip = TRUE AND status != 'deleted'
+            WHERE is_vip = TRUE AND status != $1
             ORDER BY created_at DESC
             "#,
         )
+        .bind(DeploymentStatus::Deleted.as_str())
         .fetch_all(&self.pool)
         .await?;
crates/basilica-aggregator/src/vip/mod.rs (1)

8-13: Consider selective re-exports for clearer public API.

Glob re-exports (pub use *) work but can make it harder to understand the module's public surface. If this is an internal module, it's fine. For external consumers, selective re-exports improve discoverability.

Example of selective re-exports:

pub use cache::VipCache;
pub use poller::{VipPoller, PollStats};
pub use sheets::{GoogleSheetsClient, SheetsError};
pub use task::VipPollerTask;
pub use types::{VipSheetRow, VipCacheEntry};
// etc.
crates/basilica-aggregator/src/config.rs (1)

169-184: Consider validating VIP config in validate() when enabled.

The validate() method checks provider configuration but doesn't validate VIP config. If VIP is enabled but misconfigured (e.g., enabled = true with empty google_sheet_id), the error will only surface at runtime. Adding validation here would provide earlier feedback.

     pub fn validate(&self) -> Result<()> {
         // Check at least one provider is enabled (has auth configured)
         let any_enabled = self.providers.datacrunch.is_enabled()
             || self.providers.hyperstack.is_enabled()
             || self.providers.lambda.is_enabled()
             || self.providers.hydrahost.is_enabled();

         if !any_enabled {
             return Err(AggregatorError::Config(
                 "At least one provider must be configured (has auth)".to_string(),
             ));
         }

+        // Validate VIP config if enabled
+        if self.vip.enabled && !self.vip.is_configured() {
+            return Err(AggregatorError::Config(
+                "VIP is enabled but missing google_sheet_id or google_credentials_path".to_string(),
+            ));
+        }
+
         Ok(())
     }
crates/basilica-aggregator/src/vip/sheets.rs (2)

60-65: Consider adding retry logic or timeout for API calls.

The Google Sheets API call has no timeout or retry logic. Transient network issues could cause the poll cycle to fail. Since the caller (VipPollerTask) continues on error, this may be acceptable, but adding resilience here would reduce error noise in logs.

You could wrap the API call with a timeout and/or use a retry library:

use tokio::time::timeout;

let result = timeout(
    Duration::from_secs(30),
    self.hub.spreadsheets().values_get(&self.sheet_id, range).doit()
).await
    .map_err(|_| SheetsError::Api("Request timed out".to_string()))?
    .map_err(|e| SheetsError::Api(e.to_string()))?;

67-67: Log when sheet returns no data.

If values is None, the code silently returns an empty vector. This could mask configuration issues (wrong sheet ID, permissions, etc.). Consider logging at debug/warn level.

-        let values = result.1.values.unwrap_or_default();
+        let values = match result.1.values {
+            Some(v) => v,
+            None => {
+                tracing::debug!("Google Sheet returned no values - sheet may be empty or inaccessible");
+                return Ok(Vec::new());
+            }
+        };
crates/basilica-aggregator/src/vip/cache.rs (2)

22-80: VipCache API looks solid; consider whether async is necessary for all methods.
Given this is in-memory only, the ergonomics are good; cloning records on read is a reasonable tradeoff for avoiding lock leakage.


86-97: Consider query_as!/query! (or tests) to catch schema drift early.
Right now column/type mismatches will surface only at runtime.

Also applies to: 180-193

crates/basilica-aggregator/src/vip/poller.rs (1)

238-267: Re-link path likely needs a metadata sync as well.
When DB has an existing rental but cache is empty, you re-link without ensuring DB metadata matches the sheet. Consider calling update_vip_rental_metadata here too (even unconditionally).

crates/basilica-aggregator/src/vip/rental_ops.rs (2)

17-25: Define a rounding/quantization policy for marked-up hourly rates (if used for billing).
Currently this can produce arbitrary scale depending on input.


219-238: Assumes at most one active VIP rental per machine; enforce with DB constraints.
Without a uniqueness constraint, fetch_optional can behave unpredictably if duplicates exist.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8b8eb65 and d902b02.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (17)
  • crates/basilica-aggregator/Cargo.toml (1 hunks)
  • crates/basilica-aggregator/src/config.rs (7 hunks)
  • crates/basilica-aggregator/src/db.rs (9 hunks)
  • crates/basilica-aggregator/src/lib.rs (1 hunks)
  • crates/basilica-aggregator/src/models.rs (1 hunks)
  • crates/basilica-aggregator/src/service.rs (1 hunks)
  • crates/basilica-aggregator/src/vip/cache.rs (1 hunks)
  • crates/basilica-aggregator/src/vip/mod.rs (1 hunks)
  • crates/basilica-aggregator/src/vip/poller.rs (1 hunks)
  • crates/basilica-aggregator/src/vip/rental_ops.rs (1 hunks)
  • crates/basilica-aggregator/src/vip/sheets.rs (1 hunks)
  • crates/basilica-aggregator/src/vip/task.rs (1 hunks)
  • crates/basilica-aggregator/src/vip/types.rs (1 hunks)
  • crates/basilica-api/migrations/017_vip_rentals.sql (1 hunks)
  • crates/basilica-api/src/api/routes/secure_cloud.rs (2 hunks)
  • crates/basilica-api/src/config/mod.rs (3 hunks)
  • crates/basilica-api/src/server.rs (2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-13T15:34:46.544Z
Learnt from: epappas
Repo: one-covenant/basilica PR: 184
File: crates/basilica-validator/src/rental/billing.rs:129-130
Timestamp: 2025-10-13T15:34:46.544Z
Learning: In the basilica-validator crate, rental_id is a String type (not UUID) throughout the system - in RentalInfo struct, database rentals.id column, and all rental operations. The persistence layer methods should accept &str for rental_id parameters, not &Uuid, since they immediately convert to String for SQL binding anyway.

Applied to files:

  • crates/basilica-aggregator/src/vip/types.rs
  • crates/basilica-api/src/api/routes/secure_cloud.rs
  • crates/basilica-aggregator/src/vip/rental_ops.rs
🧬 Code graph analysis (8)
crates/basilica-aggregator/src/models.rs (1)
crates/basilica-billing/src/domain/rentals.rs (1)
  • provider_instance_id (208-212)
crates/basilica-aggregator/src/vip/task.rs (2)
crates/basilica-aggregator/src/vip/cache.rs (1)
  • new (16-20)
crates/basilica-aggregator/src/vip/poller.rs (1)
  • new (48-60)
crates/basilica-aggregator/src/config.rs (1)
crates/basilica-api/src/config/mod.rs (6)
  • default (54-62)
  • default (76-81)
  • default (95-100)
  • default (114-119)
  • default (142-148)
  • default (185-190)
crates/basilica-aggregator/src/db.rs (2)
crates/basilica-aggregator/src/models.rs (1)
  • format_vip_machine_id (155-157)
crates/basilica-aggregator/src/vip/rental_ops.rs (1)
  • sqlx (80-112)
crates/basilica-aggregator/src/vip/sheets.rs (2)
crates/basilica-aggregator/src/vip/cache.rs (1)
  • new (16-20)
crates/basilica-aggregator/src/vip/poller.rs (1)
  • new (48-60)
crates/basilica-aggregator/src/vip/cache.rs (1)
crates/basilica-aggregator/src/models.rs (2)
  • extract_vip_machine_id (160-162)
  • from (206-214)
crates/basilica-aggregator/src/vip/rental_ops.rs (1)
crates/basilica-aggregator/src/models.rs (1)
  • format_vip_machine_id (155-157)
crates/basilica-aggregator/src/vip/poller.rs (3)
crates/basilica-aggregator/src/vip/rental_ops.rs (5)
  • close_vip_rental (132-217)
  • insert_vip_rental (70-128)
  • prepare_vip_rental (47-66)
  • update_vip_rental_metadata (241-290)
  • sqlx (80-112)
crates/basilica-aggregator/src/vip/cache.rs (2)
  • new (16-20)
  • default (196-198)
crates/basilica-aggregator/src/vip/sheets.rs (1)
  • new (30-53)
🔇 Additional comments (14)
crates/basilica-aggregator/src/lib.rs (1)

12-12: Public vip module export looks fine. Adds the intended integration surface at the crate root.

crates/basilica-aggregator/src/service.rs (1)

528-528: Good: explicit is_vip: false on normal deployments. Prevents accidental VIP semantics on standard provisioning.

crates/basilica-api/src/config/mod.rs (1)

131-135: Config wiring looks correct and backwards-compatible (#[serde(default)]).

Also applies to: 141-148, 350-351

crates/basilica-aggregator/Cargo.toml (1)

46-49: No additional configuration required. google-sheets4 v5 and yup-oauth2 v9 both default to the hyper-rustls TLS stack, which aligns with the workspace's rustls-only strategy for MUSL compatibility. The existing implementation in crates/basilica-aggregator/src/vip/sheets.rs already correctly constructs the hyper_rustls::HttpsConnector, confirming proper async runtime and TLS integration.

crates/basilica-aggregator/src/vip/task.rs (1)

10-13: LGTM! Well-structured polling task.

The struct cleanly encapsulates the poller and interval configuration. Converting poll_interval_secs to Duration at construction time is a good practice.

crates/basilica-aggregator/src/db.rs (2)

167-198: LGTM! Deployment creation properly includes is_vip.

The INSERT statement and binding are correctly updated to support the new is_vip field.


454-475: LGTM! VIP machine lookup by prefixed ID.

The method correctly uses format_vip_machine_id to construct the lookup key, maintaining consistency with how VIP rentals are stored in rental_ops.rs.

crates/basilica-aggregator/src/config.rs (1)

107-141: LGTM! Well-structured VIP configuration.

The VipConfig struct has sensible defaults and the is_configured() method properly validates that all required fields are present when VIP is enabled. The #[serde(default)] attributes ensure backwards compatibility with existing config files.

crates/basilica-aggregator/src/vip/sheets.rs (3)

9-21: LGTM! Comprehensive error types.

The SheetsError enum covers credential handling, authentication, API errors, and row parsing with good context (row number and message).


28-53: LGTM! Service account authentication setup.

The authentication flow correctly reads the service account key, builds the authenticator, and configures the HTTPS connector with native roots.


88-140: LGTM! Clear row parsing with good documentation.

The column mapping is well-documented in the comment. The get_col closure provides consistent error handling for required fields, and optional notes is handled appropriately.

crates/basilica-aggregator/src/vip/types.rs (1)

1-57: Type definitions are clean and appropriately scoped.
Also aligns with system-wide “rental_id is a String” usage. (Based on retrieved learnings.)

crates/basilica-aggregator/src/vip/rental_ops.rs (2)

44-67: rental_id as String is consistent with the broader system. (Based on retrieved learnings.)
Uuid::new_v4().to_string() is fine if UUID format is acceptable for downstream consumers.


130-217: Clarify intent: Is data loss acceptable if archival fails?

Both terminated_secure_cloud_rentals and secure_cloud_rentals tables are properly defined via migrations (015 and 013 respectively), with matching column schemas. However, the function silently loses data if archival fails—archival errors are logged but ignored (line 189-196), while deletion proceeds unconditionally (line 198-208). If archival encounters any error (permission, constraint violation, network issue), the rental record is deleted without being preserved in the archive table.

If this best-effort archival is intentional, document it in a comment. If not, consider:

  • Gating deletion on successful archival
  • Making archival failures fatal to close_vip_rental
  • Adding startup validation that both tables exist and have expected columns

Automatically sync team-managed GPU machines from a Google Sheet
inventory into the rental system with polling, fail-safe updates,
and startup recovery.
@itzlambda itzlambda force-pushed the managed-machines-sync branch from 605ae25 to 57953a2 Compare December 12, 2025 09:06
@itzlambda
Copy link
Collaborator Author

@coderabbitai resolve

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 12, 2025

✅ Actions performed

Comments resolved.

Add ability to test VIP managed machines without Google Sheets:
- VipDataSource trait abstracts data fetching (sheets vs mock)
- MockVipDataSource provides configurable test data
- Enable via `vip.mock_user_id` in config

Also fixes VIP rental display in list rentals endpoint.
Switch VIP machine inventory from Google Sheets API to CSV files:
- Support local CSV files for development/testing
- Support S3-hosted CSV for production
- Remove google-sheets4 and yup-oauth2 dependencies
- Add csv and aws-sdk-s3 dependencies
Update type naming and documentation to reflect CSV data source
instead of Google Sheets terminology throughout the VIP module.
VIP rentals were using a hardcoded 20% markup while secure cloud
rentals used the configurable pricing_config.secure_cloud_markup_percent
(default 10%). This caused billing inconsistency between the two
rental types.

Now VIP rentals receive the same markup percentage from the API's
PricingConfig, ensuring consistent pricing across all secure cloud
rental types.
Remove get_rental_by_vip_machine_id and list_vip_rentals as they have
no call sites in the codebase.
Simplifies VIP row filtering by using a boolean active field (0/1)
instead of comparing against a configurable ready_value string.
Also excludes VIP rentals from health checks since they're manually
managed.
- Add vcpu_count and system_memory_gb fields to VIP CSV schema
- Store and retrieve specs from rental raw_response JSON
- Expose is_vip flag in API response for client awareness
- Prevent users from stopping VIP rentals via exclude_vip filter
Resolve conflicts between VIP rental feature and SSH key schema changes:
- Adopt main's ssh_public_key storage (dropped ssh_key_id FK)
- Preserve VIP rental support (is_vip column, raw_response)
- Renumber VIP migration to 018
- Remove obsolete ssh_key_id nullable migration
Registers VIP rentals with billing service before DB insert and
finalizes billing when machines are removed from CSV.
VIP enablement is now determined by presence of the config rather
than an explicit enabled boolean field.
Allows overriding the default AWS region when fetching VIP CSV from S3.
@itzlambda itzlambda marked this pull request as ready for review December 20, 2025 09:22
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
crates/basilica-cli/src/cli/handlers/gpu_rental.rs (1)

692-694: Inconsistent VIP access control for restart operation.

VIP rentals are excluded from the down command (line 624 uses exclude_vip=true) but included in the restart command (line 692 uses exclude_vip=false). Since both are control operations that modify rental state, this creates an inconsistent authorization boundary.

If VIP rentals are managed machines that users cannot stop, they likely should not be able to restart them either. The comment "though restart may fail" suggests this inconsistency may cause issues.

🔎 Recommended fix
     // Single rental restart (no --all flag as per requirements)
-    // exclude_vip=false: VIP rentals included in selector (though restart may fail for secure cloud)
+    // exclude_vip=true: VIP rentals excluded from restart (consistent with down command)
     let (rental_id, _compute_type) =
-        resolve_target_rental_unified(target, None, &api_client, false).await?;
+        resolve_target_rental_unified(target, None, &api_client, true).await?;
♻️ Duplicate comments (10)
crates/basilica-api/src/server.rs (2)

134-140: VIP rentals should be excluded from billing telemetry.

The query at lines 134-140 sends synthetic telemetry for all running rentals but doesn't exclude VIP rentals. VIP rentals are managed separately by the VIP poller and shouldn't receive synthetic billing telemetry.

This was flagged in a previous review.

         SELECT r.id, COALESCE(o.gpu_count, 1) as gpu_count
         FROM secure_cloud_rentals r
         LEFT JOIN gpu_offerings o ON r.offering_id = o.id
-        WHERE r.status = 'running'
+        WHERE r.status = 'running' AND (r.is_vip = FALSE OR r.is_vip IS NULL)

232-234: VIP rentals should be excluded from credit exhaustion checks.

The credit exhaustion query at line 233 doesn't filter out VIP rentals. VIP rentals have their own billing lifecycle managed by the VIP poller.

This was flagged in a previous review.

-        match sqlx::query_as("SELECT id FROM secure_cloud_rentals WHERE status = 'running'")
+        match sqlx::query_as("SELECT id FROM secure_cloud_rentals WHERE status = 'running' AND (is_vip = FALSE OR is_vip IS NULL)")
crates/basilica-aggregator/src/models.rs (1)

152-153: Add #[serde(default)] to is_vip for backward compatibility.

This issue was raised in a previous review. Without #[serde(default)], deserializing older JSON or test fixtures that lack the is_vip field will fail.

     pub updated_at: DateTime<Utc>,
+    #[serde(default)]
     pub is_vip: bool,
 }
crates/basilica-aggregator/src/vip/cache.rs (1)

115-161: Lifetime bugs in JSON fallbacks and unsafe numeric casts remain unfixed.

The issues previously flagged are still present:

  1. Lines 119, 148: .unwrap_or(&row.ip_address.clone().unwrap_or_default()) takes a reference to a temporary, which may not compile or could cause undefined behavior.
  2. Lines 124, 153, 157, 161: as u16/as u32 casts can silently truncate values from u64.
Recommended fix
         for row in rows {
             if let Some(vip_machine_id) = row
                 .provider_instance_id
                 .as_ref()
                 .and_then(|id| extract_vip_machine_id(id))
             {
                 let vip_machine_id = vip_machine_id.to_string();
+                let ip_fallback = row.ip_address.clone().unwrap_or_default();
+                let instance_type_fallback = row.instance_type.clone().unwrap_or_default();

                 let connection = if let Some(conn_json) = &row.connection_info {
                     let ssh_host = conn_json
                         .get("ssh_host")
                         .and_then(|v| v.as_str())
-                        .unwrap_or(&row.ip_address.clone().unwrap_or_default())
+                        .unwrap_or(&ip_fallback)
                         .to_string();
                     let ssh_port = conn_json
                         .get("ssh_port")
                         .and_then(|v| v.as_u64())
-                        .unwrap_or(22) as u16;
+                        .and_then(|p| u16::try_from(p).ok())
+                        .unwrap_or(22);

Apply similar changes for gpu_count, vcpu_count, and system_memory_gb.

crates/basilica-aggregator/src/vip/poller.rs (5)

93-116: Stale-removal logic remains dangerously narrow.

seen_ids only contains rows that are both active AND valid. Rows present on the sheet but skipped due to active=false or validation failure will be incorrectly treated as removed, triggering rental closure—a destructive side effect.

Populate seen_ids for ALL rows present on the sheet (before filtering), and use a separate collection for active+valid machines.


146-148: poll_success is hardcoded to true even when errors occur.

When process_machine() or remove_stale_rental() fails (lines 120-127, 132-140), the error is logged but poll_success = true is still reported. This makes failures invisible to monitoring systems.

Track errors with a flag and set poll_success = !had_errors.


166-168: Avoid logging full VipCsvRow; it contains user identifiers.

tracing::warn!(row = ?row, ...) can leak assigned_user into logs, which is a compliance/privacy concern.

-            tracing::warn!(row = ?row, "Invalid row: missing vip_machine_id");
+            tracing::warn!("Invalid row: missing vip_machine_id");

206-248: last_seen_at not refreshed when metadata is unchanged.

A machine with unchanged metadata won't have its last_seen_at updated, making it appear stale for observability purposes even though it's actively present in the data source.

Consider always refreshing last_seen_at to maintain an accurate heartbeat.


320-331: hourly_rate changes are silently ignored.

metadata_differs() excludes hourly_rate from comparison, so rate changes in the CSV won't trigger updates. Since hourly_rate is stored in the DB (via raw_response) and used in billing calculations (see apply_markup() in rental_ops.rs), this could lead to billing discrepancies.

Either add hourly_rate to metadata_differs() to propagate changes, or document that rates are only captured at creation time.

crates/basilica-aggregator/src/vip/rental_ops.rs (1)

76-77: Documentation mismatch with actual usage.

This concern was already raised in a previous review: the documentation states this is "typically called from basilica-api after billing registration succeeds," but the poller actually inserts rentals before billing registration, potentially creating orphan DB records if billing fails.

🧹 Nitpick comments (7)
crates/basilica-cli/src/cli/handlers/gpu_rental_helpers.rs (1)

242-244: Consider a more specific error message when VIP filtering excludes all rentals.

When exclude_vip is true and all active rentals are VIP, the generic "No active rentals found" message might confuse users who can see VIP rentals elsewhere. A more specific message could improve the user experience.

🔎 Proposed enhancement for clearer error messaging
+    // Check if we filtered out VIP rentals
+    let has_vip_only = exclude_vip 
+        && secure_rentals.as_ref()
+            .map(|s| s.rentals.iter().all(|r| r.is_vip))
+            .unwrap_or(false);
+
     if unified_items.is_empty() {
-        return Err(eyre!("No active rentals found"));
+        return if has_vip_only {
+            Err(eyre!("No eligible rentals found. Only VIP rentals are active."))
+        } else {
+            Err(eyre!("No active rentals found"))
+        };
     }
crates/basilica-api/migrations/018_vip_rentals.sql (1)

5-18: Consider a CHECK constraint to enforce the is_vip and prefix relationship.

The migration establishes a semantic link between is_vip = TRUE and the vip: prefix in provider_instance_id, but doesn't enforce it at the database level. Adding a CHECK constraint would prevent data inconsistencies if application logic has bugs or if data is modified directly.

🔎 Suggested CHECK constraint
 ALTER TABLE secure_cloud_rentals ADD COLUMN IF NOT EXISTS is_vip BOOLEAN NOT NULL DEFAULT FALSE;
+
+-- Enforce that is_vip flag matches provider_instance_id prefix
+ALTER TABLE secure_cloud_rentals ADD CONSTRAINT chk_vip_prefix_consistency
+    CHECK (
+        (is_vip = TRUE AND provider_instance_id LIKE 'vip:%') OR
+        (is_vip = FALSE AND provider_instance_id NOT LIKE 'vip:%')
+    );
crates/basilica-aggregator/src/vip/types.rs (1)

23-42: Consider gating test_machine with #[cfg(test)] or moving to a test module.

The test_machine constructor is marked pub and available in production builds. If it's only intended for testing, consider:

+#[cfg(test)]
 impl VipCsvRow {
     /// Create a test machine row with sensible defaults
     pub fn test_machine(id: &str, user: &str) -> Self {

Alternatively, if it's needed for integration tests in other crates, the current approach is acceptable.

crates/basilica-aggregator/src/vip/task.rs (1)

24-71: Consider storing the JoinHandle for graceful shutdown.

The start() method returns a JoinHandle<()>, but in server.rs (lines 1042-1043 and 1106-1107), the handle is immediately dropped. While dropped handles don't cancel the task (it continues running), this means:

  1. No way to gracefully stop the poller on server shutdown
  2. No way to detect if the polling task panicked

If graceful shutdown isn't required, this is acceptable. Otherwise, consider storing the handle in the Server struct's background_tasks JoinSet.

crates/basilica-api/src/api/routes/secure_cloud.rs (1)

253-260: Potential precision loss when parsing VIP hourly rate.

The hourly rate is stored as a string in raw_response but parsed to f64, which can lose precision for monetary values. Since Decimal is used elsewhere for rates, consider using Decimal parsing for consistency:

-                    let vip_hourly_rate = raw_response
-                        .as_ref()
-                        .and_then(|r| r.get("hourly_rate"))
-                        .and_then(|v| v.as_str())
-                        .and_then(|s| s.parse::<f64>().ok())
-                        .unwrap_or(0.0);
+                    let vip_hourly_rate = raw_response
+                        .as_ref()
+                        .and_then(|r| r.get("hourly_rate"))
+                        .and_then(|v| v.as_str())
+                        .and_then(|s| s.parse::<Decimal>().ok())
+                        .and_then(|d| d.to_f64())
+                        .unwrap_or(0.0);

This is a minor issue since the final value is f64 anyway for the response, but using Decimal parsing first preserves precision during the parse step.

crates/basilica-aggregator/src/vip/cache.rs (1)

199-203: Return value may be misleading when rows are skipped.

count is set to rows.len() before filtering, but the function returns count after potentially skipping rows that fail extract_vip_machine_id. Consider returning entries.len() to reflect the actual number of records rebuilt into the cache.

Suggested fix
-        tracing::info!(count = count, "Rebuilt VIP cache from database");
+        let rebuilt_count = entries.len();
+        tracing::info!(db_rows = count, rebuilt = rebuilt_count, "Rebuilt VIP cache from database");

-        Ok(count)
+        Ok(rebuilt_count)
crates/basilica-aggregator/src/config.rs (1)

157-164: Consider validating S3 configuration completeness.

has_s3_source() requires both s3_bucket and s3_key to be non-empty, but doesn't verify s3_region. While the AWS SDK can use default region from the credential chain, explicitly documenting or validating this would prevent runtime surprises.

     /// Check if S3 source is configured
+    /// Note: s3_region is optional as AWS SDK falls back to default credential chain region
     pub fn has_s3_source(&self) -> bool {
         self.s3_bucket
             .as_ref()
             .map(|s| !s.is_empty())
             .unwrap_or(false)
             && self.s3_key.as_ref().map(|s| !s.is_empty()).unwrap_or(false)
     }
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d902b02 and 42be52a.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (21)
  • .gitignore (1 hunks)
  • crates/basilica-aggregator/Cargo.toml (2 hunks)
  • crates/basilica-aggregator/src/config.rs (7 hunks)
  • crates/basilica-aggregator/src/db.rs (9 hunks)
  • crates/basilica-aggregator/src/lib.rs (1 hunks)
  • crates/basilica-aggregator/src/models.rs (4 hunks)
  • crates/basilica-aggregator/src/service.rs (1 hunks)
  • crates/basilica-aggregator/src/vip/cache.rs (1 hunks)
  • crates/basilica-aggregator/src/vip/csv.rs (1 hunks)
  • crates/basilica-aggregator/src/vip/mod.rs (1 hunks)
  • crates/basilica-aggregator/src/vip/poller.rs (1 hunks)
  • crates/basilica-aggregator/src/vip/rental_ops.rs (1 hunks)
  • crates/basilica-aggregator/src/vip/task.rs (1 hunks)
  • crates/basilica-aggregator/src/vip/types.rs (1 hunks)
  • crates/basilica-api/migrations/018_vip_rentals.sql (1 hunks)
  • crates/basilica-api/src/api/routes/secure_cloud.rs (6 hunks)
  • crates/basilica-api/src/config/mod.rs (3 hunks)
  • crates/basilica-api/src/server.rs (4 hunks)
  • crates/basilica-cli/src/cli/handlers/gpu_rental.rs (4 hunks)
  • crates/basilica-cli/src/cli/handlers/gpu_rental_helpers.rs (3 hunks)
  • crates/basilica-sdk/src/types.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • crates/basilica-aggregator/src/service.rs
  • crates/basilica-aggregator/Cargo.toml
  • crates/basilica-aggregator/src/lib.rs
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-13T15:34:46.544Z
Learnt from: epappas
Repo: one-covenant/basilica PR: 184
File: crates/basilica-validator/src/rental/billing.rs:129-130
Timestamp: 2025-10-13T15:34:46.544Z
Learning: In the basilica-validator crate, rental_id is a String type (not UUID) throughout the system - in RentalInfo struct, database rentals.id column, and all rental operations. The persistence layer methods should accept &str for rental_id parameters, not &Uuid, since they immediately convert to String for SQL binding anyway.

Applied to files:

  • crates/basilica-cli/src/cli/handlers/gpu_rental.rs
  • crates/basilica-sdk/src/types.rs
  • crates/basilica-aggregator/src/vip/cache.rs
  • crates/basilica-aggregator/src/vip/types.rs
  • crates/basilica-aggregator/src/vip/rental_ops.rs
  • crates/basilica-api/src/api/routes/secure_cloud.rs
🧬 Code graph analysis (7)
crates/basilica-cli/src/cli/handlers/gpu_rental.rs (1)
crates/basilica-cli/src/cli/handlers/gpu_rental_helpers.rs (1)
  • resolve_target_rental_unified (131-286)
crates/basilica-aggregator/src/models.rs (1)
crates/basilica-billing/src/domain/rentals.rs (1)
  • provider_instance_id (208-212)
crates/basilica-aggregator/src/vip/cache.rs (2)
crates/basilica-aggregator/src/models.rs (1)
  • extract_vip_machine_id (164-166)
crates/basilica-aggregator/src/vip/rental_ops.rs (1)
  • sqlx (89-124)
crates/basilica-aggregator/src/vip/rental_ops.rs (1)
crates/basilica-aggregator/src/models.rs (1)
  • format_vip_machine_id (159-161)
crates/basilica-aggregator/src/vip/csv.rs (2)
crates/basilica-aggregator/src/vip/poller.rs (1)
  • new (55-71)
crates/basilica-aggregator/src/vip/types.rs (1)
  • test_machine (25-41)
crates/basilica-aggregator/src/vip/poller.rs (1)
crates/basilica-aggregator/src/vip/rental_ops.rs (5)
  • close_vip_rental (144-230)
  • get_vip_rental_by_machine_id (233-251)
  • insert_vip_rental (78-140)
  • update_vip_rental_metadata (254-306)
  • sqlx (89-124)
crates/basilica-api/src/api/routes/secure_cloud.rs (2)
crates/basilica-billing/src/domain/rentals.rs (1)
  • offering_id (223-225)
crates/basilica-api/src/api/middleware/balance.rs (1)
  • hourly_cost_with_markup (94-100)
🔇 Additional comments (36)
.gitignore (1)

137-137: LGTM!

Adding dev/ to .gitignore is appropriate for excluding development-time artifacts from version control. This aligns well with the existing patterns in the file.

crates/basilica-cli/src/cli/handlers/gpu_rental.rs (3)

229-230: LGTM: Appropriate VIP inclusion for status viewing.

Correctly allows VIP rentals to be viewed in status checks. Read-only operations should include VIP rentals for visibility.


362-363: LGTM: Appropriate VIP inclusion for log viewing.

Correctly allows VIP rentals to be viewed in log operations. Read-only operations should include VIP rentals for observability.


624-626: LGTM: Appropriate VIP exclusion for stop operations.

Correctly excludes VIP rentals from user-initiated stop operations. Managed machines should not be controllable by regular users for destructive operations.

crates/basilica-sdk/src/types.rs (1)

566-568: Field implementation is correct; verify stop rental endpoint enforces the is_vip check.

The is_vip: bool field with #[serde(default)] is properly implemented for backward compatibility with clear documentation. However, ensure the stop rental endpoint or service handler explicitly checks this field and rejects stop requests for VIP rentals (managed machines). This is a critical security-related field that requires corresponding authorization logic in the API layer.

crates/basilica-cli/src/cli/handlers/gpu_rental_helpers.rs (3)

686-688: LGTM! VIP access via SSH is correctly preserved.

The implementation correctly passes exclude_vip = false when selecting a rental for SSH access, ensuring VIP rentals remain accessible. The inline comment clearly documents this behavior.


220-223: VIP exclusion logic looks correct for secure cloud.

The implementation correctly skips VIP rentals when exclude_vip is true. Note that this filtering is applied only to secure cloud rentals, not community cloud rentals (lines 191-210), which suggests VIPs are a secure-cloud-only concept.

Verify that:

  1. The is_vip field exists on the secure cloud rental type and is correctly populated
  2. VIP rentals are indeed exclusive to secure cloud (confirming the asymmetric filtering is intentional)
#!/bin/bash
# Description: Verify the is_vip field exists on rental types and confirm VIP is secure-cloud-only

# Find the rental type definition and check for is_vip field
ast-grep --pattern 'struct $_ {
  $$$
  is_vip: $_,
  $$$
}'

# Search for VIP-related code to understand scope
rg -n -C 3 '\bis_vip\b' --type rust

127-127: All call sites of resolve_target_rental_unified have been updated with the new exclude_vip parameter. The function signature requires 4 parameters, and all 5 call sites in the codebase pass the parameter correctly in the proper order, with context-appropriate values (true for the down command, false for other operations).

crates/basilica-api/migrations/018_vip_rentals.sql (3)

5-6: LGTM!

The is_vip column addition is well-designed with proper defaults and idempotency safeguards.


8-11: LGTM!

The partial index on is_vip is an excellent optimization if VIP rentals are uncommon, as it keeps the index small while accelerating VIP-specific queries.


13-16: The partial index on provider_instance_id is redundant. A general index on this column already exists in migration 013. For equality lookups on VIP provider_instance_ids, the existing general index is more efficient since the query planner won't use a partial index when a general index covers the same operation on a larger dataset. Consider removing this partial index to avoid index maintenance overhead.

crates/basilica-aggregator/src/vip/types.rs (1)

1-82: LGTM - Well-structured VIP data models.

The type definitions are clean with appropriate separation between CSV input (VipCsvRow), connection details (VipConnectionInfo), display metadata (VipDisplayInfo), validated state (ValidVipMachine), and cache records (VipRentalRecord). The use of Decimal for hourly_rate ensures precision for monetary values.

crates/basilica-api/src/config/mod.rs (2)

131-146: LGTM - VIP configuration properly integrated.

The vip field is correctly marked with #[serde(default)] for backward compatibility, and the Default implementation initializes it to None. This allows existing configurations to work without the VIP section.


350-350: LGTM - Config propagation is correct.

The VIP configuration is properly cloned and passed through to the aggregator config.

crates/basilica-aggregator/src/db.rs (2)

415-429: LGTM - VIP exclusion in active deployments query.

The get_all_active_deployments correctly excludes VIP rentals with (is_vip = FALSE OR is_vip IS NULL), handling both explicit false values and NULL for backward compatibility with existing rows.


166-198: LGTM - is_vip properly bound in deployment creation.

The create_deployment function correctly binds is_vip at position 17, matching the updated INSERT statement.

crates/basilica-aggregator/src/models.rs (2)

155-166: LGTM - VIP machine ID helpers are well-designed.

The VIP_MACHINE_ID_PREFIX constant and associated helper functions (format_vip_machine_id, extract_vip_machine_id) provide a clean API for handling VIP-specific identifiers stored in provider_instance_id.


15-17: LGTM - Provider::Vip variant properly integrated.

The new Vip variant is correctly added with string conversion support in both as_str() and FromStr, enabling VIP rentals to be stored with provider = "vip".

crates/basilica-aggregator/src/vip/task.rs (1)

35-69: LGTM - Resilient polling loop with good observability.

The loop correctly:

  • Uses MissedTickBehavior::Skip to avoid backlog buildup
  • Measures and logs poll duration
  • Logs changes at INFO level, no-changes at DEBUG level
  • Continues polling after errors (non-fatal handling)
crates/basilica-api/src/server.rs (3)

1137-1144: LGTM - Health check correctly excludes VIP rentals.

The secure cloud health check query properly filters out VIP rentals with WHERE is_vip = FALSE. VIP machines are manually managed and don't need automated health checks.


993-1124: VIP poller startup logic is well-structured.

The initialization properly:

  • Logs configuration state for debugging
  • Handles mock mode separately with clear logging
  • Supports both S3 and local CSV sources
  • Rebuilds cache from DB before starting
  • Performs initial poll before starting periodic task
  • Handles errors gracefully without crashing startup

1362-1398: LGTM - Mock VIP data helper is appropriate for testing.

The create_mock_vip_data function provides realistic test data with sensible GPU configurations and pricing.

crates/basilica-api/src/api/routes/secure_cloud.rs (3)

644-650: LGTM - VIP rental stop guard is appropriate.

VIP rentals are manually managed, so preventing user-initiated stops with a clear error message directing them to support is the correct behavior.


229-305: LGTM - VIP metadata extraction logic is correct.

The conditional handling for VIP vs regular rentals properly extracts GPU details, resource specs, and pricing from different sources (raw_response for VIP, gpu_offerings for regular). The fallback values are sensible.


39-41: LGTM - RentalQueryRow correctly extended for VIP support.

The tuple type alias is updated to include is_vip and raw_response fields, matching the updated SQL query.

crates/basilica-aggregator/src/vip/cache.rs (1)

14-79: LGTM - Cache API is well-designed.

The basic cache operations using RwLock<HashMap> are correctly implemented with appropriate async read/write lock semantics. The get_ids_not_in method elegantly supports stale entry detection for the polling sync logic.

crates/basilica-aggregator/src/vip/mod.rs (1)

1-16: LGTM - Clean module organization.

The module structure follows Rust best practices with clear separation of concerns and convenient re-exports for the public API surface.

crates/basilica-aggregator/src/config.rs (1)

107-181: LGTM - Well-designed configuration structure.

The VipConfig struct provides flexible support for multiple data source modes (mock, local CSV, S3) with appropriate defaults and helper methods. The #[serde(default)] annotations ensure backward compatibility.

crates/basilica-aggregator/src/vip/csv.rs (2)

46-77: LGTM - Clean data source abstraction.

The CsvDataSource provides a well-designed abstraction over local files and S3 with proper async handling. The S3 integration correctly uses the AWS SDK's default credential chain with optional region override.


245-298: LGTM - MockVipDataSource is well-suited for testing.

The mock implementation with Arc<RwLock<Vec>> provides thread-safe mutation methods (set_rows, add_row, remove_row, update_row) that enable comprehensive test scenarios for the poller.

crates/basilica-aggregator/src/vip/poller.rs (2)

54-71: LGTM - Constructor provides clean dependency injection.

The VipPoller::new() constructor accepts all required dependencies including optional billing client, enabling flexible composition and testability.


359-416: LGTM - Billing integration follows correct order of operations.

The billing registration happens BEFORE the DB insert (lines 282-295), ensuring we don't create orphaned DB records if billing fails. The register_with_billing method correctly constructs the protobuf request with appropriate resource specs.

crates/basilica-aggregator/src/vip/rental_ops.rs (4)

1-27: LGTM! Solid error handling and monetary calculation.

The error enum is well-structured with thiserror, and apply_markup correctly uses Decimal for monetary calculations with proper overflow checking and conversion validation.


29-74: LGTM! Proper rental preparation with markup.

The PreparedVipRental struct and prepare_vip_rental function correctly handle rental ID generation and markup application. The use of String for rental_id is consistent with the system-wide convention.


176-221: Verify best-effort archival strategy is acceptable for compliance/audit requirements.

The function archives to terminated_secure_cloud_rentals on a best-effort basis (line 203), logging failures but proceeding with deletion from the active table regardless of archival success. If archiving fails due to database errors or constraints, the rental record is permanently lost after deletion.

Confirm this design is acceptable given your compliance and audit requirements. If the terminated table is critical for billing history or audit trails, consider failing the closure operation when archiving fails.


232-306: LGTM! Clean query functions with proper parameterization.

Both get_vip_rental_by_machine_id and update_vip_rental_metadata are well-implemented with proper SQL parameterization, appropriate filtering, and suitable logging levels.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
crates/basilica-api/src/api/routes/secure_cloud.rs (2)

279-286: Consider logging when VIP hourly_rate is missing or malformed.

Silent fallback to 0.0 could mask data integrity issues in VIP rentals. When the hourly rate cannot be parsed, a warning log would help diagnose issues with VIP rental data.

🔎 Proposed improvement
                     // VIP hourly rate from raw_response (marked up rate stored at insert time)
-                    let vip_hourly_rate = raw_response
+                    let vip_hourly_rate = match raw_response
                         .as_ref()
                         .and_then(|r| r.get("hourly_rate"))
                         .and_then(|v| v.as_str())
                         .and_then(|s| s.parse::<f64>().ok())
-                        .unwrap_or(0.0);
+                    {
+                        Some(rate) => rate,
+                        None => {
+                            tracing::warn!(
+                                "VIP rental {} missing or invalid hourly_rate in raw_response, defaulting to 0.0",
+                                rental_id
+                            );
+                            0.0
+                        }
+                    };

647-676: VIP stop guard correctly implemented.

The guard appropriately blocks user-initiated stops for VIP rentals with a clear error message.

Minor readability suggestion: consider using a named struct instead of tuple indices (rental.0, rental.2) to improve code clarity, though this follows the existing pattern in the codebase.

🔎 Optional: Use named struct for clarity
+    struct RentalOwnership {
+        user_id: String,
+        created_at: chrono::DateTime<Utc>,
+        is_vip: bool,
+    }
+
     // 1. Get rental and verify ownership
-    let rental: (String, chrono::DateTime<Utc>, bool) = sqlx::query_as(
+    let rental: RentalOwnership = sqlx::query_as(
         "SELECT user_id, created_at, is_vip
          FROM secure_cloud_rentals
          WHERE id = $1",
     )
     // ... fetch logic ...

-    if rental.0 != auth.user_id {
+    if rental.user_id != auth.user_id {
         // ...
     }

-    if rental.2 {
+    if rental.is_vip {
         // ...
     }

-    let duration = stop_time.signed_duration_since(rental.1);
+    let duration = stop_time.signed_duration_since(rental.created_at);
crates/basilica-aggregator/src/vip/csv.rs (2)

110-131: Consider validating CSV header columns.

The parser assumes a specific column order but doesn't validate the header row matches expectations. If the CSV column order differs, data will be silently parsed into wrong fields.

Optional enhancement to validate headers

Before the loop at line 117, add header validation:

fn parse_csv(&self, content: &str) -> Result<Vec<VipCsvRow>, DataSourceError> {
    let mut reader = csv::Reader::from_reader(content.as_bytes());
    
    // Validate headers
    let headers = reader.headers()
        .map_err(|e| DataSourceError::CsvParse(e.to_string()))?;
    let expected = vec![
        "vip_machine_id", "assigned_user", "active", "ssh_host", "ssh_port",
        "ssh_user", "gpu_type", "gpu_count", "region", "hourly_rate",
        "vcpu_count", "system_memory_gb", "notes"
    ];
    if headers.len() < expected.len() - 1 {  // -1 because notes is optional
        return Err(DataSourceError::CsvParse(
            format!("Expected at least {} columns, found {}", expected.len() - 1, headers.len())
        ));
    }
    
    let mut rows = Vec::new();
    // ... rest of method
}

156-165: Consider case-insensitive parsing for the active field.

The current implementation only accepts lowercase "true"/"false". CSV data from various sources might use "True", "FALSE", etc.

Optional: make parsing case-insensitive
-        let active = match active_str.as_str() {
-            "1" | "true" => true,
-            "0" | "false" => false,
+        let active = match active_str.to_lowercase().as_str() {
+            "1" | "true" => true,
+            "0" | "false" => false,
             other => {
                 return Err(DataSourceError::RowParse {
                     row: row_num,
                     message: format!("Invalid 'active' value '{}', expected 0/1", other),
                 })
             }
         };
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 42be52a and e847385.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • crates/basilica-aggregator/src/vip/csv.rs (1 hunks)
  • crates/basilica-api/src/api/routes/secure_cloud.rs (6 hunks)
  • crates/basilica-api/src/config/mod.rs (3 hunks)
  • crates/basilica-api/src/server.rs (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • crates/basilica-api/src/config/mod.rs
  • crates/basilica-api/src/server.rs
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-13T15:34:46.544Z
Learnt from: epappas
Repo: one-covenant/basilica PR: 184
File: crates/basilica-validator/src/rental/billing.rs:129-130
Timestamp: 2025-10-13T15:34:46.544Z
Learning: In the basilica-validator crate, rental_id is a String type (not UUID) throughout the system - in RentalInfo struct, database rentals.id column, and all rental operations. The persistence layer methods should accept &str for rental_id parameters, not &Uuid, since they immediately convert to String for SQL binding anyway.

Applied to files:

  • crates/basilica-api/src/api/routes/secure_cloud.rs
🧬 Code graph analysis (2)
crates/basilica-api/src/api/routes/secure_cloud.rs (2)
crates/basilica-api/src/api/query.rs (1)
  • gpu_type (21-25)
crates/basilica-api/src/api/middleware/balance.rs (1)
  • hourly_cost_with_markup (94-100)
crates/basilica-aggregator/src/vip/csv.rs (1)
crates/basilica-aggregator/src/vip/types.rs (1)
  • test_machine (25-41)
🔇 Additional comments (8)
crates/basilica-api/src/api/routes/secure_cloud.rs (3)

39-41: LGTM!

The type alias correctly extends RentalQueryRow with the new VIP-related fields, maintaining proper ordering that matches the SQL query.


178-196: LGTM!

The SQL query correctly fetches the new VIP fields. The LEFT JOIN is appropriate since VIP rentals may not have corresponding gpu_offerings entries.


359-359: LGTM!

The is_vip field is correctly propagated to the response item.

crates/basilica-aggregator/src/vip/csv.rs (5)

9-27: LGTM! Clean error model and trait design.

The error types comprehensively cover all failure modes (file I/O, CSV parsing, row-level errors, S3, UTF-8), and the async trait is properly bounded for concurrent use.


174-176: The previous concern about these fields has been resolved.

The doc comment at line 137 correctly documents only notes as optional, and the implementation correctly treats vcpu_count and system_memory_gb as required fields using get_col(). The code and documentation are now consistent.


182-217: Excellent error handling for field parsing.

Each numeric field parse includes the row number and specific field name in error messages, making CSV validation issues easy to diagnose. Using Decimal for hourly_rate is the correct choice for financial data.


245-298: LGTM! Comprehensive mock implementation for testing.

The mock provides all necessary mutations (add, remove, update, set) and correctly uses Arc<RwLock<>> for thread-safe access. Cloning the vec on fetch (line 296) is appropriate for test scenarios where data volumes are small.


64-66: The region configuration is correct. The AWS SDK for Rust uses aws_sdk_s3::config::Region::new() to create Region instances for the config builder method, and this pattern works correctly with both static and dynamic region strings.

Prevent silent mutations of critical VIP rental fields by detecting when
immutable data (assigned user, connection info, display specs) changes
mid-rental. Conflicting rentals are now closed rather than updated.

Also adds terminated rental lookup to block ID reuse with different data,
fixes billing finalization to block on errors, and properly rolls back
DB inserts when billing registration fails.
Only finalize+close when assigned_user changes; update SSH info, GPU
specs, and other metadata in-place to avoid premature billing stops.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
crates/basilica-aggregator/src/vip/rental_ops.rs (1)

76-78: Update documentation to reflect actual execution order.

The docstring states this is "typically called from basilica-api after billing registration succeeds" (line 77), but the poller actually calls insert_vip_rental first at line 623, then attempts billing registration at line 634. If billing fails, delete_vip_rental is called to rollback (lines 642-650). The documentation should reflect this DB-first, billing-second pattern with rollback on failure.

crates/basilica-aggregator/src/vip/cache.rs (1)

114-204: Fix lifetime bugs in JSON fallbacks and use checked numeric conversions.

Lines 119, 148: unwrap_or(&row.ip_address.clone().unwrap_or_default()) and similar patterns take references to temporaries, causing lifetime issues.

Lines 124, 153, 157, 161: Numeric casts using as u16 / as u32 can silently truncate large values from JSON.

Additionally, line 205 logs count (total rows fetched) rather than entries.len() (rows successfully inserted), which can differ if some rows fail validation.

crates/basilica-aggregator/src/vip/poller.rs (2)

103-199: Track all sheet rows separately from validation/processing outcomes to prevent incorrect stale removal.

seen_ids (line 125) is only populated for machines that both validate successfully (line 151) AND don't error during processing. This means CSV rows that are present but temporarily invalid or encounter transient processing errors will be treated as "stale" at line 167 and have their rentals closed—a destructive operation triggered by data quality issues rather than actual removal from the sheet.

Additionally, poll_success is hardcoded to true at line 184 even when process_machine (line 155) or remove_stale_rental (line 170) return errors, making failures invisible to monitoring.


201-207: Avoid logging full VipCsvRow; it contains user identifiers.

Line 205: tracing::warn!(row = ?row, ...) logs the complete VipCsvRow, which includes assigned_user—a user identifier that should not appear in logs. Log only vip_machine_id instead, consistent with lines 209 and 213.

🧹 Nitpick comments (1)
crates/basilica-aggregator/src/vip/poller.rs (1)

494-503: Refresh last_seen_at on every poll cycle for accurate heartbeat tracking.

When a cached rental's metadata hasn't changed (line 503), last_seen_at is not updated, making the field stale for observability and debugging. While this doesn't affect functional behavior (stale removal is based on seen_ids, not last_seen_at), updating last_seen_at on every poll cycle would provide accurate "last seen" timestamps for monitoring.

🔎 Proposed fix: Always refresh last_seen_at
         // Normalize cached hourly rate to marked-up value
         if cached.display.hourly_rate != expected_marked_up {
             let mut updated = cached.clone();
             updated.display.hourly_rate = expected_marked_up;
             updated.last_seen_at = Utc::now();
             self.cache.insert(updated).await;
+        } else {
+            // Refresh last_seen_at even when metadata matches
+            let mut updated = cached.clone();
+            updated.last_seen_at = Utc::now();
+            self.cache.insert(updated).await;
         }

         return Ok(ProcessOutcome::Seen);
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e847385 and abc78fb.

📒 Files selected for processing (3)
  • crates/basilica-aggregator/src/vip/cache.rs (1 hunks)
  • crates/basilica-aggregator/src/vip/poller.rs (1 hunks)
  • crates/basilica-aggregator/src/vip/rental_ops.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-13T15:34:46.544Z
Learnt from: epappas
Repo: one-covenant/basilica PR: 184
File: crates/basilica-validator/src/rental/billing.rs:129-130
Timestamp: 2025-10-13T15:34:46.544Z
Learning: In the basilica-validator crate, rental_id is a String type (not UUID) throughout the system - in RentalInfo struct, database rentals.id column, and all rental operations. The persistence layer methods should accept &str for rental_id parameters, not &Uuid, since they immediately convert to String for SQL binding anyway.

Applied to files:

  • crates/basilica-aggregator/src/vip/rental_ops.rs
  • crates/basilica-aggregator/src/vip/cache.rs
🧬 Code graph analysis (2)
crates/basilica-aggregator/src/vip/rental_ops.rs (1)
crates/basilica-aggregator/src/models.rs (1)
  • format_vip_machine_id (159-161)
crates/basilica-aggregator/src/vip/poller.rs (3)
crates/basilica-aggregator/src/models.rs (2)
  • format_vip_machine_id (159-161)
  • from (210-218)
crates/basilica-aggregator/src/vip/rental_ops.rs (5)
  • apply_markup (19-27)
  • close_vip_rental (173-259)
  • delete_vip_rental (144-169)
  • insert_vip_rental (78-140)
  • sqlx (89-124)
crates/basilica-aggregator/src/vip/csv.rs (1)
  • new (253-257)
🔇 Additional comments (13)
crates/basilica-aggregator/src/vip/rental_ops.rs (6)

18-27: LGTM! Proper overflow protection.

The markup calculation correctly uses checked_mul to prevent overflow and handles conversion failures appropriately.


49-74: LGTM!

The preparation logic correctly generates a rental ID, applies markup, and bundles all necessary data for insertion.


142-169: LGTM!

The rollback deletion correctly removes the rental without archiving, which is appropriate since the rental never fully started due to billing failure.


171-259: LGTM!

The three-step close operation (update → archive → delete) is properly structured. Best-effort archiving with non-fatal errors is a reasonable approach.


261-280: LGTM!

The query correctly filters for active VIP rentals and returns the necessary identifiers.


282-335: LGTM!

The metadata update correctly refreshes connection and display information. Note that hourly_rate is stored in raw_response but doesn't trigger billing recalculations—this appears intentional as rates are applied only at rental creation.

crates/basilica-aggregator/src/vip/cache.rs (2)

9-79: LGTM!

The cache API provides all necessary operations with proper async synchronization using RwLock.


210-230: LGTM!

The database row struct and Default implementation follow standard patterns.

crates/basilica-aggregator/src/vip/poller.rs (5)

21-31: LGTM!

The error enum provides appropriate variants for all failure modes in the poller lifecycle.


33-69: LGTM!

The stats and snapshot types provide clear tracking of poll outcomes and proper separation between DB and domain models.


71-101: LGTM!

The poller structure uses proper dependency injection, and the optional billing client enables graceful degradation when billing is unavailable.


236-284: LGTM!

The helper functions correctly apply markup, compare user assignments, and match metadata. The parse_decimal function handles multiple JSON representations gracefully.


697-807: LGTM!

The billing integration correctly sequences operations (billing finalization before DB close) and handles errors appropriately. Blocking closes on billing finalization ensures users are charged correctly, with retries on subsequent poll cycles.

@itzlambda itzlambda merged commit d32e2df into main Dec 21, 2025
17 checks passed
@itzlambda itzlambda deleted the managed-machines-sync branch December 21, 2025 17:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants