Skip to content

Conversation

@itzlambda
Copy link
Collaborator

@itzlambda itzlambda commented Dec 17, 2025

Revenue is now attributed by telemetry event timestamp instead of rental completion time, enabling accurate revenue tracking for ongoing rentals.

Key changes:

  • Updated refresh_miner_revenue_summary SQL function for progressive recognition
  • Simplified API to use date strings (YYYY-MM-DD) instead of timestamps
  • Added validation requiring period_end to be before today

Summary by CodeRabbit

  • New Features

    • Progressive miner revenue recognition now derives from telemetry events using event-timestamp attribution
    • Validation to detect and prevent overlapping period calculations, with human-readable overlap details
  • Bug Fixes

    • Corrected average revenue/hour and rental-duration calculations to aggregate per-rental (fixes event-weighted bias)
    • Increased cost precision to reduce rounding drift
  • Breaking Changes

    • API inputs changed from RFC3339 timestamps to YYYY-MM-DD date strings; period end must be before today

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 17, 2025

Walkthrough

Replaces end-time attribution with progressive revenue recognition from telemetry events, changes gRPC date inputs to YYYY-MM-DD strings with strict validation, adds overlap detection to prevent duplicate period calculations, updates storage/repository APIs, and adds/fixes multiple DB migrations and a CLI script.

Changes

Cohort / File(s) Summary
Protocol & gRPC
crates/basilica-protocol/proto/billing.proto, crates/basilica-billing/src/grpc/billing_service.rs
Request fields period_start/period_end switched from google.protobuf.Timestamp to string (YYYY-MM-DD). gRPC handler now validates presence/format, converts to day-boundary UTC timestamps, enforces ordering and end-before-today, and errors early on invalid input.
Database migrations (revenue logic & fixes)
crates/basilica-billing/migrations/033_progressive_miner_revenue.sql, crates/basilica-billing/migrations/034_fix_revenue_avg_calculations.sql, crates/basilica-billing/migrations/035_fix_total_cost_precision.sql
Added/updated billing.refresh_miner_revenue_summary(...) implementations to use telemetry_processed events for progressive recognition, append-only inserts, per-rental averaging fixes, grouping by (node_id, validator_id, miner_hotkey, miner_uid), and adjusted rentals.total_cost precision to DECIMAL(20,8). Each migration includes detailed function comments and diagnostics.
Storage & Domain Layer
crates/basilica-billing/src/storage/miner_revenue.rs, crates/basilica-billing/src/storage/mod.rs, crates/basilica-billing/src/domain/miner_revenue.rs
Introduced OverlappingPeriod struct and find_overlapping_periods() repository method; implemented SQL query mapping for overlaps; domain service now checks for overlapping periods and returns a formatted InvalidState error when overlaps exist before proceeding. Exported OverlappingPeriod via storage mod.
Tooling / Scripts
scripts/billing/refresh_miner_revenue.sh
Updated help text, examples, and validation messages to require YYYY-MM-DD date inputs and note end-date-before-today constraint; adjusted usage examples accordingly.

Sequence Diagram

sequenceDiagram
    actor Client
    participant gRPC as gRPC Handler\n(billing_service)
    participant Domain as Domain Service\n(MinerRevenueService)
    participant Repo as Repository\n(MinerRevenueRepository)
    participant DB as Database

    Client->>gRPC: RefreshMinerRevenueSummary(period_start, period_end as YYYY-MM-DD)
    gRPC->>gRPC: Validate format & presence\nConvert to day-boundary UTC timestamps\nCheck period_start <= period_end and end < today
    alt invalid input
        gRPC-->>Client: Error (invalid format/order/missing)
    else valid input
        gRPC->>Domain: refresh_summary(period_start_ts, period_end_ts, version)
        Domain->>Repo: find_overlapping_periods(period_start_ts, period_end_ts)
        Repo->>DB: Query overlapping periods
        DB-->>Repo: Matching periods (rows)
        Repo-->>Domain: Vec<OverlappingPeriod>
        alt overlaps found
            Domain-->>Client: Error (InvalidState with overlap details)
        else no overlaps
            Domain->>Repo: refresh_summary_for_period(...)
            Repo->>DB: Call billing.refresh_miner_revenue_summary(...)
            DB->>DB: Aggregate telemetry_processed events\nInsert append-only rows into miner_revenue_summary
            DB-->>Repo: Inserted row count
            Repo-->>Domain: Success(result)
            Domain-->>Client: Success (row count)
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Pay special attention to:
    • SQL in migrations 033/034: aggregation correctness, event filters, grouping keys, and append-only semantics.
    • Precision change in migration 035 and any downstream effects on calculations or casting.
    • Repository SQL for find_overlapping_periods: correctness of overlap logic and row counting.
    • gRPC date parsing/conversion and edge cases around timezone/day-boundaries and "end before today" rule.

Possibly related PRs

Suggested reviewers

  • distributedstatemachine

Poem

🐰 I nibble dates in YYYY-MM-DD,
I tally hops from telemetry's spree,
I spot overlapping burrows before they bloom,
I tuck progressive carrots into the revenue room,
Hooray — neat rows and no more gloom! 🌿

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat(billing): implement progressive miner revenue recognition' accurately summarizes the main objective: enabling progressive (telemetry-based) revenue recognition instead of completion-time-based attribution.
Docstring Coverage ✅ Passed Docstring coverage is 80.49% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch progressive-revenue

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Update miner revenue summary to attribute revenue by telemetry event
timestamp instead of rental end_time. This enables accurate revenue
calculation for ongoing rentals that span time window boundaries.
@itzlambda itzlambda marked this pull request as ready for review December 17, 2025 09:39
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
crates/basilica-protocol/proto/billing.proto (1)

276-279: API type inconsistency between request messages.

RefreshMinerRevenueSummaryRequest uses string for period_start/period_end, while GetMinerRevenueSummaryRequest (lines 293-294) still uses google.protobuf.Timestamp. This inconsistency may confuse API consumers expecting a uniform interface.

If intentional (e.g., refresh operates on whole days while get supports arbitrary timestamps), consider adding a comment explaining the rationale.

crates/basilica-billing/migrations/033_progressive_miner_revenue.sql (1)

69-78: Consider indexing for query performance.

The WHERE clause filters on multiple JSON fields (incremental_cost, credits_deducted, timestamp). For large billing_events tables, consider adding a partial or expression index:

CREATE INDEX CONCURRENTLY idx_billing_events_telemetry_timestamp 
ON billing.billing_events ((event_data->>'timestamp')::timestamptz)
WHERE event_type = 'telemetry_processed' 
  AND event_data->>'incremental_cost' IS NOT NULL;

This would significantly improve query performance for progressive revenue calculations.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3118197 and e3b3841.

⛔ Files ignored due to path filters (1)
  • crates/basilica-protocol/src/gen/basilica.billing.v1.rs is excluded by !**/gen/**
📒 Files selected for processing (3)
  • crates/basilica-billing/migrations/033_progressive_miner_revenue.sql (1 hunks)
  • crates/basilica-billing/src/grpc/billing_service.rs (1 hunks)
  • crates/basilica-protocol/proto/billing.proto (1 hunks)
🔇 Additional comments (3)
crates/basilica-billing/src/grpc/billing_service.rs (1)

1055-1095: Date validation and conversion logic looks correct.

The validation flow properly:

  1. Checks presence of both fields
  2. Parses dates with explicit format
  3. Validates period_end < today
  4. Converts to day-boundary timestamps
  5. Validates period_start <= period_end

The use of .expect("valid time") is safe since the time components are compile-time constants.

crates/basilica-billing/migrations/033_progressive_miner_revenue.sql (2)

53-55: Status aggregation uses current rental state, not state during the period.

completed_rentals and failed_rentals are determined by r.status at query execution time, not the rental's status during [p_period_start, p_period_end). A rental that was active during the period but completed afterward will be counted as "completed" for that historical window.

If this is intentional (revenue categorization should reflect final outcome), the current approach is fine. Otherwise, consider joining to a status history table or clarifying this behavior in the function comment.


21-28: No idempotency protection for repeated calls with the same period.

The function is append-only (as documented), but calling it twice with the same p_period_start/p_period_end will create duplicate summary rows. If this is a reconciliation job that might be retried, consider either:

  1. Adding a unique constraint on (node_id, validator_id, miner_hotkey, miner_uid, period_start, period_end, computation_version), or
  2. Adding an ON CONFLICT DO NOTHING clause, or
  3. Checking for existing rows before inserting

If duplicates are acceptable (e.g., computed_at differentiates snapshots), this is fine as-is.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (1)
crates/basilica-billing/migrations/033_progressive_miner_revenue.sql (1)

68-68: Previously flagged: Unvalidated UUID and timestamp casts.

These casts were already flagged in the previous review:

  • Line 68: be.entity_id::uuid
  • Lines 73-74: (be.event_data->>'timestamp')::timestamptz

Both casts can cause runtime failures with malformed data and should be addressed per the previous review's recommendations.

Also applies to: 73-74

🧹 Nitpick comments (1)
crates/basilica-billing/migrations/033_progressive_miner_revenue.sql (1)

61-61: Potential NULL in average hourly rate calculation.

If base_price_per_gpu or gpu_count are NULL, the multiplication yields NULL and AVG will exclude those rows. This may be intentional, but consider making it explicit.

If you want to handle NULLs explicitly:

-        AVG(r.base_price_per_gpu * r.gpu_count) AS avg_hourly_rate,
+        AVG(COALESCE(r.base_price_per_gpu, 0) * COALESCE(r.gpu_count, 0)) AS avg_hourly_rate,

Or filter out NULLs:

     WHERE be.event_type = 'telemetry_processed'
       ...
+      AND r.base_price_per_gpu IS NOT NULL
+      AND r.gpu_count IS NOT NULL
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e3b3841 and 223d1a7.

📒 Files selected for processing (1)
  • crates/basilica-billing/migrations/033_progressive_miner_revenue.sql (1 hunks)
🔇 Additional comments (2)
crates/basilica-billing/migrations/033_progressive_miner_revenue.sql (2)

13-21: LGTM!

The function signature and variable declaration are appropriate for the progressive revenue recognition use case. Input validation is handled at the API layer as indicated by the related context.


85-88: LGTM!

The return logic using GET DIAGNOSTICS and ROW_COUNT is the standard pattern for returning the number of inserted rows.

Prevents calculating revenue for periods that overlap with existing
calculations to ensure data integrity.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
crates/basilica-billing/src/domain/miner_revenue.rs (1)

82-107: Potential TOCTOU race between overlap check and insert.

The overlap check and the subsequent refresh_summary_for_period call are not atomic. If multiple processes run concurrently, overlapping summaries could be inserted between these two operations.

If this endpoint is invoked by a single admin or scheduled job, this is acceptable. Otherwise, consider adding a database-level constraint or advisory lock in the stored function to enforce non-overlapping periods atomically.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 223d1a7 and 45148aa.

📒 Files selected for processing (3)
  • crates/basilica-billing/src/domain/miner_revenue.rs (1 hunks)
  • crates/basilica-billing/src/storage/miner_revenue.rs (3 hunks)
  • crates/basilica-billing/src/storage/mod.rs (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
crates/basilica-billing/src/storage/miner_revenue.rs (1)
crates/basilica-billing/src/domain/miner_revenue.rs (1)
  • new (32-34)
🔇 Additional comments (4)
crates/basilica-billing/src/storage/mod.rs (1)

24-27: LGTM!

The export of OverlappingPeriod follows the existing pattern and properly exposes the new type for use by the domain layer.

crates/basilica-billing/src/storage/miner_revenue.rs (3)

42-49: LGTM!

The OverlappingPeriod struct is appropriately designed with correct types for the SQL result mapping (i64 for COUNT(*) result).


87-94: LGTM!

The trait method signature follows the established pattern and properly expresses the overlap detection contract.


418-452: LGTM!

The interval overlap detection logic is correct: period_start <= $2 AND period_end >= $1 correctly identifies all existing periods that overlap with the requested [period_start, period_end] range. Parameter binding order matches the SQL placeholders, and the error handling follows the established pattern.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
scripts/autoscaler/rollout.sh (1)

41-41: Update the help text to match the new default namespace.

The help text still references the old default value "basilica-system", but the actual default was changed to "basilica-autoscaler" on line 7.

Apply this diff to fix the help text:

-            echo "  --namespace NAMESPACE     Kubernetes namespace (default: basilica-system)"
+            echo "  --namespace NAMESPACE     Kubernetes namespace (default: basilica-autoscaler)"
orchestrator/k8s/plugins/nvidia-device-plugin.yml (1)

1-91: Inconsistency: PR objectives don't match file changes.

The PR description mentions "progressive miner revenue recognition," "SQL function," "billing gRPC/proto," and "YYYY-MM-DD date strings," but this file contains Kubernetes configuration for the NVIDIA device plugin. This appears to be either:

  1. The wrong file included in the PR
  2. The wrong PR description
  3. Leftover test data

Please verify that this file should be part of this PR.

crates/basilica-autoscaler/src/crd/scaling_policy.rs (1)

170-208: Breaking change in SecureCloudTemplate is scoped but requires documentation.

datacenter_id is now required within secureCloud, but since secureCloud itself is optional, existing ScalingPolicy resources without this section remain unaffected. However, any configuration that specifies secureCloud must now include datacenterId or deserialization will fail.

Existing manifests in the repository have been properly updated (scaling-policy.yaml, scaling-policy.template.yaml, crds.yaml all include datacenterId). No breaking change documentation was found in CHANGELOG.md—add a migration note documenting this requirement for external users.

♻️ Duplicate comments (7)
crates/basilica-billing/migrations/033_progressive_miner_revenue.sql (6)

22-44: Overlap detection doesn't fully prevent duplicates.

While the PR adds overlap detection at the service layer, the append-only semantics still allow exact duplicates when called multiple times with identical (period_start, period_end, computation_version). The overlap check prevents overlapping periods but not identical periods.

Consider adding a unique constraint and ON CONFLICT DO NOTHING:

-- Add unique constraint (in this or a follow-up migration):
ALTER TABLE billing.miner_revenue_summary 
ADD CONSTRAINT miner_revenue_summary_unique 
UNIQUE (node_id, validator_id, miner_hotkey, miner_uid, 
        period_start, period_end, computation_version);

-- Then modify INSERT:
INSERT INTO billing.miner_revenue_summary (...)
SELECT ...
ON CONFLICT (node_id, validator_id, miner_hotkey, miner_uid, 
             period_start, period_end, computation_version) 
DO NOTHING;

68-68: Unvalidated UUID cast can cause runtime failures.

The cast be.entity_id::uuid will fail the entire function if entity_id contains malformed UUID strings. Consider defensive validation:

JOIN billing.rentals r ON r.rental_id = CASE 
    WHEN be.entity_id ~ '^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$' 
    THEN be.entity_id::uuid 
    ELSE NULL 
END

Or add upstream data integrity constraints to ensure entity_id is always a valid UUID.

Can you verify whether upstream code guarantees entity_id is always a valid UUID string, or if defensive handling is needed here?


49-49: Unvalidated INTEGER cast for miner_uid can fail.

Lines 49 and 83 cast r.metadata->>'miner_uid' to INTEGER without validation. Malformed or non-numeric values will cause the function to fail.

Add defensive handling:

-- In SELECT clause (line 49):
NULLIF(r.metadata->>'miner_uid', '')::INTEGER AS miner_uid,

-- Add WHERE filter to exclude invalid values:
AND r.metadata->>'miner_uid' ~ '^\d+$'

-- Update GROUP BY (line 83) to match:
NULLIF(r.metadata->>'miner_uid', '')::INTEGER

Also applies to: 83-83


57-59: Unvalidated DECIMAL casts for revenue metrics.

Lines 57 and 59 cast JSON fields to DECIMAL without validation. While incremental_cost has a NULL check (line 70), malformed numeric data will still cause failures. The gpu_hours field has no NULL check.

Add validation to the WHERE clause:

WHERE be.event_type = 'telemetry_processed'
  AND be.event_data->>'incremental_cost' IS NOT NULL
  AND be.event_data->>'incremental_cost' ~ '^\d+\.?\d*$'
  AND be.event_data->'usage_metrics'->>'gpu_hours' IS NOT NULL
  AND be.event_data->'usage_metrics'->>'gpu_hours' ~ '^\d+\.?\d*$'
  AND (be.event_data->>'credits_deducted')::boolean = true

71-71: Unvalidated boolean cast can fail.

Line 71 casts credits_deducted to boolean without checking if the field exists or is valid. Missing or malformed data will cause failures.

Use a safer pattern:

-- Replace:
AND (be.event_data->>'credits_deducted')::boolean = true

-- With:
AND be.event_data->>'credits_deducted' IS NOT NULL
AND lower(be.event_data->>'credits_deducted') IN ('true', 't', '1')

73-74: Unvalidated timestamp cast can cause failures.

Lines 73-74 cast be.event_data->>'timestamp' to timestamptz without validation. Malformed timestamp strings will fail the entire function.

Add defensive handling:

-- Add to WHERE clause before the comparisons:
AND be.event_data->>'timestamp' IS NOT NULL
-- The cast will still fail on invalid formats, so consider:
AND be.event_data->>'timestamp' ~ '^\d{4}-\d{2}-\d{2}'

Or rely on upstream data validation to ensure timestamps are always well-formed.

Can you verify whether upstream validation ensures event_data->>'timestamp' is always a valid timestamp string?

crates/basilica-api/src/api/routes/deployments/handlers.rs (1)

359-368: Same capacity validation concern applies to scale operations.

Similar to the deployment creation flow (lines 74-82), scale operations now proceed despite insufficient capacity. Users may scale up successfully per the API but observe that desired != ready replicas persist if the autoscaler cannot provision capacity.

This is less critical than deployment creation since scale operations are typically monitored more closely, but consider:

  • Returning a 202 Accepted instead of 200 OK when capacity is insufficient
  • Including a warning field in the response
  • Documenting this behavior for API consumers
🧹 Nitpick comments (32)
crates/basilica-api/src/api/routes/secure_cloud.rs (2)

162-173: Consider parallelizing the refresh calls for better latency.

The refresh loop iterates sequentially over all active rentals. For users with many active rentals, this could add noticeable latency to the list endpoint. Consider using futures::future::join_all or futures::stream::FuturesUnordered to refresh in parallel.

-    // 2. Refresh active rentals from provider to get latest IP/status
-    for (rental_id, status) in &rental_ids {
-        if status == "running" || status == "provisioning" || status == "pending" {
-            if let Err(e) = state.aggregator_service.get_deployment(rental_id).await {
-                tracing::debug!(
-                    "Failed to refresh rental {} from provider (may be expected): {}",
-                    rental_id,
-                    e
-                );
-            }
-        }
-    }
+    // 2. Refresh active rentals from provider to get latest IP/status (in parallel)
+    let refresh_futures: Vec<_> = rental_ids
+        .iter()
+        .filter(|(_, status)| {
+            status == "running" || status == "provisioning" || status == "pending"
+        })
+        .map(|(rental_id, _)| async {
+            if let Err(e) = state.aggregator_service.get_deployment(rental_id).await {
+                tracing::debug!(
+                    "Failed to refresh rental {} from provider (may be expected): {}",
+                    rental_id,
+                    e
+                );
+            }
+        })
+        .collect();
+    futures::future::join_all(refresh_futures).await;

164-164: Consider using constants for status values.

The status strings "running", "provisioning", and "pending" are used for comparison. Using constants or an enum would reduce the risk of typos and make it easier to refactor if status values change.

orchestrator/k8s/services/autoscaler/secrets.template.yaml (1)

1-5: Consider adding operational guidance to the template.

While the basic usage and security warnings are good, consider documenting:

  • How to obtain each secret value (e.g., where to find the K3s node token on the server)
  • Secret rotation procedures and recommended frequency
  • Which components/ServiceAccounts consume these secrets
  • Any RBAC configurations that must be in place

This would improve the operator experience and reduce the risk of misconfiguration.

scripts/cloud/modules/ecs-service/main.tf (1)

220-220: Minor formatting adjustment.

The closing brace placement is adjusted for more compact formatting. This is a cosmetic change with no functional impact.

orchestrator/k8s/services/autoscaler/servicemonitor.yaml (1)

15-16: Consider whether Prometheus needs discovery labels.

Depending on your Prometheus Operator configuration, the Prometheus instance may require specific labels on this ServiceMonitor to discover it (e.g., release: prometheus or prometheus: main). Verify that your Prometheus configuration will select this ServiceMonitor.

crates/basilica-cli/src/cli/handlers/deploy/templates/model_size.rs (3)

25-26: Consider matching case-insensitively in the regex.

The model name is already lowercased before matching, so the current pattern works. However, the doc comment mentions matching "7B" (uppercase), which could be misleading. Consider updating the doc comment to reflect that matching happens after lowercasing.


66-71: Verify memory rounding behavior for edge cases.

The formula (base as u32).div_ceil(8) * 8 rounds up to the nearest 8GB. For example, 16.8 → 16 → div_ceil(8) = 2 → 16GB. This is correct, but the test comment on line 146 shows (16+7)/8*8 = 16GB which doesn't match the actual div_ceil semantics. The tests pass, so the implementation is correct—consider updating the test comments to match the actual formula.


73-110: Model family heuristics are reasonable but may need maintenance.

The hardcoded model families will require updates as new models are released. Consider documenting this as a known limitation or extracting these to a configuration if they need frequent updates.

crates/basilica-sdk-python/python/basilica/__init__.py (3)

510-515: Consider moving uuid import to module level.

The import uuid statement inside the function is executed on each call when name is None. Moving it to the module level would be more conventional and slightly more efficient.

 import os
 import re
 import time
+import uuid
 from pathlib import Path

Then remove the inline imports at lines 512 and 658.


543-556: Significant code duplication between deploy_vllm and deploy_sglang storage specs.

The storage spec construction is nearly identical between both methods. Consider extracting a helper function to reduce duplication.

def _build_model_cache_storage_spec() -> StorageSpec:
    """Build storage spec for model caching."""
    return StorageSpec(
        persistent=PersistentStorageSpec(
            enabled=True,
            backend=StorageBackend.R2,
            bucket="",
            credentials_secret="basilica-r2-credentials",
            sync_interval_ms=1000,
            cache_size_mb=4096,
            mount_path="/root/.cache",
        )
    )

Also applies to: 682-695


558-571: GPU spec and resource construction is also duplicated.

Both methods construct GpuRequirementsSpec and ResourceRequirements identically. Consider a shared helper.

Also applies to: 697-710

crates/basilica-cli/src/cli/handlers/deploy/templates/sglang.rs (1)

230-247: Env var parsing is correct but could be more permissive.

The current implementation requires a value after =. Consider allowing empty values (e.g., DEBUG=) which are valid environment variables.

 fn parse_env_vars(env: &[String]) -> Result<HashMap<String, String>, DeployError> {
     let mut map = HashMap::new();
 
     for entry in env {
         let mut parts = entry.splitn(2, '=');
         let key = parts.next().ok_or_else(|| DeployError::Validation {
             message: format!("Invalid env var format: '{}'", entry),
         })?;
-        let value = parts.next().ok_or_else(|| DeployError::Validation {
-            message: format!("Invalid env var format: '{}'. Use KEY=VALUE", entry),
-        })?;
+        let value = parts.next().unwrap_or("");
 
         map.insert(key.to_string(), value.to_string());
     }
 
     Ok(map)
 }
examples/19_deploy_vllm_template.py (2)

23-23: Remove extraneous f prefix from strings without placeholders.

Lines 23 and 41 use f-strings but contain no interpolated values. Remove the f prefix for clarity.

-print(f"vLLM deployment ready!")
+print("vLLM deployment ready!")
-print(f"Llama-2-7B deployment ready!")
+print("Llama-2-7B deployment ready!")

Also applies to: 41-41


45-49: Consider adding error handling for cleanup.

If the first deployment.delete() fails, deployment2 will remain undeleted. For production examples, wrapping cleanup in try/finally is more robust—though acceptable for a quick demo.

examples/20_deploy_sglang_template.py (1)

23-23: Remove extraneous f prefix from strings without placeholders.

Same issue as the vLLM example—these f-strings have no interpolated values.

-print(f"SGLang deployment ready!")
+print("SGLang deployment ready!")
-print(f"Qwen 3B deployment ready!")
+print("Qwen 3B deployment ready!")

Also applies to: 41-41

crates/basilica-sdk-python/python/basilica/templates/model_size.py (1)

100-136: Model family fallbacks provide reasonable coverage.

The hardcoded model families handle common cases. Consider documenting that this list may need periodic updates as new models are released.

crates/basilica-cli/src/cli/handlers/deploy/templates/vllm.rs (1)

20-21: Pin the vLLM Docker image to a specific version instead of :latest.

Using :latest can result in non-reproducible deployments and unexpected breaking changes when the upstream image is updated. Pin to a stable release version to ensure consistent behavior across deployments.

crates/basilica-cli/src/cli/commands.rs (1)

736-762: Consider enforcing default model values in code.

The documentation comments specify default models ("Qwen/Qwen3-0.6B" for vLLM, "Qwen/Qwen2.5-0.5B-Instruct" for SGLang), but the model field is typed as Option<String>. The defaults aren't enforced at the type level, which means the handlers must implement this logic separately.

This is acceptable but increases the chance of inconsistency if the defaults are updated in one place but not the other.

crates/basilica-payments/src/blockchain/monitor.rs (2)

131-159: Consider removing async from constructors.

Both new() and with_config() are marked async but contain no .await operations. They also always succeed, making the Result wrapper unnecessary.

If API consistency with other async constructors in the codebase is not a concern, consider simplifying:

-    pub async fn new(
+    pub fn new(
         repos: PgRepos,
         endpoints: Vec<String>,
         metrics: Option<Arc<PaymentsMetricsSystem>>,
-    ) -> Result<Self> {
-        Ok(Self {
+    ) -> Self {
+        Self {
             repos,
             endpoints,
             metrics,
             config: MonitorConfig::default(),
-        })
+        }
     }

-    pub async fn with_config(
+    pub fn with_config(
         repos: PgRepos,
         endpoints: Vec<String>,
         metrics: Option<Arc<PaymentsMetricsSystem>>,
         config: MonitorConfig,
-    ) -> Result<Self> {
-        Ok(Self {
+    ) -> Self {
+        Self {
             repos,
             endpoints,
             metrics,
             config,
-        })
+        }
     }

This would require updating call sites to remove .await and ?.


239-257: Initial skip on resume doesn't persist to database.

When skipping ahead due to a large gap at lines 246-247, the new next_block value isn't persisted to the database immediately. If the service crashes before the first successful block scan in the main loop, the same skip decision would repeat on restart.

While the main loop's gap check at lines 340-345 handles persistence, consider persisting here for clearer state management and to avoid redundant warning logs on restart.

                             if gap > config.max_block_gap {
                                 warn!(
                                     "Block gap {} exceeds max_block_gap {}, skipping to current block {}. \
                                      WARNING: Deposits between blocks {} and {} may be missed!",
                                     gap, config.max_block_gap, current_chain_block, block, current_chain_block
                                 );
+                                // Persist the skip immediately to avoid repeating on restart
+                                if let Err(e) = repos
+                                    .update_last_scanned_block("payments_monitor", current_chain_block)
+                                    .await
+                                {
+                                    error!("Failed to persist initial skip progress: {}", e);
+                                }
                                 current_chain_block
crates/bittensor/src/connect/monitor.rs (2)

110-114: Unused parameter creates misleading API.

The client parameter is immediately discarded. The comment mentions "type consistency" but this adds confusion. If the block already contains the client reference internally, consider removing the parameter:

     async fn get_transfers_from_block(
-        client: &OnlineClient<PolkadotConfig>,
         block: subxt::blocks::Block<PolkadotConfig, OnlineClient<PolkadotConfig>>,
     ) -> Result<Vec<TransferInfo>> {
-        let _ = client; // Used for type consistency, block already contains client ref
         let mut transfers = Vec::new();

Then update callers to pass only block. Alternatively, if the parameter serves as documentation of the dependency, use #[allow(unused)] with a clearer comment.


90-94: Read lock held across multiple await points.

The read lock on client is held during network I/O in at_latest() and get_transfers_from_block(). While tokio::sync::RwLock handles this correctly, long-running reads will delay reconnect() since it needs write locks.

Consider cloning the client before the async operations if reconnect latency becomes an issue, though the current approach is acceptable for typical workloads.

crates/basilica-autoscaler/src/provisioner/ssh.rs (3)

336-340: Consider escaping remote_path to prevent shell interpretation.

If remote_path ever contains spaces or shell metacharacters, the command will break. While current usage appears safe with hardcoded paths, escaping would improve robustness.

+        // Escape remote_path for shell safety
+        let escaped_path = remote_path.replace('\'', "'\\''");
         let command = format!(
-            "echo '{}' | base64 -d | sudo tee {} > /dev/null && sudo chmod {} {}",
-            encoded, remote_path, mode, remote_path
+            "echo '{}' | base64 -d | sudo tee '{}' > /dev/null && sudo chmod {} '{}'",
+            encoded, escaped_path, mode, escaped_path
         );

452-455: Aggressive lock cleanup may corrupt dpkg state in rare cases.

Force-killing apt/dpkg and removing locks while operations are in progress could leave the package database inconsistent. The dpkg --configure -a helps recover, but consider logging a warning when this path is taken.

This is a pragmatic choice for automation reliability, but adding visibility when forced cleanup occurs would aid debugging:

 # Kill any stale apt/dpkg processes and clean up locks if still held
-sudo killall -9 apt apt-get dpkg 2>/dev/null || true
-sudo rm -f /var/lib/dpkg/lock-frontend /var/lib/apt/lists/lock 2>/dev/null || true
+if sudo fuser /var/lib/dpkg/lock-frontend >/dev/null 2>&1; then
+    echo "WARNING: Force-killing stale dpkg processes"
+    sudo killall -9 apt apt-get dpkg 2>/dev/null || true
+    sudo rm -f /var/lib/dpkg/lock-frontend /var/lib/apt/lists/lock 2>/dev/null || true
+fi
 sudo dpkg --configure -a 2>/dev/null || true

828-843: URL parsing is fragile for URLs with paths.

The simple trim_start_matches approach would break if api_server_url includes a path (e.g., https://host:6443/api). Consider extracting only host and port.

-        let api_host = api_server_url
-            .trim_start_matches("https://")
-            .trim_start_matches("http://");
+        let api_host = api_server_url
+            .trim_start_matches("https://")
+            .trim_start_matches("http://")
+            .split('/')
+            .next()
+            .unwrap_or(api_server_url);
crates/basilica-autoscaler/src/controllers/k8s_client.rs (2)

500-501: Consider making action a parameter instead of hardcoding.

The action field is hardcoded to "Scheduling", but this method may be used for various event types (e.g., eviction, scaling decisions). Consider adding action as a method parameter for flexibility.

     async fn create_pod_event(
         &self,
         pod_ns: &str,
         pod_name: &str,
         pod_uid: Option<&str>,
         event_type: &str,
         reason: &str,
         message: &str,
+        action: &str,
     ) -> Result<()>;

509-556: Implementation is correct; consider extracting shared logic.

The NodePool event creation mirrors the Pod event implementation. Both methods share significant code that could be extracted into a helper, but this is a minor refactor opportunity.

crates/basilica-autoscaler/src/api/secure_cloud.rs (2)

214-229: Consider returning Option<u32> from gpu_memory_gb() instead of defaulting to 0.

The accessor returns 0 when gpu_memory_gb_per_gpu is None, which could be confused with an actual 0 GB value. Returning Option<u32> would make the absence of data explicit.


589-634: Client-side filtering for get_rental may be inefficient at scale.

The implementation lists all rentals and filters client-side. If the number of active rentals grows large, this becomes inefficient. Consider requesting a direct GET endpoint from the API if this becomes a bottleneck.

crates/basilica-operator/src/controllers/user_deployment_controller.rs (1)

534-552: Consider logging serialization failures.

The GPU requirements annotation uses serde_json::to_string(gpu) but silently ignores errors with if let Ok(json). If serialization fails, the annotation will be missing without any indication of why.

Consider logging a warning when serialization fails to aid debugging:

-        if let Ok(json) = serde_json::to_string(gpu) {
-            annotations.insert("autoscaler.basilica.ai/gpu-requirements".to_string(), json);
-        }
+        match serde_json::to_string(gpu) {
+            Ok(json) => {
+                annotations.insert("autoscaler.basilica.ai/gpu-requirements".to_string(), json);
+            }
+            Err(e) => {
+                tracing::warn!(
+                    instance = %instance_name,
+                    error = %e,
+                    "Failed to serialize GPU requirements to JSON"
+                );
+            }
+        }
crates/basilica-autoscaler/src/provisioner/wireguard.rs (1)

163-164: Consider if public_key truly needs #[allow(dead_code)].

The public_key field is marked as dead code but is used in tests (lines 217). If it's genuinely used in tests or will be used in future functionality, consider removing the allow directive or clarifying its purpose.

crates/basilica-autoscaler/src/controllers/node_pool_controller.rs (1)

633-651: Consider logging peer count for observability.

The warning for empty peers is good, but logging the actual peer count when peers exist would help with debugging WireGuard connectivity issues.

 if peers.is_empty() {
     warn!(pool = %name, "No peers in registration response, WireGuard may not route traffic");
+} else {
+    debug!(pool = %name, peer_count = peers.len(), "Configuring WireGuard with peers");
 }
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 45148aa and 2adcefe.

⛔ Files ignored due to path filters (2)
  • Cargo.lock is excluded by !**/*.lock
  • crates/basilica-protocol/src/gen/basilica.billing.v1.rs is excluded by !**/gen/**
📒 Files selected for processing (70)
  • crates/basilica-aggregator/src/providers/mod.rs (1 hunks)
  • crates/basilica-api/src/api/routes/deployments/handlers.rs (2 hunks)
  • crates/basilica-api/src/api/routes/secure_cloud.rs (4 hunks)
  • crates/basilica-api/src/config/mod.rs (4 hunks)
  • crates/basilica-autoscaler/Cargo.toml (1 hunks)
  • crates/basilica-autoscaler/src/api/mod.rs (1 hunks)
  • crates/basilica-autoscaler/src/api/secure_cloud.rs (11 hunks)
  • crates/basilica-autoscaler/src/controllers/k8s_client.rs (3 hunks)
  • crates/basilica-autoscaler/src/controllers/node_pool_controller.rs (15 hunks)
  • crates/basilica-autoscaler/src/controllers/scaling_policy_controller.rs (14 hunks)
  • crates/basilica-autoscaler/src/crd/mod.rs (1 hunks)
  • crates/basilica-autoscaler/src/crd/node_pool.rs (6 hunks)
  • crates/basilica-autoscaler/src/crd/scaling_policy.rs (4 hunks)
  • crates/basilica-autoscaler/src/error.rs (4 hunks)
  • crates/basilica-autoscaler/src/lib.rs (2 hunks)
  • crates/basilica-autoscaler/src/metrics.rs (5 hunks)
  • crates/basilica-autoscaler/src/offering_matcher.rs (1 hunks)
  • crates/basilica-autoscaler/src/provisioner/ssh.rs (15 hunks)
  • crates/basilica-autoscaler/src/provisioner/wireguard.rs (4 hunks)
  • crates/basilica-autoscaler/src/runtime.rs (6 hunks)
  • crates/basilica-billing/migrations/033_progressive_miner_revenue.sql (1 hunks)
  • crates/basilica-billing/src/domain/miner_revenue.rs (1 hunks)
  • crates/basilica-billing/src/grpc/billing_service.rs (1 hunks)
  • crates/basilica-billing/src/storage/miner_revenue.rs (3 hunks)
  • crates/basilica-billing/src/storage/mod.rs (1 hunks)
  • crates/basilica-cli/Cargo.toml (1 hunks)
  • crates/basilica-cli/src/cli/commands.rs (1 hunks)
  • crates/basilica-cli/src/cli/handlers/deploy/mod.rs (2 hunks)
  • crates/basilica-cli/src/cli/handlers/deploy/templates/mod.rs (1 hunks)
  • crates/basilica-cli/src/cli/handlers/deploy/templates/model_size.rs (1 hunks)
  • crates/basilica-cli/src/cli/handlers/deploy/templates/sglang.rs (1 hunks)
  • crates/basilica-cli/src/cli/handlers/deploy/templates/vllm.rs (1 hunks)
  • crates/basilica-cli/src/cli/handlers/ssh_keys.rs (1 hunks)
  • crates/basilica-operator/src/controllers/user_deployment_controller.rs (2 hunks)
  • crates/basilica-payments/src/blockchain/mod.rs (1 hunks)
  • crates/basilica-payments/src/blockchain/monitor.rs (11 hunks)
  • crates/basilica-payments/src/config.rs (3 hunks)
  • crates/basilica-payments/src/main.rs (2 hunks)
  • crates/basilica-protocol/proto/billing.proto (1 hunks)
  • crates/basilica-sdk-python/python/basilica/__init__.py (2 hunks)
  • crates/basilica-sdk-python/python/basilica/templates/__init__.py (1 hunks)
  • crates/basilica-sdk-python/python/basilica/templates/model_size.py (1 hunks)
  • crates/bittensor/src/connect/monitor.rs (3 hunks)
  • examples/19_deploy_vllm_template.py (1 hunks)
  • examples/20_deploy_sglang_template.py (1 hunks)
  • orchestrator/ansible/playbooks/04-maintain/api-ssh-access.yml (1 hunks)
  • orchestrator/ansible/roles/api_ssh_access/defaults/main.yml (1 hunks)
  • orchestrator/ansible/roles/api_ssh_access/tasks/main.yml (1 hunks)
  • orchestrator/ansible/roles/api_ssh_access/templates/sudoers.j2 (1 hunks)
  • orchestrator/cloud/modules/vpc-peering/main.tf (2 hunks)
  • orchestrator/k8s/plugins/nvidia-device-plugin.yml (2 hunks)
  • orchestrator/k8s/services/autoscaler/.gitignore (1 hunks)
  • orchestrator/k8s/services/autoscaler/crds.yaml (7 hunks)
  • orchestrator/k8s/services/autoscaler/deployment.yaml (0 hunks)
  • orchestrator/k8s/services/autoscaler/namespaces.yaml (1 hunks)
  • orchestrator/k8s/services/autoscaler/scaling-policy.template.yaml (1 hunks)
  • orchestrator/k8s/services/autoscaler/scaling-policy.yaml (1 hunks)
  • orchestrator/k8s/services/autoscaler/secrets.template.yaml (1 hunks)
  • orchestrator/k8s/services/autoscaler/servicemonitor.yaml (1 hunks)
  • scripts/autoscaler/Dockerfile (1 hunks)
  • scripts/autoscaler/rollout.sh (1 hunks)
  • scripts/cloud/main.tf (1 hunks)
  • scripts/cloud/modules/ecs-service/main.tf (3 hunks)
  • scripts/cloud/modules/ecs-task/main.tf (3 hunks)
  • scripts/cloud/modules/ecs-task/variables.tf (2 hunks)
  • scripts/cloud/modules/networking/main.tf (2 hunks)
  • scripts/cloud/modules/networking/variables.tf (1 hunks)
  • scripts/cloud/modules/rds/main.tf (2 hunks)
  • scripts/cloud/terraform.tfvars.example (1 hunks)
  • scripts/cloud/variables.tf (1 hunks)
💤 Files with no reviewable changes (1)
  • orchestrator/k8s/services/autoscaler/deployment.yaml
✅ Files skipped from review due to trivial changes (4)
  • scripts/cloud/modules/rds/main.tf
  • orchestrator/k8s/services/autoscaler/.gitignore
  • scripts/cloud/modules/ecs-task/main.tf
  • scripts/cloud/modules/ecs-task/variables.tf
🚧 Files skipped from review as they are similar to previous changes (2)
  • crates/basilica-protocol/proto/billing.proto
  • crates/basilica-billing/src/grpc/billing_service.rs
🧰 Additional context used
🧬 Code graph analysis (16)
crates/basilica-payments/src/main.rs (1)
crates/basilica-payments/src/blockchain/monitor.rs (2)
  • from (113-120)
  • with_config (147-159)
crates/basilica-sdk-python/python/basilica/templates/model_size.py (1)
crates/basilica-autoscaler/src/api/secure_cloud.rs (1)
  • gpu_memory_gb (216-218)
crates/basilica-sdk-python/python/basilica/templates/__init__.py (1)
crates/basilica-sdk-python/python/basilica/templates/model_size.py (2)
  • GpuRequirements (22-27)
  • estimate_gpu_requirements (30-60)
crates/basilica-api/src/api/routes/secure_cloud.rs (2)
crates/basilica-api/src/api/extractors/ownership.rs (5)
  • sqlx (99-99)
  • sqlx (225-225)
  • sqlx (376-376)
  • sqlx (390-390)
  • sqlx (412-412)
crates/basilica-api/src/server.rs (2)
  • sqlx (912-912)
  • sqlx (1002-1002)
crates/basilica-cli/src/cli/handlers/deploy/templates/sglang.rs (4)
crates/basilica-cli/src/progress/mod.rs (2)
  • complete_spinner_and_clear (137-139)
  • create_spinner (98-109)
crates/basilica-cli/src/cli/handlers/deploy/templates/model_size.rs (1)
  • estimate_gpu_requirements (32-50)
crates/basilica-cli/src/cli/handlers/deploy/templates/vllm.rs (1)
  • model (162-177)
crates/basilica-cli/src/output/mod.rs (1)
  • json_output (12-17)
crates/basilica-autoscaler/src/crd/scaling_policy.rs (1)
crates/basilica-autoscaler/src/crd/node_pool.rs (1)
  • default_cloud_ssh_user (131-133)
crates/basilica-autoscaler/src/provisioner/wireguard.rs (1)
crates/basilica-autoscaler/src/api/secure_cloud.rs (1)
  • allowed_ips (334-341)
crates/basilica-sdk-python/python/basilica/__init__.py (2)
crates/basilica-sdk-python/python/basilica/deployment.py (5)
  • name (178-180)
  • Deployment (114-453)
  • _from_response (436-453)
  • wait_until_ready (293-372)
  • refresh (403-425)
crates/basilica-sdk-python/python/basilica/templates/model_size.py (1)
  • estimate_gpu_requirements (30-60)
crates/basilica-cli/src/cli/handlers/deploy/templates/mod.rs (3)
crates/basilica-cli/src/cli/handlers/deploy/templates/model_size.rs (1)
  • estimate_gpu_requirements (32-50)
crates/basilica-cli/src/cli/handlers/deploy/templates/sglang.rs (1)
  • handle_sglang_deploy (36-157)
crates/basilica-cli/src/cli/handlers/deploy/templates/vllm.rs (1)
  • handle_vllm_deploy (36-157)
crates/basilica-billing/src/storage/miner_revenue.rs (4)
crates/basilica-billing/src/domain/miner_revenue.rs (1)
  • new (32-34)
crates/basilica-billing/src/grpc/billing_service.rs (1)
  • new (51-98)
crates/basilica-billing/src/domain/rentals.rs (1)
  • new (352-356)
crates/basilica-billing/src/domain/billing_handlers.rs (1)
  • new (29-39)
examples/20_deploy_sglang_template.py (2)
crates/basilica-sdk-python/python/basilica/__init__.py (1)
  • deploy_sglang (600-737)
crates/basilica-sdk-python/python/basilica/deployment.py (1)
  • url (183-191)
crates/basilica-autoscaler/src/controllers/node_pool_controller.rs (2)
crates/basilica-autoscaler/src/offering_matcher.rs (3)
  • new (62-75)
  • new (241-251)
  • normalize_gpu_model (45-51)
crates/basilica-autoscaler/src/provisioner/k3s.rs (1)
  • sanitize_label_value (25-30)
crates/basilica-autoscaler/src/runtime.rs (2)
crates/basilica-autoscaler/src/controllers/node_pool_controller.rs (2)
  • new (59-74)
  • with_offering_selector (84-100)
crates/basilica-autoscaler/src/controllers/scaling_policy_controller.rs (1)
  • new (58-70)
crates/basilica-autoscaler/src/controllers/scaling_policy_controller.rs (1)
crates/basilica-autoscaler/src/offering_matcher.rs (5)
  • calculate_nodes_needed (643-663)
  • group_pending_pods_by_requirements (625-640)
  • new (62-75)
  • new (241-251)
  • from_template (117-131)
crates/bittensor/src/connect/monitor.rs (2)
crates/bittensor/src/connect/pool.rs (2)
  • new (34-42)
  • new (229-236)
crates/bittensor/src/connect/state.rs (2)
  • new (89-97)
  • new (373-381)
crates/basilica-payments/src/blockchain/monitor.rs (1)
crates/bittensor/src/connect/monitor.rs (1)
  • is_block_likely_pruned (222-231)
🪛 Checkov (3.2.334)
orchestrator/k8s/plugins/nvidia-device-plugin.yml

[medium] 12-91: Minimize the admission of root containers

(CKV_K8S_23)

🪛 Ruff (0.14.8)
examples/19_deploy_vllm_template.py

23-23: f-string without any placeholders

Remove extraneous f prefix

(F541)


41-41: f-string without any placeholders

Remove extraneous f prefix

(F541)

crates/basilica-sdk-python/python/basilica/__init__.py

520-520: Possible binding to all interfaces

(S104)


551-551: Possible hardcoded password assigned to argument: "credentials_secret"

(S106)


667-667: Possible binding to all interfaces

(S104)


690-690: Possible hardcoded password assigned to argument: "credentials_secret"

(S106)

examples/20_deploy_sglang_template.py

23-23: f-string without any placeholders

Remove extraneous f prefix

(F541)


41-41: f-string without any placeholders

Remove extraneous f prefix

(F541)

@itzlambda
Copy link
Collaborator Author

@coderabbitai resolve

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 17, 2025

✅ Actions performed

Comments resolved.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (7)
crates/basilica-billing/src/grpc/billing_service.rs (1)

1078-1088: Boundary semantics mismatch already flagged.

This concern about inclusive vs. exclusive upper bound semantics between Rust (23:59:59.999999) and SQL (< p_period_end) was raised in a previous review. Ensure alignment across proto comments, Rust, and SQL migration before merging.

crates/basilica-billing/migrations/033_progressive_miner_revenue.sql (6)

22-44: Missing duplicate prevention for append-only semantics.

The function claims "append-only, never update existing" behavior (line 22), but there's no mechanism to prevent duplicate inserts. If called multiple times with the same parameters, it will create duplicate rows for the same (node_id, validator_id, miner_hotkey, miner_uid, period_start, period_end) combination.

Consider adding a unique constraint and ON CONFLICT handling:

-- In the migration, add:
ALTER TABLE billing.miner_revenue_summary 
ADD CONSTRAINT miner_revenue_summary_unique 
UNIQUE (node_id, validator_id, miner_hotkey, miner_uid, period_start, period_end, computation_version);

-- Then modify the INSERT:
INSERT INTO billing.miner_revenue_summary (...)
SELECT ...
ON CONFLICT (node_id, validator_id, miner_hotkey, miner_uid, period_start, period_end, computation_version) 
DO NOTHING;

49-49: Unvalidated cast to INTEGER for miner_uid.

Line 49 casts r.metadata->>'miner_uid' to INTEGER without validation. If the metadata contains a malformed or non-numeric value, the entire function will fail.

Add defensive handling:

-        (r.metadata->>'miner_uid')::INTEGER AS miner_uid,
+        NULLIF(r.metadata->>'miner_uid', '')::INTEGER AS miner_uid,

And add a WHERE filter to exclude invalid values:

     WHERE be.event_type = 'telemetry_processed'
       AND be.event_data->>'incremental_cost' IS NOT NULL
+      AND r.metadata->>'miner_uid' ~ '^\d+$'

57-59: Unvalidated DECIMAL casts for revenue metrics.

Lines 57 and 59 cast JSON fields to DECIMAL without validation:

  • incremental_cost has a NULL check on line 70, but malformed data will still cause failures
  • gpu_hours has no NULL check and could be missing or malformed

Add validation to the WHERE clause:

     WHERE be.event_type = 'telemetry_processed'
       AND be.event_data->>'incremental_cost' IS NOT NULL
+      AND be.event_data->>'incremental_cost' ~ '^\d+\.?\d*$'
+      AND be.event_data->'usage_metrics'->>'gpu_hours' IS NOT NULL
+      AND be.event_data->'usage_metrics'->>'gpu_hours' ~ '^\d+\.?\d*$'

68-68: Unvalidated UUID cast.

Line 68 assumes all entity_id values are valid UUID strings with be.entity_id::uuid. A malformed value will fail the entire function.

Consider defensive handling:

-    JOIN billing.rentals r ON r.rental_id = be.entity_id::uuid
+    JOIN billing.rentals r ON r.rental_id = CASE 
+        WHEN be.entity_id ~* '^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$' 
+        THEN be.entity_id::uuid 
+        ELSE NULL 
+    END

Or ensure upstream data integrity guarantees.


71-71: Unvalidated boolean cast for credits_deducted.

Line 71 casts credits_deducted to boolean without first checking if the field exists or is valid. Malformed or missing data will cause the function to fail.

Add a NULL check and use a safer pattern:

     WHERE be.event_type = 'telemetry_processed'
       AND be.event_data->>'incremental_cost' IS NOT NULL
-      AND (be.event_data->>'credits_deducted')::boolean = true
+      AND be.event_data->>'credits_deducted' IS NOT NULL
+      AND LOWER(be.event_data->>'credits_deducted') IN ('true', 't', '1')

73-74: Unvalidated timestamp casts.

Lines 73-74 cast be.event_data->>'timestamp' to timestamptz without validation. A malformed timestamp value will fail the entire function.

Add validation to the WHERE clause before the cast:

     WHERE be.event_type = 'telemetry_processed'
       AND be.event_data->>'incremental_cost' IS NOT NULL
+      AND be.event_data->>'timestamp' IS NOT NULL
       AND (be.event_data->>'credits_deducted')::boolean = true

Consider wrapping the cast in a CASE statement or using a TRY/CATCH block to handle invalid timestamps gracefully.

🧹 Nitpick comments (1)
crates/basilica-billing/src/storage/miner_revenue.rs (1)

48-48: Consider using or documenting row_count.

The row_count field is populated from the SQL query but is not currently used in the error message formatting in domain/miner_revenue.rs (line 91-96). Consider either including it in the overlap details for better diagnostics (e.g., "...contains N summaries") or adding a doc comment explaining its intended future use.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2adcefe and eaaf205.

⛔ Files ignored due to path filters (1)
  • crates/basilica-protocol/src/gen/basilica.billing.v1.rs is excluded by !**/gen/**
📒 Files selected for processing (7)
  • crates/basilica-billing/migrations/033_progressive_miner_revenue.sql (1 hunks)
  • crates/basilica-billing/src/domain/miner_revenue.rs (1 hunks)
  • crates/basilica-billing/src/grpc/billing_service.rs (1 hunks)
  • crates/basilica-billing/src/storage/miner_revenue.rs (3 hunks)
  • crates/basilica-billing/src/storage/mod.rs (1 hunks)
  • crates/basilica-protocol/proto/billing.proto (1 hunks)
  • scripts/billing/refresh_miner_revenue.sh (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • crates/basilica-protocol/proto/billing.proto
🧰 Additional context used
🧬 Code graph analysis (1)
crates/basilica-billing/src/storage/miner_revenue.rs (2)
crates/basilica-billing/src/domain/miner_revenue.rs (1)
  • new (32-34)
crates/basilica-billing/src/grpc/billing_service.rs (1)
  • new (51-98)
🔇 Additional comments (7)
crates/basilica-billing/src/storage/mod.rs (1)

24-27: LGTM!

The OverlappingPeriod re-export correctly exposes the new type for use by the domain layer's overlap validation logic.

crates/basilica-billing/src/domain/miner_revenue.rs (1)

82-107: LGTM!

The overlap validation logic correctly prevents duplicate revenue calculations by querying for existing periods before proceeding. The human-readable error message with formatted overlap details provides useful diagnostic information.

scripts/billing/refresh_miner_revenue.sh (1)

5-6: LGTM!

Documentation and usage examples correctly updated to reflect the new date-based input format (YYYY-MM-DD). The script appropriately delegates validation to the gRPC service.

Also applies to: 31-31, 34-35, 40-40, 50-51

crates/basilica-billing/src/grpc/billing_service.rs (1)

1055-1076: LGTM!

The date validation is comprehensive:

  • Presence checks for required fields
  • Strict YYYY-MM-DD format parsing
  • Future date prevention (period_end >= today)
  • Logical ordering validation (period_start <= period_end)

Clear error messages aid debugging.

Also applies to: 1090-1095

crates/basilica-billing/src/storage/miner_revenue.rs (3)

42-49: LGTM!

The OverlappingPeriod struct captures all necessary information for overlap detection and error reporting. Including row_count is useful for understanding the magnitude of potential data impact.


87-93: LGTM!

The trait method signature is well-designed, accepting the period boundaries and returning a vector of overlapping periods for flexible handling by the caller.


419-452: LGTM!

The SQL overlap detection correctly uses the standard interval overlap formula (existing.start <= requested.end AND existing.end >= requested.start). The parameterized query is safe, and grouping by period boundaries with computed_at provides clear overlap identification.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
crates/basilica-billing/migrations/035_fix_total_cost_precision.sql (1)

15-16: Comment reinforces the precision inconsistency.

This comment states "8 decimal places to match CreditBalance precision," which conflicts with the earlier statement (lines 3-4) that "CreditBalance uses 6 decimal places internally." Ensure the comment accurately reflects the actual CreditBalance precision once verified.

🧹 Nitpick comments (1)
crates/basilica-billing/migrations/035_fix_total_cost_precision.sql (1)

12-13: Consider operational impact and migration safety.

The ALTER COLUMN operation may cause table locking on the billing.rentals table, which could impact production availability if the table is large. Consider:

  1. Performance: Test migration time on a production-sized dataset
  2. Downtime: Evaluate if this requires a maintenance window
  3. Rollback: Add a corresponding down-migration for rollback capability
  4. Data validation: Verify existing data remains valid after the precision change

Run the following script to estimate the table size and potential impact:

#!/bin/bash
# Description: Check for migration patterns and rental table usage

# Check for rollback/down migrations pattern
echo "=== Migration rollback patterns ==="
rg -n "down|rollback|revert" crates/basilica-billing/migrations/ -g "*.sql" | head -20

# Look for rental table size estimation queries or monitoring
echo -e "\n=== Rental table references ==="
rg -n "billing\.rentals" --type rust -C 3
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between eaaf205 and b9b187c.

📒 Files selected for processing (2)
  • crates/basilica-billing/migrations/034_fix_revenue_avg_calculations.sql (1 hunks)
  • crates/basilica-billing/migrations/035_fix_total_cost_precision.sql (1 hunks)
🔇 Additional comments (3)
crates/basilica-billing/migrations/034_fix_revenue_avg_calculations.sql (3)

1-10: Clear documentation of the bug fix rationale.

The header comments effectively explain the per-event weighting bug and the CTE-based solution. This provides valuable context for future maintainers.


54-82: Join and aggregation logic correctly implements per-rental averaging.

The INNER JOIN with billing_agg ensures only rentals with credited events in the period are counted. The FILTER (WHERE r.end_time IS NOT NULL) on avg_rental_duration_hours correctly excludes ongoing rentals from duration calculations. The GROUP BY clause properly includes all non-aggregated expressions.


89-102: Comprehensive function documentation.

The COMMENT ON FUNCTION clearly documents the append-only semantics, grouping keys, progressive recognition behavior, and the specific fix in this migration. This helps future developers understand the function's purpose and constraints.

The function correctly uses a CTE to ensure per-rental averaging rather than per-event weighting. While the SQL function itself doesn't enforce uniqueness (relying on append-only semantics), overlap detection is properly implemented in the application layer—the domain layer checks find_overlapping_periods() before invoking the function and rejects calculations with existing overlapping periods.

@itzlambda itzlambda merged commit 83f16b1 into main Dec 19, 2025
14 checks passed
@itzlambda itzlambda deleted the progressive-revenue branch December 19, 2025 06:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants