Skip to content

[FEATURE] Dynamic Port Addition for Active Rentals #218

@epappas

Description

@epappas

System Architecture: Dynamic Port Addition for Active Rentals

Version: 1.0
Date: 2025-10-29
Status: Design Phase
Criticality: HIGH - Affects Miner Payouts and Billing Accuracy

Executive Summary

This document outlines the system architecture for implementing dynamic port addition to active GPU rentals in the Basilica network. This feature enables users to add additional port mappings to running containers after initial rental creation, addressing evolving application requirements without rental termination.

Key Constraint: Docker containers do not support runtime port modification. This architecture addresses this fundamental limitation through container recreation with comprehensive state preservation.

Financial Impact: This feature directly affects billing accuracy and miner payouts. All implementations must maintain strict audit trails and idempotency guarantees.

Table of Contents

  1. Current System Analysis
  2. Technical Constraints & Challenges
  3. Proposed Architecture
  4. Component Design
  5. Data Model Changes
  6. API Specification
  7. Implementation Phases
  8. Security Considerations
  9. Testing Strategy
  10. Rollout Plan
  11. Appendices

1. Current System Analysis

1.1 Existing Architecture

Port Management Flow:

User Request → API Validation → Validator → SSH to Node → Docker Run with Ports → Container Inspect → Store Mappings

Key Components:

  • API Layer (basilica-api): Handles rental creation via POST /rentals
  • Validator Layer (basilica-validator): Orchestrates container deployment
  • Container Client: Executes Docker commands over SSH
  • Database: SQLite storing rental metadata (port mappings in JSON)
  • Billing System: Tracks resource usage via telemetry

Current Limitations:

  • Port mappings are immutable after container creation
  • No API endpoints for rental modification
  • No container update infrastructure
  • Port mappings stored only in container spec (not separately indexed)

1.2 Port Mapping Data Flow

StartRentalApiRequest.ports → PortMappingRequest[]
    ↓
ContainerSpec.ports → PortMapping[]
    ↓
docker run -p {host_port}:{container_port}/{protocol}
    ↓
docker inspect → Extract actual host ports
    ↓
Store in DB: rentals.container_spec (JSON) + user_rentals.port_mappings (JSON)

1.3 Billing Integration Points

Current Billing Events:

  • RentalStart: Initial rental creation
  • RentalEnd: Rental termination
  • Telemetry: Ongoing resource usage (60s intervals)

Missing:

  • ResourceUpdate: Configuration changes (needed for port additions)
  • Port-specific audit events
  • Cost recalculation for configuration changes

2. Technical Constraints & Challenges

2.1 Docker Limitations

Core Constraint: Docker does not support adding ports to running containers.

Technical Reasons:

  • Port bindings are set in container's network namespace at creation
  • Network namespace is immutable once created
  • Requires container recreation with new configuration

Implications:

  • Brief service interruption required (estimated 5-10 seconds)
  • State preservation critical for stateful applications
  • Data persistence must be guaranteed

2.2 State Preservation Requirements

Must Preserve:

  • Container filesystem state (application data)
  • Running processes (if possible via checkpoint/restore)
  • Environment variables
  • Volume mounts
  • Network connections (will be momentarily interrupted)

Cannot Preserve:

  • Active TCP connections (will be reset)
  • In-memory application state (unless checkpointed)
  • Process IDs

2.3 Distributed System Challenges

Consistency Requirements:

  1. Database state must match actual container configuration
  2. Billing events must accurately reflect configuration changes
  3. Monitoring systems must track state transitions
  4. Multiple concurrent updates must be serialized

Failure Scenarios:

  • Database update succeeds, container recreation fails
  • Container recreation succeeds, billing notification fails
  • Network partition during operation
  • Node unavailability during update

2.4 Financial Accuracy Requirements

Billing Integrity:

  • All port additions must be audited
  • Idempotency required to prevent duplicate charges
  • Configuration changes must be timestamped precisely
  • Rollback must not create billing inconsistencies

Miner Payout Impact:

  • Port additions do not affect per-hour resource pricing
  • But audit trail ensures accurate rental lifecycle tracking
  • Telemetry must continue uninterrupted

3. Proposed Architecture

3.1 Solution Overview

Approach: Container Recreation with State Preservation

Strategy:

  1. Validate port addition request
  2. Create checkpoint of container state
  3. Stop existing container gracefully
  4. Create new container with updated port configuration
  5. Restore state from checkpoint
  6. Update all tracking systems (database, billing, monitoring)
  7. Provide rollback on any failure

3.2 High-Level Flow

┌─────────────┐
│   User      │
│  Request    │
└──────┬──────┘
       │ POST /rentals/:id/ports
       ↓
┌─────────────────────┐
│  API Gateway        │
│  - Auth validation  │
│  - Ownership check  │
│  - Port validation  │
└──────┬──────────────┘
       │
       ↓
┌─────────────────────┐
│  Port Update        │
│  Coordinator        │
│  - Idempotency      │
│  - State machine    │
│  - Rollback logic   │
└──────┬──────────────┘
       │
       ├─→ Database: Begin transaction
       │
       ├─→ Container Manager: Prepare update
       │   ├─→ SSH to node
       │   ├─→ Checkpoint state (if supported)
       │   ├─→ Stop container
       │   ├─→ Create new container with new ports
       │   └─→ Verify health
       │
       ├─→ Billing: Emit ResourceUpdate event
       │
       └─→ Database: Commit transaction

3.3 Component Responsibilities

API Layer:

  • Endpoint: POST /rentals/:id/ports
  • Request validation (port ranges, conflicts)
  • Authentication & authorization
  • Rate limiting

Port Update Coordinator (NEW):

  • Orchestrates multi-step update process
  • Maintains operation state machine
  • Implements idempotency
  • Handles rollback on failures

Container Manager:

  • Executes container recreation
  • Manages state preservation
  • Validates port availability on host
  • Reports success/failure

Billing Integration:

  • Emits ResourceUpdate events
  • Records audit trail
  • Ensures idempotency

Database Layer:

  • Stores port addition operations
  • Tracks operation state
  • Maintains audit log

3.4 State Machine

┌─────────┐
│ Pending │
└────┬────┘
     │
     ↓
┌─────────────────┐
│ Validating      │ (Check port conflicts, availability)
└────┬────────────┘
     │
     ↓
┌─────────────────┐
│ Checkpointing   │ (Optional: Save container state)
└────┬────────────┘
     │
     ↓
┌─────────────────┐
│ Stopping        │ (Graceful stop with timeout)
└────┬────────────┘
     │
     ↓
┌─────────────────┐
│ Recreating      │ (Create container with new ports)
└────┬────────────┘
     │
     ↓
┌─────────────────┐
│ Verifying       │ (Health check new container)
└────┬────────────┘
     │
     ↓
┌─────────────────┐
│ Completed       │
└─────────────────┘

     │ (Any failure)
     ↓
┌─────────────────┐
│ RollingBack     │
└────┬────────────┘
     │
     ↓
┌─────────────────┐
│ Failed          │
└─────────────────┘

4. Component Design

4.1 API Handler

File: crates/basilica-api/src/api/routes/rentals.rs

/// Add ports to an existing rental
/// POST /rentals/:id/ports
pub async fn add_rental_ports(
    State(state): State<AppState>,
    owned_rental: OwnedRental,
    Json(request): Json<AddPortsRequest>,
) -> Result<Json<PortUpdateResponse>>

Responsibilities:

  • Validate request format
  • Check ownership via OwnedRental extractor
  • Validate port specifications (range, protocol)
  • Check for duplicate ports
  • Delegate to validator client
  • Return operation status

Validation Rules:

  • Port range: 1-65535
  • No duplicate ports in request
  • No conflicts with existing ports
  • Protocol must be "tcp" or "udp"
  • Maximum 10 ports per request (configurable)

4.2 Port Update Coordinator

File: crates/basilica-validator/src/rental/port_updater.rs (NEW)

pub struct PortUpdateCoordinator {
    container_client: Arc<ContainerClient>,
    persistence: Arc<dyn Persistence>,
    billing_client: Option<Arc<BillingClient>>,
    metrics: Arc<dyn MetricsRecorder>,
}

impl PortUpdateCoordinator {
    /// Add ports to existing rental with full state preservation
    pub async fn add_ports(
        &self,
        rental_id: &str,
        new_ports: Vec<PortMapping>,
        initiated_by: &str,
    ) -> Result<PortUpdateResult, RentalError>
}

Responsibilities:

  • Implement idempotency (check for duplicate operations)
  • Execute state machine transitions
  • Coordinate container recreation
  • Emit billing events
  • Handle rollback on failures
  • Update database transactionally

Key Methods:

  • add_ports(): Main entry point
  • validate_port_availability(): Check host port conflicts
  • checkpoint_container(): Save container state (optional)
  • recreate_container(): Execute container recreation
  • verify_container_health(): Ensure container is healthy
  • rollback(): Restore previous state on failure

4.3 Container Recreation Logic

File: crates/basilica-validator/src/rental/container_client.rs (EXTEND)

impl ContainerClient {
    /// Recreate container with updated port configuration
    /// Preserves volumes and environment variables
    pub async fn recreate_with_new_ports(
        &self,
        rental: &Rental,
        additional_ports: Vec<PortMapping>,
    ) -> Result<ContainerInfo, ContainerError>
}

Process:

  1. Extract current container configuration via docker inspect
  2. Merge existing ports with new ports
  3. Stop container gracefully (docker stop -t 30)
  4. Remove stopped container (docker rm)
  5. Create new container with merged ports (docker run)
  6. Verify container is running (docker inspect)
  7. Return updated container info

Data Preservation:

  • Use existing volume mounts (automatically preserved)
  • Reuse same environment variables
  • Maintain same container image
  • Preserve same working directory

4.4 Database Schema Extension

Migration: 002_add_port_operations.sql

-- Track port addition operations for audit and idempotency
CREATE TABLE IF NOT EXISTS port_operations (
    id TEXT PRIMARY KEY,
    rental_id TEXT NOT NULL,
    operation_type TEXT NOT NULL,  -- 'add_ports', 'remove_ports'
    operation_state TEXT NOT NULL, -- State machine state
    requested_ports TEXT NOT NULL, -- JSON array of PortMapping
    initiated_by TEXT NOT NULL,    -- User ID
    idempotency_key TEXT UNIQUE NOT NULL,
    started_at TEXT NOT NULL,
    completed_at TEXT,
    error_message TEXT,
    rollback_attempted INTEGER DEFAULT 0,
    metadata TEXT,                 -- JSON for additional context
    FOREIGN KEY (rental_id) REFERENCES rentals(id) ON DELETE CASCADE
);

CREATE INDEX IF NOT EXISTS idx_port_ops_rental ON port_operations(rental_id);
CREATE INDEX IF NOT EXISTS idx_port_ops_state ON port_operations(operation_state);
CREATE INDEX IF NOT EXISTS idx_port_ops_idempotency ON port_operations(idempotency_key);

-- Extend rentals table to track port history
ALTER TABLE rentals ADD COLUMN port_history TEXT DEFAULT '[]';

4.5 Billing Integration

Event Generation:

// Emit ResourceUpdate event
let event = BillingEvent {
    event_type: "resource_update",
    entity_type: "rental",
    entity_id: rental_id.clone(),
    user_id: Some(user_id),
    event_data: json!({
        "update_type": "port_addition",
        "previous_ports": previous_ports,
        "new_ports": new_ports,
        "added_ports": added_ports,
        "operation_id": operation_id,
        "timestamp": Utc::now(),
        "initiated_by": user_id,
    }),
    metadata: json!({
        "rental_state": "active",
        "downtime_seconds": downtime_duration.as_secs(),
    }),
    created_by: user_id.clone(),
    created_at: Utc::now(),
};

billing_client.record_event(event).await?;

Idempotency Key:

fn generate_port_update_idempotency_key(
    rental_id: &str,
    ports: &[PortMapping],
    timestamp: DateTime<Utc>,
) -> String {
    let ports_hash = hash_ports(ports); // Deterministic hash
    format!("{}:port_update:{}:{}", rental_id, ports_hash, timestamp.timestamp())
}

5. Data Model Changes

5.1 New Types

AddPortsRequest:

pub struct AddPortsRequest {
    pub ports: Vec<PortMappingRequest>,
    pub reason: Option<String>, // Optional user-provided reason
}

PortUpdateResponse:

pub struct PortUpdateResponse {
    pub operation_id: String,
    pub rental_id: String,
    pub status: PortUpdateStatus,
    pub added_ports: Vec<PortMapping>,
    pub estimated_downtime_seconds: u32,
    pub message: String,
}

pub enum PortUpdateStatus {
    Pending,
    InProgress,
    Completed,
    Failed,
}

PortOperation:

pub struct PortOperation {
    pub id: String,
    pub rental_id: String,
    pub operation_type: PortOperationType,
    pub operation_state: PortOperationState,
    pub requested_ports: Vec<PortMapping>,
    pub initiated_by: String,
    pub idempotency_key: String,
    pub started_at: DateTime<Utc>,
    pub completed_at: Option<DateTime<Utc>>,
    pub error_message: Option<String>,
    pub rollback_attempted: bool,
    pub metadata: serde_json::Value,
}

5.2 Extended Types

Rental (extended):

pub struct Rental {
    // ... existing fields ...
    pub port_history: Vec<PortHistoryEntry>,
}

pub struct PortHistoryEntry {
    pub timestamp: DateTime<Utc>,
    pub operation_type: String,
    pub ports: Vec<PortMapping>,
    pub initiated_by: String,
}

6. API Specification

6.1 Add Ports Endpoint

Endpoint: POST /rentals/:rental_id/ports

Authentication: Required (JWT or API Key)

Authorization: User must own the rental

Request Body:

{
  "ports": [
    {
      "container_port": 8080,
      "host_port": 0,
      "protocol": "tcp"
    },
    {
      "container_port": 9090,
      "host_port": 9090,
      "protocol": "tcp"
    }
  ],
  "reason": "Adding Prometheus endpoint for monitoring"
}

Success Response (202 Accepted):

{
  "operation_id": "op-uuid",
  "rental_id": "rental-uuid",
  "status": "in_progress",
  "added_ports": [
    {
      "container_port": 8080,
      "host_port": 45123,
      "protocol": "tcp"
    },
    {
      "container_port": 9090,
      "host_port": 9090,
      "protocol": "tcp"
    }
  ],
  "estimated_downtime_seconds": 8,
  "message": "Port addition in progress. Container will be recreated with new ports."
}

Error Responses:

400 Bad Request:

{
  "error": "bad_request",
  "message": "Invalid port specification",
  "details": "Port 80 is already mapped in this rental"
}

404 Not Found:

{
  "error": "not_found",
  "message": "Rental not found or you don't have access"
}

409 Conflict:

{
  "error": "conflict",
  "message": "Port operation already in progress for this rental",
  "operation_id": "op-existing-uuid"
}

503 Service Unavailable:

{
  "error": "service_unavailable",
  "message": "Unable to connect to node. Please try again later."
}

6.2 Get Port Operation Status

Endpoint: GET /rentals/:rental_id/ports/operations/:operation_id

Response:

{
  "operation_id": "op-uuid",
  "rental_id": "rental-uuid",
  "status": "completed",
  "started_at": "2025-10-29T12:00:00Z",
  "completed_at": "2025-10-29T12:00:08Z",
  "added_ports": [...],
  "message": "Port addition completed successfully"
}

6.3 List Port Operations

Endpoint: GET /rentals/:rental_id/ports/operations

Response:

{
  "operations": [
    {
      "operation_id": "op-uuid-1",
      "operation_type": "add_ports",
      "status": "completed",
      "started_at": "2025-10-29T12:00:00Z",
      "completed_at": "2025-10-29T12:00:08Z"
    }
  ],
  "total": 1
}

7. Implementation Phases

Phase 1: Foundation (Week 1-2, ~40 hours)

Objectives:

  • Database schema migration
  • Core types and data structures
  • Basic validation logic

Tasks:

  1. Create migration 002_add_port_operations.sql (2h)
  2. Define new types in basilica-sdk (4h)
    • AddPortsRequest
    • PortUpdateResponse
    • PortOperation
  3. Implement validation logic (8h)
    • Port range validation
    • Conflict detection
    • Protocol validation
  4. Add database repository methods (8h)
    • create_port_operation()
    • update_operation_state()
    • get_port_operation()
  5. Write unit tests for validation (8h)
  6. Code review and refinement (10h)

Deliverables:

  • Migration script tested and applied
  • Types defined and documented
  • Validation tests passing
  • Repository methods tested

Success Criteria:

  • All tests pass
  • Migration runs successfully on test database
  • Code review approved

Phase 2: Container Recreation Logic (Week 3-4, ~50 hours)

Objectives:

  • Implement container recreation with state preservation
  • Add rollback capability
  • Integrate with existing container client

Tasks:

  1. Extend ContainerClient with recreation method (12h)
    • recreate_with_new_ports()
    • Extract existing configuration
    • Merge port configurations
  2. Implement state preservation (10h)
    • Volume preservation
    • Environment variable retention
    • Configuration backup
  3. Add rollback logic (10h)
    • Restore original container on failure
    • Cleanup partial changes
  4. Implement health verification (6h)
    • Post-recreation health checks
    • Timeout handling
  5. Write integration tests (12h)
    • Test container recreation
    • Test state preservation
    • Test rollback scenarios

Deliverables:

  • Container recreation working end-to-end
  • Rollback tested and verified
  • Integration tests passing

Success Criteria:

  • Container can be recreated with new ports
  • Volumes and environment preserved
  • Rollback works correctly on failures
  • All tests pass

Phase 3: Coordinator & Orchestration (Week 5-6, ~45 hours)

Objectives:

  • Implement port update coordinator
  • State machine implementation
  • Idempotency handling

Tasks:

  1. Create PortUpdateCoordinator struct (8h)
    • State machine transitions
    • Operation lifecycle management
  2. Implement idempotency (8h)
    • Key generation
    • Duplicate detection
    • Operation resumption
  3. Add transaction management (10h)
    • Database transaction wrapper
    • Atomic state updates
    • Consistency guarantees
  4. Implement coordinator methods (12h)
    • add_ports()
    • validate_port_availability()
    • execute_update()
    • handle_rollback()
  5. Write unit tests (7h)

Deliverables:

  • Coordinator fully functional
  • Idempotency working
  • State machine tested

Success Criteria:

  • Coordinator can orchestrate full update flow
  • Idempotency prevents duplicate operations
  • State machine handles all transitions
  • Tests verify all paths

Phase 4: API Integration (Week 7, ~30 hours)

Objectives:

  • Add API endpoints
  • Integrate with auth system
  • Add rate limiting

Tasks:

  1. Implement API handler (8h)
    • add_rental_ports()
    • Request validation
    • Response formatting
  2. Add endpoint to router (2h)
  3. Integrate with ownership validation (4h)
  4. Add rate limiting (4h)
    • Per-user limits
    • Per-rental limits
  5. Write API tests (8h)
    • Happy path
    • Error cases
    • Auth failures
  6. Update OpenAPI spec (4h)

Deliverables:

  • API endpoint functional
  • Auth and rate limiting working
  • API tests passing
  • Documentation updated

Success Criteria:

  • Endpoint accessible and secured
  • All validation working
  • Rate limiting prevents abuse
  • Tests cover all scenarios

Phase 5: Billing Integration (Week 8, ~25 hours)

Objectives:

  • Emit billing events
  • Ensure audit trail
  • Verify idempotency

Tasks:

  1. Implement event emission (6h)
    • ResourceUpdate event
    • Event data structure
  2. Add idempotency key generation (4h)
  3. Test billing integration (8h)
    • Event recording
    • Duplicate prevention
    • Audit trail verification
  4. Add monitoring metrics (4h)
    • Operation counters
    • Success/failure rates
    • Duration histograms
  5. Documentation (3h)

Deliverables:

  • Billing events emitted correctly
  • Audit trail complete
  • Metrics visible

Success Criteria:

  • Events recorded in billing system
  • Idempotency prevents duplicates
  • Metrics show operation status
  • Audit trail verifiable

Phase 6: CLI Support (Week 9, ~20 hours)

Objectives:

  • Add CLI commands
  • User-friendly interface
  • Interactive mode

Tasks:

  1. Add port-add command (8h)
    • Argument parsing
    • API client integration
    • Output formatting
  2. Add operation status command (4h)
  3. Add interactive port selection (4h)
  4. Write CLI tests (4h)

Deliverables:

  • CLI commands functional
  • Help documentation complete
  • Tests passing

Success Criteria:

  • Commands work as expected
  • Error messages clear
  • Interactive mode intuitive

Phase 7: Testing & Hardening (Week 10-11, ~50 hours)

Objectives:

  • Comprehensive testing
  • Performance validation
  • Security audit

Tasks:

  1. End-to-end testing (16h)
    • Full flow testing
    • Multi-rental scenarios
    • Concurrent operations
  2. Failure scenario testing (12h)
    • Network failures
    • Node unavailability
    • Database failures
  3. Performance testing (8h)
    • Operation duration
    • Resource usage
    • Concurrent load
  4. Security audit (8h)
    • Authorization checks
    • Input validation
    • Injection prevention
  5. Bug fixes and refinement (6h)

Deliverables:

  • All tests passing
  • Performance benchmarks met
  • Security review complete

Success Criteria:

  • 95%+ test coverage
  • <10 second average operation time
  • Zero critical security issues
  • All edge cases handled

Phase 8: Documentation & Deployment (Week 12, ~20 hours)

Objectives:

  • Complete documentation
  • Deployment preparation
  • Monitoring setup

Tasks:

  1. API documentation (6h)
    • OpenAPI spec
    • Examples
    • Error codes
  2. User guide (6h)
    • CLI examples
    • API examples
    • Troubleshooting
  3. Deployment checklist (4h)
    • Migration steps
    • Rollback plan
    • Monitoring setup
  4. Training materials (4h)

Deliverables:

  • Documentation complete
  • Deployment ready
  • Monitoring configured

Success Criteria:

  • Documentation clear and complete
  • Deployment checklist validated
  • Team trained on new feature

8. Security Considerations

8.1 Authentication & Authorization

Requirements:

  • User must be authenticated (JWT or API key)
  • User must own the rental (verified via OwnedRental)
  • Rate limiting applied per user and per rental

Implementation:

// Ownership validation via extractor
pub async fn add_rental_ports(
    owned_rental: OwnedRental, // Validates ownership
    Json(request): Json<AddPortsRequest>,
) -> Result<Json<PortUpdateResponse>>

8.2 Input Validation

Port Validation:

  • Range: 1-65535
  • No privileged ports (<1024) without admin flag
  • No duplicate container ports
  • No conflicts with existing mappings
  • Protocol must be "tcp" or "udp"

Request Limits:

  • Maximum 10 ports per request
  • Maximum 50 total ports per rental
  • Rate limit: 5 requests per hour per user

8.3 Command Injection Prevention

Docker Command Sanitization:

// Always use parameterized commands
fn build_docker_run_command(config: &ContainerConfig) -> Command {
    let mut cmd = Command::new("docker");
    cmd.arg("run");

    // Add ports individually to prevent injection
    for port in &config.ports {
        cmd.arg("-p");
        cmd.arg(format!("{}:{}:{}",
            port.host_port,
            port.container_port,
            port.protocol
        ));
    }

    cmd
}

8.4 Resource Limits

Prevent Abuse:

  • Maximum concurrent port operations per user: 3
  • Operation timeout: 5 minutes
  • Retry limit: 3 attempts
  • Cooldown period: 1 minute between operations

8.5 Audit Trail

Log All Operations:

  • Operation initiation (who, when, what)
  • State transitions
  • Failures and errors
  • Rollback attempts
  • Final outcome

Retention:

  • Port operations: 90 days
  • Audit events: 1 year
  • Billing events: Permanent

9. Testing Strategy

9.1 Unit Tests

Coverage Areas:

  • Validation logic (port ranges, conflicts, protocols)
  • Idempotency key generation
  • State machine transitions
  • Port merging logic
  • Error handling

Test Cases:

  • Valid port addition request
  • Invalid port ranges
  • Duplicate ports
  • Protocol validation
  • Conflict detection

9.2 Integration Tests

Coverage Areas:

  • Database operations
  • Container recreation
  • Rollback mechanisms
  • Billing event emission
  • API endpoint functionality

Test Cases:

  • Successful port addition end-to-end
  • Container recreation with state preservation
  • Rollback on container failure
  • Rollback on billing failure
  • Idempotency (duplicate requests)

9.3 System Tests

Coverage Areas:

  • Full API to container flow
  • Multiple concurrent operations
  • Failure scenarios
  • Performance under load

Test Cases:

  • Add ports to active rental
  • Multiple users adding ports concurrently
  • Network failure during operation
  • Node unavailability
  • Database connection loss

9.4 Performance Tests

Metrics:

  • Average operation duration: <10 seconds
  • P95 operation duration: <15 seconds
  • P99 operation duration: <30 seconds
  • Throughput: 100 operations/minute per validator
  • Concurrent operations: 10 simultaneous

Load Testing:

  • 1000 port additions in 1 hour
  • 10 concurrent port additions
  • Sustained load over 24 hours

9.5 Security Tests

Test Cases:

  • Unauthorized access attempts
  • Cross-user rental access
  • Command injection attempts
  • Port range boundary conditions
  • Rate limit enforcement
  • SQL injection in operation metadata

9.6 Failure Recovery Tests

Scenarios:

  • Container creation fails
  • Network disconnects during operation
  • Node becomes unavailable
  • Database transaction fails
  • Billing service unavailable

Expected Outcomes:

  • Graceful rollback to previous state
  • No orphaned containers
  • Database consistency maintained
  • Clear error messages to user

10. Rollout Plan

10.1 Phase 1: Internal Testing (Week 13)

Scope:

  • Deploy to internal testing environment
  • Manual testing by engineering team
  • Validation of all use cases

Success Criteria:

  • All test cases pass
  • No critical bugs
  • Performance targets met

10.2 Phase 2: Beta Release (Week 14-15)

Scope:

  • Enable for select beta users
  • Monitor closely for issues
  • Gather user feedback

Selection Criteria:

  • Trusted users with non-critical workloads
  • Users who have expressed interest
  • Maximum 50 beta users

Monitoring:

  • Operation success/failure rates
  • Average duration metrics
  • User feedback collection
  • Billing accuracy verification

10.3 Phase 3: Limited Release (Week 16-17)

Scope:

  • Enable for 10% of user base
  • Continue monitoring and refinement
  • Document common issues

Rollout Strategy:

  • Gradual percentage increase
  • User opt-in option
  • Clear communication about feature

Rollback Criteria:

  • 5% failure rate

  • Critical security issue
  • Billing inconsistencies
  • Performance degradation

10.4 Phase 4: General Availability (Week 18+)

Scope:

  • Enable for all users
  • Full documentation published
  • Support team trained

Communication:

  • Blog post announcement
  • Email to all users
  • Documentation updates
  • CLI help text updates

Ongoing Monitoring:

  • Daily operation metrics review
  • Weekly success rate analysis
  • Monthly performance review
  • Quarterly security audit

11. Appendices

Appendix A: Error Codes

Code Message HTTP Status Retry
PORT_INVALID_RANGE Port must be between 1-65535 400 No
PORT_ALREADY_MAPPED Port already exists in rental 400 No
PORT_CONFLICT Port conflicts with existing mapping 409 No
PORT_LIMIT_EXCEEDED Maximum ports limit reached 400 No
OPERATION_IN_PROGRESS Port operation already running 409 Yes
RENTAL_NOT_ACTIVE Rental must be in active state 400 No
NODE_UNAVAILABLE Cannot connect to node 503 Yes
CONTAINER_RECREATION_FAILED Failed to recreate container 500 Yes
ROLLBACK_FAILED Failed to restore previous state 500 No
BILLING_UNAVAILABLE Billing service unreachable 503 Yes

Appendix B: Metrics

Operation Metrics:

  • port_operations_total{status, rental_id} - Counter
  • port_operation_duration_seconds{status} - Histogram
  • port_operation_failures_total{reason} - Counter
  • active_port_operations - Gauge

Container Metrics:

  • container_recreations_total{status} - Counter
  • container_recreation_duration_seconds - Histogram
  • container_rollbacks_total{success} - Counter

Billing Metrics:

  • billing_events_emitted_total{event_type} - Counter
  • billing_event_failures_total - Counter

Appendix C: Configuration

New Configuration Options:

[rental.port_operations]
enabled = true
max_ports_per_rental = 50
max_ports_per_request = 10
max_concurrent_operations_per_user = 3
operation_timeout_seconds = 300
container_stop_timeout_seconds = 30
health_check_retries = 3
health_check_interval_seconds = 2
rate_limit_per_hour = 5
cooldown_seconds = 60

Appendix D: Glossary

  • Port Mapping: Association between a host port and container port
  • Container Recreation: Process of stopping and creating a new container with updated configuration
  • State Preservation: Maintaining application data during container recreation
  • Idempotency: Property ensuring duplicate requests have same effect as single request
  • Rollback: Restoring previous state after operation failure
  • Port Conflict: Situation where requested port is already in use
  • Operation State Machine: Defined states and transitions for port operations

Appendix E: Implementation Risks

Risk Probability Impact Mitigation
Container data loss during recreation Medium High Use volumes for all persistent data
Billing inconsistency Low Critical Implement robust idempotency
Extended downtime Low Medium Optimize recreation process, add timeouts
Port conflicts Medium Low Pre-validate availability
Node unavailability Medium Medium Implement retry with backoff
Concurrent operation conflicts Medium Medium Use database locks
Rollback failures Low High Test extensively, add manual recovery

Appendix F: Success Metrics

Operational Metrics:

  • Success rate: >95%
  • Average duration: <10 seconds
  • P95 duration: <15 seconds
  • Rollback rate: <5%
  • Billing accuracy: 100%

User Satisfaction:

  • Feature adoption: >30% of active users
  • Support tickets: <10 per week
  • User rating: >4/5

Business Metrics:

  • Zero billing disputes related to port operations
  • Zero miner payout errors
  • Uptime maintained at >99.9%

Metadata

Metadata

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions