-
Notifications
You must be signed in to change notification settings - Fork 7
Description
System Architecture: Dynamic Port Addition for Active Rentals
Version: 1.0
Date: 2025-10-29
Status: Design Phase
Criticality: HIGH - Affects Miner Payouts and Billing Accuracy
Executive Summary
This document outlines the system architecture for implementing dynamic port addition to active GPU rentals in the Basilica network. This feature enables users to add additional port mappings to running containers after initial rental creation, addressing evolving application requirements without rental termination.
Key Constraint: Docker containers do not support runtime port modification. This architecture addresses this fundamental limitation through container recreation with comprehensive state preservation.
Financial Impact: This feature directly affects billing accuracy and miner payouts. All implementations must maintain strict audit trails and idempotency guarantees.
Table of Contents
- Current System Analysis
- Technical Constraints & Challenges
- Proposed Architecture
- Component Design
- Data Model Changes
- API Specification
- Implementation Phases
- Security Considerations
- Testing Strategy
- Rollout Plan
- Appendices
1. Current System Analysis
1.1 Existing Architecture
Port Management Flow:
User Request → API Validation → Validator → SSH to Node → Docker Run with Ports → Container Inspect → Store Mappings
Key Components:
- API Layer (
basilica-api): Handles rental creation viaPOST /rentals - Validator Layer (
basilica-validator): Orchestrates container deployment - Container Client: Executes Docker commands over SSH
- Database: SQLite storing rental metadata (port mappings in JSON)
- Billing System: Tracks resource usage via telemetry
Current Limitations:
- Port mappings are immutable after container creation
- No API endpoints for rental modification
- No container update infrastructure
- Port mappings stored only in container spec (not separately indexed)
1.2 Port Mapping Data Flow
StartRentalApiRequest.ports → PortMappingRequest[]
↓
ContainerSpec.ports → PortMapping[]
↓
docker run -p {host_port}:{container_port}/{protocol}
↓
docker inspect → Extract actual host ports
↓
Store in DB: rentals.container_spec (JSON) + user_rentals.port_mappings (JSON)
1.3 Billing Integration Points
Current Billing Events:
RentalStart: Initial rental creationRentalEnd: Rental terminationTelemetry: Ongoing resource usage (60s intervals)
Missing:
ResourceUpdate: Configuration changes (needed for port additions)- Port-specific audit events
- Cost recalculation for configuration changes
2. Technical Constraints & Challenges
2.1 Docker Limitations
Core Constraint: Docker does not support adding ports to running containers.
Technical Reasons:
- Port bindings are set in container's network namespace at creation
- Network namespace is immutable once created
- Requires container recreation with new configuration
Implications:
- Brief service interruption required (estimated 5-10 seconds)
- State preservation critical for stateful applications
- Data persistence must be guaranteed
2.2 State Preservation Requirements
Must Preserve:
- Container filesystem state (application data)
- Running processes (if possible via checkpoint/restore)
- Environment variables
- Volume mounts
- Network connections (will be momentarily interrupted)
Cannot Preserve:
- Active TCP connections (will be reset)
- In-memory application state (unless checkpointed)
- Process IDs
2.3 Distributed System Challenges
Consistency Requirements:
- Database state must match actual container configuration
- Billing events must accurately reflect configuration changes
- Monitoring systems must track state transitions
- Multiple concurrent updates must be serialized
Failure Scenarios:
- Database update succeeds, container recreation fails
- Container recreation succeeds, billing notification fails
- Network partition during operation
- Node unavailability during update
2.4 Financial Accuracy Requirements
Billing Integrity:
- All port additions must be audited
- Idempotency required to prevent duplicate charges
- Configuration changes must be timestamped precisely
- Rollback must not create billing inconsistencies
Miner Payout Impact:
- Port additions do not affect per-hour resource pricing
- But audit trail ensures accurate rental lifecycle tracking
- Telemetry must continue uninterrupted
3. Proposed Architecture
3.1 Solution Overview
Approach: Container Recreation with State Preservation
Strategy:
- Validate port addition request
- Create checkpoint of container state
- Stop existing container gracefully
- Create new container with updated port configuration
- Restore state from checkpoint
- Update all tracking systems (database, billing, monitoring)
- Provide rollback on any failure
3.2 High-Level Flow
┌─────────────┐
│ User │
│ Request │
└──────┬──────┘
│ POST /rentals/:id/ports
↓
┌─────────────────────┐
│ API Gateway │
│ - Auth validation │
│ - Ownership check │
│ - Port validation │
└──────┬──────────────┘
│
↓
┌─────────────────────┐
│ Port Update │
│ Coordinator │
│ - Idempotency │
│ - State machine │
│ - Rollback logic │
└──────┬──────────────┘
│
├─→ Database: Begin transaction
│
├─→ Container Manager: Prepare update
│ ├─→ SSH to node
│ ├─→ Checkpoint state (if supported)
│ ├─→ Stop container
│ ├─→ Create new container with new ports
│ └─→ Verify health
│
├─→ Billing: Emit ResourceUpdate event
│
└─→ Database: Commit transaction
3.3 Component Responsibilities
API Layer:
- Endpoint:
POST /rentals/:id/ports - Request validation (port ranges, conflicts)
- Authentication & authorization
- Rate limiting
Port Update Coordinator (NEW):
- Orchestrates multi-step update process
- Maintains operation state machine
- Implements idempotency
- Handles rollback on failures
Container Manager:
- Executes container recreation
- Manages state preservation
- Validates port availability on host
- Reports success/failure
Billing Integration:
- Emits
ResourceUpdateevents - Records audit trail
- Ensures idempotency
Database Layer:
- Stores port addition operations
- Tracks operation state
- Maintains audit log
3.4 State Machine
┌─────────┐
│ Pending │
└────┬────┘
│
↓
┌─────────────────┐
│ Validating │ (Check port conflicts, availability)
└────┬────────────┘
│
↓
┌─────────────────┐
│ Checkpointing │ (Optional: Save container state)
└────┬────────────┘
│
↓
┌─────────────────┐
│ Stopping │ (Graceful stop with timeout)
└────┬────────────┘
│
↓
┌─────────────────┐
│ Recreating │ (Create container with new ports)
└────┬────────────┘
│
↓
┌─────────────────┐
│ Verifying │ (Health check new container)
└────┬────────────┘
│
↓
┌─────────────────┐
│ Completed │
└─────────────────┘
│ (Any failure)
↓
┌─────────────────┐
│ RollingBack │
└────┬────────────┘
│
↓
┌─────────────────┐
│ Failed │
└─────────────────┘
4. Component Design
4.1 API Handler
File: crates/basilica-api/src/api/routes/rentals.rs
/// Add ports to an existing rental
/// POST /rentals/:id/ports
pub async fn add_rental_ports(
State(state): State<AppState>,
owned_rental: OwnedRental,
Json(request): Json<AddPortsRequest>,
) -> Result<Json<PortUpdateResponse>>Responsibilities:
- Validate request format
- Check ownership via
OwnedRentalextractor - Validate port specifications (range, protocol)
- Check for duplicate ports
- Delegate to validator client
- Return operation status
Validation Rules:
- Port range: 1-65535
- No duplicate ports in request
- No conflicts with existing ports
- Protocol must be "tcp" or "udp"
- Maximum 10 ports per request (configurable)
4.2 Port Update Coordinator
File: crates/basilica-validator/src/rental/port_updater.rs (NEW)
pub struct PortUpdateCoordinator {
container_client: Arc<ContainerClient>,
persistence: Arc<dyn Persistence>,
billing_client: Option<Arc<BillingClient>>,
metrics: Arc<dyn MetricsRecorder>,
}
impl PortUpdateCoordinator {
/// Add ports to existing rental with full state preservation
pub async fn add_ports(
&self,
rental_id: &str,
new_ports: Vec<PortMapping>,
initiated_by: &str,
) -> Result<PortUpdateResult, RentalError>
}Responsibilities:
- Implement idempotency (check for duplicate operations)
- Execute state machine transitions
- Coordinate container recreation
- Emit billing events
- Handle rollback on failures
- Update database transactionally
Key Methods:
add_ports(): Main entry pointvalidate_port_availability(): Check host port conflictscheckpoint_container(): Save container state (optional)recreate_container(): Execute container recreationverify_container_health(): Ensure container is healthyrollback(): Restore previous state on failure
4.3 Container Recreation Logic
File: crates/basilica-validator/src/rental/container_client.rs (EXTEND)
impl ContainerClient {
/// Recreate container with updated port configuration
/// Preserves volumes and environment variables
pub async fn recreate_with_new_ports(
&self,
rental: &Rental,
additional_ports: Vec<PortMapping>,
) -> Result<ContainerInfo, ContainerError>
}Process:
- Extract current container configuration via
docker inspect - Merge existing ports with new ports
- Stop container gracefully (
docker stop -t 30) - Remove stopped container (
docker rm) - Create new container with merged ports (
docker run) - Verify container is running (
docker inspect) - Return updated container info
Data Preservation:
- Use existing volume mounts (automatically preserved)
- Reuse same environment variables
- Maintain same container image
- Preserve same working directory
4.4 Database Schema Extension
Migration: 002_add_port_operations.sql
-- Track port addition operations for audit and idempotency
CREATE TABLE IF NOT EXISTS port_operations (
id TEXT PRIMARY KEY,
rental_id TEXT NOT NULL,
operation_type TEXT NOT NULL, -- 'add_ports', 'remove_ports'
operation_state TEXT NOT NULL, -- State machine state
requested_ports TEXT NOT NULL, -- JSON array of PortMapping
initiated_by TEXT NOT NULL, -- User ID
idempotency_key TEXT UNIQUE NOT NULL,
started_at TEXT NOT NULL,
completed_at TEXT,
error_message TEXT,
rollback_attempted INTEGER DEFAULT 0,
metadata TEXT, -- JSON for additional context
FOREIGN KEY (rental_id) REFERENCES rentals(id) ON DELETE CASCADE
);
CREATE INDEX IF NOT EXISTS idx_port_ops_rental ON port_operations(rental_id);
CREATE INDEX IF NOT EXISTS idx_port_ops_state ON port_operations(operation_state);
CREATE INDEX IF NOT EXISTS idx_port_ops_idempotency ON port_operations(idempotency_key);
-- Extend rentals table to track port history
ALTER TABLE rentals ADD COLUMN port_history TEXT DEFAULT '[]';4.5 Billing Integration
Event Generation:
// Emit ResourceUpdate event
let event = BillingEvent {
event_type: "resource_update",
entity_type: "rental",
entity_id: rental_id.clone(),
user_id: Some(user_id),
event_data: json!({
"update_type": "port_addition",
"previous_ports": previous_ports,
"new_ports": new_ports,
"added_ports": added_ports,
"operation_id": operation_id,
"timestamp": Utc::now(),
"initiated_by": user_id,
}),
metadata: json!({
"rental_state": "active",
"downtime_seconds": downtime_duration.as_secs(),
}),
created_by: user_id.clone(),
created_at: Utc::now(),
};
billing_client.record_event(event).await?;Idempotency Key:
fn generate_port_update_idempotency_key(
rental_id: &str,
ports: &[PortMapping],
timestamp: DateTime<Utc>,
) -> String {
let ports_hash = hash_ports(ports); // Deterministic hash
format!("{}:port_update:{}:{}", rental_id, ports_hash, timestamp.timestamp())
}5. Data Model Changes
5.1 New Types
AddPortsRequest:
pub struct AddPortsRequest {
pub ports: Vec<PortMappingRequest>,
pub reason: Option<String>, // Optional user-provided reason
}PortUpdateResponse:
pub struct PortUpdateResponse {
pub operation_id: String,
pub rental_id: String,
pub status: PortUpdateStatus,
pub added_ports: Vec<PortMapping>,
pub estimated_downtime_seconds: u32,
pub message: String,
}
pub enum PortUpdateStatus {
Pending,
InProgress,
Completed,
Failed,
}PortOperation:
pub struct PortOperation {
pub id: String,
pub rental_id: String,
pub operation_type: PortOperationType,
pub operation_state: PortOperationState,
pub requested_ports: Vec<PortMapping>,
pub initiated_by: String,
pub idempotency_key: String,
pub started_at: DateTime<Utc>,
pub completed_at: Option<DateTime<Utc>>,
pub error_message: Option<String>,
pub rollback_attempted: bool,
pub metadata: serde_json::Value,
}5.2 Extended Types
Rental (extended):
pub struct Rental {
// ... existing fields ...
pub port_history: Vec<PortHistoryEntry>,
}
pub struct PortHistoryEntry {
pub timestamp: DateTime<Utc>,
pub operation_type: String,
pub ports: Vec<PortMapping>,
pub initiated_by: String,
}6. API Specification
6.1 Add Ports Endpoint
Endpoint: POST /rentals/:rental_id/ports
Authentication: Required (JWT or API Key)
Authorization: User must own the rental
Request Body:
{
"ports": [
{
"container_port": 8080,
"host_port": 0,
"protocol": "tcp"
},
{
"container_port": 9090,
"host_port": 9090,
"protocol": "tcp"
}
],
"reason": "Adding Prometheus endpoint for monitoring"
}Success Response (202 Accepted):
{
"operation_id": "op-uuid",
"rental_id": "rental-uuid",
"status": "in_progress",
"added_ports": [
{
"container_port": 8080,
"host_port": 45123,
"protocol": "tcp"
},
{
"container_port": 9090,
"host_port": 9090,
"protocol": "tcp"
}
],
"estimated_downtime_seconds": 8,
"message": "Port addition in progress. Container will be recreated with new ports."
}Error Responses:
400 Bad Request:
{
"error": "bad_request",
"message": "Invalid port specification",
"details": "Port 80 is already mapped in this rental"
}404 Not Found:
{
"error": "not_found",
"message": "Rental not found or you don't have access"
}409 Conflict:
{
"error": "conflict",
"message": "Port operation already in progress for this rental",
"operation_id": "op-existing-uuid"
}503 Service Unavailable:
{
"error": "service_unavailable",
"message": "Unable to connect to node. Please try again later."
}6.2 Get Port Operation Status
Endpoint: GET /rentals/:rental_id/ports/operations/:operation_id
Response:
{
"operation_id": "op-uuid",
"rental_id": "rental-uuid",
"status": "completed",
"started_at": "2025-10-29T12:00:00Z",
"completed_at": "2025-10-29T12:00:08Z",
"added_ports": [...],
"message": "Port addition completed successfully"
}6.3 List Port Operations
Endpoint: GET /rentals/:rental_id/ports/operations
Response:
{
"operations": [
{
"operation_id": "op-uuid-1",
"operation_type": "add_ports",
"status": "completed",
"started_at": "2025-10-29T12:00:00Z",
"completed_at": "2025-10-29T12:00:08Z"
}
],
"total": 1
}7. Implementation Phases
Phase 1: Foundation (Week 1-2, ~40 hours)
Objectives:
- Database schema migration
- Core types and data structures
- Basic validation logic
Tasks:
- Create migration
002_add_port_operations.sql(2h) - Define new types in
basilica-sdk(4h)AddPortsRequestPortUpdateResponsePortOperation
- Implement validation logic (8h)
- Port range validation
- Conflict detection
- Protocol validation
- Add database repository methods (8h)
create_port_operation()update_operation_state()get_port_operation()
- Write unit tests for validation (8h)
- Code review and refinement (10h)
Deliverables:
- Migration script tested and applied
- Types defined and documented
- Validation tests passing
- Repository methods tested
Success Criteria:
- All tests pass
- Migration runs successfully on test database
- Code review approved
Phase 2: Container Recreation Logic (Week 3-4, ~50 hours)
Objectives:
- Implement container recreation with state preservation
- Add rollback capability
- Integrate with existing container client
Tasks:
- Extend
ContainerClientwith recreation method (12h)recreate_with_new_ports()- Extract existing configuration
- Merge port configurations
- Implement state preservation (10h)
- Volume preservation
- Environment variable retention
- Configuration backup
- Add rollback logic (10h)
- Restore original container on failure
- Cleanup partial changes
- Implement health verification (6h)
- Post-recreation health checks
- Timeout handling
- Write integration tests (12h)
- Test container recreation
- Test state preservation
- Test rollback scenarios
Deliverables:
- Container recreation working end-to-end
- Rollback tested and verified
- Integration tests passing
Success Criteria:
- Container can be recreated with new ports
- Volumes and environment preserved
- Rollback works correctly on failures
- All tests pass
Phase 3: Coordinator & Orchestration (Week 5-6, ~45 hours)
Objectives:
- Implement port update coordinator
- State machine implementation
- Idempotency handling
Tasks:
- Create
PortUpdateCoordinatorstruct (8h)- State machine transitions
- Operation lifecycle management
- Implement idempotency (8h)
- Key generation
- Duplicate detection
- Operation resumption
- Add transaction management (10h)
- Database transaction wrapper
- Atomic state updates
- Consistency guarantees
- Implement coordinator methods (12h)
add_ports()validate_port_availability()execute_update()handle_rollback()
- Write unit tests (7h)
Deliverables:
- Coordinator fully functional
- Idempotency working
- State machine tested
Success Criteria:
- Coordinator can orchestrate full update flow
- Idempotency prevents duplicate operations
- State machine handles all transitions
- Tests verify all paths
Phase 4: API Integration (Week 7, ~30 hours)
Objectives:
- Add API endpoints
- Integrate with auth system
- Add rate limiting
Tasks:
- Implement API handler (8h)
add_rental_ports()- Request validation
- Response formatting
- Add endpoint to router (2h)
- Integrate with ownership validation (4h)
- Add rate limiting (4h)
- Per-user limits
- Per-rental limits
- Write API tests (8h)
- Happy path
- Error cases
- Auth failures
- Update OpenAPI spec (4h)
Deliverables:
- API endpoint functional
- Auth and rate limiting working
- API tests passing
- Documentation updated
Success Criteria:
- Endpoint accessible and secured
- All validation working
- Rate limiting prevents abuse
- Tests cover all scenarios
Phase 5: Billing Integration (Week 8, ~25 hours)
Objectives:
- Emit billing events
- Ensure audit trail
- Verify idempotency
Tasks:
- Implement event emission (6h)
ResourceUpdateevent- Event data structure
- Add idempotency key generation (4h)
- Test billing integration (8h)
- Event recording
- Duplicate prevention
- Audit trail verification
- Add monitoring metrics (4h)
- Operation counters
- Success/failure rates
- Duration histograms
- Documentation (3h)
Deliverables:
- Billing events emitted correctly
- Audit trail complete
- Metrics visible
Success Criteria:
- Events recorded in billing system
- Idempotency prevents duplicates
- Metrics show operation status
- Audit trail verifiable
Phase 6: CLI Support (Week 9, ~20 hours)
Objectives:
- Add CLI commands
- User-friendly interface
- Interactive mode
Tasks:
- Add
port-addcommand (8h)- Argument parsing
- API client integration
- Output formatting
- Add operation status command (4h)
- Add interactive port selection (4h)
- Write CLI tests (4h)
Deliverables:
- CLI commands functional
- Help documentation complete
- Tests passing
Success Criteria:
- Commands work as expected
- Error messages clear
- Interactive mode intuitive
Phase 7: Testing & Hardening (Week 10-11, ~50 hours)
Objectives:
- Comprehensive testing
- Performance validation
- Security audit
Tasks:
- End-to-end testing (16h)
- Full flow testing
- Multi-rental scenarios
- Concurrent operations
- Failure scenario testing (12h)
- Network failures
- Node unavailability
- Database failures
- Performance testing (8h)
- Operation duration
- Resource usage
- Concurrent load
- Security audit (8h)
- Authorization checks
- Input validation
- Injection prevention
- Bug fixes and refinement (6h)
Deliverables:
- All tests passing
- Performance benchmarks met
- Security review complete
Success Criteria:
- 95%+ test coverage
- <10 second average operation time
- Zero critical security issues
- All edge cases handled
Phase 8: Documentation & Deployment (Week 12, ~20 hours)
Objectives:
- Complete documentation
- Deployment preparation
- Monitoring setup
Tasks:
- API documentation (6h)
- OpenAPI spec
- Examples
- Error codes
- User guide (6h)
- CLI examples
- API examples
- Troubleshooting
- Deployment checklist (4h)
- Migration steps
- Rollback plan
- Monitoring setup
- Training materials (4h)
Deliverables:
- Documentation complete
- Deployment ready
- Monitoring configured
Success Criteria:
- Documentation clear and complete
- Deployment checklist validated
- Team trained on new feature
8. Security Considerations
8.1 Authentication & Authorization
Requirements:
- User must be authenticated (JWT or API key)
- User must own the rental (verified via
OwnedRental) - Rate limiting applied per user and per rental
Implementation:
// Ownership validation via extractor
pub async fn add_rental_ports(
owned_rental: OwnedRental, // Validates ownership
Json(request): Json<AddPortsRequest>,
) -> Result<Json<PortUpdateResponse>>8.2 Input Validation
Port Validation:
- Range: 1-65535
- No privileged ports (<1024) without admin flag
- No duplicate container ports
- No conflicts with existing mappings
- Protocol must be "tcp" or "udp"
Request Limits:
- Maximum 10 ports per request
- Maximum 50 total ports per rental
- Rate limit: 5 requests per hour per user
8.3 Command Injection Prevention
Docker Command Sanitization:
// Always use parameterized commands
fn build_docker_run_command(config: &ContainerConfig) -> Command {
let mut cmd = Command::new("docker");
cmd.arg("run");
// Add ports individually to prevent injection
for port in &config.ports {
cmd.arg("-p");
cmd.arg(format!("{}:{}:{}",
port.host_port,
port.container_port,
port.protocol
));
}
cmd
}8.4 Resource Limits
Prevent Abuse:
- Maximum concurrent port operations per user: 3
- Operation timeout: 5 minutes
- Retry limit: 3 attempts
- Cooldown period: 1 minute between operations
8.5 Audit Trail
Log All Operations:
- Operation initiation (who, when, what)
- State transitions
- Failures and errors
- Rollback attempts
- Final outcome
Retention:
- Port operations: 90 days
- Audit events: 1 year
- Billing events: Permanent
9. Testing Strategy
9.1 Unit Tests
Coverage Areas:
- Validation logic (port ranges, conflicts, protocols)
- Idempotency key generation
- State machine transitions
- Port merging logic
- Error handling
Test Cases:
- Valid port addition request
- Invalid port ranges
- Duplicate ports
- Protocol validation
- Conflict detection
9.2 Integration Tests
Coverage Areas:
- Database operations
- Container recreation
- Rollback mechanisms
- Billing event emission
- API endpoint functionality
Test Cases:
- Successful port addition end-to-end
- Container recreation with state preservation
- Rollback on container failure
- Rollback on billing failure
- Idempotency (duplicate requests)
9.3 System Tests
Coverage Areas:
- Full API to container flow
- Multiple concurrent operations
- Failure scenarios
- Performance under load
Test Cases:
- Add ports to active rental
- Multiple users adding ports concurrently
- Network failure during operation
- Node unavailability
- Database connection loss
9.4 Performance Tests
Metrics:
- Average operation duration: <10 seconds
- P95 operation duration: <15 seconds
- P99 operation duration: <30 seconds
- Throughput: 100 operations/minute per validator
- Concurrent operations: 10 simultaneous
Load Testing:
- 1000 port additions in 1 hour
- 10 concurrent port additions
- Sustained load over 24 hours
9.5 Security Tests
Test Cases:
- Unauthorized access attempts
- Cross-user rental access
- Command injection attempts
- Port range boundary conditions
- Rate limit enforcement
- SQL injection in operation metadata
9.6 Failure Recovery Tests
Scenarios:
- Container creation fails
- Network disconnects during operation
- Node becomes unavailable
- Database transaction fails
- Billing service unavailable
Expected Outcomes:
- Graceful rollback to previous state
- No orphaned containers
- Database consistency maintained
- Clear error messages to user
10. Rollout Plan
10.1 Phase 1: Internal Testing (Week 13)
Scope:
- Deploy to internal testing environment
- Manual testing by engineering team
- Validation of all use cases
Success Criteria:
- All test cases pass
- No critical bugs
- Performance targets met
10.2 Phase 2: Beta Release (Week 14-15)
Scope:
- Enable for select beta users
- Monitor closely for issues
- Gather user feedback
Selection Criteria:
- Trusted users with non-critical workloads
- Users who have expressed interest
- Maximum 50 beta users
Monitoring:
- Operation success/failure rates
- Average duration metrics
- User feedback collection
- Billing accuracy verification
10.3 Phase 3: Limited Release (Week 16-17)
Scope:
- Enable for 10% of user base
- Continue monitoring and refinement
- Document common issues
Rollout Strategy:
- Gradual percentage increase
- User opt-in option
- Clear communication about feature
Rollback Criteria:
-
5% failure rate
- Critical security issue
- Billing inconsistencies
- Performance degradation
10.4 Phase 4: General Availability (Week 18+)
Scope:
- Enable for all users
- Full documentation published
- Support team trained
Communication:
- Blog post announcement
- Email to all users
- Documentation updates
- CLI help text updates
Ongoing Monitoring:
- Daily operation metrics review
- Weekly success rate analysis
- Monthly performance review
- Quarterly security audit
11. Appendices
Appendix A: Error Codes
| Code | Message | HTTP Status | Retry |
|---|---|---|---|
PORT_INVALID_RANGE |
Port must be between 1-65535 | 400 | No |
PORT_ALREADY_MAPPED |
Port already exists in rental | 400 | No |
PORT_CONFLICT |
Port conflicts with existing mapping | 409 | No |
PORT_LIMIT_EXCEEDED |
Maximum ports limit reached | 400 | No |
OPERATION_IN_PROGRESS |
Port operation already running | 409 | Yes |
RENTAL_NOT_ACTIVE |
Rental must be in active state | 400 | No |
NODE_UNAVAILABLE |
Cannot connect to node | 503 | Yes |
CONTAINER_RECREATION_FAILED |
Failed to recreate container | 500 | Yes |
ROLLBACK_FAILED |
Failed to restore previous state | 500 | No |
BILLING_UNAVAILABLE |
Billing service unreachable | 503 | Yes |
Appendix B: Metrics
Operation Metrics:
port_operations_total{status, rental_id}- Counterport_operation_duration_seconds{status}- Histogramport_operation_failures_total{reason}- Counteractive_port_operations- Gauge
Container Metrics:
container_recreations_total{status}- Countercontainer_recreation_duration_seconds- Histogramcontainer_rollbacks_total{success}- Counter
Billing Metrics:
billing_events_emitted_total{event_type}- Counterbilling_event_failures_total- Counter
Appendix C: Configuration
New Configuration Options:
[rental.port_operations]
enabled = true
max_ports_per_rental = 50
max_ports_per_request = 10
max_concurrent_operations_per_user = 3
operation_timeout_seconds = 300
container_stop_timeout_seconds = 30
health_check_retries = 3
health_check_interval_seconds = 2
rate_limit_per_hour = 5
cooldown_seconds = 60Appendix D: Glossary
- Port Mapping: Association between a host port and container port
- Container Recreation: Process of stopping and creating a new container with updated configuration
- State Preservation: Maintaining application data during container recreation
- Idempotency: Property ensuring duplicate requests have same effect as single request
- Rollback: Restoring previous state after operation failure
- Port Conflict: Situation where requested port is already in use
- Operation State Machine: Defined states and transitions for port operations
Appendix E: Implementation Risks
| Risk | Probability | Impact | Mitigation |
|---|---|---|---|
| Container data loss during recreation | Medium | High | Use volumes for all persistent data |
| Billing inconsistency | Low | Critical | Implement robust idempotency |
| Extended downtime | Low | Medium | Optimize recreation process, add timeouts |
| Port conflicts | Medium | Low | Pre-validate availability |
| Node unavailability | Medium | Medium | Implement retry with backoff |
| Concurrent operation conflicts | Medium | Medium | Use database locks |
| Rollback failures | Low | High | Test extensively, add manual recovery |
Appendix F: Success Metrics
Operational Metrics:
- Success rate: >95%
- Average duration: <10 seconds
- P95 duration: <15 seconds
- Rollback rate: <5%
- Billing accuracy: 100%
User Satisfaction:
- Feature adoption: >30% of active users
- Support tickets: <10 per week
- User rating: >4/5
Business Metrics:
- Zero billing disputes related to port operations
- Zero miner payout errors
- Uptime maintained at >99.9%