Scope: This file records historical analysis sessions and design notes. It is not the canonical development guide. For setup and quality gates, use
DEVELOPMENT.mdandCONTRIBUTING.md. For current ports and operator behavior, seeCLAUDE.mdand the source tree.
Port terminology (do not confuse):
- RustFS inside a Tenant (Services created by the operator): S3 API 9000, RustFS Console UI 9001 (see
src/types/v1alpha1/tenant/services.rs). - Operator HTTP Console (
cargo run -- console, default 9090): separate management API for the operator itself, not the same as the Tenant’s{tenant}-consoleService.
See CHANGELOG.md for complete list of bugs found and fixed.
Key Discovery (historical—since fixed in this repo): Through analysis of RustFS source and early operator output, several mismatches were found versus RustFS defaults, including:
- Wrong RustFS service ports in older operator revisions (e.g. IO 90 instead of 9000, console 9090 instead of 9001 for the in-cluster RustFS Console Service)
- Missing environment variables
- Non-standard volume paths
Methodology: Analyzed RustFS repository at ~/git/rustfs to verify correct implementation.
Added comprehensive Kubernetes scheduling capabilities to Pool struct.
Design Decision: Use SchedulingConfig struct with #[serde(flatten)]
- Better code organization
- Maintains flat YAML structure
- Follows industry patterns (MongoDB, PostgreSQL operators)
See architecture-decisions.md for detailed rationale.
Critical Finding: All pools form ONE unified RustFS cluster, not independent storage tiers.
From RustFS source code analysis (~/git/rustfs):
1. Unified Cluster Architecture (crates/ecstore/src/pools.rs):
- All pools combined into ONE
RUSTFS_VOLUMESenvironment variable - Single distributed hash ring across all volumes
- No pool independence
2. Uniform Erasure Coding (crates/ecstore/src/erasure.rs):
- Reed-Solomon erasure coding across ALL volumes
- Shards distributed uniformly (no preference for fast disks)
- Parity calculated for total drive count across all pools
3. No Storage Class Awareness (crates/ecstore/src/config/storageclass.rs):
- Storage class controls PARITY levels (EC:4, EC:2), NOT disk selection
- Does NOT control data placement or prefer certain disks
- No hot/warm/cold data awareness
4. External Tiering Only (crates/ecstore/src/tier/tier.rs):
- Tiering = transitioning to EXTERNAL cloud storage
- Types:
TierType::S3,TierType::Azure,TierType::GCS - NOT for internal disk class differentiation
Problem: Mixing NVMe/SSD/HDD in one Tenant
What Actually Happens:
- Object is erasure-coded into shards
- Shards distributed across ALL volumes (NVMe + SSD + HDD)
- Write completes when ALL shards written (limited by slowest = HDD)
- Read requires fetching shards (limited by slowest = HDD)
- Result: Entire cluster performs at HDD speed, NVMe wasted
Conclusion: Do NOT mix storage classes for "performance tiers" - it doesn't work.
✅ What Works:
- Cluster expansion (add pools for capacity)
- Geographic distribution (compliance/DR, not performance)
- Spot vs on-demand (compute cost, same storage class)
- Same class, different sizes (utilize mixed hardware)
- Resource differentiation (CPU/memory per pool)
❌ What Doesn't Work:
- NVMe for hot data, HDD for cold data
- Storage performance tiering via multi-pool
- Automatic intelligent data placement
For Real Tiering: Use RustFS lifecycle policies to external cloud storage (S3 Glacier, Azure Cool, GCS Nearline).
All implementation decisions verified against official RustFS source code, not assumptions.
Sources:
- RustFS constants:
crates/config/src/constants/app.rs - RustFS config:
rustfs/src/config/mod.rs - RustFS Helm chart:
helm/rustfs/
- Use recommended labels (
app.kubernetes.io/name, etc.) - Server-side apply for idempotency
- Owner references for garbage collection
- Industry-standard CRD patterns
- All new fields are
Option<T> - Use
#[serde(flatten)]to avoid breaking YAML structure - Maintain existing behavior by default
- Clear, accurate examples
- Prominent warnings about gotchas
- Comprehensive documentation
- Prevent costly mistakes (storage class mixing)
- Test resource structure creation
- Test field propagation (scheduling, RBAC, etc.)
- Test edge cases (None values, overrides)
- Currently: 47 library unit tests (run
cargo test --allfor the exact count), all passing
- Deploy actual Tenant
- Verify RustFS cluster formation
- Test multi-pool behavior
- Validate RUSTFS_VOLUMES expansion
src/
├── types/
│ └── v1alpha1/
│ ├── pool.rs (SchedulingConfig + Pool)
│ ├── persistence.rs
│ ├── tenant.rs
│ └── tenant/
│ ├── rbac.rs (RBAC factory methods)
│ ├── services.rs (Service factory methods)
│ └── workloads.rs (StatefulSet factory methods)
├── reconcile.rs (reconciliation logic)
└── context.rs (Kubernetes API wrapper)
Each resource type has a factory method on Tenant:
new_role(),new_service_account(),new_role_binding()new_io_service(),new_console_service(),new_headless_service()new_statefulset(pool)
This keeps logic organized and testable.
❌ Don't: Create pools with different storage classes for "performance tiering"
pools:
- name: fast
storageClassName: nvme # ← Don't mix
- name: slow
storageClassName: hdd # ← Performance tiers✅ Do: Use same storage class, different sizes
pools:
- name: large
storageClassName: ssd # ← Same class
storage: 10Ti
- name: small
storageClassName: ssd # ← Same class
storage: 2Ti❌ Don't: Think pools are independent clusters
✅ Do: Understand all pools form ONE unified cluster via RUSTFS_VOLUMES
Always set in operator (users don't need to):
- RUSTFS_VOLUMES (generated)
- RUSTFS_ADDRESS (auto-set)
- RUSTFS_CONSOLE_ADDRESS (auto-set)
- RUSTFS_CONSOLE_ENABLE (auto-set)
- Status field population
- Configuration secret mounting
- Image pull policy application
- Health probes
- Per-pool status tracking
- Dynamic pool addition API
- Pool decommissioning automation
- Pool-specific service endpoints
- Advanced topology awareness
Last Updated: 2026-03-28