Summary
Add support for distributed stream processing across multiple nodes/processes, enabling horizontal scaling, parallel processing, and fault tolerance through partitioned streams.
Problem Statement
Currently, Cortex.Streams runs entirely within a single process:
- No horizontal scaling: Cannot distribute load across multiple machines
- Single point of failure: If the process crashes, all processing stops
- Limited throughput: Bounded by single-machine resources (CPU, memory, network)
- No partition-based parallelism: Cannot process different keys in parallel across nodes
Current Behavior
// Everything runs in a single process
var stream = StreamBuilder<Order>.CreateNewStream("OrderProcessor")
.Stream(kafkaSource) // Single consumer
.Aggregate(...) // Single aggregator
.Sink(...) // Single sink
.Build();
// If this process crashes, all processing stops
// Cannot scale beyond one machine's capacity
Impact
Without distributed mode:
- Cannot handle high-throughput production workloads (millions of events/sec)
- No horizontal scaling for growing data volumes
- Limited fault tolerance (checkpointing helps, but recovery takes time)
- Cannot leverage cloud elasticity
Technical Considerations
-
Exactly-Once with Distribution: Combine with checkpointing for exactly-once across workers.
-
State Migration: When partitions move between workers, state must be migrated.
-
Ordering Guarantees: Events for the same key should be processed in order (within a partition).
-
Backpressure Across Workers: Need flow control for shuffle operations.
-
Network Partitions: Handle split-brain scenarios gracefully.
-
Skewed Partitions: Monitor and alert on hot partitions.
Non-Goals (v1)
- SQL query layer across distributed state
- Automatic partition splitting for hot keys
- Cross-datacenter replication
- Custom partition assignment strategies (beyond key-based)
References
Summary
Add support for distributed stream processing across multiple nodes/processes, enabling horizontal scaling, parallel processing, and fault tolerance through partitioned streams.
Problem Statement
Currently, Cortex.Streams runs entirely within a single process:
Current Behavior
Impact
Without distributed mode:
Technical Considerations
Exactly-Once with Distribution: Combine with checkpointing for exactly-once across workers.
State Migration: When partitions move between workers, state must be migrated.
Ordering Guarantees: Events for the same key should be processed in order (within a partition).
Backpressure Across Workers: Need flow control for shuffle operations.
Network Partitions: Handle split-brain scenarios gracefully.
Skewed Partitions: Monitor and alert on hot partitions.
Non-Goals (v1)
References