Skip to content

Commit 28372fd

Browse files
authored
feat: Add backoff support for retries. (#267)
1 parent a0ea80c commit 28372fd

File tree

16 files changed

+778
-58
lines changed

16 files changed

+778
-58
lines changed

.formatter.exs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@ spark_locals_without_parens = [
99
around: 2,
1010
around: 3,
1111
async?: 1,
12+
backoff: 1,
1213
batch_size: 1,
1314
before_all: 1,
1415
collect: 1,

documentation/dsls/DSL-Reactor.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2306,8 +2306,9 @@ end
23062306
|------|------|---------|------|
23072307
| [`description`](#reactor-step-description){: #reactor-step-description } | `String.t` | | An optional description for the step. |
23082308
| [`run`](#reactor-step-run){: #reactor-step-run } | `(any -> any) \| mfa \| (any, any -> any) \| mfa` | | Provide an anonymous function which implements a `run/1-2` callback. Cannot be provided at the same time as the `impl` argument. |
2309-
| [`undo`](#reactor-step-undo){: #reactor-step-undo } | `(any -> any) \| mfa \| (any, any -> any) \| mfa \| (any, any, any -> any) \| mfa` | | Provide an anonymous function which implements a `undo/1-3` callback. Cannot be provided at the same time as the `impl` argument. |
2310-
| [`compensate`](#reactor-step-compensate){: #reactor-step-compensate } | `(any -> any) \| mfa \| (any, any -> any) \| mfa \| (any, any, any -> any) \| mfa` | | Provide an anonymous function which implements a `compensate/1-3` callback. Cannot be provided at the same time as the `impl` argument. |
2309+
| [`undo`](#reactor-step-undo){: #reactor-step-undo } | `(any -> any) \| mfa \| (any, any -> any) \| mfa \| (any, any, any -> any) \| mfa` | | Provide an anonymous function which implements a `undo/1..3` callback. Cannot be provided at the same time as the `impl` argument. |
2310+
| [`compensate`](#reactor-step-compensate){: #reactor-step-compensate } | `(any -> any) \| mfa \| (any, any -> any) \| mfa \| (any, any, any -> any) \| mfa` | | Provide an anonymous function which implements a `compensate/1..3` callback. Cannot be provided at the same time as the `impl` argument. |
2311+
| [`backoff`](#reactor-step-backoff){: #reactor-step-backoff } | `(any -> any) \| mfa \| (any, any -> any) \| mfa \| (any, any, any -> any) \| mfa` | | Provide an anonymous function which implements a `backoff/1..3` callback. Cannot be provided at the same time as the `impl` argument. |
23112312
| [`max_retries`](#reactor-step-max_retries){: #reactor-step-max_retries } | `:infinity \| non_neg_integer` | `:infinity` | The maximum number of times that the step can be retried before failing. Only used when the result of the `compensate` callback is `:retry`. |
23122313
| [`async?`](#reactor-step-async?){: #reactor-step-async? } | `boolean` | `true` | When set to true the step will be executed asynchronously via Reactor's `TaskSupervisor`. |
23132314
| [`transform`](#reactor-step-transform){: #reactor-step-transform } | `(any -> any) \| module \| nil` | | An optional transformation function which can be used to modify the entire argument map before it is passed to the step. |

documentation/how-to/performance-optimization.md

Lines changed: 197 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -556,6 +556,200 @@ end
556556
```
557557

558558

559+
## Backoff Strategies for External Services
560+
561+
### 1. Choosing the Right Backoff Strategy
562+
563+
When dealing with external services, proper backoff strategies can significantly improve both reliability and performance:
564+
565+
```elixir
566+
defmodule SmartAPIStep do
567+
use Reactor.Step
568+
569+
@impl true
570+
def run(args, _context, _options) do
571+
case HTTPoison.get(args.url) do
572+
{:ok, %{status_code: 200} = response} ->
573+
{:ok, response.body}
574+
{:ok, %{status_code: 429}} ->
575+
{:error, %{type: :rate_limit, retry_after: get_retry_after(response)}}
576+
{:ok, %{status_code: 503}} ->
577+
{:error, %{type: :service_unavailable}}
578+
{:error, %{reason: :timeout}} ->
579+
{:error, %{type: :network_timeout}}
580+
other ->
581+
{:error, other}
582+
end
583+
end
584+
585+
@impl true
586+
def compensate(error, _args, _context, _options) do
587+
case error do
588+
%{type: type} when type in [:rate_limit, :service_unavailable, :network_timeout] ->
589+
:retry
590+
_other ->
591+
:ok # Don't retry permanent errors
592+
end
593+
end
594+
595+
@impl true
596+
# Respect server's Retry-After header
597+
def backoff(%{type: :rate_limit, retry_after: seconds}, _args, context, _step) when is_integer(seconds), do: (seconds + 1) * 1000
598+
599+
# No Retry-After header - use conservative fixed delay
600+
def backoff(%{type: :rate_limit}, _args, _context, _step), do: 30_000
601+
602+
# Service temporarily down - exponential backoff with jitter
603+
def backoff(%{type: :service_unavailable}, _args, context, _step) do
604+
retry_count = Map.get(context, :current_try, 0)
605+
base_delay = :math.pow(2, retry_count) * 1000 |> round()
606+
jitter = :rand.uniform(1000) # 0-1000ms jitter
607+
min(base_delay + jitter, 60_000) # Cap at 1 minute
608+
end
609+
610+
def backoff(_, _args, _context, _step), do: :now
611+
612+
defp get_retry_after(response) do
613+
case HTTPoison.Response.get_header(response, "retry-after") do
614+
[value] -> String.to_integer(value)
615+
_ -> nil
616+
end
617+
end
618+
end
619+
```
620+
621+
### 2. Backoff Impact on Throughput
622+
623+
Understanding how backoff affects system throughput is crucial for performance tuning:
624+
625+
```elixir
626+
defmodule BackoffBenchmark do
627+
def compare_backoff_strategies do
628+
# Simulate API that fails 30% of the time
629+
defmodule FlakeyAPI do
630+
use Reactor.Step
631+
632+
@impl true
633+
def run(_args, _context, _options) do
634+
if :rand.uniform() < 0.3 do
635+
{:error, %{type: :transient_failure}}
636+
else
637+
{:ok, "success"}
638+
end
639+
end
640+
641+
@impl true
642+
def compensate(_error, _args, _context, _options), do: :retry
643+
end
644+
645+
# No backoff strategy
646+
defmodule NoBackoffAPI do
647+
use Reactor.Step
648+
defdelegate run(args, context, options), to: FlakeyAPI
649+
defdelegate compensate(error, args, context, options), to: FlakeyAPI
650+
651+
@impl true
652+
def backoff(_error, _args, _context, _step), do: :now
653+
end
654+
655+
# Fixed backoff strategy
656+
defmodule FixedBackoffAPI do
657+
use Reactor.Step
658+
defdelegate run(args, context, options), to: FlakeyAPI
659+
defdelegate compensate(error, args, context, options), to: FlakeyAPI
660+
661+
@impl true
662+
def backoff(_error, _args, _context, _step), do: 100 # 100ms fixed delay
663+
end
664+
665+
data = Enum.to_list(1..100)
666+
667+
Benchee.run(
668+
%{
669+
"no_backoff" => fn ->
670+
reactor = build_reactor(NoBackoffAPI, data)
671+
Reactor.run(reactor, %{items: data}, %{}, max_concurrency: 10)
672+
end,
673+
"fixed_backoff" => fn ->
674+
reactor = build_reactor(FixedBackoffAPI, data)
675+
Reactor.run(reactor, %{items: data}, %{}, max_concurrency: 10)
676+
end
677+
},
678+
warmup: 1,
679+
time: 3
680+
)
681+
end
682+
end
683+
```
684+
685+
### 3. Adaptive Backoff for Complex Scenarios
686+
687+
For sophisticated systems, implement adaptive backoff that responds to system conditions:
688+
689+
```elixir
690+
defmodule AdaptiveBackoffStep do
691+
use Reactor.Step
692+
use GenServer
693+
694+
# Track error rates in a GenServer
695+
def start_link(opts) do
696+
GenServer.start_link(__MODULE__, %{}, name: __MODULE__)
697+
end
698+
699+
@impl true
700+
def init(state), do: {:ok, %{error_count: 0, success_count: 0, window_start: System.monotonic_time()}}
701+
702+
@impl true
703+
def run(args, _context, _options) do
704+
result = call_external_service(args)
705+
706+
case result do
707+
{:ok, data} ->
708+
GenServer.cast(__MODULE__, :record_success)
709+
{:ok, data}
710+
{:error, reason} ->
711+
GenServer.cast(__MODULE__, :record_error)
712+
{:error, reason}
713+
end
714+
end
715+
716+
@impl true
717+
def compensate(_error, _args, _context, _options), do: :retry
718+
719+
@impl true
720+
def backoff(_error, _args, _context, _step) do
721+
error_rate = GenServer.call(__MODULE__, :get_error_rate)
722+
723+
base_delay = case error_rate do
724+
rate when rate > 0.5 -> 10_000 # High error rate - long delays
725+
rate when rate > 0.2 -> 5_000 # Moderate error rate - medium delays
726+
rate when rate > 0.1 -> 2_000 # Low error rate - short delays
727+
_ -> 500 # Very low error rate - minimal delays
728+
end
729+
730+
# Add jitter to prevent thundering herd
731+
jitter = :rand.uniform(1000)
732+
base_delay + jitter
733+
end
734+
735+
# GenServer callbacks for tracking error rates
736+
@impl true
737+
def handle_cast(:record_success, state) do
738+
{:noreply, %{state | success_count: state.success_count + 1}}
739+
end
740+
741+
def handle_cast(:record_error, state) do
742+
{:noreply, %{state | error_count: state.error_count + 1}}
743+
end
744+
745+
def handle_call(:get_error_rate, _from, state) do
746+
total = state.error_count + state.success_count
747+
rate = if total > 0, do: state.error_count / total, else: 0.0
748+
{:reply, rate, state}
749+
end
750+
end
751+
```
752+
559753
## Performance Benchmarking
560754

561755
### 1. Using Benchee for Reactor Performance Testing
@@ -754,33 +948,8 @@ end
754948
- Map steps with batch sizes that are too large
755949
- Streams not being processed lazily
756950

757-
**Problem**: External service rate limiting errors
758-
**Solution**: Use compensation with exponential backoff and reduce `max_concurrency`:
759-
760-
```elixir
761-
step :api_call do
762-
run fn args, _context ->
763-
call_external_api(args)
764-
end
765-
766-
compensate fn _args, context ->
767-
# Use current_try for exponential backoff
768-
retry_count = context.current_try
769-
delay_ms = :math.pow(2, retry_count) * 1000 |> round()
770-
771-
Process.sleep(delay_ms)
772-
:retry
773-
end
774-
end
775-
776-
# Also reduce concurrency to respect service limits
777-
{:ok, result} = Reactor.run(
778-
APIReactor,
779-
inputs,
780-
%{},
781-
max_concurrency: 5 # Lower concurrency for rate-limited APIs
782-
)
783-
```
951+
**Problem**: External service rate limiting errors
952+
**Solution**: Use backoff strategies and reduce `max_concurrency`.
784953

785954
**Problem**: Inconsistent performance
786955
**Solution**: Ensure proper resource isolation and monitoring:
@@ -826,4 +995,4 @@ Optimizing Reactor performance requires understanding your workload characterist
826995
- [Building Async Workflows](../tutorials/03-async-workflows.md) - Understanding Reactor's concurrency model
827996
- [Data Processing Pipelines](data-pipelines.md) - Efficient batch processing patterns
828997
- [Testing Strategies](testing-strategies.md) - Performance testing approaches
829-
- [Debugging Workflows](debugging-workflows.md) - Performance monitoring and profiling
998+
- [Debugging Workflows](debugging-workflows.md) - Performance monitoring and profiling

documentation/reference/glossary.md

Lines changed: 11 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -94,17 +94,25 @@ This glossary defines key terms, concepts, and technical vocabulary used through
9494

9595
## Error Handling Concepts
9696

97+
**Backoff** - Delay mechanism that adds intelligent waiting periods between retry attempts to prevent overwhelming external services and improve system stability. Backoff delays are minimum delays - actual retry timing may be longer as the executor prioritises processing ready steps before checking for expired backoffs.
98+
99+
**Backoff Callback** - Optional `c:Reactor.Step.backoff/4` callback that determines retry delay based on arguments, context, step metadata, and error reason.
100+
97101
**Compensation Logic** - Step-level error handling that decides whether to retry, continue with alternative values, or fail.
98102

99103
**Error Classification** - Categorisation of errors using splode library into Invalid, Internal, Unknown, and Validation types.
100104

101-
**Exponential Backoff** - Retry strategy where delay increases exponentially with each retry attempt.
105+
**Exponential Backoff** - Retry strategy where delay increases exponentially with each retry attempt (1s, 2s, 4s, 8s...), commonly used for network issues and service overload.
106+
107+
**Fixed Backoff** - Retry strategy using consistent delays between attempts, often used for rate limiting when reset intervals are known.
108+
109+
**Jittered Backoff** - Backoff strategy that adds randomness to delays to prevent thundering herd problems when multiple clients retry simultaneously.
102110

103111
**Max Retries** - Configuration limiting how many times a step can be retried through compensation.
104112

105-
**Retry Logic** - Automatic re-execution of failed steps when compensation returns `:retry`.
113+
**Retry Logic** - Automatic re-execution of failed steps when compensation returns `:retry`, now enhanced with optional backoff delays.
106114

107-
**Three-Tier Error Handling** - Reactor's approach using compensation (retry), undo (rollback), and global rollback levels.
115+
**Three-Tier Error Handling** - Reactor's approach using compensation (retry), backoff (delay), undo (rollback), and global rollback levels.
108116

109117
**Undo Stack** - Last-in-first-out collection of successfully completed undoable steps for rollback purposes.
110118

0 commit comments

Comments
 (0)