Retries¶

A Retry strategy defines the "cooling-off" period for a circuit breaker. It controls when the breaker should attempt to recover by transitioning from the OPEN state to the HALF_OPEN state. Choosing the right strategy is key to balancing fast recovery with giving a struggling service enough time to heal.

Retry Type	Transition Timing	Best For...
Always	Immediately	Non-critical services where immediate retries are acceptable.
Never	Manual only	When recovery requires manual intervention by an operator.
Cooldown	After a fixed delay	A simple, predictable wait time before attempting recovery.
Backoff	After an exponentially increasing delay	An adaptive approach that waits longer after repeated failures.

Always¶

This strategy moves the circuit to HALF_OPEN immediately after it opens. On any subsequent call, it will attempt a recovery.

Use with Caution

Always can be dangerous, as it encourages a "thundering herd" problem where many clients retry simultaneously, overwhelming a service that is trying to recover. It's best used for non-critical services where failures are known to be very brief.

from fluxgate import CircuitBreaker
from fluxgate.retries import Always

# Not generally recommended.
cb = CircuitBreaker(name="api", retry=Always(), ...)

Never¶

This strategy keeps the circuit in the OPEN state indefinitely until it is manually reset. This is useful when a service requires human intervention to be fixed.

from fluxgate import CircuitBreaker
from fluxgate.retries import Never

cb = CircuitBreaker(name="api", retry=Never(), ...)

# An operator must manually reset the breaker after fixing the service.
cb.reset()

Cooldown¶

This is a simple and common strategy that waits for a fixed duration (in seconds) before moving to HALF_OPEN.

It's a good default choice for services that tend to recover in a predictable amount of time.

from fluxgate import CircuitBreaker
from fluxgate.retries import Cooldown

# Wait for 60 seconds before the first recovery attempt.
cb = CircuitBreaker(
    name="api",
    retry=Cooldown(duration=60.0),
    ...
)

Backoff¶

This is the most robust and recommended strategy. It increases the wait time exponentially after each consecutive failure, giving a struggling service more and more time to recover.

The wait time is calculated as initial * (multiplier ** consecutive_failures).

from fluxgate import CircuitBreaker
from fluxgate.retries import Backoff

# The wait time starts at 10s, doubles after each failed recovery attempt,
# and is capped at a maximum of 300s.
cb = CircuitBreaker(
    name="api",
    retry=Backoff(
        initial=10.0,
        multiplier=2.0,
        max_duration=300.0
    ),
    ...
)
# Sequence of wait times:
# 1st attempt -> 10s
# 2nd attempt -> 20s
# 3rd attempt -> 40s
# 4th attempt -> 80s
# 5th attempt -> 160s
# 6th+ attempt -> 300s (capped by max_duration)

Choosing the Right Retry Strategy¶

Comparison¶

Feature	Always	Never	Cooldown	Backoff
Recovery	Immediate	Manual	Fixed Delay	Exponential Delay
Service Load	High	None	Medium	Low
Handles Repeated Failures?	No	N/A	No	Yes
Complexity	Very Simple	Very Simple	Simple	Medium
Recommended?	No	For special cases	Good default	Recommended

When should I use `Always`?¶

Only for non-critical services where failures are known to be extremely brief and the service can handle a high volume of retries.

When should I use `Never`?¶

When a service requires manual intervention to fix. The circuit breaker will not attempt to recover on its own.

Use case: During a planned deployment or when a service is taken down for maintenance.

When should I use `Cooldown`?¶

This is a great, simple default. It's best when you have a general idea of how long the service takes to recover.

Use case: Protecting a service that has a predictable recovery time, like an external API with a fixed rate-limiting window.

When should I use `Backoff`?¶

This is the recommended strategy for most use cases. It gracefully backs off from a struggling service, giving it more time to recover after repeated failures.

Use case: Protecting a critical downstream service that may be slow to restart or recover from an outage.

A Note on Jitter¶

Always Add Jitter

For both Cooldown and Backoff, it is highly recommended to add jitter. Jitter adds a small amount of randomness to the wait time, which helps prevent a "thundering herd" scenario where multiple instances of your service all try to recover at the exact same time.

from fluxgate.retries import Cooldown, Backoff

# For a 60s cooldown, jitter adds a +/- 6s random variation (54s to 66s).
retry_cooldown = Cooldown(duration=60.0, jitter_ratio=0.1)

# The same applies to each step of the backoff.
retry_backoff = Backoff(initial=10.0, jitter_ratio=0.1)

Next Steps¶

Permits: Configure how many "probe" calls are allowed during the HALF_OPEN recovery state.
Trippers: Define the conditions that cause the circuit to trip in the first place.