Circuit breakers are entirely in-memory (`internal/resilience/circuitbreaker.go`). When the server restarts, every breaker resets to closed. If a domain's MX was broken (breaker open), the server immediately starts hammering it again with delivery attempts until it hits the failure threshold (5 failures) and re-opens.
This creates a burst of unnecessary connections to broken servers on every restart, and delays delivery to other domains while workers are tied up on known-bad destinations.
Current behavior
```
restart → all breakers closed → 5 failed attempts per broken domain → breaker opens again
```
With N broken domains queued, that's 5×N wasted delivery attempts before things stabilize.
Suggestion
Persist breaker state (domain, state, failure count, last failure time) to Redis alongside the queue data. On startup, restore breaker state for domains that were recently open. A simple hash per domain with a TTL matching the breaker timeout would work — no need for a full state machine in Redis.
Alternatively, a lighter approach: on startup, check `delivery_log` for domains with recent consecutive failures and pre-open their breakers.
Circuit breakers are entirely in-memory (`internal/resilience/circuitbreaker.go`). When the server restarts, every breaker resets to closed. If a domain's MX was broken (breaker open), the server immediately starts hammering it again with delivery attempts until it hits the failure threshold (5 failures) and re-opens.
This creates a burst of unnecessary connections to broken servers on every restart, and delays delivery to other domains while workers are tied up on known-bad destinations.
Current behavior
```
restart → all breakers closed → 5 failed attempts per broken domain → breaker opens again
```
With N broken domains queued, that's 5×N wasted delivery attempts before things stabilize.
Suggestion
Persist breaker state (domain, state, failure count, last failure time) to Redis alongside the queue data. On startup, restore breaker state for domains that were recently open. A simple hash per domain with a TTL matching the breaker timeout would work — no need for a full state machine in Redis.
Alternatively, a lighter approach: on startup, check `delivery_log` for domains with recent consecutive failures and pre-open their breakers.