fix(logical-backup): wait for PG connectivity before running backup#3069
Open
aslafy-z wants to merge 2 commits intozalando:masterfrom
Open
fix(logical-backup): wait for PG connectivity before running backup#3069aslafy-z wants to merge 2 commits intozalando:masterfrom
aslafy-z wants to merge 2 commits intozalando:masterfrom
Conversation
The backup script connects to the target PostgreSQL pod immediately after resolving its IP via the Kubernetes API. When NetworkPolicy is enforced via iptables, a newly-created pod's IP may not yet be present in the destination node's ingress allow lists, causing cross-node connections to be rejected until the next policy sync. This adds a pg_isready retry loop before the dump starts, with configurable retries and delay via LOGICAL_BACKUP_CONNECT_RETRIES (default: 10) and LOGICAL_BACKUP_CONNECT_RETRY_DELAY (default: 2s). Signed-off-by: Zadkiel AHARONIAN <zaharonian@ccl-consulting.fr>
|
Cannot start a pipeline due to: Click on pipeline status check Details link below for more information. |
Document the new environment variables that control the pg_isready retry loop added in the previous commit. These are passed via the existing logical_backup_cronjob_environment_secret mechanism. Signed-off-by: Zadkiel AHARONIAN <zaharonian@ccl-consulting.fr>
|
Cannot start a pipeline due to: Click on pipeline status check Details link below for more information. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem description
The logical-backup script (
dump.sh) connects to the target PostgreSQL pod immediately after resolving its IP via the Kubernetes API. When NetworkPolicy is enforced via iptables, a newly-created pod's IP may not yet be present in the destination node's ingress allow lists, causing cross-node connections to be rejected until the next policy sync.This manifests as intermittent
Connection refusederrors even though the target PostgreSQL pod is healthy and listening:The rejection happens at the network layer on the destination node, not at PostgreSQL. The race window is typically under 1 second but is enough to cause consistent failures because
dump.shconnects with zero delay after pod startup.This PR adds a
pg_isreadyretry loop before the dump starts. It returns as soon as the connection succeeds, adding near-zero overhead when connectivity is immediate. Retry count and delay are configurable viaLOGICAL_BACKUP_CONNECT_RETRIES(default: 10) andLOGICAL_BACKUP_CONNECT_RETRY_DELAY(default: 2s), compatible withlogical_backup_cronjob_environment_secret.Linked issues
iptables-restorewithout--noflushcauses full chain rebuilds on every pod event, creating a race window where new pod IPs are not yet in the destination node's ingress ipsets. Milestoned for kube-router v2.2.0 but any iptables-based CNI that does full restores on pod events can trigger the same issue.Checklist
acid.zalan.doapi package.logical_backup_cronjob_environment_secret