Fix SSH tunnel not being closed after remote server reboot (#406)#407
Open
debba wants to merge 2 commits into
Open
Fix SSH tunnel not being closed after remote server reboot (#406)#407debba wants to merge 2 commits into
debba wants to merge 2 commits into
Conversation
…oot (#406) SSH tunnels were cached in the global TUNNELS map but never stopped or removed, so a tunnel outlived the connection that owned it. When the remote host rebooted the tunnel died while still holding its local forward port; the next reconnect reused the stale map entry and failed with "port already in use". Only restarting the app cleared the map. - ssh_tunnel: reap the killed system-ssh child in stop(); add is_alive() and remove_tunnel() helpers. - commands: skip and discard dead tunnels on reuse; tear down the tunnel in disconnect_connection. - health_check: tear down the tunnel when a connection exceeds the failure threshold (the path triggered by a server reboot).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #406
What was happening
When a MySQL connection went through an SSH tunnel and the remote server rebooted, Tabularis correctly detected the drop and cleared the connection indicator. But trying to reconnect failed with a "port already in use" error, and the only way out was to restart the app.
The reason is that SSH tunnels are cached in a global map (keyed by
user@host:port:remote->port) and reused across connects, but nothing ever tore them down. A tunnel could outlive the connection it belonged to. Once the remote host rebooted, the tunnel died while its local forward port was still held by the (now defunct) ssh process, so the next reconnect happily picked up the stale map entry, pointed at that dead port, and blew up. Restarting the app "fixed" it only because that wiped the map.What I changed
ssh_tunnel.rsstop()now also reaps the killed system-ssh child (wait()), so it doesn't linger as a zombie still holding the forwarded port.is_alive()to tell whether a cached tunnel is still usable (detects a system-ssh child that has already exited).remove_tunnel(key)to stop a tunnel and drop it from the map (no-op if it isn't there).commands.rsis_alive(). A dead one is removed and a fresh tunnel is created instead of reusing the broken port.disconnect_connectiontears the tunnel down so it no longer survives the connection.health_check.rsTests
Added unit tests in
ssh_tunnel.rsforis_alive()on both backends (russh flag + system-ssh exited child), forstop()reaping the child, and forremove_tunnel(both the missing-key no-op and real removal). Fullssh_tunnelsuite passes and the backend builds clean.Notes
Tunnels are shared by key, so two connections to the same DB over the same SSH host share one tunnel — tearing it down on disconnect affects both, which is the intended behaviour here since a reboot takes them all down anyway.