[Bug]: UI WebSocket disconnects/reconnects continuously over mildly lossy or packet-reordering links (e.g. WireGuard), despite a healthy server

### Before submitting

- [x] I searched existing issues and did not find a duplicate.
- [x] I included enough detail to reproduce or investigate the problem.

### Area

apps/web

### Steps to reproduce

## Summary
The browser↔server UI WebSocket (`/ws`) drops and reconnects every few seconds when the client is on a link with mild packet **reordering and/or loss** — in our case a WireGuard "road-warrior" tunnel (OPNsense `if_wg`). The underlying TCP connection stays **alive** the whole time, and other long-lived apps over the same link (IMAP, plain HTTPS) are unaffected. The WS layer declares "disconnected" on brief stalls that TCP recovers from on its own.

This also intermittently leaves a thread stuck on "awaiting input" with no rendered content — the orphaned-thread behavior in #313 — which appears to be a *downstream symptom* of this same reconnect churn.

## Environment
- t3code **v0.0.27** (server), behind Caddy (reverse proxy, HTTP/2; HTTP/3 also tested).
- Clients: **Android Chrome and desktop Chrome**, both reaching the server over a WireGuard tunnel terminating on an OPNsense firewall.

## Evidence it is *not* the server, proxy, or network config
1. **Server/proxy are stable.** An authenticated WS client run *from the server host* — both directly to the t3 process (loopback) and through Caddy — held **70 s with zero drops**, while the user's browser was dropping during that same window.
2. **It's specific to the lossy path, measured live.** `ss -ti` on the host, comparing all established connections at one instant:

   | Path | reord_seen | retransmits |
   |---|---|---|
   | **WireGuard client (the browser)** | **25** (peaked at 206) | **0/336** |
   | non-WG connections (incl. public internet) | **0** | 0/4 – 0/15 |
   | loopback | 0 | minimal |

   The WG connection also showed `cwnd` collapsed to ~7 and ~27 ms jitter — but the TCP socket stayed **established** and transferred 60+ MB. Jittery, not dead.
3. **Not MTU / offload.** MSS is correctly clamped (`mss:1360`, `pmtu:1500`); NIC hardware offload (TSO/LRO/CRC) is disabled. Oversized-packet black-holing is ruled out.

**Conclusion:** TCP survives the reordering/loss; it's the **WebSocket keepalive/heartbeat** that tears the connection down.

## Suspected root cause
The WS keepalive is too aggressive for imperfect links — a single brief stall (reordering or a retransmit) trips a disconnect instead of being ridden out. Apps without an aggressive heartbeat over the identical tunnel are fine.

## Requested change
1. Make the WS keepalive **tolerant of brief stalls/reordering** before tearing down — a longer ping timeout, several missed beats before declaring dead, and ideally a **configurable** timeout for users on high-latency/VPN/mobile links.
2. Ensure a reconnect **re-attaches cleanly without orphaning a pending thread** — this is the root cause behind #313.

## Related
- **#313** — reconnect orphaning a pending thread; the visible symptom of this churn.
- **#2579** — frequent disconnects on the **OpenCode provider** SSE connection. Different layer (provider, not the UI WebSocket), but the same theme: connection handling that doesn't tolerate imperfect links, no heartbeat, opaque "disconnected" messaging. A shared resilience philosophy would help both.
- Provider-side WS issues #765 / #2924 appear separate.

## Repro
Use t3code over any link with mild loss + packet reordering. On a Linux box you can emulate it on the client (or a gateway):
```
sudo tc qdisc add dev <iface> root netem delay 30ms 20ms reorder 5% loss 0.2%
```
Open a thread and watch the UI cycle "disconnected → reconnected" every few seconds while the page is otherwise reachable. Remove with `sudo tc qdisc del dev <iface> root`.


### Expected behavior

## Requested change
1. Make the WS keepalive **tolerant of brief stalls/reordering** before tearing down — a longer ping timeout, several missed beats before declaring dead, and ideally a **configurable** timeout for users on high-latency/VPN/mobile links.
2. Ensure a reconnect **re-attaches cleanly without orphaning a pending thread** — this is the root cause behind #313.

### Actual behavior

## Summary
The browser↔server UI WebSocket (`/ws`) drops and reconnects every few seconds when the client is on a link with mild packet **reordering and/or loss** — in our case a WireGuard "road-warrior" tunnel (OPNsense `if_wg`). The underlying TCP connection stays **alive** the whole time, and other long-lived apps over the same link (IMAP, plain HTTPS) are unaffected. The WS layer declares "disconnected" on brief stalls that TCP recovers from on its own.

This also intermittently leaves a thread stuck on "awaiting input" with no rendered content — the orphaned-thread behavior in #313 — which appears to be a *downstream symptom* of this same reconnect churn.

## Environment
- t3code **v0.0.27** (server), behind Caddy (reverse proxy, HTTP/2; HTTP/3 also tested).
- Clients: **Android Chrome and desktop Chrome**, both reaching the server over a WireGuard tunnel terminating on an OPNsense firewall.

## Evidence it is *not* the server, proxy, or network config
1. **Server/proxy are stable.** An authenticated WS client run *from the server host* — both directly to the t3 process (loopback) and through Caddy — held **70 s with zero drops**, while the user's browser was dropping during that same window.
2. **It's specific to the lossy path, measured live.** `ss -ti` on the host, comparing all established connections at one instant:

   | Path | reord_seen | retransmits |
   |---|---|---|
   | **WireGuard client (the browser)** | **25** (peaked at 206) | **0/336** |
   | non-WG connections (incl. public internet) | **0** | 0/4 – 0/15 |
   | loopback | 0 | minimal |

   The WG connection also showed `cwnd` collapsed to ~7 and ~27 ms jitter — but the TCP socket stayed **established** and transferred 60+ MB. Jittery, not dead.
3. **Not MTU / offload.** MSS is correctly clamped (`mss:1360`, `pmtu:1500`); NIC hardware offload (TSO/LRO/CRC) is disabled. Oversized-packet black-holing is ruled out.

**Conclusion:** TCP survives the reordering/loss; it's the **WebSocket keepalive/heartbeat** that tears the connection down.

## Suspected root cause
The WS keepalive is too aggressive for imperfect links — a single brief stall (reordering or a retransmit) trips a disconnect instead of being ridden out. Apps without an aggressive heartbeat over the identical tunnel are fine.

### Impact

Minor bug or occasional failure

### Version or commit

_No response_

### Environment

_No response_

### Logs or stack traces

```shell

```

### Screenshots, recordings, or supporting files

_No response_

### Workaround

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: UI WebSocket disconnects/reconnects continuously over mildly lossy or packet-reordering links (e.g. WireGuard), despite a healthy server #3054

Before submitting

Area

Steps to reproduce

Summary

Environment

Evidence it is not the server, proxy, or network config

Suspected root cause

Requested change

Related

Repro

Expected behavior

Requested change

Actual behavior

Summary

Environment

Evidence it is not the server, proxy, or network config

Suspected root cause

Impact

Version or commit

Environment

Logs or stack traces

Screenshots, recordings, or supporting files

Workaround

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Path	reord_seen	retransmits
WireGuard client (the browser)	25 (peaked at 206)	0/336
non-WG connections (incl. public internet)	0	0/4 – 0/15
loopback	0	minimal

[Bug]: UI WebSocket disconnects/reconnects continuously over mildly lossy or packet-reordering links (e.g. WireGuard), despite a healthy server #3054

Description

Before submitting

Area

Steps to reproduce

Summary

Environment

Evidence it is not the server, proxy, or network config

Suspected root cause

Requested change

Related

Repro

Expected behavior

Requested change

Actual behavior

Summary

Environment

Evidence it is not the server, proxy, or network config

Suspected root cause

Impact

Version or commit

Environment

Logs or stack traces

Screenshots, recordings, or supporting files

Workaround

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions