Fix #189: process worker fork-safety on macOS Tahoe#207
Open
brentrager wants to merge 3 commits into
Open
Conversation
Owner
|
Hey! Thanks so much for sharing the fix. It seems like a lot of different thing in here are bundled together - of completely different nature. As you suggested yourself, I think it would be a good idea to split this into two PRs. |
65b1833 to
89b3469
Compare
Author
|
Done — split into two PRs as you suggested:
I force-pushed |
…r stderr Two changes for the macOS Tahoe segfault reported in issue Maxteabag#189: 1. process_worker_client.py: drop the spawn -> fork fallback when running on darwin. macOS Tahoe (26+) widened the set of system libraries that abort if initialized after fork() — CoreFoundation, Security.framework, libsystem_trace, libsystem_info — so any library the parent has loaded (psycopg, getaddrinfo, setproctitle, NSCharacterSet) can crash the forked child. Letting spawn fail and surfacing it to the caller lets the in-process fallback in query_execution take over instead. Cost: query cancellation isn't available when spawn fails. Gain: no hard crashes ("Worker connection closed.") on macOS. 2. process_worker.py: redirect the worker's stdout/stderr to a log file and enable faulthandler at entry. The worker can SIGPIPE on libpq notice writes when its inherited FDs point to a Textual-managed pipe, and any future C-level fault on the worker side is otherwise invisible to the parent (the pipe just dies). The log path honors $SQLIT_WORKER_LOG and defaults to <tmpdir>/sqlit-worker.log. Refs the celery (Tahoe + setproctitle, celery#9894) and gunicorn (Tahoe + NSCharacterSet, gunicorn#3526) reports for the broader fork- safety regression on Tahoe.
The first two commits made the macOS Tahoe segfault non-fatal by disabling the fork fallback, but every spawn after Textual booted still failed with `bad value(s) in fds_to_keep` — leaving the user with a "Process worker unavailable; falling back" popup and the slower in-process executor on every query. Root cause: `multiprocessing.spawn` snapshots the parent's open file descriptors at spawn time. Once Textual's `App.__init__` has registered its signal-handler pipes and event-loop FDs, several of those are not inheritable, and spawn aborts before the child even runs. The previous lazy creation in the idle scheduler always hit this race. Fix: spawn the worker in `cli.py` *before* `SSMSTUI(...)` is constructed. That's the latest moment at which the parent's FD set is still clean. The new `_prewarm_process_worker(runtime)` helper returns a live `ProcessWorkerClient`, or `None` if the worker is disabled / mocked / spawn raised — in which case we fall through to the lazy path inside the UI (with the in-process fallback on darwin). The pre-spawned client is passed into `SSMSTUI` via a new `process_worker_client` kwarg and stashed as `self._process_worker_client`. `ProcessWorkerLifecycleMixin` already returns the cached client first, so no other changes are needed. Verified on macOS Tahoe 26.4.1 / Python 3.14 / PostgreSQL over TLS: the popup no longer appears and queries run in the worker process.
Covers the load-bearing behaviors introduced by the prior commits: - _maybe_fallback_start re-raises on darwin (commit 1) - _maybe_fallback_start still re-raises non-fds_to_keep errors verbatim - _open_worker_log honors $SQLIT_WORKER_LOG and survives an unwritable path - _prewarm_process_worker respects runtime.process_worker, runtime.mock, and swallows spawn exceptions so startup never crashes (commit 3) - SSMSTUI(process_worker_client=client) stashes the client on self._process_worker_client, which is what ProcessWorkerLifecycleMixin consults first
89b3469 to
6016067
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Thanks again for sqlit — it's the cleanest of the SQL TUIs I've tried.
Fixes #189. macOS Tahoe (26.4+) widened the set of system libraries that abort when initialized after
fork(), which has been quietly breaking quite a few Python projects (celery, gunicorn, psycopg2 all have open issues from the last few months — links below). For sqlit, the symptom is that the process worker either hangs withWorker connection closed.or segfaults the child when expanding a folder or running a query.This PR has been split per your request (#207 review) — the psycopg v3 driver swap and its supporting adapter/test changes now live in #222.
Commits
Drop the spawn → fork fallback on darwin; redirect worker stderr. When
multiprocessing.spawnfails withbad value(s) in fds_to_keep, the current code falls back tofork(). On Tahoe that's no longer safe — CoreFoundation, Security.framework, libsystem_trace'sos_log, and libsystem_info'sgetaddrinfoall now abort if they were loaded by the parent before fork. Letting spawn fail and surfacing it to the caller lets the in-process fallback take over instead. The worker also now redirects its stdout/stderr to$SQLIT_WORKER_LOG(defaults to<tmpdir>/sqlit-worker.log) and enablesfaulthandler, so future C-level faults aren't invisible behind a closed pipe — that's what let me find the segfault site in the first place.Pre-spawn the worker in
cli.pybeforeApp.__init__. This is the long-term fix from the issue addendum.multiprocessing.spawnsnapshots the parent's open FDs at spawn time, and Textual'sApp.__init__registers signal-handler pipes that aren't inheritable — that's thebad value(s) in fds_to_keepValueError. The previous lazy creation inidle_scheduler.py:_warmwas racing this. Spawning beforeSSMSTUI(...)is constructed is the latest moment at which the FD set is still clean. The pre-spawned client is plumbed in via a newprocess_worker_clientkwarg onSSMSTUI;ProcessWorkerLifecycleMixinalready returns the cached client first, so no other changes are needed. If pre-spawn fails for any reason, the lazy path still runs (and on darwin, that path now correctly falls back to in-process execution per commit 1).Unit tests for the new code paths. New
tests/unit/test_process_worker_fallback.pyandtests/unit/test_cli_prewarm.pycover the darwin re-raise, the worker-log env override, the prewarm gating onruntime.process_worker/runtime.mock, prewarm exception-swallowing, and the SSMSTUI kwarg plumbing.Result on macOS Tahoe 26.4.1 / Python 3.14
Worker connection closed.Tests
uv run pytest tests/unit— 837 passed, 1 skipped, including the 9 new tests in this PR.Related upstream regressions on Tahoe (for context — this isn't sqlit-specific)
'fork'is broken: change to `'forkserver' || 'spawn'python/cpython#84559 — CPython 3.14 makesforkserverdefault on POSIX (partial mitigation)Tradeoffs to flag
process_worker=True(the default) — there's no longer an idle-only path on the first launch. Cost: one extra Python subprocess at TUI startup. Benefit: no race, no popup, worker is hot.Happy to iterate on either commit if you'd prefer a different approach.