test_kqueue_pipe_peer_close_uaf hangs at 120s watchdog on Linux backend (master)

## Repro

`test_kqueue_pipe_peer_close_uaf` hangs at the 120s watchdog on every Linux CI variant (release-gcc, release-clang, release-musl-gcc, debug-asan, debug-tsan). Reproduces on master (`cfe5dd0`) without any of the Windows branch changes.

The test (test/kqueue.c:635) is straightforward:

```
4 worker threads × ~500ms each:
  pipe(p);
  kq = kqueue();
  EV_ADD on p[1] for EVFILT_WRITE;
  close(p[0]); close(p[1]); close(kq);
  // repeat
```

After joining the workers it calls `libkqueue_drain_pending_close()` to wait for the monitoring thread to retire every closed kq before the test returns.

CI sample run: https://github.com/arr2036/libkqueue/actions/runs/25345514171 (release-gcc job; same shape on the asan / tsan jobs).

## What I think is happening

Two things stack here. The first is now fixed in `arr2036/libkqueue@windows-iter-debug` and explains roughly half the strands; the second is still open.

**(a) Monitoring-thread tid race — fixed in the branch.** A previous refactor (b388b1c, "linux/platform: cleanup acquires kq_mtx, drop CANCEL_LOCKED machinery") symmetrised the cleanup handler's lock/unlock for tsan-lockset cleanliness, which moved the kq_mtx unlock point ahead of `monitoring_thread_cleanup` clearing `monitoring_tid`. That opened a window where a racing `kqueue()` could read a still-set `monitoring_tid` and `F_SETOWN_EX` its close-detect signal to a TID about to die. The kernel discards thread-directed RT signals when the target exits, so the close-detect signal for that kq was lost and the kq stranded in `kq_list`; `libkqueue_drain_pending_close` then spun out its 1M-iteration cap.

A follow-up (67aea58, "linux/platform: drop pthread_detach") papered over a related symptom but didn't address the underlying unlock-too-early move. V1 (pre-b388b1c) held kq_mtx continuously through `pthread_detach`, making `linux_libkqueue_free`'s tid-read atomic with respect to the detach. The branch restores V1 and re-introduces `thread_exit_state` so the cleanup handler still knows whether to acquire kq_mtx itself or inherit it from a cancelled context. The tools/tsan.supp suppression (`race:monitoring_thread_cleanup`, added in 6465a87 + reaffirmed in 3f831b3) covers the lockset asymmetry that motivated the original refactor.

**(b) Residual hang — still open.** Even with V1 restored, every Linux job in the CI run above still hits the 120s watchdog at this same test. The watchdog's stack-dump tool can't attach under Ubuntu CI's default `kernel.yama.ptrace_scope = 1` (`eu-stack: dwfl_thread_getframes tid N: Operation not permitted`), so I don't have a backtrace yet. It's unclear whether the hang is in `libkqueue_drain_pending_close` (so still some kq stranded, different cause from (a)) or somewhere else under the worker join.

## Asks

- Reproducer locally with `ptrace_scope = 0` so gdb can attach and see which threads are blocked where. The hang is reliable on every Ubuntu 24.04 runner; should reproduce on a developer workstation under the same kernel/glibc.
- Once the blocked thread is known: is the residual strand a different code path the V1 restoration didn't catch, or is it user-side (e.g. the close(kq) somehow racing the monitoring thread's EVFILT_PROC handling for a child process the test doesn't actually spawn)?

## Branch with the V1 restoration

`arr2036/libkqueue@windows-iter-debug` commit `4bad3d6`: https://github.com/arr2036/libkqueue/commit/4bad3d6

(Lives on a Windows-port branch but the linux/platform.c change is self-contained.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test_kqueue_pipe_peer_close_uaf hangs at 120s watchdog on Linux backend (master) #170

Repro

What I think is happening

Asks

Branch with the V1 restoration

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

test_kqueue_pipe_peer_close_uaf hangs at 120s watchdog on Linux backend (master) #170

Description

Repro

What I think is happening

Asks

Branch with the V1 restoration

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions