Skip to content

Fix Io\Poll memory-safety issues#22316

Open
iliaal wants to merge 1 commit into
php:masterfrom
iliaal:fix/io-poll-memory-safety
Open

Fix Io\Poll memory-safety issues#22316
iliaal wants to merge 1 commit into
php:masterfrom
iliaal:fix/io-poll-memory-safety

Conversation

@iliaal

@iliaal iliaal commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Fixes memory-safety bugs in the new Io\Poll API, found by review and confirmed under valgrind.

  • Use-after-free: a Watcher kept a raw pointer to its Context's poll context with no reference, so dropping the Context while still holding a Watcher made remove()/modify() touch freed memory. The Context now clears its watchers (active=false, poll_ctx=NULL) before destruction, so those calls throw InactiveWatcherException.
  • Descriptor leak: StreamPollHandle referenced the stream resource in its constructor but never released it. Released in the handle cleanup.
  • Missing get_gc on Watcher and Context, so cycles through Watcher::$data leaked. Added for both.
  • clone of a Context/Watcher/StreamPollHandle went through the default handler, which copied the backing poll context and watcher map by pointer and double-freed them. All three are now uncloneable.
  • Calling __construct() twice on a Context or StreamPollHandle replaced the backing state without releasing the first, leaking it. Now throws if already constructed.
  • add()/modify()/remove()/wait() accepted a NULL ctx and forwarded it to php_poll_set_error(), which dereferenced it. The userland layer already gates on an active context before reaching the C API, so these now assert a non-NULL ctx.

Tests cover the clone guard, the double-construct guard, the watcher-outlives-context UAF, the fd release, and the get_gc cycle.

The wait()-error recording and the kqueue buffer cap from the original PR were split into #22326 and #22327.

Comment thread main/poll/poll_backend_kqueue.c Outdated
/* New FD, create new event */
ZEND_ASSERT(unique_events < max_events);
if (unique_events >= max_events) {
continue;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems continue also skips the oneshot bookkeeping below it, so any oneshot fd dropped past the cap leaves the backend's tracking desynced. Gate only the buffer write on unique_events < max_events and still fall through to the tracking.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to dig into it

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the continue also skipped the oneshot tracking. Gated only the buffer write now, so the bookkeeping runs regardless of the cap.

Comment thread main/poll/poll_backend_poll.c Outdated
php_poll_set_current_errno_error(ctx);
return -1;
}
if (nfds == 0) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I guess you can just merge as if (...) else if (...) here wdyt?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@iliaal iliaal force-pushed the fix/io-poll-memory-safety branch from ef34a6e to 71ddf7d Compare June 15, 2026 12:41
@iliaal

iliaal commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

@ndossche fixes pushed: clone now throws (Context/Watcher/StreamPollHandle marked uncloneable), double __construct throws instead of leaking, and php_poll_set_error() is NULL-safe so the add/modify/remove/wait guards no longer deref a NULL ctx. get_gc and the watcher-after-context-destroy UAF were already in the PR with tests (poll_watcher_gc_cycle, poll_watcher_outlives_context).

Couldn't reproduce the getAvailableBackends() foreach UAF under valgrind on Linux (poll and epoll), even mutating the array mid-iteration. Which platform/backend and build (ASAN?) did you hit it on?

@ndossche

Copy link
Copy Markdown
Member

Couldn't reproduce the getAvailableBackends() foreach UAF under valgrind on Linux (poll and epoll), even mutating the array mid-iteration. Which platform/backend and build (ASAN?) did you hit it on?

See Slack, I already fixed that yesterday ;)

I'll have a review look at this PR this evening.

@ndossche ndossche left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most of this seems right and straight-forward

Comment thread main/poll/poll_core.c Outdated
/* Create new poll context */
PHPAPI php_poll_ctx *php_poll_create_by_name(const char *preferred_backend, uint32_t flags)
{
if (!preferred_backend) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this defensive NULL check necessary? We could just make it part of the API contract that preferred_backend must not be NULL, which seems reasonable to me.

@iliaal iliaal Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dropped.

Comment thread main/poll/php_poll_internal.h Outdated
static inline void php_poll_set_error(php_poll_ctx *ctx, php_poll_error error)
{
ctx->last_error = error;
if (ctx) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was added for the inverted NULL on the {add,modify,remove,...} APIs. But I wonder whether it should check for a NULL ctx in the first place. We could make it part of the API contract that they should not be called on a NULL ctx.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, it's all new code so no reason to keep the NULL tolerance. Made non-NULL ctx the contract: ZEND_ASSERT(ctx) in set_max_events_hint, add, modify, remove, wait, and dropped the set_error guard.

Comment thread ext/standard/io_poll.c Outdated
}

zend_get_gc_buffer_use(gc_buffer, table, n);
return zend_std_get_properties(obj);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could just be return NULL due to the class being final and with no properties.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, returns NULL now (final class, no declared properties).

Comment thread ext/standard/io_poll.c Outdated
}

zend_get_gc_buffer_use(gc_buffer, table, n);
return zend_std_get_properties(obj);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could just be return NULL due to the class being final and with no properties.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread main/poll/poll_backend_kqueue.c Outdated
events[unique_events].revents = revents;
events[unique_events].data = data;
unique_events++;
if (unique_events < max_events) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this check risk silently ignoring some new events?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes for oneshot, no for level-triggered. Dropped level-triggered fds stay ready and re-fire on the next wait(). Oneshot is the gap: kevent() has already disarmed them, so one past the cap is consumed but never delivered.

It happens because grouped mode dequeues up to max_events * 2 kevents (headroom to coalesce a read+write pair) while the caller buffer only holds max_events. Capping the kevent() count to max_events makes unique_events unable to exceed the buffer, so nothing is consumed-and-dropped; the tradeoff is a read+write fd may split across two wait() calls instead of coalescing. Want that? I can't valgrind kqueue here (no macOS), so it'd ride on macOS CI.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to think about this one - not sure if this is the right solution.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you separate to its own PR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, pulled the buffer cap into #22327 with the oneshot-bookkeeping change that goes with it. Out of scope for this one, and we can settle the right approach there.

@iliaal iliaal force-pushed the fix/io-poll-memory-safety branch from 71ddf7d to dc8d0ce Compare June 15, 2026 20:59
Comment thread main/poll/poll_backend_epoll.c Outdated
events[i].revents = epoll_events_from_native(backend_data->events[i].events);
events[i].data = backend_data->events[i].data.ptr;
}
} else if (nfds < 0) {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I remember that I was setting it in php_poll_wait but then dropped. I think it should be done as it makes sense. That said, this should apply to all backends I think.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed it should be uniform. Pulled the wait()-error recording into its own PR, #22326, covering epoll, poll and kqueue (eventport and wsapoll already record). Kept it per-backend rather than central in php_poll_wait because wsapoll sets its error from WSAGetLastError(), not errno, so a central php_poll_set_current_errno_error() would clobber the Windows error.

@bukka

bukka commented Jun 15, 2026

Copy link
Copy Markdown
Member

Can you please sepearate the kqueue thing to itw own PR and the nfds checks as well as the first one needs more checking and the second one needs some other work and would like to have it done in a different commit. The rest looks fine.

Several memory-safety issues in the new Io\Poll API, found by review and
confirmed under valgrind:

- Watcher kept a raw pointer to its Context's php_poll_ctx with no
  reference, so dropping the Context while holding a Watcher left
  remove()/modify() dereferencing freed memory (use-after-free). The
  Context now neutralizes its watchers (active=false, poll_ctx=NULL)
  before it is destroyed, so those calls throw InactiveWatcherException.
- StreamPollHandle took a reference on the stream resource in the
  constructor but never released it, leaking the descriptor for the
  rest of the request. Store the zend_resource and release it in the
  handle cleanup; the php_stream may already be freed by then (e.g.
  the user closed it), so the cleanup must not dereference it.
- Watcher and Context had no get_gc handler, so reference cycles through
  Watcher::$data were uncollectable. Add get_gc for both.
- Context, Watcher and StreamPollHandle were cloneable through the
  default handler, which shallow-copied the backing php_poll_ctx and the
  watcher map by pointer and double-freed them on destruction. Mark all
  three uncloneable.
- Calling __construct() a second time on a Context or StreamPollHandle
  replaced the backing context or handle data without releasing the
  first, leaking it. Throw if the object is already constructed.
- The add(), modify(), remove() and wait() entry points accepted a NULL
  ctx and forwarded it to php_poll_set_error(), which dereferenced it.
  The userland layer already gates on an active context before reaching
  the C API, so assert a non-NULL ctx in those entry points instead.

Closes phpGH-22316
@iliaal iliaal force-pushed the fix/io-poll-memory-safety branch from dc8d0ce to 64c39ca Compare June 15, 2026 22:03
@iliaal iliaal changed the title Fix Io\Poll memory-safety and error-handling issues Fix Io\Poll memory-safety issues Jun 15, 2026
@iliaal iliaal requested review from bukka and ndossche June 16, 2026 00:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants