Skip to content

use reserved index in debug messenger and report create rollback#1938

Merged
charles-lunarg merged 1 commit into
KhronosGroup:mainfrom
aizu-m:debug-create-rollback-index
Jun 19, 2026
Merged

use reserved index in debug messenger and report create rollback#1938
charles-lunarg merged 1 commit into
KhronosGroup:mainfrom
aizu-m:debug-create-rollback-index

Conversation

@aizu-m

@aizu-m aizu-m commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

terminator_CreateDebugUtilsMessengerEXT, the rollback at out::

179  uint32_t *pNextIndex = loader_instance_heap_alloc(...);   // malloc, uninitialised
232  *pNextIndex = next_index;                                 // only on success
250  ...messengers_list.list[*pNextIndex].status = VK_FALSE;   // read on the error path

*pNextIndex is set once, at the tail of the success path, but the error
rollback reads it. A failure before that point (a driver returning an
error from CreateDebugUtilsMessengerEXT, or the new_dbg_func_node
allocation failing) leaves it holding whatever malloc returned. If the
pNextIndex allocation itself failed while the instance already owns a
messenger, pNextIndex is NULL and the short-circuit dereferences it. On
32-bit the index * sizeof in the capacity check can also wrap the write
out of the list.

The destroy loop just above already uses the reserved local next_index.
Release the slot the same way, and only when one was reserved.
terminator_CreateDebugReportCallbackEXT had the same rollback.

*pNextIndex is only assigned on the success path, so the error rollback
read an indeterminate value (and dereferenced NULL when the pNextIndex
allocation itself failed). Release the reserved slot with next_index, the
same local the ICD destroy loop above already uses, and only when a slot
was actually reserved.
@ci-tester-lunarg

Copy link
Copy Markdown

Author aizu-m not on autobuild list. Waiting for curator authorization before starting CI build.

1 similar comment
@ci-tester-lunarg

Copy link
Copy Markdown

Author aizu-m not on autobuild list. Waiting for curator authorization before starting CI build.

@ci-tester-lunarg

Copy link
Copy Markdown

CI Vulkan-Loader build queued with queue ID 10146.

@ci-tester-lunarg

Copy link
Copy Markdown

CI Vulkan-Loader build # 3569 running.

@ci-tester-lunarg

Copy link
Copy Markdown

CI Vulkan-Loader build # 3569 passed.

@charles-lunarg charles-lunarg left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another good catch.

I have to say, after a week of small errors that were found by inspection rather than crash, it is quite apparent how easy it is for invalid code to continue. OSS-Fuzz found quite a few issues when it was added, but it mainly focused on invalid JSON input since that is the only thing that was fuzzed.

@charles-lunarg charles-lunarg merged commit b14a119 into KhronosGroup:main Jun 19, 2026
51 checks passed
@aizu-m

aizu-m commented Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

Yeah, these all turned up reading the error and rollback paths rather than from a crash. The success path gets exercised constantly so it stays solid. It's the cleanup-on-failure branches that quietly drift, since nothing routinely runs them. A fuzzer that could force allocation failures would probably surface a batch of these in one go.

@charles-lunarg

Copy link
Copy Markdown
Collaborator

So there are tests which exercise the OOM paths. https://github.com/KhronosGroup/Vulkan-Loader/blob/main/tests/loader_alloc_callback_tests.cpp

The problem is that they aren't exhaustive, and the failures purely derive from heap allocation failures which is malloc and realloc. Alloca isn't tested nor is OOM error originating in drivers (because the test drivers don't use the allocation callbacks, they are mocks.) So some work improving alloc_callback_tests would be welcome.

@aizu-m

aizu-m commented Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

Makes sense. The fail-after-N count already reaches the pNextIndex alloc, so that trigger's within range of the current harness; the one it can't hit is a driver erroring out of its own create, since the mock drivers never go through the callbacks. I'll have a look at making the test drivers honour the allocation callbacks and adding cases that walk the count across the messenger and report create paths, so the whole family gets covered rather than the single instance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants