Skip to content

[libcudacxx] Add support for wider types in fill_bytes#8333

Open
pciolkosz wants to merge 2 commits intoNVIDIA:mainfrom
pciolkosz:wider_types_in_fill_bytes
Open

[libcudacxx] Add support for wider types in fill_bytes#8333
pciolkosz wants to merge 2 commits intoNVIDIA:mainfrom
pciolkosz:wider_types_in_fill_bytes

Conversation

@pciolkosz
Copy link
Copy Markdown
Contributor

Summary

  • Generalize fill_bytes to accept uint16_t and uint32_t fill values in addition to uint8_t, matching the CUDA driver's cuMemsetD8Async/cuMemsetD16Async/cuMemsetD32Async capabilities.
  • Template __fill_bytes_impl on the fill value type, with a static_assert restricting to 1, 2, or 4 byte values.
  • For wider fill values, validate that the destination size in bytes is a multiple of the fill value size.

Test plan

  • Existing fill_bytes tests continue to pass (uint8_t path unchanged)
  • Header tests build across all compiler configurations
  • Add tests for uint16_t and uint32_t fill patterns

@pciolkosz pciolkosz requested a review from a team as a code owner April 8, 2026 22:36
@pciolkosz pciolkosz requested a review from wmaxey April 8, 2026 22:36
@github-project-automation github-project-automation bot moved this to Todo in CCCL Apr 8, 2026
@cccl-authenticator-app cccl-authenticator-app bot moved this from Todo to In Review in CCCL Apr 8, 2026
Test fill_bytes with uint16_t and uint32_t patterns on device and
pinned memory, plus an mdspan test with uint32_t.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 9, 2026

🥳 CI Workflow Results

🟩 Finished in 2h 25m: Pass: 100%/108 | Total: 1d 23h | Max: 2h 15m | Hits: 99%/281468

See results here.

Comment on lines +52 to +53
static_assert(sizeof(_ValueTy) == 1 || sizeof(_ValueTy) == 2 || sizeof(_ValueTy) == 4,
"Fill value must be 1, 2, or 4 bytes (matching CUDA driver memset support)");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is already done in __memsetAsync() (though the message there could be improved).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Review

Development

Successfully merging this pull request may close these issues.

2 participants