Skip to content

fbcode/torchft/torchft#313

Open
facebook-github-bot wants to merge 1 commit intomainfrom
export-D91299742
Open

fbcode/torchft/torchft#313
facebook-github-bot wants to merge 1 commit intomainfrom
export-D91299742

Conversation

@facebook-github-bot
Copy link
Copy Markdown
Contributor

Reviewed By: JuanBesa

Differential Revision: D91299742

Reviewed By: JuanBesa

Differential Revision: D91299742
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 3, 2026
Copy link
Copy Markdown

@JiwaniZakir JiwaniZakir left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The renaming of i_b_i_b and o_b_o_b across the Triton kernels (_fused_kernel_quantize_into_fp8, _fused_kernel_dequantize_from_fp8, and _fused_kernel_reduce_fp8) correctly signals that the loop counter itself is never referenced inside the loop body — the actual iteration is driven by col_offsets being updated implicitly. However, it's worth double-checking that col_offsets is actually advancing each iteration; scanning the loop bodies, col_offsets appears to be re-initialized to tl.arange(0, BLOCK_SIZE) before each loop but not incremented within the loop, which means every iteration loads and processes the same block. If this is intentional (e.g., col_offsets advances via a side-effect not visible in this diff), that should be made explicit with a comment. If it's a pre-existing bug, renaming to _i_b makes the loop counter look intentionally unused while the real issue (missing col_offsets += BLOCK_SIZE) goes unaddressed. This warrants a closer look before merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot. fb-exported meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants