Skip to content

Enable mlir attention for RDNA gfx10/11#4772

Draft
klin2024 wants to merge 2 commits intodevelopfrom
enable_RDNA_attention
Draft

Enable mlir attention for RDNA gfx10/11#4772
klin2024 wants to merge 2 commits intodevelopfrom
enable_RDNA_attention

Conversation

@klin2024
Copy link
Copy Markdown
Contributor

Motivation

On RDNA, MLIR attention is not enabled by default, causing attention to be computed as separate unfused ops: GEMM → scale → softmax → GEMM.

Technical Details

Enabling MLIR attention fuses the above ops into a single kernel for RDNA gfx10/11, which can significantly improve performance.

Changelog Category

Add a CHANGELOG.md entry for any option other than Not Applicable

    • Added: New functionality.
    • Changed: Changes to existing functionality.
    • Removed: Functionality or support that has been removed. (Compared to a previous release)
    • Optimized: Component performance that has been optimized or improved.
    • Resolved Issues: Known issues from a previous version that have been resolved.
    • Not Applicable: This PR is not to be included in the changelog.

On RDNA, attention is not enabled by default. It will use GEMM +
scale + softmax + gemm OPs without any fusion. Enable attention let
above OP fused, and can signifcantly improve performance.
@klin2024 klin2024 changed the title enable mlir attention for RDNA gfx10/11 Enable mlir attention for RDNA gfx10/11 Apr 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants