[plugin][OOT Qwen3.5][GDN] add GDN packed decode fast path by zejunchen-zejun · Pull Request #508 · ROCm/ATOM

zejunchen-zejun · 2026-04-07T14:07:27Z

have a packed recurrent attention triton kernel for fast path

fast path, enable the pure non-spec decode GDN fast path into the OOT plugin backend for Qwen3.5/Qwen3Next Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>

Copilot

Pull request overview

This PR adds a packed recurrent decode “fast path” for the GDN attention backend in the vLLM OOT plugin, and enables it for the pure non-speculative decode case (targeting Qwen3.5/Qwen3Next on vLLM 0.19.0).

Changes:

Introduces a non-spec decode helper (_forward_core_decode_non_spec) that uses fused_recurrent_gated_delta_rule_packed_decode.
Gates the packed-decode path behind vllm_envs.VLLM_ENABLE_FLA_PACKED_RECURRENT_DECODE and runtime metadata conditions (no spec masks, decode-only).
Updates KV cache access to use compilation_config[layer_name].kv_cache directly (removing virtual_engine indexing).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

atom/plugin/vllm/attention_backend/attention_gdn.py

Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>

[plugin][OOT Qwen3.5][GDN][vLLM 0.19.0] add GDN packed decode

8072363

fast path, enable the pure non-spec decode GDN fast path into the OOT plugin backend for Qwen3.5/Qwen3Next Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>

Copilot AI review requested due to automatic review settings April 7, 2026 14:07

zejunchen-zejun marked this pull request as draft April 7, 2026 14:07

Copilot started reviewing on behalf of zejunchen-zejun April 7, 2026 14:09 View session

Copilot AI reviewed Apr 7, 2026

View reviewed changes

atom/plugin/vllm/attention_backend/attention_gdn.py Show resolved Hide resolved

zejunchen-zejun changed the title ~~[plugin][OOT Qwen3.5][GDN][vLLM 0.19.0] add GDN packed decode fast path, enable the pure non-spec decode GDN fast path into the OOT plugin backend for Qwen3.5/Qwen3Next~~ [plugin][OOT Qwen3.5][GDN] add GDN packed decode fast path Apr 7, 2026

remove cat op and use continguous buffer for mixed_qkv

799dc44

Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[plugin][OOT Qwen3.5][GDN] add GDN packed decode fast path#508

[plugin][OOT Qwen3.5][GDN] add GDN packed decode fast path#508
zejunchen-zejun wants to merge 2 commits intomainfrom
zejun/vLLM_19_qwen3.5_fast_path_0407

zejunchen-zejun commented Apr 7, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zejunchen-zejun commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zejunchen-zejun commented Apr 7, 2026 •

edited

Loading