Skip to content

[plugin][OOT Qwen3.5][GDN] add GDN packed decode fast path#508

Draft
zejunchen-zejun wants to merge 2 commits intomainfrom
zejun/vLLM_19_qwen3.5_fast_path_0407
Draft

[plugin][OOT Qwen3.5][GDN] add GDN packed decode fast path#508
zejunchen-zejun wants to merge 2 commits intomainfrom
zejun/vLLM_19_qwen3.5_fast_path_0407

Conversation

@zejunchen-zejun
Copy link
Copy Markdown
Contributor

@zejunchen-zejun zejunchen-zejun commented Apr 7, 2026

have a packed recurrent attention triton kernel for fast path

fast path, enable the pure non-spec decode GDN fast path into
the OOT plugin backend for Qwen3.5/Qwen3Next

Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>
Copilot AI review requested due to automatic review settings April 7, 2026 14:07
@zejunchen-zejun zejunchen-zejun marked this pull request as draft April 7, 2026 14:07
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a packed recurrent decode “fast path” for the GDN attention backend in the vLLM OOT plugin, and enables it for the pure non-speculative decode case (targeting Qwen3.5/Qwen3Next on vLLM 0.19.0).

Changes:

  • Introduces a non-spec decode helper (_forward_core_decode_non_spec) that uses fused_recurrent_gated_delta_rule_packed_decode.
  • Gates the packed-decode path behind vllm_envs.VLLM_ENABLE_FLA_PACKED_RECURRENT_DECODE and runtime metadata conditions (no spec masks, decode-only).
  • Updates KV cache access to use compilation_config[layer_name].kv_cache directly (removing virtual_engine indexing).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@zejunchen-zejun zejunchen-zejun changed the title [plugin][OOT Qwen3.5][GDN][vLLM 0.19.0] add GDN packed decode fast path, enable the pure non-spec decode GDN fast path into the OOT plugin backend for Qwen3.5/Qwen3Next [plugin][OOT Qwen3.5][GDN] add GDN packed decode fast path Apr 7, 2026
Signed-off-by: zejunchen-zejun <zejun.chen@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants