[Feat][Plugin] Enable DeepSeek-V3.2 for vLLM OOT Plugin by kliuae-amd · Pull Request #494 · ROCm/ATOM

kliuae-amd · 2026-04-06T19:36:15Z

Motivation

This PR is a follow up to #399 in adding DeepSeek-V3.2 support to ATOM's vLLM plugin mode.
Currently this PR contain changes from #399 and should be much more simplified after it gets upstreamed.

Technical Details

Test Plan

Accuracy test with lm_eval

Model: deepseek-ai/DeepSeek-V3.2

Server command:

ATOM_DISABLE_VLLM_PLUGIN=0 \
ATOM_DISABLE_VLLM_PLUGIN_ATTENTION=0 \
vllm serve deepseek-ai/DeepSeek-V3.2 \
  -tp 8 \
  --gpu-memory-utilization 0.8 \
  --no-enable-prefix-caching \
  --disable-uvicorn-access-log \
  --trust-remote-code \
  --compilation-config '{"cudagraph_mode": "FULL_AND_PIECEWISE"}' \
  --kv-cache-dtype {auto, fp8} \
  --block-size 1

Test Result

lm_eval command

lm_eval --model local-completions  --model_args model=deepseek-ai/DeepSeek-V3.2,base_url=http://localhost:8000/v1/completions --batch_size 100  --tasks gsm8k --num_fewshot 20

vLLM plugin, bf16 kv cache

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	20	exact_match	_	0.9545	_	0.0057
		strict-match	20	exact_match	_	0.9545	_	0.0057

vLLM plugin, fp8 kv cache

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	20	exact_match	_	0.9431	_	0.0064
		strict-match	20	exact_match	_	0.9393	_	0.0066

Performance on MI355X, TP8

ISL/OSL	Concurrency	KV Cache	vLLM Plugin Req/s	ATOM Req/s	vLLM Plugin over ATOM (Req/s)	vLLM Plugin Total tok/s	ATOM Total tok/s	vLLM Plugin over ATOM (tok/s)
1k/1k	128	fp8	3.51	3.80	-7.63%	7194.55	7786.85	-7.61%
1k/1k	128	bf16	3.47	3.58	-3.07%	7099.09	7326.96	-3.11%
1k/1k	64	fp8	2.28	2.32	-1.72%	4677.17	4744.63	-1.42%
1k/1k	64	bf16	2.26	2.22	+1.80%	4621.56	4538.04	+1.84%
8k/1k	128	fp8	1.42	1.59	-10.69%	13077.77	14614.80	-10.52%
8k/1k	128	bf16	1.39	1.03	+34.95%	12831.04	9505.53	+34.99%
8k/1k	64	fp8	1.16	1.22	-4.92%	10683.58	11288.62	-5.36%
8k/1k	64	bf16	1.16	0.86	+34.88%	10731.82	7962.13	+34.79%

Submission Checklist

Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>

kliuae and others added 30 commits March 20, 2026 07:01

add sparse mla support for vllm plugin

16799d3

Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>

remove redundant metadata

da94028

Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>

inject sparse indexer methods

7d3c44b

Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>

support bf16 kv cache only

8c01a47

Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>

sync main

935ca09

Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>

disable persistent mla for fp8 kvcache

4911f42

Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>

clean up

cce218a

Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>

clean up

0f02329

Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>

format

343b4ef

Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>

format

7747dd0

Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>

add sparse mla marker and tqdm

9cfaf45

Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>

Merge main

1ce5029

Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>

add glm5 recipe

4837ca5

Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>

merge main

0ebc9bb

Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>

merge main

a2e74a4

Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>

keep indexer not converted

eb7ca29

Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>

Merge branch 'main' into plugin_sparse_mla

995a402

Merge branch 'main' into plugin_sparse_mla

d7a3d93

forward compat

54540ec

Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>

support deepseek v3.2

ea03a71

Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>

merge main

c2ed1e4

Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>

merge plugin_sparse_mla

6fd6fa0

Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>

format

f6fc57e

Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>

clean up

4ded383

Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>

format

1d2fbe4

Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>

move sparse mla modules to plugin

9829c52

Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>

clean up

b532a6d

Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>

format

5aa7def

Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>

merge main

05fb61d

Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>

merge plugin_mla_sparse

916f361

Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat][Plugin] Enable DeepSeek-V3.2 for vLLM OOT Plugin#494

[Feat][Plugin] Enable DeepSeek-V3.2 for vLLM OOT Plugin#494
kliuae-amd wants to merge 30 commits intomainfrom
kliuae/plugin_deepseekv32

kliuae-amd commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

kliuae-amd commented Apr 6, 2026

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants