Skip to content

[Feat][Plugin] Enable DeepSeek-V3.2 for vLLM OOT Plugin#494

Draft
kliuae-amd wants to merge 30 commits intomainfrom
kliuae/plugin_deepseekv32
Draft

[Feat][Plugin] Enable DeepSeek-V3.2 for vLLM OOT Plugin#494
kliuae-amd wants to merge 30 commits intomainfrom
kliuae/plugin_deepseekv32

Conversation

@kliuae-amd
Copy link
Copy Markdown
Contributor

Motivation

This PR is a follow up to #399 in adding DeepSeek-V3.2 support to ATOM's vLLM plugin mode.
Currently this PR contain changes from #399 and should be much more simplified after it gets upstreamed.

Technical Details

Test Plan

Accuracy test with lm_eval

Model: deepseek-ai/DeepSeek-V3.2

Server command:

ATOM_DISABLE_VLLM_PLUGIN=0 \
ATOM_DISABLE_VLLM_PLUGIN_ATTENTION=0 \
vllm serve deepseek-ai/DeepSeek-V3.2 \
  -tp 8 \
  --gpu-memory-utilization 0.8 \
  --no-enable-prefix-caching \
  --disable-uvicorn-access-log \
  --trust-remote-code \
  --compilation-config '{"cudagraph_mode": "FULL_AND_PIECEWISE"}' \
  --kv-cache-dtype {auto, fp8} \
  --block-size 1

Test Result

lm_eval command

lm_eval --model local-completions  --model_args model=deepseek-ai/DeepSeek-V3.2,base_url=http://localhost:8000/v1/completions --batch_size 100  --tasks gsm8k --num_fewshot 20

vLLM plugin, bf16 kv cache

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 20 exact_match _ 0.9545 _ 0.0057
strict-match 20 exact_match _ 0.9545 _ 0.0057

vLLM plugin, fp8 kv cache

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 20 exact_match _ 0.9431 _ 0.0064
strict-match 20 exact_match _ 0.9393 _ 0.0066

Performance on MI355X, TP8

ISL/OSL Concurrency KV Cache vLLM Plugin Req/s ATOM Req/s vLLM Plugin over ATOM (Req/s) vLLM Plugin Total tok/s ATOM Total tok/s vLLM Plugin over ATOM (tok/s)
1k/1k 128 fp8 3.51 3.80 -7.63% 7194.55 7786.85 -7.61%
1k/1k 128 bf16 3.47 3.58 -3.07% 7099.09 7326.96 -3.11%
1k/1k 64 fp8 2.28 2.32 -1.72% 4677.17 4744.63 -1.42%
1k/1k 64 bf16 2.26 2.22 +1.80% 4621.56 4538.04 +1.84%
8k/1k 128 fp8 1.42 1.59 -10.69% 13077.77 14614.80 -10.52%
8k/1k 128 bf16 1.39 1.03 +34.95% 12831.04 9505.53 +34.99%
8k/1k 64 fp8 1.16 1.22 -4.92% 10683.58 11288.62 -5.36%
8k/1k 64 bf16 1.16 0.86 +34.88% 10731.82 7962.13 +34.79%

Submission Checklist

kliuae and others added 30 commits March 20, 2026 07:01
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Signed-off-by: kliuae <kuanfu.liu@embeddedllm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants