Skip to content

Accuracy regression in DeepSeek-R1-MXFP4 (KV FP8) on MI35x after commit a56b520 in SGLang #2656

@bingxche

Description

@bingxche

Summary

Commit a56b5206aea55c5464c29bcd25eb4e5a6e4be273 (PR #2434) introduced a severe accuracy regression for the DeepSeek-R1-MXFP4 model with KV cache FP8 on MI35x (GSM8K few-shot completion benchmark).

Details

Test: sglang/test/registered/amd/accuracy/mi35x/test_deepseek_r1_mxfp4_kv_fp8_eval_mi35x.py (GSM8K 200-question, 5-shot, --kv-cache-dtype fp8_e4m3, TP=8)
Threshold: 0.93

Condition Accuracy Notes
Before the commit 0.94 – 0.96 Consistently above threshold
At the commit 0.03 Catastrophic drop
After reverting gfx950-GEMM-AFP4WFP4-N=7168-K=2304.json 0.905 – 0.93 Partial recovery (2/3 runs at 0.915, 1/3 at 0.93)

Reproduction

Pull rocm/sgl-dev:v0.5.10rc0-rocm700-mi35x-20260406, reinstall AITER in /sgl-workspace/aiter, execute

    export SGLANG_AMD_CI=1 
    export SGLANG_IS_IN_CI=1
    export SGLANG_IS_IN_CI_AMD=1
    export SGLANG_USE_AITER=1
    python3 -m pytest \
      test/registered/amd/accuracy/mi35x/test_deepseek_r1_mxfp4_kv_fp8_eval_mi35x.py \
      -v -s

Root Cause Analysis

Thanks to @1am9trash for pointing out this clue, and @yctseng0211 for bisecting the aiter commits
Reverting the Triton GEMM config for shape N=7168, K=2304 recovers most of the accuracy:

git checkout a56b520^ -- "aiter/ops/triton/configs/gemm/gfx950-GEMM-AFP4WFP4-N=7168-K=2304.json"

However, accuracy still falls short of the expected 0.94–0.96 range even after this revert, suggesting that other config changes in this commit may also be contributing to the regression.

Expected Behavior

Accuracy on GSM8K should remain in the 0.94–0.96 range, consistently above the 0.93 threshold.

Environment

  • GPU: AMD MI35x (8x)
  • Model: amd/DeepSeek-R1-MXFP4-Preview
  • Attention backend: aiter
  • KV cache dtype: fp8_e4m3

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions