Skip to content

[BugFix] enable deepseek r1 fp4#527

Merged
valarLip merged 1 commit intomainfrom
lingzha/enable-dpsk-fp4
Apr 15, 2026
Merged

[BugFix] enable deepseek r1 fp4#527
valarLip merged 1 commit intomainfrom
lingzha/enable-dpsk-fp4

Conversation

@ZLkanyo009
Copy link
Copy Markdown
Contributor

@ZLkanyo009 ZLkanyo009 commented Apr 9, 2026

Motivation

For FP4 DeepSeek, the attention part is fully BF16 while the MoE part is FP4. Therefore, the scale for the attention part is None. For regular DeepSeek FP8, the attention part is also FP8 and requires quantization. However, the case where scale is None was previously ignored. This PR fixes that bug.

command

export AITER_QUICK_REDUCE_QUANTIZATION=INT4
export SGLANG_AITER_FP8_PREFILL_ATTN=0
export SGLANG_USE_AITER=1
export ATOM_ENABLE_DS_QKNORM_QUANT_FUSION=1
 
model_path=/workspace/model/DeepSeek-R1-0528-MXFP4/
export PYTHONPATH=/workspace/dpsk-r1-fp4/sglang/python:/workspace/dpsk-r1-fp4/ATOM_oot/ATOM
 
 
export SGLANG_PROFILE_RECORD_SHAPES=1
export SGLANG_PROFILE_WITH_STACK=1
export SGLANG_TORCH_PROFILER_DIR=/workspace/dpsk-r1-fp4/sglang/profile_log
 
# export SGLANG_EXTERNAL_MODEL_PACKAGE=atom.plugin.sglang.model_wrapper
export SGLANG_EXTERNAL_MODEL_PACKAGE=atom.plugin.sglang.models

export ATOM_PROFILE_MLA_ABSORBED_BMM=1

TORCHINDUCTOR_COMPILE_THREADS=128 python3 -m sglang.launch_server \
    --model-path $model_path \
    --host localhost \
    --port 8000 \
    --trust-remote-code \
    --tensor-parallel-size 8 \
    --kv-cache-dtype fp8_e4m3 \
    --mem-fraction-static 0.9 \
    --page-size 1 \
    --disable-radix-cache \
    --skip-server-warmup > log.serve.atom.oot.fp4.log 2>&1

@ZLkanyo009 ZLkanyo009 force-pushed the lingzha/enable-dpsk-fp4 branch 2 times, most recently from b3dd131 to 06eedc6 Compare April 13, 2026 03:06
@ZLkanyo009 ZLkanyo009 requested a review from zhuyuhua-v April 13, 2026 03:06
zhuyuhua-v
zhuyuhua-v previously approved these changes Apr 13, 2026
Copy link
Copy Markdown
Collaborator

@zhuyuhua-v zhuyuhua-v left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

wuhuikx
wuhuikx previously approved these changes Apr 15, 2026
@ZLkanyo009 ZLkanyo009 dismissed stale reviews from wuhuikx and zhuyuhua-v via 43c2854 April 15, 2026 08:00
@ZLkanyo009 ZLkanyo009 force-pushed the lingzha/enable-dpsk-fp4 branch 2 times, most recently from 43c2854 to 7023d3d Compare April 15, 2026 08:02
@wuhuikx
Copy link
Copy Markdown
Collaborator

wuhuikx commented Apr 15, 2026

please help review @zhuyuhua-v @ZhiweiYan-96 @XiaobingSuper

@ZLkanyo009 ZLkanyo009 force-pushed the lingzha/enable-dpsk-fp4 branch from 7023d3d to 6a5d2fb Compare April 15, 2026 08:46
if kv_b_proj_w_scale is not None and attn.w_scale is None:
attn.w_scale = bind_or_assign(attn.w_scale, kv_b_proj_w_scale)
if _is_hip:
attn.w_scale *= 2.0
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this attn.w_scale *= 2.0, this should be a sglang history code, fp8 ds weight shouldn't use this *2 logic directly under _is_hip flag

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@wuhuikx wuhuikx requested review from valarLip April 15, 2026 10:55
@valarLip valarLip merged commit 876706c into main Apr 15, 2026
16 of 29 checks passed
@valarLip valarLip deleted the lingzha/enable-dpsk-fp4 branch April 15, 2026 12:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants