Skip to content

Fix layer norm backward release3.4 v2#78792

Merged
sneaxiy merged 4 commits intoPaddlePaddle:release/3.4from
zhengshengning:fix_layer_norm_backward_34_v2
Apr 24, 2026
Merged

Fix layer norm backward release3.4 v2#78792
sneaxiy merged 4 commits intoPaddlePaddle:release/3.4from
zhengshengning:fix_layer_norm_backward_34_v2

Conversation

@zhengshengning
Copy link
Copy Markdown
Contributor

@zhengshengning zhengshengning commented Apr 24, 2026

PR Category

Operator Mechanism

PR Types

Bug fixes

Description

devPR:#78794

revert 掉 revert的代码,并加上修改GammaBetaBackwardCUDAKernelTemplate的修改。

根因:GammaBetaBackwardCUDAKernelTemplate 使用 block_dim_x=32, block_dim_y=32 =1024线程/block。在Blackwell (SM 100)架构上,编译器为每个线程生成了更多的寄存器,导致 1024 线程 × 寄存器数/线程 超出了SM的寄存器文件容量。Kernel完全没有执行("too many resources requested for launch"),输出buffer保留了未初始化的脏数据(Inf/NaN)。H卡(Hopper SM 90)上同样的kernel没有问题,因为寄存器使用量更低。

修复:给GammaBetaBackwardCUDAKernelTemplate加上__launch_bounds__(block_dim_x* block_dim_y)

是否引起精度变化

@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented Apr 24, 2026

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 90.00000% with 6 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (release/3.4@2675023). Learn more about missing BASE report.

Files with missing lines Patch % Lines
...addle/fluid/ir_adaptor/translator/op_translator.cc 87.09% 4 Missing ⚠️
.../core/distributed/auto_parallel/inferspmd_utils.cc 77.77% 2 Missing ⚠️
Additional details and impacted files
@@              Coverage Diff               @@
##             release/3.4   #78792   +/-   ##
==============================================
  Coverage               ?   90.00%           
==============================================
  Files                  ?        6           
  Lines                  ?       60           
  Branches               ?        0           
==============================================
  Hits                   ?       54           
  Misses                 ?        6           
  Partials               ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Collaborator

@sneaxiy sneaxiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sneaxiy sneaxiy merged commit 1524203 into PaddlePaddle:release/3.4 Apr 24, 2026
120 of 126 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants