Fix layer norm backward release3.4 v2#78792
Merged
sneaxiy merged 4 commits intoPaddlePaddle:release/3.4from Apr 24, 2026
Merged
Fix layer norm backward release3.4 v2#78792sneaxiy merged 4 commits intoPaddlePaddle:release/3.4from
sneaxiy merged 4 commits intoPaddlePaddle:release/3.4from
Conversation
|
你的PR提交成功,感谢你对开源项目的贡献! |
wanghuancoder
approved these changes
Apr 24, 2026
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## release/3.4 #78792 +/- ##
==============================================
Coverage ? 90.00%
==============================================
Files ? 6
Lines ? 60
Branches ? 0
==============================================
Hits ? 54
Misses ? 6
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR Category
Operator Mechanism
PR Types
Bug fixes
Description
devPR:#78794
revert 掉 revert的代码,并加上修改GammaBetaBackwardCUDAKernelTemplate的修改。
根因:GammaBetaBackwardCUDAKernelTemplate 使用 block_dim_x=32, block_dim_y=32 =1024线程/block。在Blackwell (SM 100)架构上,编译器为每个线程生成了更多的寄存器,导致 1024 线程 × 寄存器数/线程 超出了SM的寄存器文件容量。Kernel完全没有执行("too many resources requested for launch"),输出buffer保留了未初始化的脏数据(Inf/NaN)。H卡(Hopper SM 90)上同样的kernel没有问题,因为寄存器使用量更低。
修复:给GammaBetaBackwardCUDAKernelTemplate加上__launch_bounds__(block_dim_x* block_dim_y)
是否引起精度变化
否