[XPU] Fix precision for paddle.Tensor.__getitem__ backward pass by YqGe585 · Pull Request #78772 · PaddlePaddle/Paddle

YqGe585 · 2026-04-23T13:16:03Z

PR Category

Operator Mechanism

PR Types

Bug fixes

Description

背景

paddle.Tensor.__getitem__ 在 XPU 上进行 advanced indexing（使用 Tensor 作为索引）时，前向精度正常，但反向梯度存在精度偏差，与 GPU 结果不一致。

根本原因

反向 kernel IndexElementwiseGetGradKernel（paddle/phi/kernels/xpu/index_elementwise_get_grad_kernel.cc）调用了 xpu::index_elementwise_get_grad，该函数存在两个问题：

不使用原子操作进行 scatter-add：当 accumulate=true（标准反向传播场景）时，若存在重复索引，缺少原子操作会导致竞争条件（race condition），产生错误的梯度累加结果。
不支持 int64_t 类型：XPU SDK 中 index_elementwise_get_grad 没有 <long, long> 特化版本，导致 int64_t 类型输出产生错误值。

修复方案

参考 index_put_grad_kernel.cc 的实现，当满足 accumulate=true && slice_offset==0（标准反向传播路径）或输出类型为 int64_t 时，改用 XPUDealWithIndices + xpu::scatter_nd 替换原 xpu::index_elementwise_get_grad 调用：

xpu::scatter_nd（is_overwrite=false）天然支持原子 scatter-add，可正确处理重复索引
同时支持 int64_t 类型

此外，在 xpu3_op_list.cc 中为 index_elementwise_get_grad 增加了 INT64 类型支持。

验证

在 38 个测试配置中，修复前 37 个存在反向梯度精度失败；修复后全部通过（float32、float16、bfloat16、int32、int64、int8 各类型均验证通过）。

是否引起精度变化

是——XPU 上 paddle.Tensor.__getitem__ 反向梯度精度与 GPU 对齐。

…r atomic scatter-add in backward pass - Add INT64 to xpu3_op_list.cc for index_elementwise_get_grad - Replace xpu::index_elementwise_get_grad with xpu::scatter_nd for accumulate=true cases to fix race conditions on duplicate indices - Also use scatter_nd for int64 output (no SDK specialization) - Fix resolves backward gradient precision gaps across all dtypes (float32, float16, bfloat16, int32, int64, int8)

paddle-bot · 2026-04-23T13:16:11Z

你的PR提交成功，感谢你对开源项目的贡献!
请关注后续CI自动化测试结果，详情请参考Paddle-CI手册。
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

YqGe585 · 2026-04-23T16:18:30Z

/re-run all-failed

YqGe585 · 2026-04-24T08:42:08Z

/re-run all-failed

YqGe585 · 2026-04-26T10:06:55Z

/re-run all-failed

YqGe585 · 2026-04-27T04:02:46Z

/re-run all-failed

YqGe585 · 2026-04-27T07:06:53Z

/re-run all-failed

YqGe585 · 2026-04-27T16:04:04Z

/re-run all-failed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[XPU] Fix precision for paddle.Tensor.getitem backward pass#78772

[XPU] Fix precision for paddle.Tensor.getitem backward pass#78772
YqGe585 wants to merge 1 commit intoPaddlePaddle:developfrom
YqGe585:xpu-api-fixer/GEY-81-xpu-precision

YqGe585 commented Apr 23, 2026

Uh oh!

paddle-bot Bot commented Apr 23, 2026

Uh oh!

YqGe585 commented Apr 23, 2026

Uh oh!

YqGe585 commented Apr 24, 2026

Uh oh!

YqGe585 commented Apr 26, 2026

Uh oh!

YqGe585 commented Apr 27, 2026

Uh oh!

YqGe585 commented Apr 27, 2026

Uh oh!

YqGe585 commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

YqGe585 commented Apr 23, 2026

PR Category

PR Types

Description

背景

根本原因

修复方案

验证

是否引起精度变化

Uh oh!

paddle-bot Bot commented Apr 23, 2026

Uh oh!

YqGe585 commented Apr 23, 2026

Uh oh!

YqGe585 commented Apr 24, 2026

Uh oh!

YqGe585 commented Apr 26, 2026

Uh oh!

YqGe585 commented Apr 27, 2026

Uh oh!

YqGe585 commented Apr 27, 2026

Uh oh!

YqGe585 commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant