Skip to content

[XPU] Fix precision for paddle.Tensor.__getitem__ backward pass#78772

Open
YqGe585 wants to merge 1 commit intoPaddlePaddle:developfrom
YqGe585:xpu-api-fixer/GEY-81-xpu-precision
Open

[XPU] Fix precision for paddle.Tensor.__getitem__ backward pass#78772
YqGe585 wants to merge 1 commit intoPaddlePaddle:developfrom
YqGe585:xpu-api-fixer/GEY-81-xpu-precision

Conversation

@YqGe585
Copy link
Copy Markdown
Member

@YqGe585 YqGe585 commented Apr 23, 2026

PR Category

Operator Mechanism

PR Types

Bug fixes

Description

背景

paddle.Tensor.__getitem__ 在 XPU 上进行 advanced indexing(使用 Tensor 作为索引)时,前向精度正常,但反向梯度存在精度偏差,与 GPU 结果不一致。

根本原因

反向 kernel IndexElementwiseGetGradKernelpaddle/phi/kernels/xpu/index_elementwise_get_grad_kernel.cc)调用了 xpu::index_elementwise_get_grad,该函数存在两个问题:

  1. 不使用原子操作进行 scatter-add:当 accumulate=true(标准反向传播场景)时,若存在重复索引,缺少原子操作会导致竞争条件(race condition),产生错误的梯度累加结果。
  2. 不支持 int64_t 类型:XPU SDK 中 index_elementwise_get_grad 没有 <long, long> 特化版本,导致 int64_t 类型输出产生错误值。

修复方案

参考 index_put_grad_kernel.cc 的实现,当满足 accumulate=true && slice_offset==0(标准反向传播路径)或输出类型为 int64_t 时,改用 XPUDealWithIndices + xpu::scatter_nd 替换原 xpu::index_elementwise_get_grad 调用:

  • xpu::scatter_ndis_overwrite=false)天然支持原子 scatter-add,可正确处理重复索引
  • 同时支持 int64_t 类型

此外,在 xpu3_op_list.cc 中为 index_elementwise_get_grad 增加了 INT64 类型支持。

验证

在 38 个测试配置中,修复前 37 个存在反向梯度精度失败;修复后全部通过(float32、float16、bfloat16、int32、int64、int8 各类型均验证通过)。

是否引起精度变化

是——XPU 上 paddle.Tensor.__getitem__ 反向梯度精度与 GPU 对齐。

…r atomic scatter-add in backward pass

- Add INT64 to xpu3_op_list.cc for index_elementwise_get_grad
- Replace xpu::index_elementwise_get_grad with xpu::scatter_nd for
  accumulate=true cases to fix race conditions on duplicate indices
- Also use scatter_nd for int64 output (no SDK specialization)
- Fix resolves backward gradient precision gaps across all dtypes
  (float32, float16, bfloat16, int32, int64, int8)
@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented Apr 23, 2026

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@YqGe585
Copy link
Copy Markdown
Member Author

YqGe585 commented Apr 23, 2026

/re-run all-failed

5 similar comments
@YqGe585
Copy link
Copy Markdown
Member Author

YqGe585 commented Apr 24, 2026

/re-run all-failed

@YqGe585
Copy link
Copy Markdown
Member Author

YqGe585 commented Apr 26, 2026

/re-run all-failed

@YqGe585
Copy link
Copy Markdown
Member Author

YqGe585 commented Apr 27, 2026

/re-run all-failed

@YqGe585
Copy link
Copy Markdown
Member Author

YqGe585 commented Apr 27, 2026

/re-run all-failed

@YqGe585
Copy link
Copy Markdown
Member Author

YqGe585 commented Apr 27, 2026

/re-run all-failed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant