Skip to content

Introduce a TensorWrapper protection mechanism that holds an additional strong reference for pylayer#78669

Merged
sneaxiy merged 3 commits intoPaddlePaddle:developfrom
adam-xiaoyao:pylayer-tensor-wrapper-fix
Apr 24, 2026
Merged

Introduce a TensorWrapper protection mechanism that holds an additional strong reference for pylayer#78669
sneaxiy merged 3 commits intoPaddlePaddle:developfrom
adam-xiaoyao:pylayer-tensor-wrapper-fix

Conversation

@adam-xiaoyao
Copy link
Copy Markdown
Contributor

PR Category

Execute Infrastructure

PR Types

Bug fixes

Description

For pylayer, Introduce a TensorWrapper protection mechanism that holds an additional strong reference to the DenseTensor at the C++ layer.

是否引起精度变化

@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented Apr 14, 2026

你的PR提交成功,感谢你对开源项目的贡献!
请关注后续CI自动化测试结果,详情请参考Paddle-CI手册
Your PR has been submitted. Thanks for your contribution!
Please wait for the result of CI firstly. See Paddle CI Manual for details.

@adam-xiaoyao adam-xiaoyao force-pushed the pylayer-tensor-wrapper-fix branch 3 times, most recently from 359d8fa to 75a9396 Compare April 14, 2026 06:30
@adam-xiaoyao
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

2 similar comments
@adam-xiaoyao
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

@adam-xiaoyao
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

@adam-xiaoyao adam-xiaoyao force-pushed the pylayer-tensor-wrapper-fix branch 3 times, most recently from 9110683 to b91245e Compare April 15, 2026 07:48
@adam-xiaoyao adam-xiaoyao force-pushed the pylayer-tensor-wrapper-fix branch 6 times, most recently from 54030eb to dca931b Compare April 16, 2026 02:54
@adam-xiaoyao
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 16, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@4174bb8). Learn more about missing BASE report.

Additional details and impacted files
@@             Coverage Diff             @@
##             develop    #78669   +/-   ##
===========================================
  Coverage           ?   100.00%           
===========================================
  Files              ?         1           
  Lines              ?         2           
  Branches           ?         0           
===========================================
  Hits               ?         2           
  Misses             ?         0           
  Partials           ?         0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@adam-xiaoyao adam-xiaoyao force-pushed the pylayer-tensor-wrapper-fix branch from dca931b to 1246b3f Compare April 16, 2026 07:33
@adam-xiaoyao
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

1 similar comment
@adam-xiaoyao
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

@adam-xiaoyao adam-xiaoyao changed the title Pylayer tensor wrapper fix Introduce a TensorWrapper protection mechanism that holds an additional strong reference Apr 17, 2026
@adam-xiaoyao adam-xiaoyao changed the title Introduce a TensorWrapper protection mechanism that holds an additional strong reference Introduce a TensorWrapper protection mechanism that holds an additional strong reference for pylayer Apr 17, 2026
@adam-xiaoyao
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

1 similar comment
@adam-xiaoyao
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

@adam-xiaoyao adam-xiaoyao force-pushed the pylayer-tensor-wrapper-fix branch 3 times, most recently from e875578 to 7f6a718 Compare April 20, 2026 06:18
@adam-xiaoyao
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

2 similar comments
@adam-xiaoyao
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

@adam-xiaoyao
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

Comment on lines +272 to +273
VLOG(6) << "Clearing PyLayer tensor wrappers";
SetIsTensorWrappersCleared(true);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里不应该SetIsTensorWrappersCleared(true);如果你讲vector clear掉,同时Py_DECREF(self->container);的话,才可以。但不建议这么做。这里可以不动。或者说这里与我们要做的事情无关~

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

�OK

Comment thread paddle/fluid/pybind/eager_py_layer.cc Outdated
Comment on lines +770 to +806

// If tensor_hold_helper is non-empty, some tensors may have been cleared by
// _clear_dataptr(). Iterate the top-level container tuple and restore any
// null impl from the corresponding entry in tensor_hold_helper.
// tensor_hold_helper is ordered by the DenseTensors found during deep
// traversal in set_container; for the common case (flat tuple of tensors)
// the k-th tensor in the tuple maps to tensor_hold_helper[k].
if (!self->tensor_hold_helper.empty()) {
Py_ssize_t size = PyTuple_Size(self->container);
PyObject* recovered_container = PyTuple_New(size);
Py_ssize_t holder_idx = 0;
for (Py_ssize_t i = 0; i < size; ++i) {
PyObject* item = PyTuple_GetItem(self->container, i);
if (item && PyCheckTensor(item)) {
TensorObject* tensor_obj = reinterpret_cast<TensorObject*>(item);
if (!tensor_obj->tensor.impl() &&
holder_idx <
static_cast<Py_ssize_t>(self->tensor_hold_helper.size()) &&
self->tensor_hold_helper[holder_idx]) {
// Tensor was cleared by _clear_dataptr; restore impl from holder.
paddle::Tensor recovered;
recovered.set_impl(self->tensor_hold_helper[holder_idx]);
PyTuple_SET_ITEM(
recovered_container, i, paddle::pybind::ToPyObject(recovered));
++holder_idx;
continue;
}
++holder_idx;
Py_INCREF(item);
PyTuple_SET_ITEM(recovered_container, i, item);
} else {
Py_INCREF(item);
PyTuple_SET_ITEM(recovered_container, i, item);
}
}
return recovered_container;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里没有必要吧?直接return self->container;就可以。

Copy link
Copy Markdown
Contributor Author

@adam-xiaoyao adam-xiaoyao Apr 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

只是存到了tensor_hold_helper里, 在self->container里并没有。 因此这里有必要。

Comment on lines +431 to +432
if ctx.closure_cells:
_restore_freed_closure_tensors(ctx)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里还需要保留吗?_restore_freed_closure_tensors是因为有提前释放的问题,孙东对recompute做的修复。有了以上代码,这里我觉得可以直接删掉。你可以找孙东确认一下。

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里是需要的, 我们在release3.3分支上改了的, 当时孙东有参与的。这里主要处理的是recompute的函数为闭包的情况。

@adam-xiaoyao adam-xiaoyao force-pushed the pylayer-tensor-wrapper-fix branch 3 times, most recently from f6f1bcc to 6186972 Compare April 22, 2026 17:16
Problem:
PyLayer's save_for_backward only holds Python-level references, but
_clear_dataptr releases the C++ DenseTensor data, causing
backward failure in pipeline parallel scenarios.

Solution:
Add C++ TensorWrapper storage to GradNodePyLayer. Registration is done
in pylayer_method_apply after ctx->grad_node is set, because forward()
is called at line 408 before grad_node is assigned at line 587 - any
registration attempted during forward would silently skip due to the
null weak_ptr.

Changes:
- py_layer_node.h: Add saved_tensor_wrappers_ member
- py_layer_node.cc: Implement AddSavedTensorWrapper / ClearTensorWrappers
- eager_py_layer.cc: Register TensorWrappers in apply() after grad_node
  is set; recover cleared tensors in tensor_properties_get_container()
- recompute.py: Minor compatibility update

Tests: 7 test cases covering core pipeline-parallel scenarios
@adam-xiaoyao adam-xiaoyao force-pushed the pylayer-tensor-wrapper-fix branch 2 times, most recently from 5d05692 to 2e10cda Compare April 23, 2026 06:53
@adam-xiaoyao adam-xiaoyao force-pushed the pylayer-tensor-wrapper-fix branch from 2e10cda to 5a3c9b3 Compare April 23, 2026 07:39
@adam-xiaoyao
Copy link
Copy Markdown
Contributor Author

/re-run all-failed

Copy link
Copy Markdown
Collaborator

@sneaxiy sneaxiy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for placement new.

@sneaxiy sneaxiy merged commit a5120d0 into PaddlePaddle:develop Apr 24, 2026
133 of 140 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants