Introduce a TensorWrapper protection mechanism that holds an additional strong reference for pylayer#78669
Conversation
|
你的PR提交成功,感谢你对开源项目的贡献! |
359d8fa to
75a9396
Compare
|
/re-run all-failed |
2 similar comments
|
/re-run all-failed |
|
/re-run all-failed |
9110683 to
b91245e
Compare
54030eb to
dca931b
Compare
|
/re-run all-failed |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #78669 +/- ##
===========================================
Coverage ? 100.00%
===========================================
Files ? 1
Lines ? 2
Branches ? 0
===========================================
Hits ? 2
Misses ? 0
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
dca931b to
1246b3f
Compare
|
/re-run all-failed |
1 similar comment
|
/re-run all-failed |
|
/re-run all-failed |
1 similar comment
|
/re-run all-failed |
e875578 to
7f6a718
Compare
|
/re-run all-failed |
2 similar comments
|
/re-run all-failed |
|
/re-run all-failed |
| VLOG(6) << "Clearing PyLayer tensor wrappers"; | ||
| SetIsTensorWrappersCleared(true); |
There was a problem hiding this comment.
这里不应该SetIsTensorWrappersCleared(true);如果你讲vector clear掉,同时Py_DECREF(self->container);的话,才可以。但不建议这么做。这里可以不动。或者说这里与我们要做的事情无关~
|
|
||
| // If tensor_hold_helper is non-empty, some tensors may have been cleared by | ||
| // _clear_dataptr(). Iterate the top-level container tuple and restore any | ||
| // null impl from the corresponding entry in tensor_hold_helper. | ||
| // tensor_hold_helper is ordered by the DenseTensors found during deep | ||
| // traversal in set_container; for the common case (flat tuple of tensors) | ||
| // the k-th tensor in the tuple maps to tensor_hold_helper[k]. | ||
| if (!self->tensor_hold_helper.empty()) { | ||
| Py_ssize_t size = PyTuple_Size(self->container); | ||
| PyObject* recovered_container = PyTuple_New(size); | ||
| Py_ssize_t holder_idx = 0; | ||
| for (Py_ssize_t i = 0; i < size; ++i) { | ||
| PyObject* item = PyTuple_GetItem(self->container, i); | ||
| if (item && PyCheckTensor(item)) { | ||
| TensorObject* tensor_obj = reinterpret_cast<TensorObject*>(item); | ||
| if (!tensor_obj->tensor.impl() && | ||
| holder_idx < | ||
| static_cast<Py_ssize_t>(self->tensor_hold_helper.size()) && | ||
| self->tensor_hold_helper[holder_idx]) { | ||
| // Tensor was cleared by _clear_dataptr; restore impl from holder. | ||
| paddle::Tensor recovered; | ||
| recovered.set_impl(self->tensor_hold_helper[holder_idx]); | ||
| PyTuple_SET_ITEM( | ||
| recovered_container, i, paddle::pybind::ToPyObject(recovered)); | ||
| ++holder_idx; | ||
| continue; | ||
| } | ||
| ++holder_idx; | ||
| Py_INCREF(item); | ||
| PyTuple_SET_ITEM(recovered_container, i, item); | ||
| } else { | ||
| Py_INCREF(item); | ||
| PyTuple_SET_ITEM(recovered_container, i, item); | ||
| } | ||
| } | ||
| return recovered_container; | ||
| } |
There was a problem hiding this comment.
这里没有必要吧?直接return self->container;就可以。
There was a problem hiding this comment.
只是存到了tensor_hold_helper里, 在self->container里并没有。 因此这里有必要。
| if ctx.closure_cells: | ||
| _restore_freed_closure_tensors(ctx) |
There was a problem hiding this comment.
这里还需要保留吗?_restore_freed_closure_tensors是因为有提前释放的问题,孙东对recompute做的修复。有了以上代码,这里我觉得可以直接删掉。你可以找孙东确认一下。
There was a problem hiding this comment.
这里是需要的, 我们在release3.3分支上改了的, 当时孙东有参与的。这里主要处理的是recompute的函数为闭包的情况。
f6f1bcc to
6186972
Compare
Problem: PyLayer's save_for_backward only holds Python-level references, but _clear_dataptr releases the C++ DenseTensor data, causing backward failure in pipeline parallel scenarios. Solution: Add C++ TensorWrapper storage to GradNodePyLayer. Registration is done in pylayer_method_apply after ctx->grad_node is set, because forward() is called at line 408 before grad_node is assigned at line 587 - any registration attempted during forward would silently skip due to the null weak_ptr. Changes: - py_layer_node.h: Add saved_tensor_wrappers_ member - py_layer_node.cc: Implement AddSavedTensorWrapper / ClearTensorWrappers - eager_py_layer.cc: Register TensorWrappers in apply() after grad_node is set; recover cleared tensors in tensor_properties_get_container() - recompute.py: Minor compatibility update Tests: 7 test cases covering core pipeline-parallel scenarios
5d05692 to
2e10cda
Compare
2e10cda to
5a3c9b3
Compare
|
/re-run all-failed |
sneaxiy
left a comment
There was a problem hiding this comment.
LGTM for placement new.
PR Category
Execute Infrastructure
PR Types
Bug fixes
Description
For pylayer, Introduce a TensorWrapper protection mechanism that holds an additional strong reference to the DenseTensor at the C++ layer.
是否引起精度变化
否