[Mirror] Tpetra: Modifying import/export buffer caching for DistObject#75
[Mirror] Tpetra: Modifying import/export buffer caching for DistObject#75
Conversation
Signed-off-by: Chris Siefert <csiefer@sandia.gov>
…DistObject Signed-off-by: Chris Siefert <csiefer@sandia.gov>
Signed-off-by: Chris Siefert <csiefer@sandia.gov>
Signed-off-by: Chris Siefert <csiefer@sandia.gov>
Signed-off-by: Chris Siefert <csiefer@sandia.gov>
Signed-off-by: Chris Siefert <csiefer@sandia.gov>
Signed-off-by: Chris Siefert <csiefer@sandia.gov>
Adding an exports_parentView to the DistObject that can be reused if a new export is performed and needs more memory than allocated in the current view but not more than allocated in the parent view. Signed-off-by: Luc Berger-Vergiat <lberge@sandia.gov>
|
CDash for AT1 results [Only accessible from Sandia networks] |
There was a problem hiding this comment.
Code Review
This pull request introduces a mechanism to cache DualView allocations in DistObject using "parent views" to minimize expensive reallocations during communication. It adds a CMake option, Tpetra_ENABLE_Legacy_Import_Shrink, to control whether buffers should be reallocated when they are significantly larger than needed. The changes include updates to DistObject, MultiVector, and CooMatrix to utilize this caching, along with new methods to query and clear these buffers. Review feedback identifies critical bugs in the reallocDualViewIfNeeded implementation where aliasing between parent and current views could lead to null view operations or the loss of view labels. Additionally, improvements were suggested to ensure consistency in how export buffer sizes are reported and cleared.
| current_view = dual_view_type (); | ||
| auto d_view = Kokkos::subview (parent_view.view_device(), range_type (0, newSize)); | ||
| auto h_view = Kokkos::subview (parent_view.view_host(), range_type (0, newSize)); | ||
| current_view = Kokkos::DualView<ValueType*, DeviceType>(d_view, h_view); | ||
| return false; // we did not reallocate |
There was a problem hiding this comment.
The assignment current_view = dual_view_type (); on line 75 is dangerous when current_view and parent_view are aliased (i.e., they refer to the same DualView object). If they are aliased, clearing current_view will also clear parent_view, causing the subsequent Kokkos::subview calls on parent_view to operate on a null view. Since this function explicitly supports aliasing, this line should be removed.
auto d_view = Kokkos::subview (parent_view.view_device(), range_type (0, newSize));
auto h_view = Kokkos::subview (parent_view.view_host(), range_type (0, newSize));
current_view = Kokkos::DualView<ValueType*, DeviceType>(d_view, h_view);
return false; // we did not reallocate| current_view = dual_view_type (); // free first, in order to save memory | ||
|
|
||
| // If current size is 0, the DualView's Views likely lack a label. | ||
| dv = dual_view_type (curSize == 0 ? newLabel : dv.view_device().label (), newSize); | ||
| parent_view = dual_view_type (curSize == 0 ? newLabel : current_view.view_device().label (), newSize); | ||
| current_view = parent_view; | ||
| return true; // we did reallocate |
There was a problem hiding this comment.
The label is accessed from current_view on line 89 after it has been cleared on line 86. This results in the loss of the original label when curSize > 0. Additionally, the assignment current_view = dual_view_type () is redundant as it is overwritten on line 90. The label should be captured before any views are cleared.
const std::string label = (curSize == 0 ? newLabel : current_view.view_device().label ());
parent_view = dual_view_type(); // free first, in order to save memory
// If current size is 0, the DualView's Views likely lack a label.
parent_view = dual_view_type (label, newSize);
current_view = parent_view;
return true; // we did reallocate| current_view = dual_view_type (); // free first, in order to save memory | ||
| parent_view = dual_view_type(); // free first, in order to save memory | ||
| // If current size is 0, the DualView's Views likely lack a label. | ||
| parent_view = dual_view_type (curSize == 0 ? newLabel : current_view.view_device().label (), newSize); | ||
| current_view = parent_view; | ||
| return true; // we did reallocate |
There was a problem hiding this comment.
Similar to the previous block, the label is lost because current_view is cleared before its label is accessed. The assignment to current_view on line 117 is also redundant.
const std::string label = (curSize == 0 ? newLabel : current_view.view_device().label ());
parent_view = dual_view_type(); // free first, in order to save memory
// If current size is 0, the DualView's Views likely lack a label.
parent_view = dual_view_type (label, newSize);
current_view = parent_view;
return true; // we did reallocate| void DistObject<Packet, LocalOrdinal, GlobalOrdinal, Node>::clearExports() { | ||
| this->exports_.realloc(0); | ||
| } |
There was a problem hiding this comment.
clearExports() should clear exports_parentView_ as well. If only exports_ is cleared, the underlying cached allocation in exports_parentView_ remains, which fails to free the memory as intended by a 'clear' operation.
void DistObject<Packet, LocalOrdinal, GlobalOrdinal, Node>::clearExports() {
this->exports_parentView_.realloc(0);
this->exports_ = this->exports_parentView_;
}|
|
||
| template <class Packet, class LocalOrdinal, class GlobalOrdinal, class Node> | ||
| size_t DistObject<Packet, LocalOrdinal, GlobalOrdinal, Node>::getSizeOfExports() const { | ||
| return this->exports_.view_device().extent(0); |
There was a problem hiding this comment.
This should return the size of exports_parentView_ to be consistent with getSizeOfImports(). Returning the size of the exports_ subview only gives the size of the last communication, whereas the user likely expects the total allocated buffer capacity when querying the object's state.
return this->exports_parentView_.view_device().extent(0);
Automated mirror of upstream PR trilinos#14159 This PR does a few things:
Issue: trilinos#14158
Tests: Yes!
Stakeholder feedback: From @vbrunini via email