Skip to content

[Mirror] Tpetra: Modifying import/export buffer caching for DistObject#75

Open
csiefer2 wants to merge 8 commits intodevelopfrom
pr-mirror-14159
Open

[Mirror] Tpetra: Modifying import/export buffer caching for DistObject#75
csiefer2 wants to merge 8 commits intodevelopfrom
pr-mirror-14159

Conversation

@csiefer2
Copy link
Copy Markdown
Owner

@csiefer2 csiefer2 commented Apr 8, 2026

Automated mirror of upstream PR trilinos#14159 This PR does a few things:

  1. Modifies DistObject to keep a parent of the imports view so "small" allocations just get a subview.
  2. Removes the "downsizing" in DistObject. This can be re-enabled with -DTpetra_ENABLE_Legacy_Import_Shrink=ON
  3. Adds the ability to manually query import/export size and then resize those buffers to zero.
  4. Adds a unit tests for the new features.

Issue: trilinos#14158
Tests: Yes!
Stakeholder feedback: From @vbrunini via email

csiefer2 and others added 8 commits July 30, 2025 13:27
Signed-off-by: Chris Siefert <csiefer@sandia.gov>
…DistObject

Signed-off-by: Chris Siefert <csiefer@sandia.gov>
Signed-off-by: Chris Siefert <csiefer@sandia.gov>
Signed-off-by: Chris Siefert <csiefer@sandia.gov>
Signed-off-by: Chris Siefert <csiefer@sandia.gov>
Signed-off-by: Chris Siefert <csiefer@sandia.gov>
Signed-off-by: Chris Siefert <csiefer@sandia.gov>
Adding an exports_parentView to the DistObject that can
be reused if a new export is performed and needs more memory
than allocated in the current view but not more than allocated
in the parent view.

Signed-off-by: Luc Berger-Vergiat <lberge@sandia.gov>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 8, 2026

CDash for AT1 results [Only accessible from Sandia networks]
CDash for AT2 results [Currently only accessible from Sandia networks]

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a mechanism to cache DualView allocations in DistObject using "parent views" to minimize expensive reallocations during communication. It adds a CMake option, Tpetra_ENABLE_Legacy_Import_Shrink, to control whether buffers should be reallocated when they are significantly larger than needed. The changes include updates to DistObject, MultiVector, and CooMatrix to utilize this caching, along with new methods to query and clear these buffers. Review feedback identifies critical bugs in the reallocDualViewIfNeeded implementation where aliasing between parent and current views could lead to null view operations or the loss of view labels. Additionally, improvements were suggested to ensure consistency in how export buffer sizes are reported and cleared.

Comment on lines +75 to +79
current_view = dual_view_type ();
auto d_view = Kokkos::subview (parent_view.view_device(), range_type (0, newSize));
auto h_view = Kokkos::subview (parent_view.view_host(), range_type (0, newSize));
current_view = Kokkos::DualView<ValueType*, DeviceType>(d_view, h_view);
return false; // we did not reallocate
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The assignment current_view = dual_view_type (); on line 75 is dangerous when current_view and parent_view are aliased (i.e., they refer to the same DualView object). If they are aliased, clearing current_view will also clear parent_view, causing the subsequent Kokkos::subview calls on parent_view to operate on a null view. Since this function explicitly supports aliasing, this line should be removed.

    auto d_view = Kokkos::subview (parent_view.view_device(), range_type (0, newSize));
    auto h_view = Kokkos::subview (parent_view.view_host(), range_type (0, newSize));
    current_view = Kokkos::DualView<ValueType*, DeviceType>(d_view, h_view);
    return false; // we did not reallocate

Comment on lines +86 to 91
current_view = dual_view_type (); // free first, in order to save memory

// If current size is 0, the DualView's Views likely lack a label.
dv = dual_view_type (curSize == 0 ? newLabel : dv.view_device().label (), newSize);
parent_view = dual_view_type (curSize == 0 ? newLabel : current_view.view_device().label (), newSize);
current_view = parent_view;
return true; // we did reallocate
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The label is accessed from current_view on line 89 after it has been cleared on line 86. This results in the loss of the original label when curSize > 0. Additionally, the assignment current_view = dual_view_type () is redundant as it is overwritten on line 90. The label should be captured before any views are cleared.

    const std::string label = (curSize == 0 ? newLabel : current_view.view_device().label ());
    parent_view = dual_view_type(); // free first, in order to save memory
    // If current size is 0, the DualView's Views likely lack a label.
    parent_view = dual_view_type (label, newSize);
    current_view = parent_view;
    return true; // we did reallocate

Comment on lines +117 to +122
current_view = dual_view_type (); // free first, in order to save memory
parent_view = dual_view_type(); // free first, in order to save memory
// If current size is 0, the DualView's Views likely lack a label.
parent_view = dual_view_type (curSize == 0 ? newLabel : current_view.view_device().label (), newSize);
current_view = parent_view;
return true; // we did reallocate
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to the previous block, the label is lost because current_view is cleared before its label is accessed. The assignment to current_view on line 117 is also redundant.

    const std::string label = (curSize == 0 ? newLabel : current_view.view_device().label ());
    parent_view = dual_view_type(); // free first, in order to save memory
    // If current size is 0, the DualView's Views likely lack a label.
    parent_view = dual_view_type (label, newSize);
    current_view = parent_view;
    return true; // we did reallocate

Comment on lines +2042 to +2044
void DistObject<Packet, LocalOrdinal, GlobalOrdinal, Node>::clearExports() {
this->exports_.realloc(0);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

clearExports() should clear exports_parentView_ as well. If only exports_ is cleared, the underlying cached allocation in exports_parentView_ remains, which fails to free the memory as intended by a 'clear' operation.

void DistObject<Packet, LocalOrdinal, GlobalOrdinal, Node>::clearExports() {
  this->exports_parentView_.realloc(0);
  this->exports_ = this->exports_parentView_;
}


template <class Packet, class LocalOrdinal, class GlobalOrdinal, class Node>
size_t DistObject<Packet, LocalOrdinal, GlobalOrdinal, Node>::getSizeOfExports() const {
return this->exports_.view_device().extent(0);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This should return the size of exports_parentView_ to be consistent with getSizeOfImports(). Returning the size of the exports_ subview only gives the size of the last communication, whereas the user likely expects the total allocated buffer capacity when querying the object's state.

  return this->exports_parentView_.view_device().extent(0);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants