Skip to content

[Mirror] 16.2 backports#85

Open
csiefer2 wants to merge 14 commits intodevelopfrom
pr-mirror-15138
Open

[Mirror] 16.2 backports#85
csiefer2 wants to merge 14 commits intodevelopfrom
pr-mirror-15138

Conversation

@csiefer2
Copy link
Copy Markdown
Owner

sebrowne and others added 13 commits November 24, 2025 14:09
Signed-off-by: Samuel E. Browne <sebrown@sandia.gov>
Zoltan_ch_simple_zoltan_serial fails without this flag on GCC14
_without_ MPI enabled.  It was getting added via the strong C warnings
flags, so with DEVELOPMENT_MODE disabled it doesn't get added.  Force it
on here.

Signed-off-by: Samuel E. Browne <sebrown@sandia.gov>
This adds new Tpetra replacements for some sparse 3-tensor utilities
that have now disappeared from the Epetra removal.

Signed-off-by: Eric Phipps <etphipp@sandia.gov>
…okhos_tpetra_utilities

Stokhos:  Add Tpetra utilities to replace now missing Epetra utilities (cherry-pick to 16.2 branch)
* Amesos2: Pardiso with debug statements

Signed-off-by: iyamazaki <iyamaza@sandia.gov>

* Amesos2: print X after solve

Signed-off-by: iyamazaki <iyamaza@sandia.gov>

* Amesos2:KLU2 print A, b, & x for debuging

Signed-off-by: iyamazaki <iyamaza@sandia.gov>

* Amesos2: print map for Pardiso/KLU2 from solve_impl

Signed-off-by: iyamazaki <iyamaza@sandia.gov>

* Amesos2: turn of debug message for default KLU2 & Pardiso and remove some debugging printf

Signed-off-by: iyamazaki <iyamaza@sandia.gov>

* Amesos2: switching to std::cout

Signed-off-by: iyamazaki <iyamaza@sandia.gov>

* Amesos2: comments about debug and message levels

Signed-off-by: iyamazaki <iyamaza@sandia.gov>

---------

Signed-off-by: iyamazaki <iyamaza@sandia.gov>
* Amesos2: Css free memory before symbolic, if needed

Signed-off-by: iyamazaki <iyamaza@sandia.gov>

* Amesos2: test re-symbolic with same solver object

Signed-off-by: iyamazaki <iyamaza@sandia.gov>

* Amesos2: cleanup PardisoMKL before re-symbolic

Signed-off-by: iyamazaki <iyamaza@sandia.gov>

* Amesos2: guard time prints with verbose for pardiso & css with Amesos2_timer

Signed-off-by: iyamazaki <iyamaza@sandia.gov>

---------

Signed-off-by: iyamazaki <iyamaza@sandia.gov>
* Stratimikos : trying to set Amesos2 parameters

Signed-off-by: iyamazaki <iyamaza@sandia.gov>

* Stratimikos : setting Amesos2 (KLU2) parameter from test

Signed-off-by: iyamazaki <iyamaza@sandia.gov>

* Stratimikos: update comments about Amesos2 parameter setting

Signed-off-by: iyamazaki <iyamaza@sandia.gov>

---------

Signed-off-by: iyamazaki <iyamaza@sandia.gov>
This applies the patch file from the trilinos/github
issue spack/spack-packages#2931

Signed-off-by: Alan Williams <william@sandia.gov>
… trilinoscoupling and tpetra simultaneously' to being 'true'. This change serves the XYCE team in some of its situations (trilinoscouping=ON but tpetra=OFF) where the absence of this conditioning causes the trilinos configuration to fail.

Signed-off-by: Ernesto Prudencio <eeprude@sandia.gov>
…r comparison with 0.0:

TEUCHOS_TEST_FOR_EXCEPTION_CLASS_FUNC(pivot == 0.0, std::runtime_error,...

The variable 'pivot' is a scalar template argument that can be a complex number,
so this comparison should be to the zero provided by Teuchos::ScalarTraits class.

Signed-off-by: Heidi Thornquist <hkthorn@sandia.gov>
Signed-off-by: Christian Glusa <caglusa@sandia.gov>
@github-actions
Copy link
Copy Markdown

CDash for AT1 results [Only accessible from Sandia networks]
CDash for AT2 results [Currently only accessible from Sandia networks]

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates Trilinos to version 16.2.0 and introduces several enhancements across packages, including new Tpetra utilities for Stokhos, improved memory management and debugging for Amesos2 solvers, and build system refinements. Feedback highlights the need for proper MPI root-rank guards on new debug print statements in Amesos2 to avoid garbled output. Additionally, improvements are suggested for the efficiency of Tpetra graph construction in Stokhos, specifically regarding row capacity estimation and batching index insertions.

Comment on lines +175 to +187
if(debug_level_ > 0) {
std::cout << " ** Amesos2::KLU2::numericFactorization failed with status = ";
if(data_.common_.status == KLU_OK)
std::cout << "KLU_OK **\n";
else if (data_.common_.status == KLU_SINGULAR)
std::cout << "KLU_SINGULAR **\n";
else if (data_.common_.status == KLU_OUT_OF_MEMORY)
std::cout << "KLU_OUT_OF_MEMORY **\n";
else if (data_.common_.status == KLU_INVALID)
std::cout << "KLU_INVALID **\n";
else if (data_.common_.status == KLU_TOO_LARGE)
std::cout << "KLU_TOO_LARGE **\n";
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This debug block should be guarded by if (this->root_). Since KLU2 is a serial solver, data_.numeric_ will be nullptr on all non-root ranks. Without the guard, every non-root rank will print a failure message to std::cout even if the factorization succeeded on the root rank, leading to confusing and incorrect error logs in parallel environments.

        if(this->root_ && debug_level_ > 0) {
          std::cout << " ** Amesos2::KLU2::numericFactorization failed with status = ";
          if(data_.common_.status == KLU_OK)
            std::cout << "KLU_OK **\n";
          else if (data_.common_.status == KLU_SINGULAR)
            std::cout << "KLU_SINGULAR **\n";
          else if (data_.common_.status == KLU_OUT_OF_MEMORY)
            std::cout << "KLU_OUT_OF_MEMORY **\n";
          else if (data_.common_.status == KLU_INVALID)
            std::cout << "KLU_INVALID **\n";
          else if (data_.common_.status == KLU_TOO_LARGE) 
            std::cout << "KLU_TOO_LARGE **\n";
        }

Comment on lines +221 to +228
if (debug_level_ == 1) {
B->description();
} else {
Teuchos::RCP<Teuchos::FancyOStream> fancy = Teuchos::fancyOStream(Teuchos::rcpFromRef(std::cout));
B->getMap()->describe(*fancy, Teuchos::VERB_EXTREME);
std::cout << std::endl;
B->describe(*fancy, Teuchos::VERB_EXTREME);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The call to B->description() on line 222 is a no-op because it returns a string that is not used. Additionally, the describe block in the else branch is not guarded by this->root_, which will cause all MPI ranks to print to std::cout simultaneously, leading to garbled output in parallel. This issue also occurs at lines 403-410 and 504-508.

    if (this->root_) {
      if (debug_level_ == 1) {
        std::cout << B->description() << std::endl;
      } else {
        Teuchos::RCP<Teuchos::FancyOStream> fancy = Teuchos::fancyOStream(Teuchos::rcpFromRef(std::cout));
        B->getMap()->describe(*fancy, Teuchos::VERB_EXTREME);
        std::cout << std::endl;
        B->describe(*fancy, Teuchos::VERB_EXTREME);
      }
    }

Comment on lines +213 to +220
if (debug_level_ == 1) {
B->description();
} else {
Teuchos::RCP<Teuchos::FancyOStream> fancy = Teuchos::fancyOStream(Teuchos::rcpFromRef(std::cout));
B->getMap()->describe(*fancy, Teuchos::VERB_EXTREME);
std::cout << std::endl;
B->describe(*fancy, Teuchos::VERB_EXTREME);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to the KLU2 implementation, B->description() is a no-op and the describe calls should be guarded by if (this->root_) to avoid interleaved output from multiple MPI ranks. This issue also occurs at lines 279-286 and 555-559.

      if (this->root_) {
        if (debug_level_ == 1) {
          std::cout << B->description() << std::endl;
        } else {
          Teuchos::RCP<Teuchos::FancyOStream> fancy = Teuchos::fancyOStream(Teuchos::rcpFromRef(std::cout));
          B->getMap()->describe(*fancy, Teuchos::VERB_EXTREME);
          std::cout << std::endl;
          B->describe(*fancy, Teuchos::VERB_EXTREME);
        }
      }

Comment on lines +52 to +57
for (typename Cijk_type::ik_iterator k_it = Cijk.k_begin(i_it); k_it != Cijk.k_end(i_it); ++k_it) {
for (typename Cijk_type::ikj_iterator j_it = Cijk.j_begin(k_it); j_it != Cijk.j_end(k_it); ++j_it) {
++num_entry;
}
max_num_entry = (num_entry > max_num_entry) ? num_entry : max_num_entry;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Updating max_num_entry inside the k_it loop is inefficient. It should be updated once per row i after the k_it loop completes. Furthermore, num_entry counts all entries including duplicates across different k values, which can lead to a significant overestimate of the required row capacity in Tpetra::CrsGraph, potentially causing excessive memory allocation.

Suggested change
for (typename Cijk_type::ik_iterator k_it = Cijk.k_begin(i_it); k_it != Cijk.k_end(i_it); ++k_it) {
for (typename Cijk_type::ikj_iterator j_it = Cijk.j_begin(k_it); j_it != Cijk.j_end(k_it); ++j_it) {
++num_entry;
}
max_num_entry = (num_entry > max_num_entry) ? num_entry : max_num_entry;
}
for (typename Cijk_type::ik_iterator k_it = Cijk.k_begin(i_it); k_it != Cijk.k_end(i_it); ++k_it) {
for (typename Cijk_type::ikj_iterator j_it = Cijk.j_begin(k_it); j_it != Cijk.j_end(k_it); ++j_it) {
++num_entry;
}
}
max_num_entry = (num_entry > max_num_entry) ? num_entry : max_num_entry;

Comment on lines +69 to +77
for (typename Cijk_type::i_iterator i_it=Cijk.i_begin(); i_it!=Cijk.i_end(); ++i_it) {
TpetraGlobalOrdinal i = index(i_it);
for (typename Cijk_type::ik_iterator k_it = Cijk.k_begin(i_it); k_it != Cijk.k_end(i_it); ++k_it) {
for (typename Cijk_type::ikj_iterator j_it = Cijk.j_begin(k_it); j_it != Cijk.j_end(k_it); ++j_it) {
TpetraGlobalOrdinal j = index(j_it);
graph->insertGlobalIndices(i, arrayView(&j, 1));
}
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Inserting global indices one by one in a triple loop is highly inefficient for Tpetra::CrsGraph. It is recommended to accumulate all unique column indices for a given row i and perform a single insertGlobalIndices call per row. This also avoids redundant insertions of the same (i, j) pair for different k values.

Signed-off-by: Christian Glusa <caglusa@sandia.gov>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants