Conversation
Signed-off-by: Samuel E. Browne <sebrown@sandia.gov>
Zoltan_ch_simple_zoltan_serial fails without this flag on GCC14 _without_ MPI enabled. It was getting added via the strong C warnings flags, so with DEVELOPMENT_MODE disabled it doesn't get added. Force it on here. Signed-off-by: Samuel E. Browne <sebrown@sandia.gov>
Update version(s) for 16.2.0 release
This adds new Tpetra replacements for some sparse 3-tensor utilities that have now disappeared from the Epetra removal. Signed-off-by: Eric Phipps <etphipp@sandia.gov>
…okhos_tpetra_utilities Stokhos: Add Tpetra utilities to replace now missing Epetra utilities (cherry-pick to 16.2 branch)
* Amesos2: Pardiso with debug statements Signed-off-by: iyamazaki <iyamaza@sandia.gov> * Amesos2: print X after solve Signed-off-by: iyamazaki <iyamaza@sandia.gov> * Amesos2:KLU2 print A, b, & x for debuging Signed-off-by: iyamazaki <iyamaza@sandia.gov> * Amesos2: print map for Pardiso/KLU2 from solve_impl Signed-off-by: iyamazaki <iyamaza@sandia.gov> * Amesos2: turn of debug message for default KLU2 & Pardiso and remove some debugging printf Signed-off-by: iyamazaki <iyamaza@sandia.gov> * Amesos2: switching to std::cout Signed-off-by: iyamazaki <iyamaza@sandia.gov> * Amesos2: comments about debug and message levels Signed-off-by: iyamazaki <iyamaza@sandia.gov> --------- Signed-off-by: iyamazaki <iyamaza@sandia.gov>
* Amesos2: Css free memory before symbolic, if needed Signed-off-by: iyamazaki <iyamaza@sandia.gov> * Amesos2: test re-symbolic with same solver object Signed-off-by: iyamazaki <iyamaza@sandia.gov> * Amesos2: cleanup PardisoMKL before re-symbolic Signed-off-by: iyamazaki <iyamaza@sandia.gov> * Amesos2: guard time prints with verbose for pardiso & css with Amesos2_timer Signed-off-by: iyamazaki <iyamaza@sandia.gov> --------- Signed-off-by: iyamazaki <iyamaza@sandia.gov>
* Stratimikos : trying to set Amesos2 parameters Signed-off-by: iyamazaki <iyamaza@sandia.gov> * Stratimikos : setting Amesos2 (KLU2) parameter from test Signed-off-by: iyamazaki <iyamaza@sandia.gov> * Stratimikos: update comments about Amesos2 parameter setting Signed-off-by: iyamazaki <iyamaza@sandia.gov> --------- Signed-off-by: iyamazaki <iyamaza@sandia.gov>
Amesos2 : Patch into release-16-2
This applies the patch file from the trilinos/github issue spack/spack-packages#2931 Signed-off-by: Alan Williams <william@sandia.gov>
… trilinoscoupling and tpetra simultaneously' to being 'true'. This change serves the XYCE team in some of its situations (trilinoscouping=ON but tpetra=OFF) where the absence of this conditioning causes the trilinos configuration to fail. Signed-off-by: Ernesto Prudencio <eeprude@sandia.gov>
…r comparison with 0.0: TEUCHOS_TEST_FOR_EXCEPTION_CLASS_FUNC(pivot == 0.0, std::runtime_error,... The variable 'pivot' is a scalar template argument that can be a complex number, so this comparison should be to the zero provided by Teuchos::ScalarTraits class. Signed-off-by: Heidi Thornquist <hkthorn@sandia.gov>
Signed-off-by: Christian Glusa <caglusa@sandia.gov>
|
CDash for AT1 results [Only accessible from Sandia networks] |
There was a problem hiding this comment.
Code Review
This pull request updates Trilinos to version 16.2.0 and introduces several enhancements across packages, including new Tpetra utilities for Stokhos, improved memory management and debugging for Amesos2 solvers, and build system refinements. Feedback highlights the need for proper MPI root-rank guards on new debug print statements in Amesos2 to avoid garbled output. Additionally, improvements are suggested for the efficiency of Tpetra graph construction in Stokhos, specifically regarding row capacity estimation and batching index insertions.
| if(debug_level_ > 0) { | ||
| std::cout << " ** Amesos2::KLU2::numericFactorization failed with status = "; | ||
| if(data_.common_.status == KLU_OK) | ||
| std::cout << "KLU_OK **\n"; | ||
| else if (data_.common_.status == KLU_SINGULAR) | ||
| std::cout << "KLU_SINGULAR **\n"; | ||
| else if (data_.common_.status == KLU_OUT_OF_MEMORY) | ||
| std::cout << "KLU_OUT_OF_MEMORY **\n"; | ||
| else if (data_.common_.status == KLU_INVALID) | ||
| std::cout << "KLU_INVALID **\n"; | ||
| else if (data_.common_.status == KLU_TOO_LARGE) | ||
| std::cout << "KLU_TOO_LARGE **\n"; | ||
| } |
There was a problem hiding this comment.
This debug block should be guarded by if (this->root_). Since KLU2 is a serial solver, data_.numeric_ will be nullptr on all non-root ranks. Without the guard, every non-root rank will print a failure message to std::cout even if the factorization succeeded on the root rank, leading to confusing and incorrect error logs in parallel environments.
if(this->root_ && debug_level_ > 0) {
std::cout << " ** Amesos2::KLU2::numericFactorization failed with status = ";
if(data_.common_.status == KLU_OK)
std::cout << "KLU_OK **\n";
else if (data_.common_.status == KLU_SINGULAR)
std::cout << "KLU_SINGULAR **\n";
else if (data_.common_.status == KLU_OUT_OF_MEMORY)
std::cout << "KLU_OUT_OF_MEMORY **\n";
else if (data_.common_.status == KLU_INVALID)
std::cout << "KLU_INVALID **\n";
else if (data_.common_.status == KLU_TOO_LARGE)
std::cout << "KLU_TOO_LARGE **\n";
}| if (debug_level_ == 1) { | ||
| B->description(); | ||
| } else { | ||
| Teuchos::RCP<Teuchos::FancyOStream> fancy = Teuchos::fancyOStream(Teuchos::rcpFromRef(std::cout)); | ||
| B->getMap()->describe(*fancy, Teuchos::VERB_EXTREME); | ||
| std::cout << std::endl; | ||
| B->describe(*fancy, Teuchos::VERB_EXTREME); | ||
| } |
There was a problem hiding this comment.
The call to B->description() on line 222 is a no-op because it returns a string that is not used. Additionally, the describe block in the else branch is not guarded by this->root_, which will cause all MPI ranks to print to std::cout simultaneously, leading to garbled output in parallel. This issue also occurs at lines 403-410 and 504-508.
if (this->root_) {
if (debug_level_ == 1) {
std::cout << B->description() << std::endl;
} else {
Teuchos::RCP<Teuchos::FancyOStream> fancy = Teuchos::fancyOStream(Teuchos::rcpFromRef(std::cout));
B->getMap()->describe(*fancy, Teuchos::VERB_EXTREME);
std::cout << std::endl;
B->describe(*fancy, Teuchos::VERB_EXTREME);
}
}| if (debug_level_ == 1) { | ||
| B->description(); | ||
| } else { | ||
| Teuchos::RCP<Teuchos::FancyOStream> fancy = Teuchos::fancyOStream(Teuchos::rcpFromRef(std::cout)); | ||
| B->getMap()->describe(*fancy, Teuchos::VERB_EXTREME); | ||
| std::cout << std::endl; | ||
| B->describe(*fancy, Teuchos::VERB_EXTREME); | ||
| } |
There was a problem hiding this comment.
Similar to the KLU2 implementation, B->description() is a no-op and the describe calls should be guarded by if (this->root_) to avoid interleaved output from multiple MPI ranks. This issue also occurs at lines 279-286 and 555-559.
if (this->root_) {
if (debug_level_ == 1) {
std::cout << B->description() << std::endl;
} else {
Teuchos::RCP<Teuchos::FancyOStream> fancy = Teuchos::fancyOStream(Teuchos::rcpFromRef(std::cout));
B->getMap()->describe(*fancy, Teuchos::VERB_EXTREME);
std::cout << std::endl;
B->describe(*fancy, Teuchos::VERB_EXTREME);
}
}| for (typename Cijk_type::ik_iterator k_it = Cijk.k_begin(i_it); k_it != Cijk.k_end(i_it); ++k_it) { | ||
| for (typename Cijk_type::ikj_iterator j_it = Cijk.j_begin(k_it); j_it != Cijk.j_end(k_it); ++j_it) { | ||
| ++num_entry; | ||
| } | ||
| max_num_entry = (num_entry > max_num_entry) ? num_entry : max_num_entry; | ||
| } |
There was a problem hiding this comment.
Updating max_num_entry inside the k_it loop is inefficient. It should be updated once per row i after the k_it loop completes. Furthermore, num_entry counts all entries including duplicates across different k values, which can lead to a significant overestimate of the required row capacity in Tpetra::CrsGraph, potentially causing excessive memory allocation.
| for (typename Cijk_type::ik_iterator k_it = Cijk.k_begin(i_it); k_it != Cijk.k_end(i_it); ++k_it) { | |
| for (typename Cijk_type::ikj_iterator j_it = Cijk.j_begin(k_it); j_it != Cijk.j_end(k_it); ++j_it) { | |
| ++num_entry; | |
| } | |
| max_num_entry = (num_entry > max_num_entry) ? num_entry : max_num_entry; | |
| } | |
| for (typename Cijk_type::ik_iterator k_it = Cijk.k_begin(i_it); k_it != Cijk.k_end(i_it); ++k_it) { | |
| for (typename Cijk_type::ikj_iterator j_it = Cijk.j_begin(k_it); j_it != Cijk.j_end(k_it); ++j_it) { | |
| ++num_entry; | |
| } | |
| } | |
| max_num_entry = (num_entry > max_num_entry) ? num_entry : max_num_entry; |
| for (typename Cijk_type::i_iterator i_it=Cijk.i_begin(); i_it!=Cijk.i_end(); ++i_it) { | ||
| TpetraGlobalOrdinal i = index(i_it); | ||
| for (typename Cijk_type::ik_iterator k_it = Cijk.k_begin(i_it); k_it != Cijk.k_end(i_it); ++k_it) { | ||
| for (typename Cijk_type::ikj_iterator j_it = Cijk.j_begin(k_it); j_it != Cijk.j_end(k_it); ++j_it) { | ||
| TpetraGlobalOrdinal j = index(j_it); | ||
| graph->insertGlobalIndices(i, arrayView(&j, 1)); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Inserting global indices one by one in a triple loop is highly inefficient for Tpetra::CrsGraph. It is recommended to accumulate all unique column indices for a given row i and perform a single insertGlobalIndices call per row. This also avoids redundant insertions of the same (i, j) pair for different k values.
Signed-off-by: Christian Glusa <caglusa@sandia.gov>
Automated mirror of upstream PR trilinos#15138 @trilinos/
Motivation
Backport changes from develop to the 16.2 release branch in preparation for the 16.2 patch release