|
1 | 1 | OpenBLAS ChangeLog |
| 2 | +==================================================================== |
| 3 | +Version 0.3.32 |
| 4 | +23-Mar-2026 |
| 5 | + |
| 6 | +general: |
| 7 | + - Moved the preliminary support for a Web Assembly target to its own WASM |
| 8 | + architecture and WASM128_GENERIC target |
| 9 | + - Fixed a potential performance difference between dedicated compilation for |
| 10 | + a target and its representation in DYNAMIC_ARCH builds by making additional |
| 11 | + cpu-specific parameters available to the DYNAMIC_ARCH configuration |
| 12 | + - Fixed the reimplementation of LAPACK ?GESV to conform to the reference (i.e. |
| 13 | + compute the LU factorization even when NRHS is zero) |
| 14 | + - Improved the error message that is displayed when the compile-time allocation |
| 15 | + of memory buffers is exceeded |
| 16 | + - Fixed a problem with non-serialized accesses to parallelized SYRK by concurrent |
| 17 | + callers |
| 18 | + - Fixed an ABI mismatch in the internal version of CDOT/ZDOT used by the C fallback |
| 19 | + versions of the LAPACK source |
| 20 | + - Improved the f_check script for detecting the Fortran compiler to handle embedded |
| 21 | + dashes in path names |
| 22 | + - Fixed several memory access issues in the utests that were detected by Address |
| 23 | + Sanitizer |
| 24 | + - Fixed Makefile errors in cases where only a subset of precision types was selected |
| 25 | + - Fixed missing function errors in Makefile builds without LAPACK or without threads |
| 26 | + - Fixed a syntax error in the benchmarks Makefile |
| 27 | + - Fixed compiler warnings in the CBLAS testsuite |
| 28 | + - Fixed the OpenMP compiler option used with the Intel Ifx compiler |
| 29 | + - Updated the README sections on supported cpus and operating systems, and added |
| 30 | + notes pertaining to JAVA |
| 31 | + - Updated the documentation page for supported BLAS-like extensions |
| 32 | + - included fixes from the Reference-LAPACK project: |
| 33 | + - Improved step length selection in the fallback path of ?LAED4 |
| 34 | + (Reference-LAPACK PR 1191) |
| 35 | + - Rounding up of LWORK and removal of redundant type conversions in the GVD |
| 36 | + functions (Reference-LAPACK PR 1202) |
| 37 | + - internal errors were getting ignored in calculation of selected eigenvalues |
| 38 | + (Reference-LAPACK PR 1204) |
| 39 | + |
| 40 | +arm64: |
| 41 | + - Fixed a potential miscompilation of the SDOT/DDOT/DSDOT kernels |
| 42 | + - Fixed DYNAMIC_ARCH compilation with CMake and compilers lacking SVE support |
| 43 | + - Improved the performance of BGEMM and SBGEMM kernels for Neoverse V2 |
| 44 | + - Added optimized SSUM and DSUM kernels for Neoverse N1 |
| 45 | + - Added preliminary support for Neoverse V3 cpus as NEOVERSEV2 |
| 46 | + - Added cpu autodetection of Cortex A725 and X925 cpus |
| 47 | + - Fixed a CMake build problem with flang on Mac OS |
| 48 | + - Fixed build problems with gcc versions 12 and earlier that do not support fp16 |
| 49 | + - Fixed compilation of GEMM kernels for VORTEXM4/ARMV9SME without multithreading |
| 50 | + - Fixed the optimized CDOT/ZDOT kernel to compile with LLVM under Windows on Arm |
| 51 | + - Renamed the copy of the DllMain function used in static linking on MS Windows to |
| 52 | + OpenBLASDllMain to avoid symbol name conflicts with other libraries |
| 53 | + |
| 54 | +ioongarch64: |
| 55 | + - fixed POTRF returning wrong results on LA464 due to a wrong parameter setting |
| 56 | + |
| 57 | +power: |
| 58 | + - Fixed compilation problems caused by missing support for half-precision floats (FP16) |
| 59 | + - Fixed a potential miscompilation of the POWER10 DGEMV kernel by limiting its optimization |
| 60 | + level |
| 61 | + - Fixed a SCAL issue on PPCG4/PPC970 running Linux |
| 62 | + - Worked around a SCAL issue on PPC970 running FreeBSD by switching to the generic C kernels |
| 63 | + |
| 64 | +riscv64: |
| 65 | + - Optimized the CROT/ZROT kernel for vector length 128 in the non-unit stride path |
| 66 | + - Improved SBGEMM/SHGEMM and related helper functions for type conversion |
| 67 | + - Fixed probing for BFLOAT16 support in DYNAMIC_ARCH cpu detection at runtime |
| 68 | + |
| 69 | +x86_64: |
| 70 | + - Fixed a potential miscompilation (by gcc 15.x) of the AVX512 SGEMM kernel for "small" |
| 71 | + matrix sizes |
| 72 | + - Fixed the SROT and DROT kernels for Haswell to have consistent (FMA) rounding |
| 73 | + in the main loop and tail call |
| 74 | + - Added automatic detection of Intel Arrow Lake H/U, Panther Lake and Jasper Lake |
| 75 | + - Added automatic detection of Intel Emerald Rapids and upcoming cpu models |
| 76 | + - Updated the cache size translation table in the cpu model autodetection code |
| 77 | + - Improved cpu detection fallback to also include Nehalem as a non-AVX option |
| 78 | + - Fixed a Makefile build issue with clang and the SkylakeX SGEMM kernel |
| 79 | + - Renamed the copy of the DllMain function used in static linking on MS Windows to |
| 80 | + OpenBLASDllMain to avoid symbol name conflicts with other libraries |
| 81 | + |
| 82 | +wasm: |
| 83 | + - Added optimized intrinsics kernels for SGEMM and DGEMM as well as DOT, ROT and SUM |
| 84 | + |
2 | 85 | ==================================================================== |
3 | 86 | Version 0.3.31 |
4 | | -15-Jan-2025 |
| 87 | +15-Jan-2026 |
5 | 88 |
|
6 | 89 | general: |
7 | 90 | - reverted a matrix partitioning optimization from 0.3.30 that could lead to |
|
0 commit comments