Skip to content

Commit 52178f7

Browse files
authored
Merge pull request #5703 from martin-frbg/changelog0332
Update Changelog for 0.3.32
2 parents f88aa7d + 6137054 commit 52178f7

File tree

1 file changed

+84
-1
lines changed

1 file changed

+84
-1
lines changed

Changelog.txt

Lines changed: 84 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,90 @@
11
OpenBLAS ChangeLog
2+
====================================================================
3+
Version 0.3.32
4+
23-Mar-2026
5+
6+
general:
7+
- Moved the preliminary support for a Web Assembly target to its own WASM
8+
architecture and WASM128_GENERIC target
9+
- Fixed a potential performance difference between dedicated compilation for
10+
a target and its representation in DYNAMIC_ARCH builds by making additional
11+
cpu-specific parameters available to the DYNAMIC_ARCH configuration
12+
- Fixed the reimplementation of LAPACK ?GESV to conform to the reference (i.e.
13+
compute the LU factorization even when NRHS is zero)
14+
- Improved the error message that is displayed when the compile-time allocation
15+
of memory buffers is exceeded
16+
- Fixed a problem with non-serialized accesses to parallelized SYRK by concurrent
17+
callers
18+
- Fixed an ABI mismatch in the internal version of CDOT/ZDOT used by the C fallback
19+
versions of the LAPACK source
20+
- Improved the f_check script for detecting the Fortran compiler to handle embedded
21+
dashes in path names
22+
- Fixed several memory access issues in the utests that were detected by Address
23+
Sanitizer
24+
- Fixed Makefile errors in cases where only a subset of precision types was selected
25+
- Fixed missing function errors in Makefile builds without LAPACK or without threads
26+
- Fixed a syntax error in the benchmarks Makefile
27+
- Fixed compiler warnings in the CBLAS testsuite
28+
- Fixed the OpenMP compiler option used with the Intel Ifx compiler
29+
- Updated the README sections on supported cpus and operating systems, and added
30+
notes pertaining to JAVA
31+
- Updated the documentation page for supported BLAS-like extensions
32+
- included fixes from the Reference-LAPACK project:
33+
- Improved step length selection in the fallback path of ?LAED4
34+
(Reference-LAPACK PR 1191)
35+
- Rounding up of LWORK and removal of redundant type conversions in the GVD
36+
functions (Reference-LAPACK PR 1202)
37+
- internal errors were getting ignored in calculation of selected eigenvalues
38+
(Reference-LAPACK PR 1204)
39+
40+
arm64:
41+
- Fixed a potential miscompilation of the SDOT/DDOT/DSDOT kernels
42+
- Fixed DYNAMIC_ARCH compilation with CMake and compilers lacking SVE support
43+
- Improved the performance of BGEMM and SBGEMM kernels for Neoverse V2
44+
- Added optimized SSUM and DSUM kernels for Neoverse N1
45+
- Added preliminary support for Neoverse V3 cpus as NEOVERSEV2
46+
- Added cpu autodetection of Cortex A725 and X925 cpus
47+
- Fixed a CMake build problem with flang on Mac OS
48+
- Fixed build problems with gcc versions 12 and earlier that do not support fp16
49+
- Fixed compilation of GEMM kernels for VORTEXM4/ARMV9SME without multithreading
50+
- Fixed the optimized CDOT/ZDOT kernel to compile with LLVM under Windows on Arm
51+
- Renamed the copy of the DllMain function used in static linking on MS Windows to
52+
OpenBLASDllMain to avoid symbol name conflicts with other libraries
53+
54+
ioongarch64:
55+
- fixed POTRF returning wrong results on LA464 due to a wrong parameter setting
56+
57+
power:
58+
- Fixed compilation problems caused by missing support for half-precision floats (FP16)
59+
- Fixed a potential miscompilation of the POWER10 DGEMV kernel by limiting its optimization
60+
level
61+
- Fixed a SCAL issue on PPCG4/PPC970 running Linux
62+
- Worked around a SCAL issue on PPC970 running FreeBSD by switching to the generic C kernels
63+
64+
riscv64:
65+
- Optimized the CROT/ZROT kernel for vector length 128 in the non-unit stride path
66+
- Improved SBGEMM/SHGEMM and related helper functions for type conversion
67+
- Fixed probing for BFLOAT16 support in DYNAMIC_ARCH cpu detection at runtime
68+
69+
x86_64:
70+
- Fixed a potential miscompilation (by gcc 15.x) of the AVX512 SGEMM kernel for "small"
71+
matrix sizes
72+
- Fixed the SROT and DROT kernels for Haswell to have consistent (FMA) rounding
73+
in the main loop and tail call
74+
- Added automatic detection of Intel Arrow Lake H/U, Panther Lake and Jasper Lake
75+
- Added automatic detection of Intel Emerald Rapids and upcoming cpu models
76+
- Updated the cache size translation table in the cpu model autodetection code
77+
- Improved cpu detection fallback to also include Nehalem as a non-AVX option
78+
- Fixed a Makefile build issue with clang and the SkylakeX SGEMM kernel
79+
- Renamed the copy of the DllMain function used in static linking on MS Windows to
80+
OpenBLASDllMain to avoid symbol name conflicts with other libraries
81+
82+
wasm:
83+
- Added optimized intrinsics kernels for SGEMM and DGEMM as well as DOT, ROT and SUM
84+
285
====================================================================
386
Version 0.3.31
4-
15-Jan-2025
87+
15-Jan-2026
588

689
general:
790
- reverted a matrix partitioning optimization from 0.3.30 that could lead to

0 commit comments

Comments
 (0)