Releases: ROCm/rocMLIR
Releases · ROCm/rocMLIR
rocm-7.2.2
ROCm release v7.2
rocm-7.2.1
ROCm release v7.2
rocm-7.2.0
What's Changed
- Add option to disable rocprof in perfRunner by @mirza-halilcevic in #1975
- Enhance README to include attention kernel support by @dorde-antic in #2000
- Ensure CI fails without performance report generation by @dorde-antic in #2021
- rocmlir-tuning-driver: run compilation passes from scratch by @dhernandez0 in #2013
- Add rock.blockwise_load_tile to encapsulate the logic of loads by @dhernandez0 in #1988
- Introduce native arch support in arch database by @mirza-halilcevic in #1790
- Don't pick gfx1101 on nightly and weekly runs by @dorde-antic in #2024
- Lower 1D and 2D backward convolution ops using tosa::CustomOp by @pabloantoniom in #2016
- Use
tosa.mulfor the broadcasting instead oftosa.addby @umangyadav in #2032 - fix PrGemmSplitK test to actually use split-k > 1 by @dhernandez0 in #2037
- Direct to LDS by @dhernandez0 in #1906
- Refactor common tosa functions into
tosaUtilsby @umangyadav in #2035 - Annotate liveness pass (+ reuseLDS pass changes) by @dhernandez0 in #2015
- [EXTERNAL][SROA] Add Stored Value Size Check for Tree-Structured Merge by @justinrosner in #2041
- Missing rockCompiler dependency on
hsa-runtime64for thefind_package()command by @apwojcik in #2038 - Add fp4 MFMA instruction and selection logic by @umangyadav in #2045
- Check for multiple fusion root ops by @umangyadav in #2042
- Add missing direct to LDS tests by @dhernandez0 in #2046
- Add g+g problem configs to tier1 model list by @bdevorem in #2023
- Tuning problems - onnxruntime examples collected on MI350 by @ahsan-ca in #2001
- Add support for backwards Convolution3D lowering using
tosa::CustomOpby @pabloantoniom in #2034 - Quick tune strix navi48 by @ahsan-ca in #2014
- Use rock.blockwise_load_tile for attention by @dhernandez0 in #1980
- Add proper CPU lowering for 2D
tosa::CustomOpwith arbitrary stride, dilation, and input padding by @justinrosner in #2007 - Fix minor things to remove warnings and follow coding standard by @dhernandez0 in #2051
- Fix BlockwiseGemmAccel bug by @dhernandez0 in #2050
- Append GEMM+GEMM, strix navi48, onnxruntime examples on MI350 in appropriate files by @dorde-antic in #2052
- Refactor Python files to align with enabled Python tidy and formatter by @dorde-antic in #1997
- Trigger Github Actions CI if py files are changed by @dorde-antic in #2058
- Flake8 issues after merge by @dorde-antic in #2061
- Bug fix related to perfconfig attribute by @dorde-antic in #2065
- Fix usage of
host-pipeline=highlevelby @justinrosner in #2020 - Fix Bwd Data Conv fusion check by @justinrosner in #2039
- Tuning speedup by @mirza-halilcevic in #2064
- Extend flake8 ignore list by @dorde-antic in #2067
- Correct minNumCU for Navi2X by @justinrosner in #2055
- Use exlicit dtypes in tuning smoke tests by @umangyadav in #2068
- Add support for
migraphx.equalop by @justinrosner in #2060 - Check launch configurations in
RockToGPUpass by @pabloantoniom in #2054 - Add scale parameter for all Rock Gemm Ops by @umangyadav in #2049
- CI: Clean space on agent before running any commands by @leo-amd in #2066
- Update jenkins docker to rocm7.0 by @djramic in #1998
- Add support to generate scaled GEMM generation from rocmlir-gen by @umangyadav in #2056
- Fix nightly CI issues when rocprof is disabled by @mirza-halilcevic in #2057
- add multi threaded compilation for rocmlir-tuning-driver by @umangyadav in #2071
- Update LSESeqLen1 pattern matching by @justinrosner in #2074
- Update TosaToRock for new attention patterns by @justinrosner in #1973
- Use explicit elementType in blockwiseLoadTile by @umangyadav in #2078
- October upstream merge by @justinrosner in #2027
- Add conversion pass to handle gpu.memcpy on 4bits by @umangyadav in #2040
- Add gfx950 to Jenkins CI by @djramic in #2006
- Add support for nested pipelining by @dhernandez0 in #2069
- Refactor BlockwiseMatrixParamsAttr by @dhernandez0 in #2059
- Fix addPassThroughIndices for the
pos=0case by @umangyadav in #2083 - Limit the number of LIT workers on gfx950 by @justinrosner in #2085
- Applicability refactor by @dhernandez0 in #2086
- [UPSTREAM] Pull down upstream changes to fix gfx950 functional failure by @justinrosner in #2082
- Allow tile size of 16 (m or n) for decoding phase by @dhernandez0 in #2095
- Changes to calculate subbyte packed LDS size by @umangyadav in #2081
- Add lowering for the FP4 scaled Gemms by @umangyadav in #2077
- Add Fp4 lowering from MIGraphX Dialect by @umangyadav in #2089
- CI: Workaround for docker pull by @leo-amd in #2100
- Pipelining for attention by @dhernandez0 in #1990
- Fix MIXR Fp4 splitK Random float tests by @umangyadav in #2102
- GQA optimization migraphx integration by @dhernandez0 in #2093
- Limit maxGlobalToLDSVectorLen according to available hardware by @justinrosner in #2097
- Do not run benchmark-config.mlir if there's no wmma or mfma by @dhernandez0 in #2105
- Try kpackPerBlock 2 and 4 for attention by @dhernandez0 in #2107
- Fix split-k initialization behavior by @justinrosner in #2103
- Add normalization to relDiff computation by @justinrosner in #2104
- Add new causal mask pattern match in TosaToRock by @justinrosner in #2080
- Fix 'Cannot scavenge register in FI elimination' crash by @justinrosner in #2111
- Use bigger MLIR_N_REPEATS, WARMUP_ITERATIONS and SLEEP_US for perfRunner by @dhernandez0 in #2113
- Fix num_cu bug in convert-rock-to-gpu by @dhernandez0 in #2114
- Modify perfConfigs to tune for nPerWave by @dhernandez0 in #2088
- Revert workaround for docker pull by @umangyadav in #2115
- Add logic to do instruction cache flushing before doing each benchmarking iteration by @umangyadav in #2110
- Enable scheduleV2 for the Fp4 GEMMs by @umangyadav in #2112
- Use constrained random range for bf16/f16 E2E tests by @justinrosner in #2116
- MIGraphX flash decoding (splitKV) integration by @justinrosner in #2101
- Enable DirectToLDS for the FP4 GEMMs by @umangyadav in #2117
- Report error when invalid transformMap is created by @umangyadav in #2120
- Diff-base detection in Github Actions CI by @dorde-antic in #2122
- Fetch number of XCCs via HIP by @mirza-halilcevic in #2048
- Fix nonAccel perfconfig by @dhernandez0 in #2124
- Quick-tuning parameter lookup improvements by @mirza-halilcevic in #1994
- Fix usage of
isModuleFusiblein rocmlir-tuning-driver by @justinrosner in #2070 - Reduce noise in rocmlir-tuning-driver when measuring small kernels by @pabloantoniom in #2119
- [EXTERNAL] Skip ROCM Integration tests by @justinrosner in #2134
- CI: Fix for reference is not a tree by @leo-amd in #2106
- Use constrained range for the scaled gemms on PR tests by @umangyadav in https://gith...
rocm-7.1.1
ROCm release v7.1.1
rocm-7.1
What's Changed
- Fix multi buffer test on gfx950 by @djramic in #1912
- Update Docker image to 6.4.2 rocm version by @stefankoncarevic in #1916
- Fix Docker image tag (rocm6.4 instead of rocm6.4.2) by @stefankoncarevic in #1917
- Allow softmax type conversion to happen before or after elementwise ops in attention by @umangyadav in #1911
- Remove GPUToMIGraphX passes by @umangyadav in #1921
- Find first gemm index after fusing linalg.generic ops by @dhernandez0 in #1922
- Obtain new Tier1 tuning problems from MIGraphX by @aarushjain29 in #1873
- [CI] Improve error handling and validation in Jenkins pipeline, tuna-script and tuningRunner by @dorde-antic in #1913
- [CI] Increases the number of lit workers based on the GPU arch. by @stefankoncarevic in #1919
- tuna-script: validate tuning file for the presence of data by @dorde-antic in #1929
- Fix llvm::SmallVector error on x86 MSVC by @sooknarine in #1932
- Remove redundant attributes from Rock ops by @justinrosner in #1910
- Update minCU count for MI308 by @umangyadav in #1927
- Introduce new quick tune lists based on Tier1 configs and separated by architecture by @mirza-halilcevic in #1907
- Fix wrong check in rocmlir-gen and other bugs in perfRunner by @dhernandez0 in #1936
- Parameter Sweeps for Attention: Check all outputs, log failures, avoid kernel repeats by @dorde-antic in #1914
- [EXTERNAL] Cherry pick fix for const folding of immediate args by @umangyadav in #1939
- Use target branch for the premerge checks by @umangyadav in #1942
- [DO NOT SQUASH] rock.global_load_to_lds for direct to LDS by @dhernandez0 in #1905
- Align parameterSweeps with new layout handling in perfRunner by @dorde-antic in #1940
- Update tests to excludes unsupported tests on Navi2x by @umangyadav in #1943
- Change SeqLen to 384 from 1 in accuracy checker scripts by @umangyadav in #1948
- Add regularization for multiple linalgs in preSoftmaxBody in Attention Ops by @umangyadav in #1950
- Fix multi_buffer LIT test and correct lit.cfg files by @justinrosner in #1952
- Fix incorrect fusion check by @justinrosner in #1956
- Add backwards data convolution op to MIGraphX dialect by @justinrosner in #1946
- Changed node selection by @leo-amd in #1881
- Refactor and fix creation of ElementWise Region for Gemm+Gemm like ops by @umangyadav in #1960
- Fix rocmlir-gen device selection by @djramic in #1964
- Refactor BlockwiseGemmAccelOp to take registers as well by @dhernandez0 in #1926
- Upstream merge 56 by @dorde-antic in #1934
- Jenkins: Robust SCM checkout (handles shallow fetch) + clearer stage layout by @leo-amd in #1963
- Update mixr-gemm-gemm tests for unsupported arch by @justinrosner in #1968
- Add ninja compile and link pools by @trixirt in #1953
- CI. Do not reboot nodes on stages failures by @leo-amd in #1970
- CI: Add Retry Logic for SCM Network Failures by @leo-amd in #1971
- CI: Enable Fail-Fast for Parallel Pipeline Stages by @leo-amd in #1972
- Use migraphx image in migraphx CI stage by @mirza-halilcevic in #1976
- Attention: split-kv implementation by @dhernandez0 in #1895
- Add/update
getEffectsfor Rock ops by @justinrosner in #1959 - Improve GPU results validation for subnorm values by @justinrosner in #1962
- gemm+gemm split-k by @dhernandez0 in #1969
- Allow split-k for bwd-weight ops by @justinrosner in #1955
- CI: Improve resiliency by retrying stages on agent failure by @leo-amd in #1981
- Remove irrelevant outdated examples by @umangyadav in #1985
- Implement python script for handling new configs by @dorde-antic in #1924
- Remove reverse_grid by @dhernandez0 in #1987
- Add remove alloc pass by @justinrosner in #1992
- Add TosaToRock support for transpose_conv2d by @justinrosner in #1951
- Revert workaround for createFirstGemmNegInfPadding on gfx11 by @dhernandez0 in #1993
- Use real data type after input fusions in attention using
getInputFusionElementTypeby @pabloantoniom in #1982 - Improve tuning-driver by @mirza-halilcevic in #1966
- Update conv creation to use prefill flags by @justinrosner in #1949
- Python tidy and formatter by @dorde-antic in #1978
- Update Dockerfiles to rocm 7.0 by @djramic in #1991
- Group Query Attention (GQA) optimization by @dhernandez0 in #1984
- Fix recursion error in parameterSweeps by @justinrosner in #1995
- September Upstream merge by @umangyadav in #1974
- Add verifier for migraphx.reshape by @justinrosner in #1999
- [EXTERNAL] Fix v_mov_b16_t16 index in folding pass by @justinrosner in #2011
- Fix silent parameterSweeps errors and issues in V4R1 path by @justinrosner in #2009
- Update MI350 quick-tune lists by @mirza-halilcevic in #2008
- CI: Exclude f32 Attention Configs for Navi by @dorde-antic in #2003
- Move CSE out of MIGraphXToTosaPass by @justinrosner in #2012
- Add LIT test for gfx1201 backend bug by @justinrosner in #2018
- [EXTERNAL] Undo changes in AMDGPUPromoteAlloca in order to unblock our CI by @pabloantoniom in #2028
- [7.1][EXTERNAL][SROA] Add Stored Value Size Check for Tree-Structured Merge by @justinrosner in #2044
New Contributors
- @sooknarine made their first contribution in #1932
Full Changelog: rocm-7.0.2...rocm-7.1
rocm-7.0.2
What's Changed
- [7.0][UPSTREAM BACKPORT] Fix runtime unrolling when cascaded GEPs present by @justinrosner in #1996
Full Changelog: rocm-7.0.1...rocm-7.0.2
rocm-7.0.1
What's Changed
- Add E2E test for the OCP Fp8 fused kernel with QuantizeLinear and DeQuantizeLinear by @umangyadav in #1747
- [TOSA] Set
accTypeto Float16 for the Fp8 types by @umangyadav in #1745 - Remove scheduling barrier hack for LDS barrier lowering by @dhernandez0 in #1749
- Fixes for group conv emit-key by @dhernandez0 in #1748
- Fix performance for non-standard layouts by @dhernandez0 in #1741
- [6.4]Fix bug when both A and B are broadcasted (FoldBroadcast pass) by @dhernandez0 in #1744
- [TOSA] Fix accType for the Quant Convolutions as well by @umangyadav in #1752
- [6.4] Update gfx12 target in AmdArchDB by @TedThemistokleous in #1746
- Add Fp8 to quick-tuning by @djramic in #1753
- Add bf16 to tuning runner by @djramic in #1739
- Enable output swizzle for multiple outputs by @dhernandez0 in #1750
- Use AddDim for unit input dimensions to help getMaxVectorization() by @dhernandez0 in #1755
- [DO NOT SQUASH] Enable atomic add bf16 reduction and split-k for Navi4x by @dhernandez0 in #1732
- Enable bf16 atomic add for gfx950 by @dhernandez0 in #1734
- Add test from SWDEV-518130 by @dhernandez0 in #1757
- [6.4]fix compilation with HIP SDK 6.3 for Windows by @apwojcik in #1742
- Add lookup for more layouts in PerfRunner and Add an option for verifying each perfConfig with tuningRunner by @umangyadav in #1758
- Rocmlir tuning driver datatype fix by @dorde-antic in #1761
- [CI] Added gfx942 architecture to the 'Tune MLIR kernels' stage by @stefankoncarevic in #1733
- Fix dependency graph creation in RockPipeline and not generate loops with negative iterations by @umangyadav in #1760
- Fix GlobalLoad 4b lowering by @dhernandez0 in #1764
- Improve performance of quantizelinear for int4 by @dhernandez0 in #1706
- Add fp8 convolution to the tuning runner by @djramic in #1738
- Introduce perfConfig V3 with param to select different schedule by @umangyadav in #1767
- Support for causal attention and more strict checks for KV-Cache by @dhernandez0 in #1770
- Fix generateMlirDriverCommandLine for attention in perfRunner by @dhernandez0 in #1773
- Remove hasValidChip() from ConvGenerator by @dorde-antic in #1771
- Use MLIR based kernels for verification in MIGraphX stage by @umangyadav in #1766
- [DO NOT SQUASH] March LLVM upstream merge by @dhernandez0 in #1763
- Add requirements.txt file and modify Dockerfile by @dorde-antic in #1776
- Add checks for uid and devices by @causten in #1777
- Fix Dockerfile URL for requirements.txt by @stefankoncarevic in #1778
- Adjust Dockerfile for Separate hip-python Installation by @stefankoncarevic in #1781
- Skip unsupported datatypes in perfRunner by @djramic in #1780
- Fix initialization for split-k by @dhernandez0 in #1784
- Use hip-python API instead of rocm_agent_enumerator by @dorde-antic in #1762
- Recover split-k fusion tests removed in last upstream merge by @dhernandez0 in #1785
- Add hip-python to requirements.txt and update LLVM version by @dhernandez0 in #1787
- Fix split-k fusion when there are two or more consecutive linalg.genericops by @dhernandez0 in #1782
- Remove Machine Names Due to Security Team Advisory by @stefankoncarevic in #1788
- Remove fp8 check on nightly CI. by @stefankoncarevic in #1789
- [DO NOT SQUASH] upstream merge for sprint 48 by @dhernandez0 in #1786
- Move requirements.txt -> pip_requirements.txt due to issues with cget by @dhernandez0 in #1792
- Python script for testing metrics and plotting correlations by @dorde-antic in #1769
- Fix attention bugs (swap thread and iter when Q LDS is bypassed and bf16 tests) by @dhernandez0 in #1797
- Sort Dimensions based on Layout in case of input fusion by @umangyadav in #1793
- Fix kernel generation when kernelRepeats are more than 1 by @umangyadav in #1799
- Workaround issue 1802 by @dhernandez0 in #1800
- Add Gemm+Elementwise+Gemm support by @dhernandez0 in #1774
- Add dependencies for rocprofv3 by @djramic in #1801
- Remove perfTest from Jenkins by @dhernandez0 in #1803
- Add Tier1 model configs to rocMLIR by @dorde-antic in #1794
- GEMM+GEMM migraphx integration by @dhernandez0 in #1791
- Fix for issue 1802 workaround by @dhernandez0 in #1806
- Update MI300 quick-tuning list by @mirza-halilcevic in #1765
- gemm+gemm: extend allowed types by @dhernandez0 in #1795
- Bump Dockerfiles to rocm-6.4 by @dorde-antic in #1808
- Disable code coverage on nightly and weekly CI, and expand it to run on WMMA by @mirza-halilcevic in #1813
- Fix grep ROCM_VERSION in Docker image build by @djramic in #1814
- Add GEMM scheduleV2 by @umangyadav in #1772
- Modify Tier1 models tuning problems by @dorde-antic in #1810
- Prepare Jenkinsfiles for rocm-6.4 by @dorde-antic in #1809
- Remove unused files by @dhernandez0 in #1804
- Update AmdArchDb.cpp with gfx950 target info by @mirza-halilcevic in #1802
- Add pybind11 to pip_requirements.txt by @mirza-halilcevic in #1816
- Use migraphx.greater instead of migraphx.greater_or_equal by @dhernandez0 in #1827
- Change rounding mode for FP32 to Fp16 truncation by @umangyadav in #1833
- Implement with_attn_bias in AttentionConfiguration by @dorde-antic in #1834
- Add rocprofv3 to perfRunner by @djramic in #1779
- Fix rocm version in migraphx CI docker image by @djramic in #1837
- Upstream merge sprint 50 by @djramic in #1815
- [CI] Set 3600s test timeout and update LIT worker configuration by @stefankoncarevic in #1832
- Remove hardcoded value for render group id in Dockerfile by @umangyadav in #1839
- add back render group but do not assign GID by @umangyadav in #1843
- Causal attention by @dhernandez0 in #1829
- Correct rocprof invocation in fusion benchmarking path. by @stefankoncarevic in #1841
- conv+gemm support by @dhernandez0 in #1820
- Problem config for tier 1 models by @aarushjain29 in #1836
- conv+gemm migraphx integration by @dhernandez0 in #1823
- Separate new Tier1 tuning problems by @dorde-antic in #1849
- Disable test temporarily to pass CI by @umangyadav in #1850
- Implement GQA in AttentionConfiguration by @dorde-antic in #1847
- Correct layout map access in MLIROnlyConfig by @stefankoncarevic in #1855
- Add missing LDS barriers to attention by @dhernandez0 in #1853
- Causal masking: migraphx integration by @dhernandez0 in #1831
- Updated ATTN_TEST_PARAMETERS in reportUtils.py by @stefankoncarevic in #1858
- [CLONE] Add CI node checks and retries. Refactored the pipeline to resolve compilation errors and address incorrect syntax by @umangyadav in #1835
- Modify CI to use Tier1 and rotate through configs by @dorde-antic in #1840
- Allow retries for failing tests / Remove failing tests by @dorde-antic in #1819
- Print rocm version and permissions for
/dev/dri/dev/kfdby @umangyadav in https://github.com/ROCm/rocM...
rocm-6.4.3
What's Changed
- No changes since rocm-6.4.2
rocm-6.4.2
What's Changed
- [6.4][BACKPORT] Update MI300 quick-tuning list by @mirza-halilcevic in #1812
- [6.4][Backport] Backport some attention bugfixes + causal attention by @umangyadav in #1811
- [HOTFIX][BACKPORT] Manually add missing perf config for MI200 to avoid perf regression by @umangyadav in #1818
- [BACKPORT] Bump LLVM to pick fixes for Gfx12 Hazards by @umangyadav in #1824
- [BACKPORT] Keep python3.6 for SLES, RHEL builds by @umangyadav in #1825
Full Changelog: rocm-6.4.0...rocm-6.4.2
rocm-6.4.1
What's Changed
- [6.4][BACKPORT] Update MI300 quick-tuning list by @mirza-halilcevic in #1812
- [6.4][Backport] Backport some attention bugfixes + causal attention by @umangyadav in #1811
- [HOTFIX][BACKPORT] Manually add missing perf config for MI200 to avoid perf regression by @umangyadav in #1818
- [BACKPORT] Bump LLVM to pick fixes for Gfx12 Hazards by @umangyadav in #1824
- [BACKPORT] Keep python3.6 for SLES, RHEL builds by @umangyadav in #1825
Full Changelog: rocm-6.4.0...rocm-6.4.1