Releases · ROCm/rocMLIR

Add option to disable rocprof in perfRunner by @mirza-halilcevic in #1975
Enhance README to include attention kernel support by @dorde-antic in #2000
Ensure CI fails without performance report generation by @dorde-antic in #2021
rocmlir-tuning-driver: run compilation passes from scratch by @dhernandez0 in #2013
Add rock.blockwise_load_tile to encapsulate the logic of loads by @dhernandez0 in #1988
Introduce native arch support in arch database by @mirza-halilcevic in #1790
Don't pick gfx1101 on nightly and weekly runs by @dorde-antic in #2024
Lower 1D and 2D backward convolution ops using tosa::CustomOp by @pabloantoniom in #2016
Use tosa.mul for the broadcasting instead of tosa.add by @umangyadav in #2032
fix PrGemmSplitK test to actually use split-k > 1 by @dhernandez0 in #2037
Direct to LDS by @dhernandez0 in #1906
Refactor common tosa functions into tosaUtils by @umangyadav in #2035
Annotate liveness pass (+ reuseLDS pass changes) by @dhernandez0 in #2015
[EXTERNAL][SROA] Add Stored Value Size Check for Tree-Structured Merge by @justinrosner in #2041
Missing rockCompiler dependency on hsa-runtime64 for the find_package() command by @apwojcik in #2038
Add fp4 MFMA instruction and selection logic by @umangyadav in #2045
Check for multiple fusion root ops by @umangyadav in #2042
Add missing direct to LDS tests by @dhernandez0 in #2046
Add g+g problem configs to tier1 model list by @bdevorem in #2023
Tuning problems - onnxruntime examples collected on MI350 by @ahsan-ca in #2001
Add support for backwards Convolution3D lowering using tosa::CustomOp by @pabloantoniom in #2034
Quick tune strix navi48 by @ahsan-ca in #2014
Use rock.blockwise_load_tile for attention by @dhernandez0 in #1980
Add proper CPU lowering for 2D tosa::CustomOp with arbitrary stride, dilation, and input padding by @justinrosner in #2007
Fix minor things to remove warnings and follow coding standard by @dhernandez0 in #2051
Fix BlockwiseGemmAccel bug by @dhernandez0 in #2050
Append GEMM+GEMM, strix navi48, onnxruntime examples on MI350 in appropriate files by @dorde-antic in #2052
Refactor Python files to align with enabled Python tidy and formatter by @dorde-antic in #1997
Trigger Github Actions CI if py files are changed by @dorde-antic in #2058
Flake8 issues after merge by @dorde-antic in #2061
Bug fix related to perfconfig attribute by @dorde-antic in #2065
Fix usage of host-pipeline=highlevel by @justinrosner in #2020
Fix Bwd Data Conv fusion check by @justinrosner in #2039
Tuning speedup by @mirza-halilcevic in #2064
Extend flake8 ignore list by @dorde-antic in #2067
Correct minNumCU for Navi2X by @justinrosner in #2055
Use exlicit dtypes in tuning smoke tests by @umangyadav in #2068
Add support for migraphx.equal op by @justinrosner in #2060
Check launch configurations in RockToGPU pass by @pabloantoniom in #2054
Add scale parameter for all Rock Gemm Ops by @umangyadav in #2049
CI: Clean space on agent before running any commands by @leo-amd in #2066
Update jenkins docker to rocm7.0 by @djramic in #1998
Add support to generate scaled GEMM generation from rocmlir-gen by @umangyadav in #2056
Fix nightly CI issues when rocprof is disabled by @mirza-halilcevic in #2057
add multi threaded compilation for rocmlir-tuning-driver by @umangyadav in #2071
Update LSESeqLen1 pattern matching by @justinrosner in #2074
Update TosaToRock for new attention patterns by @justinrosner in #1973
Use explicit elementType in blockwiseLoadTile by @umangyadav in #2078
October upstream merge by @justinrosner in #2027
Add conversion pass to handle gpu.memcpy on 4bits by @umangyadav in #2040
Add gfx950 to Jenkins CI by @djramic in #2006
Add support for nested pipelining by @dhernandez0 in #2069
Refactor BlockwiseMatrixParamsAttr by @dhernandez0 in #2059
Fix addPassThroughIndices for the pos=0 case by @umangyadav in #2083
Limit the number of LIT workers on gfx950 by @justinrosner in #2085
Applicability refactor by @dhernandez0 in #2086
[UPSTREAM] Pull down upstream changes to fix gfx950 functional failure by @justinrosner in #2082
Allow tile size of 16 (m or n) for decoding phase by @dhernandez0 in #2095
Changes to calculate subbyte packed LDS size by @umangyadav in #2081
Add lowering for the FP4 scaled Gemms by @umangyadav in #2077
Add Fp4 lowering from MIGraphX Dialect by @umangyadav in #2089
CI: Workaround for docker pull by @leo-amd in #2100
Pipelining for attention by @dhernandez0 in #1990
Fix MIXR Fp4 splitK Random float tests by @umangyadav in #2102
GQA optimization migraphx integration by @dhernandez0 in #2093
Limit maxGlobalToLDSVectorLen according to available hardware by @justinrosner in #2097
Do not run benchmark-config.mlir if there's no wmma or mfma by @dhernandez0 in #2105
Try kpackPerBlock 2 and 4 for attention by @dhernandez0 in #2107
Fix split-k initialization behavior by @justinrosner in #2103
Add normalization to relDiff computation by @justinrosner in #2104
Add new causal mask pattern match in TosaToRock by @justinrosner in #2080
Fix 'Cannot scavenge register in FI elimination' crash by @justinrosner in #2111
Use bigger MLIR_N_REPEATS, WARMUP_ITERATIONS and SLEEP_US for perfRunner by @dhernandez0 in #2113
Fix num_cu bug in convert-rock-to-gpu by @dhernandez0 in #2114
Modify perfConfigs to tune for nPerWave by @dhernandez0 in #2088
Revert workaround for docker pull by @umangyadav in #2115
Add logic to do instruction cache flushing before doing each benchmarking iteration by @umangyadav in #2110
Enable scheduleV2 for the Fp4 GEMMs by @umangyadav in #2112
Use constrained random range for bf16/f16 E2E tests by @justinrosner in #2116
MIGraphX flash decoding (splitKV) integration by @justinrosner in #2101
Enable DirectToLDS for the FP4 GEMMs by @umangyadav in #2117
Report error when invalid transformMap is created by @umangyadav in #2120
Diff-base detection in Github Actions CI by @dorde-antic in #2122
Fetch number of XCCs via HIP by @mirza-halilcevic in #2048
Fix nonAccel perfconfig by @dhernandez0 in #2124
Quick-tuning parameter lookup improvements by @mirza-halilcevic in #1994
Fix usage of isModuleFusible in rocmlir-tuning-driver by @justinrosner in #2070
Reduce noise in rocmlir-tuning-driver when measuring small kernels by @pabloantoniom in #2119
[EXTERNAL] Skip ROCM Integration tests by @justinrosner in #2134
CI: Fix for reference is not a tree by @leo-amd in #2106
Use constrained range for the scaled gemms on PR tests by @umangyadav in https://gith...

Contributors

dhernandez0, bdevorem, and 10 other contributors

Assets 2

26 Nov 04:47

rocm-ci

rocm-7.1.1

3d7e854

rocm-7.1.1

ROCm release v7.1.1

Assets 2

04 Nov 23:50

causten

rocm-7.1

3d7e854

rocm-7.1

What's Changed

Fix multi buffer test on gfx950 by @djramic in #1912
Update Docker image to 6.4.2 rocm version by @stefankoncarevic in #1916
Fix Docker image tag (rocm6.4 instead of rocm6.4.2) by @stefankoncarevic in #1917
Allow softmax type conversion to happen before or after elementwise ops in attention by @umangyadav in #1911
Remove GPUToMIGraphX passes by @umangyadav in #1921
Find first gemm index after fusing linalg.generic ops by @dhernandez0 in #1922
Obtain new Tier1 tuning problems from MIGraphX by @aarushjain29 in #1873
[CI] Improve error handling and validation in Jenkins pipeline, tuna-script and tuningRunner by @dorde-antic in #1913
[CI] Increases the number of lit workers based on the GPU arch. by @stefankoncarevic in #1919
tuna-script: validate tuning file for the presence of data by @dorde-antic in #1929
Fix llvm::SmallVector error on x86 MSVC by @sooknarine in #1932
Remove redundant attributes from Rock ops by @justinrosner in #1910
Update minCU count for MI308 by @umangyadav in #1927
Introduce new quick tune lists based on Tier1 configs and separated by architecture by @mirza-halilcevic in #1907
Fix wrong check in rocmlir-gen and other bugs in perfRunner by @dhernandez0 in #1936
Parameter Sweeps for Attention: Check all outputs, log failures, avoid kernel repeats by @dorde-antic in #1914
[EXTERNAL] Cherry pick fix for const folding of immediate args by @umangyadav in #1939
Use target branch for the premerge checks by @umangyadav in #1942
[DO NOT SQUASH] rock.global_load_to_lds for direct to LDS by @dhernandez0 in #1905
Align parameterSweeps with new layout handling in perfRunner by @dorde-antic in #1940
Update tests to excludes unsupported tests on Navi2x by @umangyadav in #1943
Change SeqLen to 384 from 1 in accuracy checker scripts by @umangyadav in #1948
Add regularization for multiple linalgs in preSoftmaxBody in Attention Ops by @umangyadav in #1950
Fix multi_buffer LIT test and correct lit.cfg files by @justinrosner in #1952
Fix incorrect fusion check by @justinrosner in #1956
Add backwards data convolution op to MIGraphX dialect by @justinrosner in #1946
Changed node selection by @leo-amd in #1881
Refactor and fix creation of ElementWise Region for Gemm+Gemm like ops by @umangyadav in #1960
Fix rocmlir-gen device selection by @djramic in #1964
Refactor BlockwiseGemmAccelOp to take registers as well by @dhernandez0 in #1926
Upstream merge 56 by @dorde-antic in #1934
Jenkins: Robust SCM checkout (handles shallow fetch) + clearer stage layout by @leo-amd in #1963
Update mixr-gemm-gemm tests for unsupported arch by @justinrosner in #1968
Add ninja compile and link pools by @trixirt in #1953
CI. Do not reboot nodes on stages failures by @leo-amd in #1970
CI: Add Retry Logic for SCM Network Failures by @leo-amd in #1971
CI: Enable Fail-Fast for Parallel Pipeline Stages by @leo-amd in #1972
Use migraphx image in migraphx CI stage by @mirza-halilcevic in #1976
Attention: split-kv implementation by @dhernandez0 in #1895
Add/update getEffects for Rock ops by @justinrosner in #1959
Improve GPU results validation for subnorm values by @justinrosner in #1962
gemm+gemm split-k by @dhernandez0 in #1969
Allow split-k for bwd-weight ops by @justinrosner in #1955
CI: Improve resiliency by retrying stages on agent failure by @leo-amd in #1981
Remove irrelevant outdated examples by @umangyadav in #1985
Implement python script for handling new configs by @dorde-antic in #1924
Remove reverse_grid by @dhernandez0 in #1987
Add remove alloc pass by @justinrosner in #1992
Add TosaToRock support for transpose_conv2d by @justinrosner in #1951
Revert workaround for createFirstGemmNegInfPadding on gfx11 by @dhernandez0 in #1993
Use real data type after input fusions in attention using getInputFusionElementType by @pabloantoniom in #1982
Improve tuning-driver by @mirza-halilcevic in #1966
Update conv creation to use prefill flags by @justinrosner in #1949
Python tidy and formatter by @dorde-antic in #1978
Update Dockerfiles to rocm 7.0 by @djramic in #1991
Group Query Attention (GQA) optimization by @dhernandez0 in #1984
Fix recursion error in parameterSweeps by @justinrosner in #1995
September Upstream merge by @umangyadav in #1974
Add verifier for migraphx.reshape by @justinrosner in #1999
[EXTERNAL] Fix v_mov_b16_t16 index in folding pass by @justinrosner in #2011
Fix silent parameterSweeps errors and issues in V4R1 path by @justinrosner in #2009
Update MI350 quick-tune lists by @mirza-halilcevic in #2008
CI: Exclude f32 Attention Configs for Navi by @dorde-antic in #2003
Move CSE out of MIGraphXToTosaPass by @justinrosner in #2012
Add LIT test for gfx1201 backend bug by @justinrosner in #2018
[EXTERNAL] Undo changes in AMDGPUPromoteAlloca in order to unblock our CI by @pabloantoniom in #2028
[7.1][EXTERNAL][SROA] Add Stored Value Size Check for Tree-Structured Merge by @justinrosner in #2044

New Contributors

@sooknarine made their first contribution in #1932

Full Changelog: rocm-7.0.2...rocm-7.1

Contributors

sooknarine, trixirt, and 10 other contributors

Assets 2

14 Oct 23:59

causten

rocm-7.0.2

d0bcd5c

rocm-7.0.2

What's Changed

[7.0][UPSTREAM BACKPORT] Fix runtime unrolling when cascaded GEPs present by @justinrosner in #1996

Full Changelog: rocm-7.0.1...rocm-7.0.2

Contributors

justinrosner

Assets 2

19 Sep 17:29

causten

rocm-7.0.1

ac10652

rocm-7.0.1

What's Changed

Add E2E test for the OCP Fp8 fused kernel with QuantizeLinear and DeQuantizeLinear by @umangyadav in #1747
[TOSA] Set accType to Float16 for the Fp8 types by @umangyadav in #1745
Remove scheduling barrier hack for LDS barrier lowering by @dhernandez0 in #1749
Fixes for group conv emit-key by @dhernandez0 in #1748
Fix performance for non-standard layouts by @dhernandez0 in #1741
[6.4]Fix bug when both A and B are broadcasted (FoldBroadcast pass) by @dhernandez0 in #1744
[TOSA] Fix accType for the Quant Convolutions as well by @umangyadav in #1752
[6.4] Update gfx12 target in AmdArchDB by @TedThemistokleous in #1746
Add Fp8 to quick-tuning by @djramic in #1753
Add bf16 to tuning runner by @djramic in #1739
Enable output swizzle for multiple outputs by @dhernandez0 in #1750
Use AddDim for unit input dimensions to help getMaxVectorization() by @dhernandez0 in #1755
[DO NOT SQUASH] Enable atomic add bf16 reduction and split-k for Navi4x by @dhernandez0 in #1732
Enable bf16 atomic add for gfx950 by @dhernandez0 in #1734
Add test from SWDEV-518130 by @dhernandez0 in #1757
[6.4]fix compilation with HIP SDK 6.3 for Windows by @apwojcik in #1742
Add lookup for more layouts in PerfRunner and Add an option for verifying each perfConfig with tuningRunner by @umangyadav in #1758
Rocmlir tuning driver datatype fix by @dorde-antic in #1761
[CI] Added gfx942 architecture to the 'Tune MLIR kernels' stage by @stefankoncarevic in #1733
Fix dependency graph creation in RockPipeline and not generate loops with negative iterations by @umangyadav in #1760
Fix GlobalLoad 4b lowering by @dhernandez0 in #1764
Improve performance of quantizelinear for int4 by @dhernandez0 in #1706
Add fp8 convolution to the tuning runner by @djramic in #1738
Introduce perfConfig V3 with param to select different schedule by @umangyadav in #1767
Support for causal attention and more strict checks for KV-Cache by @dhernandez0 in #1770
Fix generateMlirDriverCommandLine for attention in perfRunner by @dhernandez0 in #1773
Remove hasValidChip() from ConvGenerator by @dorde-antic in #1771
Use MLIR based kernels for verification in MIGraphX stage by @umangyadav in #1766
[DO NOT SQUASH] March LLVM upstream merge by @dhernandez0 in #1763
Add requirements.txt file and modify Dockerfile by @dorde-antic in #1776
Add checks for uid and devices by @causten in #1777
Fix Dockerfile URL for requirements.txt by @stefankoncarevic in #1778
Adjust Dockerfile for Separate hip-python Installation by @stefankoncarevic in #1781
Skip unsupported datatypes in perfRunner by @djramic in #1780
Fix initialization for split-k by @dhernandez0 in #1784
Use hip-python API instead of rocm_agent_enumerator by @dorde-antic in #1762
Recover split-k fusion tests removed in last upstream merge by @dhernandez0 in #1785
Add hip-python to requirements.txt and update LLVM version by @dhernandez0 in #1787
Fix split-k fusion when there are two or more consecutive linalg.genericops by @dhernandez0 in #1782
Remove Machine Names Due to Security Team Advisory by @stefankoncarevic in #1788
Remove fp8 check on nightly CI. by @stefankoncarevic in #1789
[DO NOT SQUASH] upstream merge for sprint 48 by @dhernandez0 in #1786
Move requirements.txt -> pip_requirements.txt due to issues with cget by @dhernandez0 in #1792
Python script for testing metrics and plotting correlations by @dorde-antic in #1769
Fix attention bugs (swap thread and iter when Q LDS is bypassed and bf16 tests) by @dhernandez0 in #1797
Sort Dimensions based on Layout in case of input fusion by @umangyadav in #1793
Fix kernel generation when kernelRepeats are more than 1 by @umangyadav in #1799
Workaround issue 1802 by @dhernandez0 in #1800
Add Gemm+Elementwise+Gemm support by @dhernandez0 in #1774
Add dependencies for rocprofv3 by @djramic in #1801
Remove perfTest from Jenkins by @dhernandez0 in #1803
Add Tier1 model configs to rocMLIR by @dorde-antic in #1794
GEMM+GEMM migraphx integration by @dhernandez0 in #1791
Fix for issue 1802 workaround by @dhernandez0 in #1806
Update MI300 quick-tuning list by @mirza-halilcevic in #1765
gemm+gemm: extend allowed types by @dhernandez0 in #1795
Bump Dockerfiles to rocm-6.4 by @dorde-antic in #1808
Disable code coverage on nightly and weekly CI, and expand it to run on WMMA by @mirza-halilcevic in #1813
Fix grep ROCM_VERSION in Docker image build by @djramic in #1814
Add GEMM scheduleV2 by @umangyadav in #1772
Modify Tier1 models tuning problems by @dorde-antic in #1810
Prepare Jenkinsfiles for rocm-6.4 by @dorde-antic in #1809
Remove unused files by @dhernandez0 in #1804
Update AmdArchDb.cpp with gfx950 target info by @mirza-halilcevic in #1802
Add pybind11 to pip_requirements.txt by @mirza-halilcevic in #1816
Use migraphx.greater instead of migraphx.greater_or_equal by @dhernandez0 in #1827
Change rounding mode for FP32 to Fp16 truncation by @umangyadav in #1833
Implement with_attn_bias in AttentionConfiguration by @dorde-antic in #1834
Add rocprofv3 to perfRunner by @djramic in #1779
Fix rocm version in migraphx CI docker image by @djramic in #1837
Upstream merge sprint 50 by @djramic in #1815
[CI] Set 3600s test timeout and update LIT worker configuration by @stefankoncarevic in #1832
Remove hardcoded value for render group id in Dockerfile by @umangyadav in #1839
add back render group but do not assign GID by @umangyadav in #1843
Causal attention by @dhernandez0 in #1829
Correct rocprof invocation in fusion benchmarking path. by @stefankoncarevic in #1841
conv+gemm support by @dhernandez0 in #1820
Problem config for tier 1 models by @aarushjain29 in #1836
conv+gemm migraphx integration by @dhernandez0 in #1823
Separate new Tier1 tuning problems by @dorde-antic in #1849
Disable test temporarily to pass CI by @umangyadav in #1850
Implement GQA in AttentionConfiguration by @dorde-antic in #1847
Correct layout map access in MLIROnlyConfig by @stefankoncarevic in #1855
Add missing LDS barriers to attention by @dhernandez0 in #1853
Causal masking: migraphx integration by @dhernandez0 in #1831
Updated ATTN_TEST_PARAMETERS in reportUtils.py by @stefankoncarevic in #1858
[CLONE] Add CI node checks and retries. Refactored the pipeline to resolve compilation errors and address incorrect syntax by @umangyadav in #1835
Modify CI to use Tier1 and rotate through configs by @dorde-antic in #1840
Allow retries for failing tests / Remove failing tests by @dorde-antic in #1819
Print rocm version and permissions for /dev/dri /dev/kfd by @umangyadav in https://github.com/ROCm/rocM...

Contributors

causten, dhernandez0, and 9 other contributors

Assets 2

07 Aug 14:56

causten

rocm-6.4.3

88b9b7c

rocm-6.4.3

What's Changed

No changes since rocm-6.4.2

Assets 2

21 Jul 19:55

causten

rocm-6.4.2

88b9b7c

rocm-6.4.2

What's Changed

[6.4][BACKPORT] Update MI300 quick-tuning list by @mirza-halilcevic in #1812
[6.4][Backport] Backport some attention bugfixes + causal attention by @umangyadav in #1811
[HOTFIX][BACKPORT] Manually add missing perf config for MI200 to avoid perf regression by @umangyadav in #1818
[BACKPORT] Bump LLVM to pick fixes for Gfx12 Hazards by @umangyadav in #1824
[BACKPORT] Keep python3.6 for SLES, RHEL builds by @umangyadav in #1825

Full Changelog: rocm-6.4.0...rocm-6.4.2

Contributors

umangyadav and mirza-halilcevic

Assets 2

20 May 15:53

causten

rocm-6.4.1

88b9b7c

rocm-6.4.1

What's Changed

[6.4][BACKPORT] Update MI300 quick-tuning list by @mirza-halilcevic in #1812
[6.4][Backport] Backport some attention bugfixes + causal attention by @umangyadav in #1811
[HOTFIX][BACKPORT] Manually add missing perf config for MI200 to avoid perf regression by @umangyadav in #1818
[BACKPORT] Bump LLVM to pick fixes for Gfx12 Hazards by @umangyadav in #1824
[BACKPORT] Keep python3.6 for SLES, RHEL builds by @umangyadav in #1825

Full Changelog: rocm-6.4.0...rocm-6.4.1

Contributors

umangyadav and mirza-halilcevic

Assets 2

Releases: ROCm/rocMLIR

rocm-7.2.2

Uh oh!

rocm-7.2.1

Uh oh!

rocm-7.2.0

What's Changed

Contributors

Uh oh!

rocm-7.1.1

Uh oh!

rocm-7.1

What's Changed

New Contributors

Contributors

Uh oh!

rocm-7.0.2

What's Changed

Contributors

Uh oh!

rocm-7.0.1

What's Changed

Contributors

Uh oh!

rocm-6.4.3

What's Changed

Uh oh!

rocm-6.4.2

What's Changed

Contributors

Uh oh!

rocm-6.4.1

What's Changed

Contributors

Uh oh!