Skip to content

[AIROCMLIR-551] Completely support as_underlying_shape and as_logical_shape#2265

Merged
Mr-Anyone merged 17 commits intodevelopfrom
pr-migraphx-as-underlying-logical
Apr 12, 2026
Merged

[AIROCMLIR-551] Completely support as_underlying_shape and as_logical_shape#2265
Mr-Anyone merged 17 commits intodevelopfrom
pr-migraphx-as-underlying-logical

Conversation

@Mr-Anyone
Copy link
Copy Markdown
Member

Motivation

Being able to support transposed memory layout, long strides in both as_underlying_shape and as_logical_shape.

Furthermore, in as_underlying_shape, broadcasting is supported.

Technical Details

This one PR essentially implements these two commits: #2198 , this one.

as_logical_shape is implemented as the following:

  1. Reshape from flat memory layout to memory layout
  2. Check the stride permutation, and invert it to match the logical shape
  3. If the memory layout tensor is "greater in dimension" than the logical dimension, it means that we have a long stride layout so we emit tensor.extract_from_slice to get the logical shape
  4. Broadcast if needed

as_underlying_shape is implemented as the following:

  1. Reshape to memory based on stride permutation
  2. If there are long strides, we emit a tensor.insert_slice with a tensor.empty_op
  3. If there is broadcasting, we error out.

Test Plan

Added e2e and lit test.

Test Result

Passed both e2e and lit test.

Submission Checklist

@Mr-Anyone Mr-Anyone requested a review from causten as a code owner March 2, 2026 21:15
@Mr-Anyone
Copy link
Copy Markdown
Member Author

Mr-Anyone commented Mar 2, 2026

Also, note that the code structure is really similar to the one found in the tosa pipeline if that makes review easier.

// CHECK-DAG: %[[expanded:.*]] = tensor.expand_shape %[[arg1]]
// CHECK-DAG: %[[expanded_0:.*]] = tensor.expand_shape %[[arg0]]
// CHECK-DAG: linalg.sub ins(%[[expanded_0]], %[[expanded]] {{.*}})
// CHECK-DAG: linalg.sub ins(%[[arg0]], %[[arg1]] {{.*}})
Copy link
Copy Markdown
Member Author

@Mr-Anyone Mr-Anyone Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This actually improves the previous IR. Notice that the input shape !migraphx.shaped<16xf32, 1> get converted into tensor<16xf32> which is the memory layout shape. In that case, we don't emit tensor.expand_shape.

Before:

  func.func @func_sub(%arg0: tensor<16xf32>, %arg1: tensor<16xf32>) -> tensor<16xf32> {
    %expanded = tensor.expand_shape %arg1 [[0]] output_shape [16] : tensor<16xf32> into tensor<16xf32>
    %expanded_0 = tensor.expand_shape %arg0 [[0]] output_shape [16] : tensor<16xf32> into tensor<16xf32>
    %0 = tensor.empty() : tensor<16xf32>
    %1 = linalg.sub ins(%expanded_0, %expanded : tensor<16xf32>, tensor<16xf32>) outs(%0 : tensor<16xf32>) -> tensor<16xf32>
    %collapsed = tensor.collapse_shape %1 [[0]] : tensor<16xf32> into tensor<16xf32>
    return %collapsed : tensor<16xf32>
  }

After:

  func.func @func_sub(%arg0: tensor<16xf32>, %arg1: tensor<16xf32>) -> tensor<16xf32> {
    %0 = tensor.empty() : tensor<16xf32>
    %1 = linalg.sub ins(%arg0, %arg1 : tensor<16xf32>, tensor<16xf32>) outs(%0 : tensor<16xf32>) -> tensor<16xf32>
    return %1 : tensor<16xf32>
  }

Comment on lines +167 to +168
/// %empty = tensor.empty() : ....
/// %inserted_slice = tensor.insert_slice %actual_data into %empty ...
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the expand strides case the only case that will produce an empty tensor + insert_slice? This seems quite broad, and potentially susceptible to pattern matching unwanted scenarios?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is true indeed true, but my understanding is that empty tensor + insert_slice is the same as a rock.expand_strides. rock.expand_strides is the same as extract_slice (of the result) + memcpy which I think has the same semantics as above.

I think the we have to emit attribute during migraphx to linalg if we only want some case of the tensor.insert_slice to be lowered into rock. Semantically,

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I think copilot gave a pretty good comment on this one as well as I forgot to check for stride, offset, and sizes attribute for tensor.insert_slices.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in the TosaToRock version we used a tosa.custom_op so that we weren't just matching empty tensor + insert_slice

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is hard to emit one single op for this operations in linalg, because my understanding is that we only want to write to half of the memory. We don't want to load/write to the other half of the memory?

This is a similar problem in llvm discourse.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a problem with using multiple ops to represent this, my main concern is if there are ever going to be other situations that emit empty tensor + insert_slice that we don't want to lower to rock.expand_strides.

Here are a few examples:

%empty = tensor.empty() : tensor<8x8xf16>
%result = tensor.insert_slice %src into %empty[2, 0][4, 8][1, 1]
    : tensor<4x8xf16> into tensor<8x8xf16>

In this case the data should go at row offset 2 but rock.expand_strides would place it at row 0.

%empty = tensor.empty() : tensor<8x8xf16>
%result = tensor.insert_slice %src into %empty[0, 0][4, 8][2, 1]
   : tensor<4x8xf16> into tensor<8x8xf16>

This interleaves source rows with gaps (stride-2 insertion). I believe that rock.expand_strides would do a contiguous copy instead?

I think you have some handling right now for these cases, but this still seems fragile to me. Does it make sense to add some kind of metadata to the insert_slice (or tensor.empty) upon creation?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added attribute/meta data in the tensort.insert_slice

@Mr-Anyone Mr-Anyone requested a review from justinrosner March 4, 2026 20:39
return op.emitOpError("unsupported conversion to underlying shape");

// Trivial case, the input tensor is the target memory layout tensor
if (inTensorType == resultTensorType){
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are missing clang-format?

// Verify that memoryLayoutType is >= transposedType in all dimensions.
RankedTensorType transposedType =
cast<RankedTensorType>(transposed.getType());
if (llvm::any_of(llvm::enumerate(memoryLayoutType.getShape(),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will print an error for every failed dimension right? I think it would be better to print a message just on the first error

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, I am not sure, if llvm::any_of is the play here. There is side effect dependent on hasErroredOut which is not that great. Do you think a for loop will be better?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

llvm::any_of should do early exit if i think. So it shouldn't print error message multiple times i think

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends MIGraphX shape materialization to fully support transposed layouts, long/expanded strides, and (for reads) broadcasted strides via improved as_logical_shape / as_underlying_shape lowering, and adds a Rock-side lowering path for expanded-stride materializations.

Changes:

  • Update migraphx.mlir.as.logical.shape lowering to reshape into memory layout, invert stride permutation via transpose, slice for long strides, and broadcast as needed.
  • Update migraphx.mlir.as.underlying.shape lowering to transpose into memory layout and materialize long strides via tensor.insert_slice into a larger tensor.empty (broadcast writes remain unsupported).
  • Add Linalg→Rock conversion to rewrite eligible tensor.insert_slice-into-tensor.empty patterns into rock.expand_strides, and add/adjust lit + e2e tests for non-contiguous / long-stride cases.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
mlir/lib/Conversion/MIGraphXToLinalg/MIGraphXToLinalg.cpp Implements the new as_logical_shape / as_underlying_shape lowering logic for transpose, long strides, and broadcasting (read-side).
mlir/lib/Conversion/LinalgToRock/LinalgToRockPass.cpp Marks certain tensor.insert_slice patterns dynamically illegal to force conversion into Rock ops.
mlir/lib/Conversion/LinalgToRock/LinalgToRock.cpp Adds a conversion pattern rewriting tensor.insert_slice (into a single-use tensor.empty) into rock.expand_strides.
mlir/test/fusion/pr-e2e/mixr-non-contiguous-strides-sub.mlir Adds an end-to-end regression covering non-contiguous output strides.
mlir/test/Conversion/MIGraphXToLinalg/mixr-to-linalg-ops.mlir Updates existing checks and adds new coverage for transposed, broadcasted, sliced, and combined stride scenarios.
mlir/test/Conversion/MIGraphXToLinalg/migraphx-to-linalg-not-implemented.mlir Removes a no-longer-relevant “broadcast not implemented” negative test.
mlir/test/Conversion/MIGraphXToLinalg/migraphx-to-linalg-non-contiguous-stride.mlir Adds new lit tests for long-stride legalization + error cases.
mlir/test/Conversion/LinalgToRock/linalg-to-rock-expand-strides.mlir Adds lit tests validating tensor.insert_slicerock.expand_strides lowering.
Comments suppressed due to low confidence (1)

mlir/lib/Conversion/MIGraphXToLinalg/MIGraphXToLinalg.cpp:83

  • hasTranspose |= (from != static_cast<int32_t>(to)) mixes int64_t/size_t with an int32_t cast. This should compare using the same width (e.g., cast to to int64_t) to avoid truncation and keep the intent clear.
  for (auto [to, from] : llvm::enumerate(inversePermutation)) {
    permutation[from] = to;
    transposedShape[from] = memoryType.getShape()[to];
    hasTranspose |= (from != static_cast<int32_t>(to));
  }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Mr-Anyone Mr-Anyone force-pushed the pr-migraphx-as-underlying-logical branch from 8b142a9 to b951489 Compare March 9, 2026 16:32
@Mr-Anyone Mr-Anyone requested a review from umangyadav March 9, 2026 16:33
// Verify that memoryLayoutType is >= transposedType in all dimensions.
RankedTensorType transposedType =
cast<RankedTensorType>(transposed.getType());
if (llvm::any_of(llvm::enumerate(memoryLayoutType.getShape(),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

llvm::any_of should do early exit if i think. So it shouldn't print error message multiple times i think

@Mr-Anyone Mr-Anyone force-pushed the pr-migraphx-as-underlying-logical branch from b951489 to 8640f0d Compare March 9, 2026 20:01
@Mr-Anyone Mr-Anyone requested a review from umangyadav March 13, 2026 20:37
@Mr-Anyone Mr-Anyone force-pushed the pr-migraphx-as-underlying-logical branch from 8640f0d to eb548f6 Compare March 18, 2026 13:29
@Mr-Anyone Mr-Anyone force-pushed the pr-migraphx-as-underlying-logical branch from eb548f6 to 66fe4ed Compare March 26, 2026 16:02
@Mr-Anyone Mr-Anyone force-pushed the pr-migraphx-as-underlying-logical branch from 66fe4ed to 8a3c02c Compare March 30, 2026 14:53
@Mr-Anyone Mr-Anyone mentioned this pull request Mar 30, 2026
1 task
@Mr-Anyone Mr-Anyone force-pushed the pr-migraphx-as-underlying-logical branch from 8a3c02c to 9729376 Compare April 6, 2026 13:40
Comment on lines +412 to +416
func.func @mlir_dot_sigmoid(%arg0: !migraphx.shaped<4x5x16xf16, 80x16x1>, %arg1: !migraphx.shaped<4x16x24xf16, 384x24x1>) -> !migraphx.shaped<4x5x24xf16, 288x24x1> attributes {arch = "gfx1201", kernel = "mixr", num_cu = 32 : i64} {
%0 = migraphx.dot %arg0, %arg1 : <4x5x16xf16, 80x16x1>, <4x16x24xf16, 384x24x1> -> <4x5x24xf16, 120x24x1>
%1 = migraphx.sigmoid %0 : <4x5x24xf16, 120x24x1> -> <4x5x24xf16, 288x24x1>
return %1 : !migraphx.shaped<4x5x24xf16, 288x24x1>
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This testcase comes from #2225

@Mr-Anyone Mr-Anyone force-pushed the pr-migraphx-as-underlying-logical branch from 379799b to 1dadc1e Compare April 7, 2026 13:48
auto expandOp = rock::ExpandStridesOp::create(rewriter, loc, op.getType(),
adaptor.getSource(), alloc);
rewriter.replaceOp(op, expandOp);
rewriter.eraseOp(tensorEmpty);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to assert that tensor::EmptyOp doesn't have any other use before erasing

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. Using similar approach in linalg.generic convolution lowering, we can let dead code elimination clean this us?

@Mr-Anyone Mr-Anyone force-pushed the pr-migraphx-as-underlying-logical branch 2 times, most recently from dad2540 to f831e77 Compare April 10, 2026 13:32
Copy link
Copy Markdown
Contributor

@justinrosner justinrosner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall the changes look good to me now. Just need some resolution on #2265 (comment)

@Mr-Anyone
Copy link
Copy Markdown
Member Author

Overall the changes look good to me now. Just need some resolution on #2265 (comment)

In this case, I made a change in AsUnderlyingShapeConverter to emit rock.is_expand_strides to tell linalg -> rock if it is an expand_stride.

@Mr-Anyone Mr-Anyone requested a review from umangyadav April 10, 2026 19:23
@Mr-Anyone Mr-Anyone force-pushed the pr-migraphx-as-underlying-logical branch from 9c08c9e to 3004622 Compare April 10, 2026 19:23
@Mr-Anyone Mr-Anyone requested a review from justinrosner April 10, 2026 19:23
@Mr-Anyone Mr-Anyone force-pushed the pr-migraphx-as-underlying-logical branch from 3004622 to b63c436 Compare April 11, 2026 15:29
@Mr-Anyone Mr-Anyone force-pushed the pr-migraphx-as-underlying-logical branch from 1108dbc to a1374db Compare April 11, 2026 22:05
@Mr-Anyone Mr-Anyone merged commit 321803c into develop Apr 12, 2026
14 of 15 checks passed
@Mr-Anyone Mr-Anyone deleted the pr-migraphx-as-underlying-logical branch April 12, 2026 03:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants