[AIROCMLIR-551] Completely support as_underlying_shape and as_logical_shape#2265
[AIROCMLIR-551] Completely support as_underlying_shape and as_logical_shape#2265
Conversation
|
Also, note that the code structure is really similar to the one found in the tosa pipeline if that makes review easier. |
| // CHECK-DAG: %[[expanded:.*]] = tensor.expand_shape %[[arg1]] | ||
| // CHECK-DAG: %[[expanded_0:.*]] = tensor.expand_shape %[[arg0]] | ||
| // CHECK-DAG: linalg.sub ins(%[[expanded_0]], %[[expanded]] {{.*}}) | ||
| // CHECK-DAG: linalg.sub ins(%[[arg0]], %[[arg1]] {{.*}}) |
There was a problem hiding this comment.
This actually improves the previous IR. Notice that the input shape !migraphx.shaped<16xf32, 1> get converted into tensor<16xf32> which is the memory layout shape. In that case, we don't emit tensor.expand_shape.
Before:
func.func @func_sub(%arg0: tensor<16xf32>, %arg1: tensor<16xf32>) -> tensor<16xf32> {
%expanded = tensor.expand_shape %arg1 [[0]] output_shape [16] : tensor<16xf32> into tensor<16xf32>
%expanded_0 = tensor.expand_shape %arg0 [[0]] output_shape [16] : tensor<16xf32> into tensor<16xf32>
%0 = tensor.empty() : tensor<16xf32>
%1 = linalg.sub ins(%expanded_0, %expanded : tensor<16xf32>, tensor<16xf32>) outs(%0 : tensor<16xf32>) -> tensor<16xf32>
%collapsed = tensor.collapse_shape %1 [[0]] : tensor<16xf32> into tensor<16xf32>
return %collapsed : tensor<16xf32>
}After:
func.func @func_sub(%arg0: tensor<16xf32>, %arg1: tensor<16xf32>) -> tensor<16xf32> {
%0 = tensor.empty() : tensor<16xf32>
%1 = linalg.sub ins(%arg0, %arg1 : tensor<16xf32>, tensor<16xf32>) outs(%0 : tensor<16xf32>) -> tensor<16xf32>
return %1 : tensor<16xf32>
}| /// %empty = tensor.empty() : .... | ||
| /// %inserted_slice = tensor.insert_slice %actual_data into %empty ... |
There was a problem hiding this comment.
Is the expand strides case the only case that will produce an empty tensor + insert_slice? This seems quite broad, and potentially susceptible to pattern matching unwanted scenarios?
There was a problem hiding this comment.
This is true indeed true, but my understanding is that empty tensor + insert_slice is the same as a rock.expand_strides. rock.expand_strides is the same as extract_slice (of the result) + memcpy which I think has the same semantics as above.
I think the we have to emit attribute during migraphx to linalg if we only want some case of the tensor.insert_slice to be lowered into rock. Semantically,
There was a problem hiding this comment.
Also, I think copilot gave a pretty good comment on this one as well as I forgot to check for stride, offset, and sizes attribute for tensor.insert_slices.
There was a problem hiding this comment.
I think in the TosaToRock version we used a tosa.custom_op so that we weren't just matching empty tensor + insert_slice
There was a problem hiding this comment.
I think it is hard to emit one single op for this operations in linalg, because my understanding is that we only want to write to half of the memory. We don't want to load/write to the other half of the memory?
This is a similar problem in llvm discourse.
There was a problem hiding this comment.
I don't have a problem with using multiple ops to represent this, my main concern is if there are ever going to be other situations that emit empty tensor + insert_slice that we don't want to lower to rock.expand_strides.
Here are a few examples:
%empty = tensor.empty() : tensor<8x8xf16>
%result = tensor.insert_slice %src into %empty[2, 0][4, 8][1, 1]
: tensor<4x8xf16> into tensor<8x8xf16>In this case the data should go at row offset 2 but rock.expand_strides would place it at row 0.
%empty = tensor.empty() : tensor<8x8xf16>
%result = tensor.insert_slice %src into %empty[0, 0][4, 8][2, 1]
: tensor<4x8xf16> into tensor<8x8xf16>This interleaves source rows with gaps (stride-2 insertion). I believe that rock.expand_strides would do a contiguous copy instead?
I think you have some handling right now for these cases, but this still seems fragile to me. Does it make sense to add some kind of metadata to the insert_slice (or tensor.empty) upon creation?
There was a problem hiding this comment.
I have added attribute/meta data in the tensort.insert_slice
mlir/test/Conversion/MIGraphXToLinalg/migraphx-to-linalg-non-contiguous-stride.mlir
Outdated
Show resolved
Hide resolved
| return op.emitOpError("unsupported conversion to underlying shape"); | ||
|
|
||
| // Trivial case, the input tensor is the target memory layout tensor | ||
| if (inTensorType == resultTensorType){ |
There was a problem hiding this comment.
We are missing clang-format?
| // Verify that memoryLayoutType is >= transposedType in all dimensions. | ||
| RankedTensorType transposedType = | ||
| cast<RankedTensorType>(transposed.getType()); | ||
| if (llvm::any_of(llvm::enumerate(memoryLayoutType.getShape(), |
There was a problem hiding this comment.
This will print an error for every failed dimension right? I think it would be better to print a message just on the first error
There was a problem hiding this comment.
In that case, I am not sure, if llvm::any_of is the play here. There is side effect dependent on hasErroredOut which is not that great. Do you think a for loop will be better?
There was a problem hiding this comment.
llvm::any_of should do early exit if i think. So it shouldn't print error message multiple times i think
There was a problem hiding this comment.
Pull request overview
This PR extends MIGraphX shape materialization to fully support transposed layouts, long/expanded strides, and (for reads) broadcasted strides via improved as_logical_shape / as_underlying_shape lowering, and adds a Rock-side lowering path for expanded-stride materializations.
Changes:
- Update
migraphx.mlir.as.logical.shapelowering to reshape into memory layout, invert stride permutation via transpose, slice for long strides, and broadcast as needed. - Update
migraphx.mlir.as.underlying.shapelowering to transpose into memory layout and materialize long strides viatensor.insert_sliceinto a largertensor.empty(broadcast writes remain unsupported). - Add Linalg→Rock conversion to rewrite eligible
tensor.insert_slice-into-tensor.emptypatterns intorock.expand_strides, and add/adjust lit + e2e tests for non-contiguous / long-stride cases.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| mlir/lib/Conversion/MIGraphXToLinalg/MIGraphXToLinalg.cpp | Implements the new as_logical_shape / as_underlying_shape lowering logic for transpose, long strides, and broadcasting (read-side). |
| mlir/lib/Conversion/LinalgToRock/LinalgToRockPass.cpp | Marks certain tensor.insert_slice patterns dynamically illegal to force conversion into Rock ops. |
| mlir/lib/Conversion/LinalgToRock/LinalgToRock.cpp | Adds a conversion pattern rewriting tensor.insert_slice (into a single-use tensor.empty) into rock.expand_strides. |
| mlir/test/fusion/pr-e2e/mixr-non-contiguous-strides-sub.mlir | Adds an end-to-end regression covering non-contiguous output strides. |
| mlir/test/Conversion/MIGraphXToLinalg/mixr-to-linalg-ops.mlir | Updates existing checks and adds new coverage for transposed, broadcasted, sliced, and combined stride scenarios. |
| mlir/test/Conversion/MIGraphXToLinalg/migraphx-to-linalg-not-implemented.mlir | Removes a no-longer-relevant “broadcast not implemented” negative test. |
| mlir/test/Conversion/MIGraphXToLinalg/migraphx-to-linalg-non-contiguous-stride.mlir | Adds new lit tests for long-stride legalization + error cases. |
| mlir/test/Conversion/LinalgToRock/linalg-to-rock-expand-strides.mlir | Adds lit tests validating tensor.insert_slice → rock.expand_strides lowering. |
Comments suppressed due to low confidence (1)
mlir/lib/Conversion/MIGraphXToLinalg/MIGraphXToLinalg.cpp:83
hasTranspose |= (from != static_cast<int32_t>(to))mixesint64_t/size_twith anint32_tcast. This should compare using the same width (e.g., casttotoint64_t) to avoid truncation and keep the intent clear.
for (auto [to, from] : llvm::enumerate(inversePermutation)) {
permutation[from] = to;
transposedShape[from] = memoryType.getShape()[to];
hasTranspose |= (from != static_cast<int32_t>(to));
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
8b142a9 to
b951489
Compare
| // Verify that memoryLayoutType is >= transposedType in all dimensions. | ||
| RankedTensorType transposedType = | ||
| cast<RankedTensorType>(transposed.getType()); | ||
| if (llvm::any_of(llvm::enumerate(memoryLayoutType.getShape(), |
There was a problem hiding this comment.
llvm::any_of should do early exit if i think. So it shouldn't print error message multiple times i think
b951489 to
8640f0d
Compare
8640f0d to
eb548f6
Compare
eb548f6 to
66fe4ed
Compare
66fe4ed to
8a3c02c
Compare
8a3c02c to
9729376
Compare
| func.func @mlir_dot_sigmoid(%arg0: !migraphx.shaped<4x5x16xf16, 80x16x1>, %arg1: !migraphx.shaped<4x16x24xf16, 384x24x1>) -> !migraphx.shaped<4x5x24xf16, 288x24x1> attributes {arch = "gfx1201", kernel = "mixr", num_cu = 32 : i64} { | ||
| %0 = migraphx.dot %arg0, %arg1 : <4x5x16xf16, 80x16x1>, <4x16x24xf16, 384x24x1> -> <4x5x24xf16, 120x24x1> | ||
| %1 = migraphx.sigmoid %0 : <4x5x24xf16, 120x24x1> -> <4x5x24xf16, 288x24x1> | ||
| return %1 : !migraphx.shaped<4x5x24xf16, 288x24x1> | ||
| } |
379799b to
1dadc1e
Compare
| auto expandOp = rock::ExpandStridesOp::create(rewriter, loc, op.getType(), | ||
| adaptor.getSource(), alloc); | ||
| rewriter.replaceOp(op, expandOp); | ||
| rewriter.eraseOp(tensorEmpty); |
There was a problem hiding this comment.
Better to assert that tensor::EmptyOp doesn't have any other use before erasing
There was a problem hiding this comment.
done. Using similar approach in linalg.generic convolution lowering, we can let dead code elimination clean this us?
dad2540 to
f831e77
Compare
justinrosner
left a comment
There was a problem hiding this comment.
Overall the changes look good to me now. Just need some resolution on #2265 (comment)
In this case, I made a change in AsUnderlyingShapeConverter to emit rock.is_expand_strides to tell linalg -> rock if it is an expand_stride. |
9c08c9e to
3004622
Compare
Co-authored-by: Justin Rosner <justin.rosner@amd.com>
3004622 to
b63c436
Compare
1108dbc to
a1374db
Compare
Motivation
Being able to support transposed memory layout, long strides in both as_underlying_shape and as_logical_shape.
Furthermore, in
as_underlying_shape, broadcasting is supported.Technical Details
This one PR essentially implements these two commits: #2198 , this one.
as_logical_shape is implemented as the following:
as_underlying_shape is implemented as the following:
Test Plan
Added e2e and lit test.
Test Result
Passed both e2e and lit test.
Submission Checklist