Skip to content

Add _throw_dmrs device override for reshape of views#3095

Open
Abdelrahman912 wants to merge 2 commits intoJuliaGPU:masterfrom
Abdelrahman912:add_reshape_view_dispatch
Open

Add _throw_dmrs device override for reshape of views#3095
Abdelrahman912 wants to merge 2 commits intoJuliaGPU:masterfrom
Abdelrahman912:add_reshape_view_dispatch

Conversation

@Abdelrahman912
Copy link
Copy Markdown
Contributor

Problem

reshape(@view(data[1:n*n]), (n, n)) fails to compile on the GPU. @view creates a SubArray, which has no specialized reshape method on the device, so it falls back to Base's generic _reshape. That path calls _throw_dmrs, which tries to construct a DimensionMismatch string — unsupported on the GPU.

Reason: unsupported call to an external C function
Stacktrace:
 [1] _string_n
   @ ./strings/string.jl:109
 [2] StringMemory
   @ ./iobuffer.jl:167
 [3] dec
   @ ./intfuncs.jl:918
 [4] #string#403
   @ ./intfuncs.jl:1000
 [5] multiple call sites
   @ unknown:0
   
function _reshape(parent::AbstractArray, dims::Dims)
  n = length(parent)
  prod(dims) == n || _throw_dmrs(n, "size", dims)
  __reshape((parent, IndexStyle(parent)), dims)
end

@noinline function _throw_dmrs(n, str, dims)
  throw(DimensionMismatch("parent has $n elements, which is incompatible with $str $dims")) ## THIS IS THE CULPRIT
end

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 11, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.42%. Comparing base (6ccd4b4) to head (774cffc).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3095      +/-   ##
==========================================
- Coverage   90.43%   90.42%   -0.01%     
==========================================
  Files         141      141              
  Lines       12025    12025              
==========================================
- Hits        10875    10874       -1     
- Misses       1150     1151       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CUDA.jl Benchmarks

Details
Benchmark suite Current: 774cffc Previous: 6ccd4b4 Ratio
array/accumulate/Float32/1d 101843 ns 101723 ns 1.00
array/accumulate/Float32/dims=1 77138 ns 76608 ns 1.01
array/accumulate/Float32/dims=1L 1585554 ns 1585294 ns 1.00
array/accumulate/Float32/dims=2 144144 ns 143948 ns 1.00
array/accumulate/Float32/dims=2L 657967.5 ns 657945 ns 1.00
array/accumulate/Int64/1d 118678 ns 118967 ns 1.00
array/accumulate/Int64/dims=1 80133 ns 79956 ns 1.00
array/accumulate/Int64/dims=1L 1706215 ns 1694445 ns 1.01
array/accumulate/Int64/dims=2 156239.5 ns 156040 ns 1.00
array/accumulate/Int64/dims=2L 962068 ns 961840 ns 1.00
array/broadcast 20708 ns 20347 ns 1.02
array/construct 1330.4 ns 1311.9 ns 1.01
array/copy 19016 ns 18931 ns 1.00
array/copyto!/cpu_to_gpu 215947 ns 215113 ns 1.00
array/copyto!/gpu_to_cpu 284326 ns 283517 ns 1.00
array/copyto!/gpu_to_gpu 11431.5 ns 11647 ns 0.98
array/iteration/findall/bool 131568 ns 132615 ns 0.99
array/iteration/findall/int 149234 ns 149623 ns 1.00
array/iteration/findfirst/bool 81883.5 ns 82175 ns 1.00
array/iteration/findfirst/int 83535.5 ns 84437 ns 0.99
array/iteration/findmin/1d 89031.5 ns 87647 ns 1.02
array/iteration/findmin/2d 117635 ns 117309 ns 1.00
array/iteration/logical 200232.5 ns 203627.5 ns 0.98
array/iteration/scalar 67840 ns 68729 ns 0.99
array/permutedims/2d 52486 ns 52820 ns 0.99
array/permutedims/3d 53326 ns 52914 ns 1.01
array/permutedims/4d 52208 ns 51983 ns 1.00
array/random/rand/Float32 13239 ns 13104 ns 1.01
array/random/rand/Int64 37304 ns 37312 ns 1.00
array/random/rand!/Float32 8615 ns 8603.333333333334 ns 1.00
array/random/rand!/Int64 34462 ns 34156 ns 1.01
array/random/randn/Float32 44189.5 ns 38723.5 ns 1.14
array/random/randn!/Float32 31115 ns 31520 ns 0.99
array/reductions/mapreduce/Float32/1d 34760 ns 35427 ns 0.98
array/reductions/mapreduce/Float32/dims=1 49713 ns 49562 ns 1.00
array/reductions/mapreduce/Float32/dims=1L 52089 ns 51766 ns 1.01
array/reductions/mapreduce/Float32/dims=2 57176 ns 56838 ns 1.01
array/reductions/mapreduce/Float32/dims=2L 70359 ns 69604 ns 1.01
array/reductions/mapreduce/Int64/1d 43204 ns 43423 ns 0.99
array/reductions/mapreduce/Int64/dims=1 42820 ns 44694.5 ns 0.96
array/reductions/mapreduce/Int64/dims=1L 87995 ns 87805 ns 1.00
array/reductions/mapreduce/Int64/dims=2 59706 ns 60051.5 ns 0.99
array/reductions/mapreduce/Int64/dims=2L 85032 ns 85186 ns 1.00
array/reductions/reduce/Float32/1d 35128 ns 35458 ns 0.99
array/reductions/reduce/Float32/dims=1 40488 ns 46307.5 ns 0.87
array/reductions/reduce/Float32/dims=1L 52249 ns 52046 ns 1.00
array/reductions/reduce/Float32/dims=2 57013 ns 57117 ns 1.00
array/reductions/reduce/Float32/dims=2L 69997 ns 70127.5 ns 1.00
array/reductions/reduce/Int64/1d 43669 ns 41220 ns 1.06
array/reductions/reduce/Int64/dims=1 44428.5 ns 51895 ns 0.86
array/reductions/reduce/Int64/dims=1L 87860 ns 87739 ns 1.00
array/reductions/reduce/Int64/dims=2 59799 ns 59630 ns 1.00
array/reductions/reduce/Int64/dims=2L 84792 ns 84743.5 ns 1.00
array/reverse/1d 18598 ns 18349 ns 1.01
array/reverse/1dL 69158 ns 68960 ns 1.00
array/reverse/1dL_inplace 65919 ns 65909 ns 1.00
array/reverse/1d_inplace 10278 ns 8540.833333333332 ns 1.20
array/reverse/2d 20844 ns 20881 ns 1.00
array/reverse/2dL 72918.5 ns 72996 ns 1.00
array/reverse/2dL_inplace 65969 ns 65926 ns 1.00
array/reverse/2d_inplace 11211 ns 10076 ns 1.11
array/sorting/1d 2733057 ns 2735188.5 ns 1.00
array/sorting/2d 1074904 ns 1069206 ns 1.01
array/sorting/by 3301743 ns 3304125.5 ns 1.00
cuda/synchronization/context/auto 1167.8 ns 1176.2 ns 0.99
cuda/synchronization/context/blocking 921.0277777777778 ns 924.5869565217391 ns 1.00
cuda/synchronization/context/nonblocking 8133.8 ns 6942.1 ns 1.17
cuda/synchronization/stream/auto 997.7333333333333 ns 999.9375 ns 1.00
cuda/synchronization/stream/blocking 802.5567010309278 ns 787.7961165048544 ns 1.02
cuda/synchronization/stream/nonblocking 7348.2 ns 7168.6 ns 1.03
integration/byval/reference 144066 ns 143982 ns 1.00
integration/byval/slices=1 146060 ns 145868 ns 1.00
integration/byval/slices=2 284760 ns 284528 ns 1.00
integration/byval/slices=3 423409 ns 422970 ns 1.00
integration/cudadevrt 102664 ns 102612 ns 1.00
integration/volumerhs 9442040.5 ns 9440461 ns 1.00
kernel/indexing 13355 ns 13181 ns 1.01
kernel/indexing_checked 14001 ns 14081 ns 0.99
kernel/launch 2113 ns 2150.777777777778 ns 0.98
kernel/occupancy 670.566037735849 ns 672 ns 1.00
kernel/rand 14516 ns 14396 ns 1.01
latency/import 3809768988.5 ns 3814290062.5 ns 1.00
latency/precompile 4591409363 ns 4590207670.5 ns 1.00
latency/ttfp 4386579833 ns 4409319020 ns 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant