GCN: generate machine code via external AMDGPU_LLVM_Backend_jll#857
GCN: generate machine code via external AMDGPU_LLVM_Backend_jll#857vchuravy wants to merge 2 commits into
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #857 +/- ##
==========================================
- Coverage 80.32% 80.27% -0.06%
==========================================
Files 25 25
Lines 4777 4810 +33
==========================================
+ Hits 3837 3861 +24
- Misses 940 949 +9 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
@maleadt did you do anything special to stop older version of CUDA to use GPUCompiler without NVPTX_LLVM_Backend_jll being loaded? |
Add a compat bound to the registry. Looking back at the problems that caused, I think it would be better to temporarily support both the in-process and external LLVM path, deprecating the former until we cut a breaking release. |
Mirror the NVPTX_LLVM_Backend_jll approach for AMDGPU: override `mcgen` for `GCNCompilerTarget` to emit machine code through the external, up-to-date `llc` from AMDGPU_LLVM_Backend_jll instead of the in-process LLVM back-end. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Instead of erroring when AMDGPU_LLVM_Backend_jll is not loaded, fall back to the (deprecated) in-process LLVM back-end. This keeps existing consumers working until the external back-end can be required in the next breaking release. A deprecation warning nudges users to load the jll. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
51874b1 to
73956a4
Compare
| # simulate AMDGPU_LLVM_Backend_jll not being loaded | ||
| pkg = Base.PkgId(Base.UUID("cc5c0156-bd05-5a77-8a68-bb0aafb29019"), | ||
| "AMDGPU_LLVM_Backend_jll") | ||
| saved = get(Base.loaded_modules, pkg, nothing) | ||
| try | ||
| delete!(Base.loaded_modules, pkg) | ||
| @test !GPUCompiler.isavailable(GPUCompiler.AMDGPU_LLVM_Backend_jll) | ||
|
|
There was a problem hiding this comment.
@maleadt this is a fairly horrifying way of testing this. Any better ideas?
Mirror the
NVPTX_LLVM_Backend_jllapproach (seesrc/ptx.jl) for AMDGPU: generate GCN machine code through the external, up-to-datellcfromAMDGPU_LLVM_Backend_jllrather than the in-process LLVM back-end.Changes
src/gcn.jl: add anAMDGPU_LLVM_Backend_jllLazyModuleand an@unlocked mcgen(::CompilerJob{GCNCompilerTarget}, …)override that writes the module to a.bc, runs the externalllc(-mtriple/-mcpu=dev_isa/-mattr=features/--relocation-model=pic/-filetype), surfaces stderr diagnostics, and reads back the asm/object.Project.toml:AMDGPU_LLVM_Backend_jllweakdep +compat = "22".test/Project.toml: add as a test dep.test/runtests.jl: load it and skip thegcntestsuite when!AMDGPU_LLVM_Backend_jll.is_available().Notes
llvm_machine/llvm_datalayoutas-is, so the middle end still requires the in-process AMDGPU back-end (:AMDGPU in LLVM.backends(), already guarded bytest/gcn.jl); only final machine-code emission goes through the externalllc. Full decoupling would require a hardcoded datalayout, which is awkward because AMDGPU's datalayout already carries non-integral spaces (…-ni:7:8:9) that GPUCompiler then extends with-ni:10:11:12:13.AMDGPU_LLVM_Backend_jll(New package: AMDGPU_LLVM_Backend_jll v22.1.7+0 JuliaRegistries/General#158674); until that merges the project will not resolve.🤖 Generated with Claude Code