WGPU backend for cross-platform GPU compression (Metal, Vulkan, WebGPU)

## Summary

Add WGPU backend alongside cudarc to enable GPU-accelerated compression on all platforms.

## Current State

```
trueno-zram GPU support:
├── cudarc (CUDA) → NVIDIA Linux only
└── No Mac, AMD, Intel, or browser support
```

## Proposed State

```
trueno-zram GPU backends:
├── cudarc (CUDA) → NVIDIA Linux/Windows (existing)
├── wgpu (Metal) → macOS (Apple Silicon + Intel)
├── wgpu (Vulkan) → Linux AMD/Intel, Android
├── wgpu (DX12) → Windows AMD/Intel
└── wgpu (WebGPU) → Browsers (WASM)
```

## Target Platforms

| Platform | GPU API | Status |
|----------|---------|--------|
| Linux + NVIDIA | CUDA | ✅ Implemented |
| Linux + AMD/Intel | Vulkan via WGPU | 🔲 Planned |
| macOS Intel | Metal via WGPU | 🔲 Planned |
| macOS Apple Silicon | Metal via WGPU | 🔲 Planned |
| Windows | DX12/Vulkan via WGPU | 🔲 Planned |
| Browser/WASM | WebGPU via WGPU | 🔲 Planned |

## Implementation

### Dependencies

```toml
[dependencies]
wgpu = "23"
```

### Backend Selection

```rust
pub enum GpuBackend {
    Cuda(CudaContext),      // Existing
    Wgpu(wgpu::Device),     // New
}

impl GpuBackend {
    pub fn auto_detect() -> Option<Self> {
        // 1. Try CUDA first (fastest on NVIDIA)
        if let Some(cuda) = CudaContext::new().ok() {
            return Some(GpuBackend::Cuda(cuda));
        }
        // 2. Fall back to WGPU (Metal, Vulkan, DX12)
        if let Some(wgpu) = WgpuContext::new().ok() {
            return Some(GpuBackend::Wgpu(wgpu));
        }
        None
    }
}
```

### Compute Shader (WGSL)

```wgsl
@group(0) @binding(0) var<storage, read> input_pages: array<u32>;
@group(0) @binding(1) var<storage, read_write> output: array<u32>;

@compute @workgroup_size(256)
fn lz4_compress(@builtin(global_invocation_id) id: vec3<u32>) {
    let page_idx = id.x;
    // LZ4 compression kernel
}
```

## Use Cases

1. **Intel Mac via SSH** - Metal GPU for coverage builds
2. **Apple Silicon Macs** - Native Metal acceleration
3. **AMD GPU workstations** - Vulkan backend
4. **Browser-based tools** - WebGPU for web IDEs
5. **Edge devices** - Portable WASM deployment

## Acceptance Criteria

- [ ] Add `wgpu` feature flag
- [ ] Implement `WgpuContext` with device auto-detection
- [ ] Port LZ4 compress kernel to WGSL
- [ ] Port Zstd compress kernel to WGSL
- [ ] Benchmark Metal vs CUDA performance
- [ ] Test on Intel Mac (`ssh mac`)
- [ ] WASM compilation works

## Performance Targets

| Platform | Target Throughput |
|----------|-------------------|
| CUDA (RTX 4090) | 60 GB/s |
| Metal (M1 Max) | 30 GB/s |
| Vulkan (RX 7900) | 40 GB/s |
| WebGPU (browser) | 10 GB/s |

## Related

- Issue #1: GPU batch compression
- ZRAM-001 spec: GPU/CUDA subsystem
- wgpu docs: https://wgpu.rs

## Labels

enhancement, gpu, cross-platform, wgpu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WGPU backend for cross-platform GPU compression (Metal, Vulkan, WebGPU) #2

Summary

Current State

Proposed State

Target Platforms

Implementation

Dependencies

Backend Selection

Compute Shader (WGSL)

Use Cases

Acceptance Criteria

Performance Targets

Related

Labels

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Platform	GPU API	Status
Linux + NVIDIA	CUDA	✅ Implemented
Linux + AMD/Intel	Vulkan via WGPU	🔲 Planned
macOS Intel	Metal via WGPU	🔲 Planned
macOS Apple Silicon	Metal via WGPU	🔲 Planned
Windows	DX12/Vulkan via WGPU	🔲 Planned
Browser/WASM	WebGPU via WGPU	🔲 Planned

Platform	Target Throughput
CUDA (RTX 4090)	60 GB/s
Metal (M1 Max)	30 GB/s
Vulkan (RX 7900)	40 GB/s
WebGPU (browser)	10 GB/s

WGPU backend for cross-platform GPU compression (Metal, Vulkan, WebGPU) #2

Description

Summary

Current State

Proposed State

Target Platforms

Implementation

Dependencies

Backend Selection

Compute Shader (WGSL)

Use Cases

Acceptance Criteria

Performance Targets

Related

Labels

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions