Skip to content

feat(burn): cargo workspace + dia-burn placeholder backend (blocked on burn-onnx upstream)#12

Open
uqio wants to merge 4 commits into
mainfrom
feat/burn
Open

feat(burn): cargo workspace + dia-burn placeholder backend (blocked on burn-onnx upstream)#12
uqio wants to merge 4 commits into
mainfrom
feat/burn

Conversation

@uqio
Copy link
Copy Markdown
Collaborator

@uqio uqio commented May 10, 2026

Summary

  • Promote the repo to a Cargo workspace under crates/dia-core with phase-A backend split groundwork: crates/dia-{ort,tch,burn} siblings around the algorithm core. The package keeps its diarization name — only the directory moved.
  • Add dia-burn, a documented placeholder for a pure-Rust burn-onnx backend targeting platforms ORT can't ship prebuilts to (powerpc64, riscv64, s390x, i686, wasm32-*). Inference itself is not yet wired upburn-onnx 0.21.0-pre.5 rejects pyannote/segmentation-3.0 at codegen and emits non-compiling Rust for wespeaker (full repro recipe in the crate README + build.rs).
  • Convert dia-ort / dia-tch into thin re-export shims over diarization with the right features pre-activated. Downstream gets a single cargo add dia-ort / cargo add dia-tch entry-point; the actual cfg-gated modules still live in dia-core for now and migrate physically in a follow-up.
  • Update CI to thread -p diarization through every workspace-rooted cargo invocation (cargo-hack sweeps, cross builds, ep-link-check, docs.rs-equivalent doc build, tarpaulin, both miri jobs, both SDE jobs, sanitizer, AVX-asserting parity tests). Without this, a virtual workspace --no-default-features becomes a no-op and feature flags resolve against an empty manifest. New dia-burn (standalone) CI job builds + tests it from its own crate dir (workspace-excluded — links = "tch" collision with parent tch = 0.24 vs burn's optional burn-tch ^0.22).
  • Drop the stale silero-vad feature reference from the docs.rs and tarpaulin commands, and lift [profile.bench] up to the workspace root so it's actually honored.

Scope deliberately deferred

  • Physically moving cfg-gated ORT and tch code out of crates/dia-core/src/embed/ and segment/ into the sibling crates. The current setup tightens the public contract (downstream stops feature-flag wrangling) without an atomic 2k-line move. Follow-up PR.
  • Working burn segmentation. Both burn-onnx codegen failures (model-side: If-op rank propagation; runtime-side: Resize-op codegen) are upstream-track. The crate's public surface is stable so the eventual swap won't be a breaking change.

Not planned

  • Renaming the diarization package to dia-core. Per maintainer call we keep the diarization name; the directory layout (crates/dia-core/) is purely workspace-organizational.

Test plan

  • cargo build --workspace clean (no profile-warning noise after the bench-profile move)
  • cargo test --workspace --lib — 532 dia-core tests pass, dia-ort/dia-tch shims import cleanly, dia-burn lib (workspace-excluded) builds + 2 stub tests pass when run from crates/dia-burn/
  • cargo clippy -p dia-ort -p dia-tch clean
  • cargo clippy inside crates/dia-burn clean
  • cargo fmt --check clean
  • Verify the new dia-burn (standalone) CI job goes green
  • Verify the workspace-aware cargo-hack feature sweeps (build, test, clippy) still terminate in reasonable time on the runners
  • Verify tch-compile-check (-p dia-tch) and ep-link-check matrix (-p diarization) succeed
  • cargo build --features unstable-onnx-codegen from crates/dia-burn/ reproduces the wespeaker codegen → compile failure documented in the README

🤖 Generated with Claude Code

uqio and others added 4 commits May 10, 2026 23:03
…e A)

Foundation for the planned dia-ort / dia-tch / dia-burn split. This
commit is mechanical: move all current crate contents into
`crates/dia-core/`, set up the workspace at the repo root, and
preserve every API + behavior. No test regressions.

Layout change:
- `src/`              → `crates/dia-core/src/`
- `tests/`            → `crates/dia-core/tests/`
- `examples/`         → `crates/dia-core/examples/`
- `benches/`          → `crates/dia-core/benches/`
- `build.rs`          → `crates/dia-core/build.rs`
- `models/`           → `crates/dia-core/models/`
- `README.md`         → `crates/dia-core/README.md`
- `Cargo.toml`        → `crates/dia-core/Cargo.toml`
- New root `Cargo.toml` declares `[workspace]` + `[workspace.lints]`
  (extracted from the package-level `[workspace.lints.rust]` that
  was inadvertently making `dia-core` its own workspace root).
- `crates/dia-core/tests/parity` stays excluded as a sub-Cargo
  (uv-managed parity harness, lives outside workspace resolution).

Why now: enables Phase B/C — extracting `dia-ort`, `dia-tch`, and
the new `dia-burn` (pure-Rust burn-onnx-backed inference) into
sibling crates with isolated dep graphs. The current single-crate
layout makes burn integration impossible because cargo's
`links = "tch"` collision check evaluates burn 0.21's optional
`burn-tch` (links to tch ^0.22) against our `tch = "0.24"` even
when neither feature activates. Per-crate isolation breaks that
graph-level conflict.

Phase A in this commit. Phase B (extract per-backend crates) +
Phase C (working dia-burn segmentation backend) are follow-up
commits on this branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds two empty workspace members alongside `crates/dia-core/`:

- `crates/dia-ort/` — ONNX Runtime backend. Will host
  `SegmentModel` + `EmbedModel` (ort path) + `ep` + `ort_serde`
  in subsequent commits.
- `crates/dia-tch/` — TorchScript / libtorch backend. Will host
  `EmbedModel::from_torchscript_file` + the `tch` `EmbedInner`
  variant.

`tch` is gated behind a feature in `dia-tch` so a contributor
without `LIBTORCH` can still build the workspace; `torch-sys` only
links when the feature activates. Same pattern as the previous
top-level `tch` feature.

Both crates depend on `diarization = { path = "../dia-core" }` —
the dia-core package is still named `diarization` mid-migration so
internal `use crate::*` and downstream `use diarization::*` keep
compiling at every step. The package rename to `dia-core` lands
when the meta-crate `diarization` arrives in a later step.

No code moves in this commit — empty `lib.rs` placeholders only.
Phase B step 2 starts moving ort-coupled code from `crates/dia-core/`
into `crates/dia-ort/`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… on burn-onnx upstream)

Adds `crates/dia-burn` as a workspace-excluded sibling crate — separate
resolution graph dodges the `links = "tch"` collision between the
parent's `tch = 0.24` and `burn 0.21.0-pre.5`'s optional `burn-tch ^0.22`.

The crate is intentionally a documented stub today. Both dia ONNX
models hit upstream `burn-onnx` 0.21.0-pre.5 codegen bugs:

- pyannote/segmentation-3.0: `If`-op rank propagation gap makes the
  first `Conv1d` translator see rank-4 instead of rank-3 → codegen
  exits with no Rust emitted.
- wespeaker_resnet34_lm: codegen succeeds (606 LoC + 25 MB .bpk)
  but emits an OOB array index in the `Resize` lowering → does not
  compile.

Both are upstream-track. The full codegen + weight-load pipeline is
preserved behind an `unstable-onnx-codegen` feature so contributors
can flip one flag and reproduce the failures. Default builds get a
working stub: `BurnEmbedModel::from_embedded()` constructs cleanly,
`embed_chunk_with_frame_mask` returns `NotYetImplemented`, and the
public types/consts mirror dia-ort's contract so the eventual swap
won't be a breaking change.

README.md walks through the failures + fix paths in detail.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…e through CI

Phase B step 1 follow-up: dia-ort and dia-tch are now thin re-export
shims over `diarization` (a.k.a. dia-core) with the relevant features
pre-activated. Downstream users get a single `cargo add dia-ort` /
`cargo add dia-tch` entry-point instead of feature-flag wrangling on
the umbrella crate; the cfg-gated ORT and tch modules still live
physically in dia-core for now and migrate across in step 2.

CI changes for the workspace move:
- All workspace-rooted cargo invocations targeting dia-core's feature
  surface now pass `-p diarization` explicitly (cargo-hack sweeps,
  cross builds, ep-link-check, docs.rs-equivalent doc build,
  tarpaulin coverage, the AVX2/AVX512 SDE jobs, both miri jobs, the
  sanitizer pass, and the AVX-asserting parity test runs). Without
  `-p`, a virtual workspace `--no-default-features` becomes a no-op
  and `--features X` resolves against a feature-less manifest.
- New `dia-burn (standalone)` CI job that builds + tests `dia-burn`
  from its own crate dir. It's workspace-excluded by design (links =
  "tch" collision between the parent's `tch = 0.24` and burn's
  optional `burn-tch ^0.22`) so a plain `cargo build --workspace`
  never touches it; without this job, regressions there would ship
  undetected.
- Drop the stale `silero-vad` feature reference from the docs.rs-
  equivalent doc build and tarpaulin coverage run (the feature was
  removed when silero became a registry dev-dep on `chore/cleanup-ci`,
  but the workflow lines were missed).
- `tch-compile-check` now points at `-p dia-tch` to match the new
  shim layout.

Profile cleanup:
- Move `[profile.bench]` from `crates/dia-core/Cargo.toml` to the
  workspace root. Cargo only honors `[profile.*]` at the workspace
  root once a package becomes a member; the in-crate copy was being
  silently ignored, surfacing as a "profiles for the non root
  package will be ignored" warning on every workspace build.

Local verification: 532 dia-core tests pass, dia-burn standalone
tests pass, dia-ort/dia-tch clippy clean, fmt --check clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant