Skip to content

feat: optional serde support for the descriptor vocabulary (0.1.1)#7

Merged
al8n merged 8 commits into
mainfrom
feat/serde-support
May 22, 2026
Merged

feat: optional serde support for the descriptor vocabulary (0.1.1)#7
al8n merged 8 commits into
mainfrom
feat/serde-support

Conversation

@uqio

@uqio uqio commented May 21, 2026

Copy link
Copy Markdown
Contributor

Adds an off-by-default serde feature providing Serialize/Deserialize across the descriptor vocabulary, then bumps to 0.1.1.

Representation (option C — mirrors the storage backends)

Chosen so a serde-json value matches what the sqlx / mongodb / async-graphql backends already store:

  • Open codec / format enums (codec::{Video,Audio,Subtitle}Codec, container::Format, subtitle::Format, audio::{ChannelLayout, SampleFormat, ContainerFormat}) → canonical as_str() slug; Other("x265")"x265" (no {"Other": …} wrapper).
  • Closed FFmpeg-coded enums (color::{Matrix, Primaries, Transfer, DynamicRange, ChromaLocation, DcpTargetGamut}, pixel_format::PixelFormat, frame::{Rotation, FieldOrder, StereoMode}, subtitle::TrackOrigin, audio::BitRateMode) + disposition::TrackDispositionto_u32() integer. Both round-trips total (unknown slug → Other, unknown code → Unknown).
  • Plain structs (color::Info + HDR/mastering sub-structs, frame::{Dimensions, Rect, Rational, SampleAspectRatio, FrameRate}, audio::{Loudness, Tags, Device}) → field derive.
  • Validated structs (capture::GeoLocation, audio::Fingerprint, audio::CoverArt) → deserialize routed through their checking constructors (via private shadow + try_from), so out-of-range / invariant-violating input is rejected, not materialised.
  • lang::Language → canonical BCP-47 string ("en-US", "zh-Hant-TW", "und").

Feature wiring

  • serde = ["dep:serde", "smol_str?/serde", "bytes?/serde"]; serde pulled default-features = false, features = ["derive"].
  • Orthogonal to the capability tiers: no-alloc Copy types gain serde under bare --features serde; heap-tier types when alloc/std is also on. No serde alloc/std feature needed (borrowed-&str visitors).
  • Centralised enum impls in src/serde_impls.rs; struct derives at their definition sites.

Verification

  • cargo hack --each-feature check — all 23 combos green (incl. serde in isolation).
  • RUSTFLAGS=-Dwarnings cargo test --features serde — 159 tests pass (5 new serde round-trip tests: open-slug, closed-code, structs, BCP-47 language, validated-struct rejection).
  • Explicit combos green under -Dwarnings: --no-default-features --features serde (no-alloc), serde,alloc, serde (std), serde,buffa.
  • cargo fmt --check clean.

… feature)

Adds an off-by-default `serde` feature providing `Serialize`/`Deserialize`
across the descriptor types. The wire shape mirrors the storage backends
(sqlx / mongodb / async-graphql) so a serde value matches their column
representation:

- open codec/format enums serialize as their `as_str()` slug (no
  `{"Other": ...}` wrapper); closed FFmpeg-coded enums + `TrackDisposition`
  as their `to_u32()` integer. Both round-trips total.
- plain structs derive serde directly; validated structs (`GeoLocation`,
  `Fingerprint`, `CoverArt`) deserialize through their checking
  constructors so invariant-violating values are rejected.
- `lang::Language` (de)serializes as its canonical BCP-47 string.

The feature is orthogonal to the capability tiers: no-alloc Copy types
(colour / pixel-format / frame geometry / disposition) gain serde under
bare `--features serde`; heap-tier types when `alloc`/`std` is also on
(forwarding `serde` to `smol_str` / `bytes`). serde stays no-std (no
`alloc`/`std` serde feature needed — borrowed-`&str` visitors).

Centralised enum impls in `serde_impls.rs`; struct derives at their
definition sites. Bumps to 0.1.1.
@codecov

codecov Bot commented May 21, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 94.44444% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
mediaframe/src/serde_impls.rs 91.66% 2 Missing ⚠️

📢 Thoughts on this report? Let us know!

Apply the const-block convention for feature-gated trait impls: the
validated structs (`Fingerprint`, `CoverArt`, `GeoLocation`) and
`Language` now carry their `serde` impls in a single

    #[cfg(feature = "serde")]
    #[cfg_attr(docsrs, doc(cfg(feature = "serde")))]
    const _: () = { ... };

block instead of scattering `#[cfg(feature = "serde")]` across a
module-level shadow struct + `TryFrom` impl + the type's derive. The
validate-on-deserialize shadow is now private to the block (no longer
pollutes the module namespace), one gate covers both directions, and the
docs.rs feature badge is attached.

Plain structs (single `cfg_attr(derive(...))`) and the centralised enum
impls (already grouped in the gated `serde_impls` module) are unchanged —
they are already single-gate groupings. Adds a CoverArt round-trip test
(its Serialize is now hand-written). No behaviour change; 159 tests green.
uqio added 5 commits May 22, 2026 12:22
…ject corrupt codes on strictly-closed enums

Two codex findings on PR #7:

1. [high] `SampleFormat` round-trip lost `Unknown(u32)` values.
   The type has BOTH `Unknown(u32)` (lossless numeric escape) AND `Other(SmolStr)`
   (lossless string escape), but `serde_via_str!` routed everything through
   `as_str()`/`FromStr`. `Unknown(12345)` serialized to `"unknown"`, then
   deserialized via `FromStr` into `Other(SmolStr("unknown"))` — the original
   FFmpeg code was destroyed.

   Fix: bespoke serde impl (anonymous-const block, mirrors the project's
   feature-gated impl pattern):
     - serialize: `Unknown(v)` → numeric `v`; every other variant
       (named slugs + `Other`) → string slug via `as_str()`.
     - deserialize: `deserialize_any` accepts either a u32 (→ `from_u32`,
       preserving `Unknown`) or a string (→ `from_str`, preserving `Other`).
     - Includes `visit_u64`/`visit_i64` so formats that emit non-`u32`
       integers (ron / toml / yaml) work too; out-of-range values are
       rejected with a clear error.

2. [medium] `TrackOrigin` / `BitRateMode` silently canonicalised corrupt
   wire codes. Both are strictly-closed (no `Unknown` arm); their
   `from_u32` falls back to the default for any out-of-range code, so
   `serde_via_code!` would deserialize `999` as `TrackOrigin::Embedded`
   or `BitRateMode::Cbr` — adversarial input passes silently as valid.

   Fix: added `pub const fn try_from_u32(v: u32) -> Option<Self>` to each
   (strict counterpart to `from_u32`). New `serde_via_code_strict!` macro
   uses it and emits a serde error on `None`. The lossless `serde_via_code!`
   stays for enums that actually have an `Unknown(u32)` escape.

Tests (in `serde_impls::tests`):
- `sample_format_preserves_unknown_u32` — checks all three arms round-trip
  (named slug, `Other` slug, `Unknown(v)` for 12_345 / 0xDEADBEEF / u32::MAX)
  and that a numeric input matching a named variant's code canonicalises
  (matches `from_u32` contract).
- `closed_coded_enums_reject_unknown_codes` — every variant round-trips,
  `999` and `3` are rejected as serde errors.

Matrix (RUSTFLAGS=-Dwarnings cargo +1.95.0):
  check -p mediaframe                                         ✓
  check -p mediaframe --no-default-features --features alloc,serde ✓
  check -p mediaframe --all-features                           ✓
  test  -p mediaframe --features serde --lib                  161 pass (+2 new)
…t)` for sparse-deserializable structs

Two codex findings on round 2:

1. [medium] `SampleFormat` deserialize was JSON-only — `deserialize_any`
   doesn't work on non-self-describing formats (bincode/postcard). The
   crate advertised generic serde, so binary-driver users would silently
   fail to round-trip.

   Fix: branch on `Serializer::is_human_readable()` / `Deserializer::
   is_human_readable()`:
     - Self-describing (JSON/YAML/RON/TOML): existing bare-value shape —
       `Unknown(v)` → number, named/Other → string, visitor uses
       `deserialize_any`.
     - Non-self-describing (bincode/postcard/etc): explicit 2-variant
       tagged `BinaryWire<'a>{ Code(u32), Slug(Cow<'a, str>) }`. The
       enum derive picks a compact discriminator; consumers don't need
       `deserialize_any`.

   New regression test `sample_format_postcard_binary_roundtrip` exercises
   the binary branch via postcard for all three arms (named / Other / Unknown).
   Added `postcard 1` to dev-dependencies (alloc-only — keeps the no-std
   build matrix unchanged).

2. [medium] Default-backed metadata structs rejected sparse JSON. `Tags`
   (and the related `Device` / `Loudness`) document empty/zero as the
   absent sentinel, but the serde derive treated every field as required.
   A JSON object containing only `{"title":"x"}` failed to decode, and
   any future field addition would break older persisted records.

   Fix: added `serde(default)` at the container level on each of `Tags`,
   `Device`, `Loudness` so missing fields route through the type-level
   `Default` impl. The other VOs (color/frame value objects) keep their
   strict-required shape — those fields are wire data, not absence
   sentinels, and silently defaulting them would mask data corruption.

   New regression test `sparse_json_uses_serde_default_on_default_backed_structs`
   asserts a one-field JSON deserializes correctly for all three structs.

Matrix (RUSTFLAGS=-Dwarnings cargo +1.95.0):
  test -p mediaframe --features serde --lib                    163 pass (+2 new)
  check -p mediaframe                                          ✓
  check -p mediaframe --features arbitrary,serde               ✓
  check -p mediaframe --no-default-features --features alloc,serde ✓
  check -p mediaframe --all-features                           ✓
… contract

Codex round 3 flagged docs/code mismatch: CHANGELOG + README + module doc
all claimed "both round-trips total" and grouped `TrackOrigin` /
`BitRateMode` with lossless `Unknown(u32)` enums — but round 2 routed
those two through `serde_via_code_strict!`, which rejects unknown codes.
A reader of the docs would expect forward-compat round-tripping; the
impl actually fails on out-of-range input.

Split the docs into three explicit buckets:
  1. Open string enums (slug + `Other` escape).
  2. Closed FFmpeg-coded with lossless `Unknown(u32)` arm.
  3. Strictly-closed (`TrackOrigin`, `BitRateMode`) — reject unknown
     codes; `try_from_u32` backs the rejection.

Module doc additionally calls out the `SampleFormat` bespoke (dual
`Unknown(u32)` + `Other(SmolStr)`, human-readable vs binary branch).
README + CHANGELOG bullet refactored to match.

No code change (the rejection behavior is intentional per round 2).
Test suite unchanged: 163 pass.
Round 4 caught the last docs/code mismatch: CHANGELOG's open-enum bullet
still listed `audio::SampleFormat` alongside the slug-only enums, claiming
it serializes as a canonical `as_str()` string. That's wrong — the
bespoke impl emits `Unknown(v)` as a numeric code on human-readable
formats and uses the `BinaryWire::{Code, Slug}` tagged enum on binary
formats.

Removed `SampleFormat` from the slug-only bullet; added a dedicated
bullet documenting the dual-escape (`Unknown(u32)` + `Other(SmolStr)`)
contract and the human-readable vs binary wire split — matching
`serde_impls.rs` and the README.

No code change.
…slug-fallback claim

Two doc-contract nits from round 5:

1. README's `serde` bullet still gave broad "open enums → slug / coded
   enums → to_u32" rules that implicitly swept in `SampleFormat`. Replaced
   with an explicit per-type sub-list: open enums, Unknown-arm coded
   enums, strictly-closed coded enums, the `SampleFormat` bespoke, and
   `lang::Language` — each with its actual wire shape.

2. CHANGELOG's Unknown-arm coded-enum bullet claimed "unknown slugs →
   `Other`" — but those enums have no `Other` arm and accept only
   integers. Dropped the slug clause; the `Other`-fallback wording
   belongs only to the open string enums (where it already appears).

No code change.
… the no-alloc tier

CI failed `--no-default-features --features serde` on test (ubuntu) +
build/clippy (macos) with `error: unused macro definition:
serde_via_code_strict`.

`serde_via_code_strict!`'s only two invocations — `subtitle::TrackOrigin`
and `audio::BitRateMode` — are both heap-tier, gated on
`any(feature = "std", feature = "alloc")`. Under bare `--features serde`
(the no-alloc capability tier) both are cfg'd out, leaving the macro
defined-but-unused → `unused_macros` → `-D warnings` error.

Added `#[allow(unused_macros)]`, exactly as the sibling `serde_via_str!`
macro already carries for the same reason (its codec/format invocations
are also heap-tier-only).

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an off-by-default serde feature to the mediaframe crate so the descriptor vocabulary can be serialized/deserialized in a way that matches existing storage backend representations, and bumps the crate version to 0.1.1.

Changes:

  • Introduces serde feature wiring in Cargo.toml (with serde as an optional no-std dependency) plus std-only dev-deps for round-trip tests.
  • Implements centralized Serialize/Deserialize for descriptor enums (slug-based for “open” enums, integer-code for FFmpeg-coded enums, strict rejection for closed enums, bespoke dual-mode handling for audio::SampleFormat).
  • Adds/updates serde derives and validate-on-deserialize implementations for structs (including serde(default) to support sparse JSON for certain metadata structs) and documents the feature in README + changelog.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated no comments.

Show a summary per file
File Description
README.md Documents the new optional serde feature and its wire representation rules.
mediaframe/src/subtitle/track_origin.rs Adds try_from_u32 to support strict serde deserialization.
mediaframe/src/serde_impls.rs Centralizes serde impls for enums and adds serde round-trip tests (std-only).
mediaframe/src/lib.rs Wires in the serde_impls module behind feature = "serde".
mediaframe/src/lang.rs Adds serde (de)serialization for Language as canonical BCP-47 strings.
mediaframe/src/frame/mod.rs Adds serde derives for core frame primitives (Dimensions, Rect, Rational, etc.).
mediaframe/src/color.rs Adds serde derives for color/HDR structs.
mediaframe/src/capture/geo.rs Adds validate-on-deserialize serde impls for GeoLocation.
mediaframe/src/capture/device.rs Adds serde derives + serde(default) for sparse JSON compatibility.
mediaframe/src/audio/tags.rs Adds serde derives + serde(default) for sparse JSON compatibility.
mediaframe/src/audio/loudness.rs Adds serde derives + serde(default) for sparse JSON compatibility.
mediaframe/src/audio/fingerprint.rs Adds validate-on-deserialize serde impls for Fingerprint.
mediaframe/src/audio/cover_art.rs Adds validate-on-deserialize serde impls for CoverArt.
mediaframe/src/audio/bit_rate_mode.rs Adds try_from_u32 to support strict serde deserialization.
mediaframe/Cargo.toml Bumps version to 0.1.1, adds serde feature + serde dependency and dev-deps for tests.
CHANGELOG.md Adds 0.1.1 entry describing the new serde feature and wire shapes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@al8n al8n merged commit f60a94c into main May 22, 2026
44 checks passed
@al8n al8n deleted the feat/serde-support branch May 22, 2026 01:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants