Skip to content

feat: optional arbitrary + quickcheck support for the descriptor vocabulary#8

Merged
al8n merged 30 commits into
mainfrom
feat/arbitrary
May 22, 2026
Merged

feat: optional arbitrary + quickcheck support for the descriptor vocabulary#8
al8n merged 30 commits into
mainfrom
feat/arbitrary

Conversation

@uqio
Copy link
Copy Markdown
Contributor

@uqio uqio commented May 21, 2026

Adds two optional test-data-generation features for the 39 descriptor-vocabulary types, plus a audio::Tags type fix. (PR #9 was folded into this branch — feat/quickcheck retired; this is the single reviewable PR.)

Features

  • arbitrary — hand-written arbitrary::Arbitrary impls via the types' public constructors. Std-only (the arbitrary crate is std-based). Module arbitrary_impls/.
  • quickcheck — native quickcheck::Arbitrary via the published quickcheck-richderive 0.2 derive (#[quickcheck(arbitrary = "…")]fn(&mut Gen) -> Self helpers). Module quickcheck_helpers/. Independent of arbitrary.

Both cover the identical 39-type surface as the serde feature.

audio::Tags redesign (folded in)

Resolves a type/wire-codec contradiction surfaced during review:

  • Numeric fields (year/track_number/track_total/disc_number/disc_total): Option<u16> → bare u16, 0 = absent. The buffa codec already proto-zero-elided these, so Some(0) silently decoded to None — the Option was a lie. Bare u16 makes type + wire agree.
  • language: Option<SmolStr> placeholder → Option<crate::lang::Language> (closes the TODO(lang)) — parsed BCP-47.

Generation strategies

  • Coded enums: from_u32; small enums weighted (named-variants / code-weighted); large enums range-weighted (Matrix hand-written for the DOMAIN_EXT_BASE Bt601 domain variant).
  • Open string enums: curated slug ⇆ Other, all construction through FromStr so generated values are canonical (no malformed Other("h264")).
  • Validated try_new types built valid-by-construction; Loudness floats kept finite (NaN/inf break JSON round-trip); SampleFormat 3-way (named / Unknown(u32) / Other).
  • Reachability tests assert every named variant / arm / domain code is actually generated.

Codex adversarial review

Hardened through a multi-round /codex:adversarial-review loop on both feature halves (enum-reachability, canonical-value, no-std-feature-honesty, finite-float, Tags.language, and finally the Tags numeric redesign).

Matrix (RUSTFLAGS=-Dwarnings cargo +1.95.0)

  • test -p mediaframe --all-features — 875 pass
  • check — default / no-default+arbitrary / no-default+serde / quickcheck,serde,buffa
  • hack --each-feature check
  • fmt --check

@codecov
Copy link
Copy Markdown

codecov Bot commented May 21, 2026

uqio added a commit that referenced this pull request May 22, 2026
…ded enums + all 12 SampleFormat slugs

Codex round-2 findings on PR #8:

1. [high] `Matrix`/`Primaries`/`Transfer`/`PixelFormat` were still on
   `arb_via_code!` (uniform u32 → from_u32). Their named codes cluster in
   low integers (Matrix 0..=17, Primaries 0..=22, Transfer 0..=18,
   PixelFormat 270 codes over 0..=947), so uniform u32 reaches a named
   variant essentially never — fuzz/property tests over colour + pixel
   descriptors ran almost entirely on the `Unknown(_)` path.

   Fix: new `arb_via_code_weighted_range!(Ty, max_named = N)` macro —
   50/50 between `int_in_range(0..=max_named)` (lands on named variants
   most of the time; gaps fall on `Unknown`, fine) and full-range u32
   (broad `Unknown` exercise). Applied to the four large enums; `max_named`
   read from each `to_u32` match. `TrackDisposition` stays on `arb_via_code!`
   — it's bitflags, every u32 bit pattern is meaningful.

2. [medium] The bespoke `SampleFormat` generator's named branch only
   sampled 6 of the 12 named slugs, so `dbl`/`u8p`/`s32p`/`dblp`/`s64`/
   `s64p` were reachable only via the rare numeric branch drawing their
   exact 0..=11 code.

   Fix: `SLUGS` const now lists all 12 canonical slugs.

Tests:
- `reachability_sample_format_all_named_plus_arms` — strengthened to
  assert all 12 named variants observed (HashSet of `as_str()`), plus
  `Unknown(_)` and `Other(_)`, over 4096 rounds.
- `reachability_range_weighted_enums_hit_named_codes` (new) — asserts
  the range-weighted enums hit a broad named-code set: Matrix ≥10,
  Primaries ≥8, Transfer ≥10, PixelFormat ≥40 distinct named codes
  over 8192 rounds.

Matrix (RUSTFLAGS=-Dwarnings cargo +1.95.0):
  test -p mediaframe --features arbitrary --lib              163 pass
  check -p mediaframe / --features arbitrary,serde / --all-features  ✓
uqio added a commit that referenced this pull request May 22, 2026
…d enums + all 12 SampleFormat slugs

Codex round-2 findings on PR #9 (mirror of the PR #8 fixes):

1. [medium] `matrix`/`primaries`/`transfer`/`pixel_format` helpers still
   decoded a uniform `u32` through `from_u32`. Their named codes cluster
   in low integers (Matrix 0..=17, Primaries 0..=22, Transfer 0..=18,
   PixelFormat 270 codes over 0..=947), so QuickCheck `u32::arbitrary`
   reaches a named variant essentially never — properties over colour /
   pixel descriptors ran almost entirely on the `Unknown(_)` path.

   Fix: new `qc_via_code_weighted_range!(fn => Ty, max_named = N)` macro
   — 50/50 between an in-range code (`u32 % (max+1)`, since `quickcheck::
   Gen` has no `int_in_range`) and a full-range `u32`. `track_disposition`
   stays on `arb_via_code!` — bitflags, every bit pattern meaningful.

2. [medium] The bespoke `sample_format` helper's named arm only sampled
   6 of 12 slugs; `dbl`/`u8p`/`s32p`/`dblp`/`s64`/`s64p` were reachable
   only via the rare numeric arm. `SLUGS` now lists all 12.

Tests:
- `reachability_sample_format_all_named_plus_arms` — strengthened to
  assert all 12 named variants observed + `Unknown(_)` + `Other(_)`.
- `reachability_range_weighted_enums_hit_named_codes` (new) — Matrix ≥10,
  Primaries ≥8, Transfer ≥10, PixelFormat ≥40 distinct named codes.

Matrix (RUSTFLAGS=-Dwarnings cargo +1.95.0):
  test -p mediaframe --features quickcheck --lib              163 pass
  check -p mediaframe / --features arbitrary,quickcheck / --all-features  ✓
Base automatically changed from feat/serde-support to main May 22, 2026 01:36
uqio added a commit that referenced this pull request May 22, 2026
…ded enums + all 12 SampleFormat slugs

Codex round-2 findings on PR #8:

1. [high] `Matrix`/`Primaries`/`Transfer`/`PixelFormat` were still on
   `arb_via_code!` (uniform u32 → from_u32). Their named codes cluster in
   low integers (Matrix 0..=17, Primaries 0..=22, Transfer 0..=18,
   PixelFormat 270 codes over 0..=947), so uniform u32 reaches a named
   variant essentially never — fuzz/property tests over colour + pixel
   descriptors ran almost entirely on the `Unknown(_)` path.

   Fix: new `arb_via_code_weighted_range!(Ty, max_named = N)` macro —
   50/50 between `int_in_range(0..=max_named)` (lands on named variants
   most of the time; gaps fall on `Unknown`, fine) and full-range u32
   (broad `Unknown` exercise). Applied to the four large enums; `max_named`
   read from each `to_u32` match. `TrackDisposition` stays on `arb_via_code!`
   — it's bitflags, every u32 bit pattern is meaningful.

2. [medium] The bespoke `SampleFormat` generator's named branch only
   sampled 6 of the 12 named slugs, so `dbl`/`u8p`/`s32p`/`dblp`/`s64`/
   `s64p` were reachable only via the rare numeric branch drawing their
   exact 0..=11 code.

   Fix: `SLUGS` const now lists all 12 canonical slugs.

Tests:
- `reachability_sample_format_all_named_plus_arms` — strengthened to
  assert all 12 named variants observed (HashSet of `as_str()`), plus
  `Unknown(_)` and `Other(_)`, over 4096 rounds.
- `reachability_range_weighted_enums_hit_named_codes` (new) — asserts
  the range-weighted enums hit a broad named-code set: Matrix ≥10,
  Primaries ≥8, Transfer ≥10, PixelFormat ≥40 distinct named codes
  over 8192 rounds.

Matrix (RUSTFLAGS=-Dwarnings cargo +1.95.0):
  test -p mediaframe --features arbitrary --lib              163 pass
  check -p mediaframe / --features arbitrary,serde / --all-features  ✓
@uqio uqio force-pushed the feat/arbitrary branch from 056a245 to a7499df Compare May 22, 2026 01:37
uqio added a commit that referenced this pull request May 22, 2026
…d enums + all 12 SampleFormat slugs

Codex round-2 findings on PR #9 (mirror of the PR #8 fixes):

1. [medium] `matrix`/`primaries`/`transfer`/`pixel_format` helpers still
   decoded a uniform `u32` through `from_u32`. Their named codes cluster
   in low integers (Matrix 0..=17, Primaries 0..=22, Transfer 0..=18,
   PixelFormat 270 codes over 0..=947), so QuickCheck `u32::arbitrary`
   reaches a named variant essentially never — properties over colour /
   pixel descriptors ran almost entirely on the `Unknown(_)` path.

   Fix: new `qc_via_code_weighted_range!(fn => Ty, max_named = N)` macro
   — 50/50 between an in-range code (`u32 % (max+1)`, since `quickcheck::
   Gen` has no `int_in_range`) and a full-range `u32`. `track_disposition`
   stays on `arb_via_code!` — bitflags, every bit pattern meaningful.

2. [medium] The bespoke `sample_format` helper's named arm only sampled
   6 of 12 slugs; `dbl`/`u8p`/`s32p`/`dblp`/`s64`/`s64p` were reachable
   only via the rare numeric arm. `SLUGS` now lists all 12.

Tests:
- `reachability_sample_format_all_named_plus_arms` — strengthened to
  assert all 12 named variants observed + `Unknown(_)` + `Other(_)`.
- `reachability_range_weighted_enums_hit_named_codes` (new) — Matrix ≥10,
  Primaries ≥8, Transfer ≥10, PixelFormat ≥40 distinct named codes.

Matrix (RUSTFLAGS=-Dwarnings cargo +1.95.0):
  test -p mediaframe --features quickcheck --lib              163 pass
  check -p mediaframe / --features arbitrary,quickcheck / --all-features  ✓
uqio added a commit that referenced this pull request May 22, 2026
…ly values

Codex round-4 finding on PR #9: `quickcheck_helpers::composite::tags`
populated 12 of the 13 `Tags` fields but never `language`, so every
generated `Tags` had `language == None` — property tests could not catch
language-tag loss / corruption.

Fix: the `tags` helper now generates `language` too, 50/50 between `None`
and `Some(<arbitrary string>)`.

Also carried the PR #8 round-4 canonical-value fix into the quickcheck
helpers (same latent bug, separate file tree):
  - `qc_open_string_enum!` and the bespoke `sample_format` helper built
    `Other(SmolStr::from(s))` directly; a string equal to a named slug
    produced a malformed `Other("h264")` that serde canonicalises to
    `H264` — not round-trippable.
  - Both now route string construction through `FromStr` (the
    canonicalising constructor), so every generated value is canonical.

Tests:
- `reachability_tags_language_hits_none_and_some` — asserts both `None`
  and `Some(_)` are observed.
- `arbitrary_values_survive_serde_round_trip` (gated on `serde`) — 4096
  rounds, `SampleFormat` + `VideoCodec` round-trip through `serde_json`.

This branch was also rebased onto the updated `feat/arbitrary` (which
itself rebased onto the serde-fixed `feat/serde-support`) — keeps the
stack coherent so reviews see the real combined state.

Matrix (RUSTFLAGS=-Dwarnings cargo +1.95.0):
  test -p mediaframe --features quickcheck --lib              164 pass
  test -p mediaframe --features quickcheck,serde --lib        174 pass
  check -p mediaframe --all-features                          ✓
uqio added 9 commits May 22, 2026 13:47
Adds the `arbitrary` feature + `arbitrary 1` optional dep and the empty
module skeleton: `arbitrary_impls/{mod.rs, strings.rs, coded.rs, composite.rs}`.

Cluster ownership (each owns its own file; cross-cluster type cascades resolve
through the module's items):
  strings.rs   — open string enums w/ `Other(SmolStr)` (codec×3, container,
                 subtitle::Format, audio open formats).
  coded.rs     — closed `from_u32` coded enums + colour / frame / pixel-format
                 / disposition structs and enums.
  composite.rs — audio composite metadata, capture, language.

Shared `arb_via_code!` and `arb_open_string_enum!` macros live in `mod.rs`.

This is the baseline commit; the three cluster files are stub-only — the
actual impls land in three parallel follow-up commits, one per cluster.
…tructs

Coded enums via the `arb_via_code!` macro (single `from_u32` round-trip
covers every named variant + the `Unknown(u32)` / `Reserved(_)` arms):
  - color::{Matrix, Primaries, Transfer, DynamicRange, ChromaLocation,
            DcpTargetGamut}
  - pixel_format::PixelFormat
  - frame::{Rotation, FieldOrder, StereoMode}
  - subtitle::TrackOrigin
  - audio::BitRateMode
  - disposition::TrackDisposition (bitflags w/ lossless to_u32/from_u32)

Colour structs (hand-written via public `new`):
  - color::Info               (Primaries, Transfer, Matrix, DynamicRange,
                               ChromaLocation — cascades through cluster's
                               own coded-enum impls)
  - color::ContentLightLevel  (u32, u32)
  - color::ChromaCoord        (u32, u32)
  - color::MasteringDisplay   ([ChromaCoord; 3], ChromaCoord, u32, u32)
  - color::HdrStaticMetadata  (Option<MasteringDisplay>,
                               Option<ContentLightLevel>)
  - color::DolbyVisionConfig  (u8, u8, bool, bool, u8)

Frame structs (hand-written via public `new`):
  - frame::Dimensions         (u32, u32)
  - frame::Rect               (u32, u32, u32, u32)
  - frame::Rational           (u32, NonZeroU32 — uses
                               `NonZeroU32::arbitrary`, supplied by the
                               `arbitrary` crate, so the fuzzer skips
                               the zero corpus byte instead of clamping)
  - frame::SampleAspectRatio  (u32, NonZeroU32)
  - frame::FrameRate          (Rational, bool)

No constructor shapes differed from the brief.
Types covered:
  * audio::Loudness     — plain Loudness::new(f32×4) over arbitrary f32s
  * audio::Fingerprint  — try_new(algorithm, value); algorithm
                          guaranteed non-empty via 'x' fallback so the
                          expect() is sound; empty value allowed
  * audio::CoverArt     — try_new(mime, data); mime falls back to
                          'application/octet-stream' and data to [0u8]
                          so both non-empty invariants hold by
                          construction
  * audio::Tags         — Tags::new() + maybe_*/with_* builders
                          covering 12 fields: title, artist,
                          album_artist, album, composer, genre,
                          comment (SmolStr empty=absent) + year,
                          track_number, track_total, disc_number,
                          disc_total (Option<u16>, None≠Some(0))
  * capture::Device     — Device::new() + with_make + with_model;
                          SmolStr empty=absent passes through
  * capture::GeoLocation — try_new(lat, lon, Option<altitude>);
                          lat/lon built with int_in_range so they're
                          in [-90,90]/[-180,180] by construction
                          (1/100° resolution); altitude is always
                          finite f32 when Some, so the expect()
                          holds
  * lang::Language      — from_bcp47 on a curated 12-tag list
                          (und, en, en-US, es, fr, de, ja, zh-Hant-TW,
                          pt-BR, ar, ru, ko) covering language-only,
                          language+region, language+script+region,
                          and the und sentinel

Validation strategy: for every try_new type, valid inputs are
produced FIRST (in-range ints via int_in_range, non-empty strings
via fallback constants) before .expect() — fuzz input never reaches
a validator in a way that could panic.

Verified: cargo fmt --check / RUSTFLAGS=-Dwarnings cargo check -p
mediaframe --features arbitrary / RUSTFLAGS=-Dwarnings cargo check
-p mediaframe (default).
…tle/audio)

Fills in mediaframe/src/arbitrary_impls/strings.rs with hand-written

`impl arbitrary::Arbitrary` blocks for the eight open string enums in

the descriptor vocabulary.

Strategy: `super::arb_open_string_enum!` with ~6 curated canonical

FFmpeg / file-extension slugs per type (drawn from each type's own

`as_str()` match). The macro flips a coin between round-tripping a

named slug through the type's total `FromStr` and constructing the

`Other(SmolStr)` lossless-escape arm from an arbitrary string — every

type covered here satisfies the macro's preconditions (`Other(SmolStr)`

arm + `FromStr` with `Err = core::convert::Infallible`).

Types covered:

  - codec::VideoCodec (h264, hevc, av1, vp9, mpeg4, prores)

  - codec::AudioCodec (aac, mp3, opus, flac, ac3, alac)

  - codec::SubtitleCodec (srt, ass, ssa, webvtt, mov_text, dvb_subtitle)

  - container::Format (mp4, mkv, webm, mov, avi, mpegts)

  - subtitle::Format (srt, webvtt, ass, ssa, mov_text, ttml)

  - audio::ChannelLayout (mono, stereo, 5.1, 7.1, quad, 5.0)

  - audio::SampleFormat (s16, s32, flt, s16p, fltp, u8)

  - audio::ContainerFormat (mp3, aac, flac, wav, m4a, opus)

Nothing skipped: every type listed in the cluster brief is an open

string enum (audio::SampleFormat is mixed — has both `Unknown(u32)` and

`Other(SmolStr)` — but the brief specifies open-string strategy for the

mixed case so the `Other` arm gets exercised, since the closed-coded

path is covered by cluster B types). Coded-only enums (no `Other`) are

left for cluster B's `arb_via_code!`.
Adds a deterministic test harness (`drive`) that mixes a seed into a 4 KiB
buffer and feeds it through `Unstructured` across N rounds. Five tests gated
on `#[cfg(test)]` (the test module is unconditional, but the wider
`arbitrary_impls` module is `#[cfg(feature = "arbitrary")]`):

  - geo_location_invariant_lat_lon_in_range          (256 rounds)
  - fingerprint_invariant_algorithm_non_empty        (256 rounds)
  - cover_art_invariant_mime_and_data_non_empty      (256 rounds)
  - smoke_yields_values_for_representative_types     (64 rounds, 5 types)
  - coded_enums_roundtrip_through_code               (128 rounds, 4 enums)

Verifies what hand-written `Arbitrary` impls can get wrong: validated
constructors' invariants survive arbitrary fuzz bytes (no `.unwrap()` panics
on attacker input), and coded enums round-trip through their `to_u32` /
`from_u32` code losslessly.
…nums + bespoke SampleFormat 3-way

Two Codex round-1 findings on the descriptor-vocabulary 'arbitrary::Arbitrary'
impls:

  [medium] arb_via_code! almost never reaches named variants of low-
  cardinality coded enums. Feeding a uniform u32 into 'from_u32' for an
  enum whose named codes are 0/1/2 lands on a named variant
  ~3-in-4-billion of the time; everything else collapses to the default
  (closed enums) or 'Unknown(_)' (open enums). Most named variants are
  effectively unreachable in fuzz / property-driven runs.

  [medium] SampleFormat::Unknown(u32) is unreachable. SampleFormat
  carries BOTH 'Unknown(u32)' AND 'Other(SmolStr)'; the shared
  'arb_open_string_enum!' macro only exercises curated slugs and
  'Other', skipping the numeric-escape arm entirely.

Fix — two new macros in arbitrary_impls/mod.rs:

  arb_via_named_variants!   strictly closed coded enums (no
                            'Unknown(u32)' arm). Picks uniformly from
                            the named variants via 'Unstructured::choose'.
                            Applied to 'audio::BitRateMode' and
                            'subtitle::TrackOrigin'.

  arb_via_code_weighted!    closed coded enums WITH 'Unknown(u32)' and
                            < 10 named variants. 50/50 between named
                            'choose' and arbitrary-u32 'from_u32'.
                            Applied to 'color::{DynamicRange,
                            ChromaLocation, DcpTargetGamut}' and
                            'frame::{Rotation, FieldOrder, StereoMode}'.

'arb_via_code!' is retained for the large coded enums where uniform u32
genuinely exercises the named ranges ('color::{Matrix, Primaries,
Transfer}', 'pixel_format::PixelFormat') and for the bitflags
('disposition::TrackDisposition').

For 'SampleFormat' a bespoke 3-way generator lives next to the open-
string-enum invocations in strings.rs: pick a curated slug via
'FromStr', a uniform u32 via 'from_u32' (so 'Unknown(v)' is hit on
every non-canonical code, ~99.99% of the value space), or an arbitrary
'String' wrapped in 'SmolStr::from' for 'Other'.

Reachability tests live next to the shared 'drive' helper in
arbitrary_impls/mod.rs. A new 'drive_per_round' constructs a fresh
'Unstructured' (seeded by a SplitMix64-ish per-round state) for every
round, avoiding the single-buffer exhaustion that biases every per-
round decode to the all-zero fallback. Tests assert:

  * 'BitRateMode' and 'TrackOrigin' visit every named variant.
  * 'Rotation' visits every named variant AND 'Unknown(_)'.
  * 'SampleFormat' visits all three arms (named / Unknown / Other).

Verified: 'cargo fmt -p mediaframe --check', 'cargo test -p mediaframe
--features arbitrary --lib' (162 passed), and 'cargo check' across
default / arbitrary+serde / no-default+arbitrary+alloc / all-features,
all with '-Dwarnings'.
…ded enums + all 12 SampleFormat slugs

Codex round-2 findings on PR #8:

1. [high] `Matrix`/`Primaries`/`Transfer`/`PixelFormat` were still on
   `arb_via_code!` (uniform u32 → from_u32). Their named codes cluster in
   low integers (Matrix 0..=17, Primaries 0..=22, Transfer 0..=18,
   PixelFormat 270 codes over 0..=947), so uniform u32 reaches a named
   variant essentially never — fuzz/property tests over colour + pixel
   descriptors ran almost entirely on the `Unknown(_)` path.

   Fix: new `arb_via_code_weighted_range!(Ty, max_named = N)` macro —
   50/50 between `int_in_range(0..=max_named)` (lands on named variants
   most of the time; gaps fall on `Unknown`, fine) and full-range u32
   (broad `Unknown` exercise). Applied to the four large enums; `max_named`
   read from each `to_u32` match. `TrackDisposition` stays on `arb_via_code!`
   — it's bitflags, every u32 bit pattern is meaningful.

2. [medium] The bespoke `SampleFormat` generator's named branch only
   sampled 6 of the 12 named slugs, so `dbl`/`u8p`/`s32p`/`dblp`/`s64`/
   `s64p` were reachable only via the rare numeric branch drawing their
   exact 0..=11 code.

   Fix: `SLUGS` const now lists all 12 canonical slugs.

Tests:
- `reachability_sample_format_all_named_plus_arms` — strengthened to
  assert all 12 named variants observed (HashSet of `as_str()`), plus
  `Unknown(_)` and `Other(_)`, over 4096 rounds.
- `reachability_range_weighted_enums_hit_named_codes` (new) — asserts
  the range-weighted enums hit a broad named-code set: Matrix ≥10,
  Primaries ≥8, Transfer ≥10, PixelFormat ≥40 distinct named codes
  over 8192 rounds.

Matrix (RUSTFLAGS=-Dwarnings cargo +1.95.0):
  test -p mediaframe --features arbitrary --lib              163 pass
  check -p mediaframe / --features arbitrary,serde / --all-features  ✓
…bility

Codex round-3 finding: `Matrix` has a mediaframe-domain variant `Bt601`
at `DOMAIN_EXT_BASE` (`0x8000_0000`) — far outside the H.273 `0..=17`
range the round-2 `arb_via_code_weighted_range!` macro biases toward, so
`Bt601` was reachable only via the full-`u32` fallback (~1-in-8.6-billion);
the reachability test missed it (counted only codes ≤ 17).

Fix: `Matrix` is now hand-written (out of the range macro) — a 3-way
`int_in_range(0..=2)` dispatch: in-range H.273 code / the domain-ext code
(`DOMAIN_EXT_BASE`, decodes to `Bt601`) / full-range `u32`. `Matrix` was
the only coded enum with a domain variant — Primaries / Transfer /
PixelFormat have none, so they stay on the range macro.

Test `reachability_range_weighted_enums_hit_named_codes` strengthened to
assert `Matrix` observed `DOMAIN_EXT_BASE` specifically.

163 tests pass; `--all-features` clean.
…rippable) values

Codex round-4 finding: arbitrary-generated `SampleFormat` values commonly
lost identity through a serde round-trip — two causes:

1. The bespoke `SampleFormat` arbitrary impl built `Other(SmolStr::from(
   <arbitrary String>))` directly. When the string happened to equal a
   named slug (e.g. `"s16"`), that produced a malformed `Other("s16")`
   which serde canonicalises to `S16` — not round-trippable.

   Fix: the `Other` branch now routes through `FromStr` (the canonicalising
   constructor — a named slug → the named variant, only a non-named slug →
   `Other`). Every generated value is now canonical. The `arb_open_string_
   enum!` macro had the identical latent bug for `codec::*` / `container::
   Format` / `subtitle::Format` etc. (`Other("h264")` → `H264` on
   round-trip) — fixed the same way: both macro branches go through
   `FromStr`.

2. The `Unknown(u32)` concern in the finding was an artefact of the stale
   PR base — `feat/arbitrary` was branched off the pre-fix `feat/serde-
   support`. Rebasing onto the current serde tip brings in the bespoke
   `SampleFormat` serde (commit "preserve SampleFormat::Unknown(u32)")
   which preserves `Unknown(v)` numerically. No code change needed for
   this half — the rebase resolves it.

New test `arbitrary_values_survive_serde_round_trip` (gated on
`feature = "serde"`) — 4096 rounds asserting `SampleFormat` and
`VideoCodec` arbitrary values round-trip through `serde_json` unchanged.

Matrix (RUSTFLAGS=-Dwarnings cargo +1.95.0):
  test -p mediaframe --features arbitrary,serde --lib          173 pass
  check -p mediaframe / --no-default-features arbitrary,alloc / --all-features  ✓
@uqio uqio force-pushed the feat/arbitrary branch from a7499df to 97bf4ff Compare May 22, 2026 01:48
uqio added a commit that referenced this pull request May 22, 2026
…d enums + all 12 SampleFormat slugs

Codex round-2 findings on PR #9 (mirror of the PR #8 fixes):

1. [medium] `matrix`/`primaries`/`transfer`/`pixel_format` helpers still
   decoded a uniform `u32` through `from_u32`. Their named codes cluster
   in low integers (Matrix 0..=17, Primaries 0..=22, Transfer 0..=18,
   PixelFormat 270 codes over 0..=947), so QuickCheck `u32::arbitrary`
   reaches a named variant essentially never — properties over colour /
   pixel descriptors ran almost entirely on the `Unknown(_)` path.

   Fix: new `qc_via_code_weighted_range!(fn => Ty, max_named = N)` macro
   — 50/50 between an in-range code (`u32 % (max+1)`, since `quickcheck::
   Gen` has no `int_in_range`) and a full-range `u32`. `track_disposition`
   stays on `arb_via_code!` — bitflags, every bit pattern meaningful.

2. [medium] The bespoke `sample_format` helper's named arm only sampled
   6 of 12 slugs; `dbl`/`u8p`/`s32p`/`dblp`/`s64`/`s64p` were reachable
   only via the rare numeric arm. `SLUGS` now lists all 12.

Tests:
- `reachability_sample_format_all_named_plus_arms` — strengthened to
  assert all 12 named variants observed + `Unknown(_)` + `Other(_)`.
- `reachability_range_weighted_enums_hit_named_codes` (new) — Matrix ≥10,
  Primaries ≥8, Transfer ≥10, PixelFormat ≥40 distinct named codes.

Matrix (RUSTFLAGS=-Dwarnings cargo +1.95.0):
  test -p mediaframe --features quickcheck --lib              163 pass
  check -p mediaframe / --features arbitrary,quickcheck / --all-features  ✓
uqio added a commit that referenced this pull request May 22, 2026
…ly values

Codex round-4 finding on PR #9: `quickcheck_helpers::composite::tags`
populated 12 of the 13 `Tags` fields but never `language`, so every
generated `Tags` had `language == None` — property tests could not catch
language-tag loss / corruption.

Fix: the `tags` helper now generates `language` too, 50/50 between `None`
and `Some(<arbitrary string>)`.

Also carried the PR #8 round-4 canonical-value fix into the quickcheck
helpers (same latent bug, separate file tree):
  - `qc_open_string_enum!` and the bespoke `sample_format` helper built
    `Other(SmolStr::from(s))` directly; a string equal to a named slug
    produced a malformed `Other("h264")` that serde canonicalises to
    `H264` — not round-trippable.
  - Both now route string construction through `FromStr` (the
    canonicalising constructor), so every generated value is canonical.

Tests:
- `reachability_tags_language_hits_none_and_some` — asserts both `None`
  and `Some(_)` are observed.
- `arbitrary_values_survive_serde_round_trip` (gated on `serde`) — 4096
  rounds, `SampleFormat` + `VideoCodec` round-trip through `serde_json`.

This branch was also rebased onto the updated `feat/arbitrary` (which
itself rebased onto the serde-fixed `feat/serde-support`) — keeps the
stack coherent so reviews see the real combined state.

Matrix (RUSTFLAGS=-Dwarnings cargo +1.95.0):
  test -p mediaframe --features quickcheck --lib              164 pass
  test -p mediaframe --features quickcheck,serde --lib        174 pass
  check -p mediaframe --all-features                          ✓
…Loudness floats

Two codex round-5 findings on PR #8:

1. [high] The reachability tests used `std::collections::HashSet`. Under
   `--no-default-features --features arbitrary` the crate is no-std
   (`arbitrary` pulls only `alloc`), where `lib.rs` aliases `alloc as std`
   — and `alloc` has no `HashSet`. The cargo-hack feature-powerset CI
   path `--no-default-features --features arbitrary` failed to compile
   the test build.

   Fix: switched all reachability-test sets to `BTreeSet` (in
   `alloc::collections`, works on every tier). `BitRateMode` / `TrackOrigin`
   aren't `Ord`, so those two sets key on `to_u32()`. Also replaced a
   `&str::to_string()` (needs `ToString`, not in the no-std prelude) with
   `String::from`.

2. [medium] `Loudness`'s `arbitrary` impl filled its four `f32` fields via
   `f32::arbitrary`, which builds floats from raw bits — NaN / ±inf are
   common. `Loudness` derives serde; JSON serializes a non-finite `f32` as
   `null`, which then fails to deserialize back into `f32`. Arbitrary
   `Loudness` values did not reliably survive a serde round-trip.

   Fix: `Loudness` fields are now generated finite — a bounded integer
   `[-10_000_000, 10_000_000]` mapped to `f32 / 100.0` (finite, range
   [-100_000, 100_000], covers every real EBU R128 scalar). Extended
   `arbitrary_values_survive_serde_round_trip` to round-trip `Loudness`
   too (it previously covered only `SampleFormat` + `VideoCodec`).

Verified (RUSTFLAGS=-Dwarnings cargo +1.95.0):
  test -p mediaframe --no-default-features --features arbitrary --lib   159 pass
  test -p mediaframe --features arbitrary,serde --lib                   173 pass
  hack --each-feature check -p mediaframe                               ✓
  check -p mediaframe --all-features                                    ✓
uqio added a commit that referenced this pull request May 22, 2026
…d enums + all 12 SampleFormat slugs

Codex round-2 findings on PR #9 (mirror of the PR #8 fixes):

1. [medium] `matrix`/`primaries`/`transfer`/`pixel_format` helpers still
   decoded a uniform `u32` through `from_u32`. Their named codes cluster
   in low integers (Matrix 0..=17, Primaries 0..=22, Transfer 0..=18,
   PixelFormat 270 codes over 0..=947), so QuickCheck `u32::arbitrary`
   reaches a named variant essentially never — properties over colour /
   pixel descriptors ran almost entirely on the `Unknown(_)` path.

   Fix: new `qc_via_code_weighted_range!(fn => Ty, max_named = N)` macro
   — 50/50 between an in-range code (`u32 % (max+1)`, since `quickcheck::
   Gen` has no `int_in_range`) and a full-range `u32`. `track_disposition`
   stays on `arb_via_code!` — bitflags, every bit pattern meaningful.

2. [medium] The bespoke `sample_format` helper's named arm only sampled
   6 of 12 slugs; `dbl`/`u8p`/`s32p`/`dblp`/`s64`/`s64p` were reachable
   only via the rare numeric arm. `SLUGS` now lists all 12.

Tests:
- `reachability_sample_format_all_named_plus_arms` — strengthened to
  assert all 12 named variants observed + `Unknown(_)` + `Other(_)`.
- `reachability_range_weighted_enums_hit_named_codes` (new) — Matrix ≥10,
  Primaries ≥8, Transfer ≥10, PixelFormat ≥40 distinct named codes.

Matrix (RUSTFLAGS=-Dwarnings cargo +1.95.0):
  test -p mediaframe --features quickcheck --lib              163 pass
  check -p mediaframe / --features arbitrary,quickcheck / --all-features  ✓
uqio added a commit that referenced this pull request May 22, 2026
…ly values

Codex round-4 finding on PR #9: `quickcheck_helpers::composite::tags`
populated 12 of the 13 `Tags` fields but never `language`, so every
generated `Tags` had `language == None` — property tests could not catch
language-tag loss / corruption.

Fix: the `tags` helper now generates `language` too, 50/50 between `None`
and `Some(<arbitrary string>)`.

Also carried the PR #8 round-4 canonical-value fix into the quickcheck
helpers (same latent bug, separate file tree):
  - `qc_open_string_enum!` and the bespoke `sample_format` helper built
    `Other(SmolStr::from(s))` directly; a string equal to a named slug
    produced a malformed `Other("h264")` that serde canonicalises to
    `H264` — not round-trippable.
  - Both now route string construction through `FromStr` (the
    canonicalising constructor), so every generated value is canonical.

Tests:
- `reachability_tags_language_hits_none_and_some` — asserts both `None`
  and `Some(_)` are observed.
- `arbitrary_values_survive_serde_round_trip` (gated on `serde`) — 4096
  rounds, `SampleFormat` + `VideoCodec` round-trip through `serde_json`.

This branch was also rebased onto the updated `feat/arbitrary` (which
itself rebased onto the serde-fixed `feat/serde-support`) — keeps the
stack coherent so reviews see the real combined state.

Matrix (RUSTFLAGS=-Dwarnings cargo +1.95.0):
  test -p mediaframe --features quickcheck --lib              164 pass
  test -p mediaframe --features quickcheck,serde --lib        174 pass
  check -p mediaframe --all-features                          ✓
uqio added a commit that referenced this pull request May 22, 2026
…de round-trip coverage

Carries the PR #8 round-5 `Loudness` fix into the quickcheck helpers (same
latent bug, separate file tree): the `loudness` helper filled its four
`f32` fields via `f32::arbitrary`, which yields NaN / ±inf. `Loudness`
derives serde; JSON serializes a non-finite `f32` as `null`, which fails
to deserialize back into `f32` — quickcheck-generated `Loudness` values
did not reliably survive a serde round-trip.

Fix: the `loudness` helper generates finite fields — a bounded `i32`
(`rem_euclid` → non-negative, shifted into `[-10_000_000, 10_000_000]`)
mapped to `f32 / 100.0`, finite in [-100_000, 100_000]. Extended
`arbitrary_values_survive_serde_round_trip` to round-trip `Loudness` too.

(The `HashSet` → `BTreeSet` half of the round-5 finding is PR #8-only —
`feat/quickcheck`'s tests run under the `quickcheck` feature, which
implies `std`; the no-std-capable `arbitrary` tests are inherited from
`feat/arbitrary` already fixed there.)

Also rebased onto the round-5-fixed `feat/arbitrary` to keep the stack
coherent.

Verified (RUSTFLAGS=-Dwarnings cargo +1.95.0):
  test -p mediaframe --features quickcheck,serde --lib    174 pass
  hack --each-feature check -p mediaframe                 ✓
  check -p mediaframe --all-features                      ✓
uqio added 2 commits May 22, 2026 14:16
…are it honestly

Codex round-6 finding: the `arbitrary` feature was declared
`["alloc", "dep:arbitrary"]` with comments describing an alloc-tier /
no-std setup — but the `arbitrary` crate (1.4.2) is genuinely std-based:
it exposes no `alloc`/`no_std` feature and its crate root uses `std`
directly. So `--no-default-features --features arbitrary` only kept
*mediaframe's own* crate root `#![no_std]` on a host target; a real
no-std target would still fail to compile the `arbitrary` dependency.
The advertised no-std support was illusory.

Fix: `arbitrary` is honestly std-only. The feature now declares
`["std", "dep:arbitrary"]` (std includes alloc, so the String/Vec/byte
generation is unaffected) and the comment states it serves host-side
fuzzing only, never an embedded target. No code change to the impls —
they already used `std`-aliased paths that now resolve to real `std`.

Dropped the now-false "no-std" rationale from two round-5 test comments
(`BTreeSet`, `String::from`) — both remain correct under `std`, only the
justification needed correcting.

Verified (RUSTFLAGS=-Dwarnings cargo +1.95.0):
  check -p mediaframe --no-default-features --features arbitrary   ✓
  test  -p mediaframe --features arbitrary,serde --lib             173 pass
  hack --each-feature check -p mediaframe                          ✓
  check -p mediaframe --all-features                               ✓
The round-6 commit fixed the `arbitrary` *feature* comment but missed the
`arbitrary` *dependency* comment, which still claimed the crate is
"no-alloc-capable by default". It is not — `arbitrary` 1.x is std-based.
Comment now matches reality; no code / manifest-value change.
@uqio uqio force-pushed the feat/arbitrary branch from 2f14d0d to c78efbe Compare May 22, 2026 02:18
uqio added a commit that referenced this pull request May 22, 2026
…d enums + all 12 SampleFormat slugs

Codex round-2 findings on PR #9 (mirror of the PR #8 fixes):

1. [medium] `matrix`/`primaries`/`transfer`/`pixel_format` helpers still
   decoded a uniform `u32` through `from_u32`. Their named codes cluster
   in low integers (Matrix 0..=17, Primaries 0..=22, Transfer 0..=18,
   PixelFormat 270 codes over 0..=947), so QuickCheck `u32::arbitrary`
   reaches a named variant essentially never — properties over colour /
   pixel descriptors ran almost entirely on the `Unknown(_)` path.

   Fix: new `qc_via_code_weighted_range!(fn => Ty, max_named = N)` macro
   — 50/50 between an in-range code (`u32 % (max+1)`, since `quickcheck::
   Gen` has no `int_in_range`) and a full-range `u32`. `track_disposition`
   stays on `arb_via_code!` — bitflags, every bit pattern meaningful.

2. [medium] The bespoke `sample_format` helper's named arm only sampled
   6 of 12 slugs; `dbl`/`u8p`/`s32p`/`dblp`/`s64`/`s64p` were reachable
   only via the rare numeric arm. `SLUGS` now lists all 12.

Tests:
- `reachability_sample_format_all_named_plus_arms` — strengthened to
  assert all 12 named variants observed + `Unknown(_)` + `Other(_)`.
- `reachability_range_weighted_enums_hit_named_codes` (new) — Matrix ≥10,
  Primaries ≥8, Transfer ≥10, PixelFormat ≥40 distinct named codes.

Matrix (RUSTFLAGS=-Dwarnings cargo +1.95.0):
  test -p mediaframe --features quickcheck --lib              163 pass
  check -p mediaframe / --features arbitrary,quickcheck / --all-features  ✓
uqio added a commit that referenced this pull request May 22, 2026
…ly values

Codex round-4 finding on PR #9: `quickcheck_helpers::composite::tags`
populated 12 of the 13 `Tags` fields but never `language`, so every
generated `Tags` had `language == None` — property tests could not catch
language-tag loss / corruption.

Fix: the `tags` helper now generates `language` too, 50/50 between `None`
and `Some(<arbitrary string>)`.

Also carried the PR #8 round-4 canonical-value fix into the quickcheck
helpers (same latent bug, separate file tree):
  - `qc_open_string_enum!` and the bespoke `sample_format` helper built
    `Other(SmolStr::from(s))` directly; a string equal to a named slug
    produced a malformed `Other("h264")` that serde canonicalises to
    `H264` — not round-trippable.
  - Both now route string construction through `FromStr` (the
    canonicalising constructor), so every generated value is canonical.

Tests:
- `reachability_tags_language_hits_none_and_some` — asserts both `None`
  and `Some(_)` are observed.
- `arbitrary_values_survive_serde_round_trip` (gated on `serde`) — 4096
  rounds, `SampleFormat` + `VideoCodec` round-trip through `serde_json`.

This branch was also rebased onto the updated `feat/arbitrary` (which
itself rebased onto the serde-fixed `feat/serde-support`) — keeps the
stack coherent so reviews see the real combined state.

Matrix (RUSTFLAGS=-Dwarnings cargo +1.95.0):
  test -p mediaframe --features quickcheck --lib              164 pass
  test -p mediaframe --features quickcheck,serde --lib        174 pass
  check -p mediaframe --all-features                          ✓
uqio added a commit that referenced this pull request May 22, 2026
…de round-trip coverage

Carries the PR #8 round-5 `Loudness` fix into the quickcheck helpers (same
latent bug, separate file tree): the `loudness` helper filled its four
`f32` fields via `f32::arbitrary`, which yields NaN / ±inf. `Loudness`
derives serde; JSON serializes a non-finite `f32` as `null`, which fails
to deserialize back into `f32` — quickcheck-generated `Loudness` values
did not reliably survive a serde round-trip.

Fix: the `loudness` helper generates finite fields — a bounded `i32`
(`rem_euclid` → non-negative, shifted into `[-10_000_000, 10_000_000]`)
mapped to `f32 / 100.0`, finite in [-100_000, 100_000]. Extended
`arbitrary_values_survive_serde_round_trip` to round-trip `Loudness` too.

(The `HashSet` → `BTreeSet` half of the round-5 finding is PR #8-only —
`feat/quickcheck`'s tests run under the `quickcheck` feature, which
implies `std`; the no-std-capable `arbitrary` tests are inherited from
`feat/arbitrary` already fixed there.)

Also rebased onto the round-5-fixed `feat/arbitrary` to keep the stack
coherent.

Verified (RUSTFLAGS=-Dwarnings cargo +1.95.0):
  test -p mediaframe --features quickcheck,serde --lib    174 pass
  hack --each-feature check -p mediaframe                 ✓
  check -p mediaframe --all-features                      ✓
Codex round-7 finding: the `arbitrary::Arbitrary` impl for `Tags` started
from `Tags::new()` and populated 12 of the 13 fields but never `language`
(`Option<SmolStr>`, serialized as buffa field 13). Every arbitrary-driven
`Tags` had `language == None`, leaving that wire/storage path outside the
fuzzing surface.

(The quickcheck-helper `Tags` generator was already fixed for this in
round 4; this is the same gap in the separate `arbitrary_impls` tree.)

Fix: the `Tags` impl now generates `language` — an arbitrary
`Option<String>` mapped to `Option<SmolStr>` through `.maybe_language(...)`,
covering both `None` and `Some(_)`.

New test `reachability_tags_language_hits_none_and_some` asserts both
states are observed.

Verified (RUSTFLAGS=-Dwarnings cargo +1.95.0):
  test -p mediaframe --features arbitrary --lib            164 pass
  test -p mediaframe --features arbitrary,serde --lib      174 pass
  check -p mediaframe --all-features                       ✓
uqio added a commit that referenced this pull request May 22, 2026
…d enums + all 12 SampleFormat slugs

Codex round-2 findings on PR #9 (mirror of the PR #8 fixes):

1. [medium] `matrix`/`primaries`/`transfer`/`pixel_format` helpers still
   decoded a uniform `u32` through `from_u32`. Their named codes cluster
   in low integers (Matrix 0..=17, Primaries 0..=22, Transfer 0..=18,
   PixelFormat 270 codes over 0..=947), so QuickCheck `u32::arbitrary`
   reaches a named variant essentially never — properties over colour /
   pixel descriptors ran almost entirely on the `Unknown(_)` path.

   Fix: new `qc_via_code_weighted_range!(fn => Ty, max_named = N)` macro
   — 50/50 between an in-range code (`u32 % (max+1)`, since `quickcheck::
   Gen` has no `int_in_range`) and a full-range `u32`. `track_disposition`
   stays on `arb_via_code!` — bitflags, every bit pattern meaningful.

2. [medium] The bespoke `sample_format` helper's named arm only sampled
   6 of 12 slugs; `dbl`/`u8p`/`s32p`/`dblp`/`s64`/`s64p` were reachable
   only via the rare numeric arm. `SLUGS` now lists all 12.

Tests:
- `reachability_sample_format_all_named_plus_arms` — strengthened to
  assert all 12 named variants observed + `Unknown(_)` + `Other(_)`.
- `reachability_range_weighted_enums_hit_named_codes` (new) — Matrix ≥10,
  Primaries ≥8, Transfer ≥10, PixelFormat ≥40 distinct named codes.

Matrix (RUSTFLAGS=-Dwarnings cargo +1.95.0):
  test -p mediaframe --features quickcheck --lib              163 pass
  check -p mediaframe / --features arbitrary,quickcheck / --all-features  ✓
uqio added a commit that referenced this pull request May 22, 2026
…ly values

Codex round-4 finding on PR #9: `quickcheck_helpers::composite::tags`
populated 12 of the 13 `Tags` fields but never `language`, so every
generated `Tags` had `language == None` — property tests could not catch
language-tag loss / corruption.

Fix: the `tags` helper now generates `language` too, 50/50 between `None`
and `Some(<arbitrary string>)`.

Also carried the PR #8 round-4 canonical-value fix into the quickcheck
helpers (same latent bug, separate file tree):
  - `qc_open_string_enum!` and the bespoke `sample_format` helper built
    `Other(SmolStr::from(s))` directly; a string equal to a named slug
    produced a malformed `Other("h264")` that serde canonicalises to
    `H264` — not round-trippable.
  - Both now route string construction through `FromStr` (the
    canonicalising constructor), so every generated value is canonical.

Tests:
- `reachability_tags_language_hits_none_and_some` — asserts both `None`
  and `Some(_)` are observed.
- `arbitrary_values_survive_serde_round_trip` (gated on `serde`) — 4096
  rounds, `SampleFormat` + `VideoCodec` round-trip through `serde_json`.

This branch was also rebased onto the updated `feat/arbitrary` (which
itself rebased onto the serde-fixed `feat/serde-support`) — keeps the
stack coherent so reviews see the real combined state.

Matrix (RUSTFLAGS=-Dwarnings cargo +1.95.0):
  test -p mediaframe --features quickcheck --lib              164 pass
  test -p mediaframe --features quickcheck,serde --lib        174 pass
  check -p mediaframe --all-features                          ✓
uqio added a commit that referenced this pull request May 22, 2026
…de round-trip coverage

Carries the PR #8 round-5 `Loudness` fix into the quickcheck helpers (same
latent bug, separate file tree): the `loudness` helper filled its four
`f32` fields via `f32::arbitrary`, which yields NaN / ±inf. `Loudness`
derives serde; JSON serializes a non-finite `f32` as `null`, which fails
to deserialize back into `f32` — quickcheck-generated `Loudness` values
did not reliably survive a serde round-trip.

Fix: the `loudness` helper generates finite fields — a bounded `i32`
(`rem_euclid` → non-negative, shifted into `[-10_000_000, 10_000_000]`)
mapped to `f32 / 100.0`, finite in [-100_000, 100_000]. Extended
`arbitrary_values_survive_serde_round_trip` to round-trip `Loudness` too.

(The `HashSet` → `BTreeSet` half of the round-5 finding is PR #8-only —
`feat/quickcheck`'s tests run under the `quickcheck` feature, which
implies `std`; the no-std-capable `arbitrary` tests are inherited from
`feat/arbitrary` already fixed there.)

Also rebased onto the round-5-fixed `feat/arbitrary` to keep the stack
coherent.

Verified (RUSTFLAGS=-Dwarnings cargo +1.95.0):
  test -p mediaframe --features quickcheck,serde --lib    174 pass
  hack --each-feature check -p mediaframe                 ✓
  check -p mediaframe --all-features                      ✓
uqio added 15 commits May 22, 2026 14:31
Codex round-8 finding: the round-7 `Tags.language` generator mapped an
arbitrary `Option<String>` straight to `Option<SmolStr>`, which permits
`Some("")`. buffa writes field 13 only for a non-empty language and
decodes empty input back to `None` — so a generated `Some("")` would not
survive a wire round-trip and the reachability test's `Some(_)` was not
equivalent to the serialized-present state it claimed to cover.

Fix: the generator now `.filter(|s| !s.is_empty())` before `.maybe_
language(...)`, so it yields only `None` or `Some(non-empty)`. The
reachability test is tightened to assert each generated `Some(s)` has
`!s.is_empty()` and to require a non-empty `Some` was observed.

Verified (RUSTFLAGS=-Dwarnings cargo +1.95.0):
  test -p mediaframe --features arbitrary,serde --lib    174 pass
Drops the previous byte-bridge approach in favor of native
`quickcheck::Arbitrary` impls via the `quickcheck-richderive` derive. Per-type
`#[quickcheck(with = "…")]` points at `fn(&mut Gen) -> Self` helpers that own
the generation strategy; the derive emits the actual `impl` blocks.

Feature wiring:
  quickcheck = ["std", "dep:quickcheck", "dep:quickcheck-richderive"]
Deps: quickcheck 1 + quickcheck-richderive 0.1, both optional.

`mediaframe::quickcheck_helpers/` module is `pub` (the derive's `with` paths
must be reachable from each type's definition site). Sub-modules:
  strings.rs   — cluster A (open string enums)
  coded.rs     — cluster B (closed coded enums + colour/frame structs)
  composite.rs — cluster C (audio composite + capture + lang)

Tests live in `mod.rs` (5 — invariant + roundtrip + smoke; mirror the
`arbitrary_impls/` test set but drive `quickcheck::Gen` directly).

The cluster files are stub-only here; the actual derives + helpers land in
three parallel follow-up commits, one per cluster, before the matrix is
re-verified.

The two features are now independent — `quickcheck` does NOT pull `arbitrary`.
…rive

Adds container-level `#[cfg_attr(feature = "quickcheck", derive(::quickcheck_richderive::Arbitrary), quickcheck(with = "…"))]` to every cluster-A open string enum, plus the matching `fn(&mut Gen) -> T` helper bodies in `quickcheck_helpers/strings.rs`. The derive emits the `impl quickcheck::Arbitrary` blocks; the helpers run a 50/50 split between round-tripping a curated slug through total `FromStr` and constructing `Other(SmolStr::from(<arbitrary String>))`.

Types covered (8) → helpers / curated-slug count:
  - codec::VideoCodec        → strings::video_codec            (6 slugs)
  - codec::AudioCodec        → strings::audio_codec            (6 slugs)
  - codec::SubtitleCodec     → strings::subtitle_codec         (6 slugs)
  - container::Format        → strings::container_format       (6 slugs)
  - subtitle::Format         → strings::subtitle_format        (6 slugs)
  - audio::ChannelLayout     → strings::channel_layout         (6 slugs)
  - audio::SampleFormat      → strings::sample_format          (6 slugs)
  - audio::ContainerFormat   → strings::audio_container_format (6 slugs)

Slug picks mirror `arbitrary_impls/strings.rs` for consistency. A small `qc_open_string_enum!` macro inside `strings.rs` factors the shared body; per-type `pub(crate) fn` surface is preserved so each `with = "…"` path resolves.
…via quickcheck-richderive

Adds the native `quickcheck::Arbitrary` derive surface for the 24
descriptor types in cluster B, via container-level
`#[cfg_attr(feature = "quickcheck", derive(::quickcheck_richderive::Arbitrary),
quickcheck(with = "crate::quickcheck_helpers::coded::<snake>"))]`.

Helper bodies live in `mediaframe/src/quickcheck_helpers/coded.rs` (the
former comment-only stub).

Strategy per group:
- 13 coded enums via the `arb_via_code!` macro — each helper is
  `T::from_u32(u32::arbitrary(g))`, total coverage of
  `Unknown(u32)` / `Reserved(_)` arms in one line per type.
- 11 structs build via the type's public `new(...)` ctor with each
  field via `<FieldT>::arbitrary(g)`. Cascade structs (`Info`,
  `MasteringDisplay`, `HdrStaticMetadata`, `FrameRate`) call sister
  helpers within the file. `Rational` / `SampleAspectRatio` use
  `core::num::NonZeroU32::arbitrary(g)` (quickcheck 1.x has the
  impl), preserving the non-zero denom invariant by construction.
  `HdrStaticMetadata` rolls each `Option<…>` via a coin flip rather
  than going through a not-yet-in-scope `Option<T>: Arbitrary`.

Coded enums (13):
  color::{Matrix, Primaries, Transfer, DynamicRange,
          ChromaLocation, DcpTargetGamut},
  pixel_format::PixelFormat,
  frame::{Rotation, FieldOrder, StereoMode},
  subtitle::TrackOrigin,
  audio::BitRateMode,
  disposition::TrackDisposition (bitflags struct, `from_u32` =
  `from_bits_retain` — unknown bits round-trip).

Structs (11):
  color::{Info, ContentLightLevel, ChromaCoord, MasteringDisplay,
          HdrStaticMetadata, DolbyVisionConfig},
  frame::{Dimensions, Rect, Rational, SampleAspectRatio, FrameRate}.

Default builds are unaffected (everything is gated by
`cfg(feature = "quickcheck")`). Under `--features quickcheck` the
type-by-type surface compiles clean; the lib-test failures that
remain are all in cluster A/C types referenced by the shared smoke
test in `quickcheck_helpers/mod.rs` and will resolve when those
subagents land their helpers.
…ickcheck-richderive

Wires native quickcheck::Arbitrary on the 7 cluster-C composite types via
quickcheck-richderive's container-level `#[quickcheck(with = "...")]` →
`crate::quickcheck_helpers::composite::<name>` helper functions.

Types:
  - audio::Loudness     — plain `new(f32, f32, f32, f32)`; fields pass through.
  - audio::Fingerprint  — `try_new(algo, value)`; algo non-empty fallback "x"
                           so `expect` is sound; empty `value` is allowed.
  - audio::CoverArt     — `try_new(mime, data)`; mime fallback
                           "application/octet-stream", data fallback `[0u8]`
                           so both non-empty preconditions hold.
  - audio::Tags         — `new()` + 12 builder setters (title/artist/
                           album_artist/album/composer/genre/comment +
                           year/track_number/track_total/disc_number/
                           disc_total); same representative subset as the
                           arbitrary_impls cluster.
  - capture::Device     — `new()` + with_make / with_model; empty SmolStr
                           means absent so arbitrary strings pass through.
  - capture::GeoLocation — `try_new(lat, lon, altitude)`; coordinates built
                            in-range via `i32.rem_euclid(...) - shift`
                            (quickcheck::Gen has no int_in_range);
                            altitude 50/50 None vs in-band finite f32.
  - lang::Language      — `from_bcp47(...)` with a curated 12-tag table
                           via `g.choose(...)` — und sentinel + language-only
                           + language+region + language+script+region.

Validation strategy is identical to the arbitrary_impls Cluster C: build
VALID inputs FIRST, then call `try_new(...).expect(...)`. Never feed
attacker-controlled fuzz into a fallible ctor + `unwrap`.

Default build unaffected — everything is gated by `feature = "quickcheck"`.
…ary`)

richderive 0.2 reshuffles the attribute surface to mirror serde: `with` is now
a module pair (`mod::arbitrary` + `mod::shrink`); the single-fn override moved
to `arbitrary`. All 39 descriptor types use the gen-half-only override, so the
mechanical sweep is `quickcheck(with = "…")` → `quickcheck(arbitrary = "…")`
across 17 type-def files + 3 doc-comment references in `quickcheck_helpers/*`.

The helpers in `quickcheck_helpers/{strings,coded,composite}.rs` are unchanged
— they're still single `fn(&mut Gen) -> T` functions, which is the contract
for the new `arbitrary = "fn"` attribute.

Dep flipped to a git ref on the branch pending the 0.2.0 publish — once that
lands, this becomes `quickcheck-richderive = { version = "0.2", optional = true }`
in a follow-up.

Matrix (RUSTFLAGS=-Dwarnings):
  cargo fmt -p mediaframe -- --check                              ✓
  cargo test -p mediaframe --features quickcheck --lib            159 pass
  cargo check -p mediaframe --all-features                        ✓
…enums + bespoke SampleFormat 3-way

The first-pass quickcheck helpers all routed coded enums through
`<Ty>::from_u32(u32::arbitrary(g))`, which is fine for large enums
(matrix / primaries / transfer / pixel format) but collapses small ones:

  - `BitRateMode` (3 named codes 0/1/2, no `Unknown` arm) hits a named
    variant only when the random u32 happens to be 0/1/2 — ~3-in-4 G —
    so `Vbr` / `Abr` were effectively unreachable.
  - `TrackOrigin` (same shape, 3 named, no `Unknown`).
  - `Rotation` / `FieldOrder` / `StereoMode` / `DynamicRange` /
    `ChromaLocation` / `DcpTargetGamut` — closed + `Unknown(u32)` arm,
    < 10 named variants; uniform u32 decode virtually never names them.

`SampleFormat` was also misclassified: routed through the 2-arm
curated-slug-vs-`Other` macro (`qc_open_string_enum!`) even though it
has a *third* `Unknown(u32)` arm, so the `Unknown` arm was unreachable.

Two new internal helper macros next to `arb_via_code!`:

  - `qc_via_named_variants!` — strictly-closed enums; pick uniformly
    from a `const NAMED: &[Ty]` via `g.choose`.
  - `qc_via_code_weighted!` — closed + `Unknown(u32)`; 50/50 between
    `choose` over named and `from_u32(u32::arbitrary(g))` (so the
    `Unknown` arm stays reachable).

`arb_via_code!` stays for the large enums (matrix / primaries /
transfer / pixel format / disposition bitflags) where uniform-u32
decode hits named code points densely.

`SampleFormat` gets a bespoke 3-way: `g.choose(&[0u8, 1, 2])` over a
sentinel arm-tag slice (quickcheck::Gen has no `int_in_range`) →
named slug via `FromStr` / `from_u32(u32::arbitrary(g))` /
`Other(SmolStr)` from an arbitrary String.

New reachability tests in `quickcheck_helpers::tests` (use the existing
`drive(size, rounds, body)` harness):

  - reachability_small_coded_enums_hit_all_named
  - reachability_weighted_rotation_hits_named_and_unknown
  - reachability_sample_format_reaches_all_three_arms

All 162 lib tests pass under `--features quickcheck`; default /
`quickcheck,serde` / `arbitrary,quickcheck` / `--all-features` all
clean under `RUSTFLAGS=-Dwarnings`.
…d enums + all 12 SampleFormat slugs

Codex round-2 findings on PR #9 (mirror of the PR #8 fixes):

1. [medium] `matrix`/`primaries`/`transfer`/`pixel_format` helpers still
   decoded a uniform `u32` through `from_u32`. Their named codes cluster
   in low integers (Matrix 0..=17, Primaries 0..=22, Transfer 0..=18,
   PixelFormat 270 codes over 0..=947), so QuickCheck `u32::arbitrary`
   reaches a named variant essentially never — properties over colour /
   pixel descriptors ran almost entirely on the `Unknown(_)` path.

   Fix: new `qc_via_code_weighted_range!(fn => Ty, max_named = N)` macro
   — 50/50 between an in-range code (`u32 % (max+1)`, since `quickcheck::
   Gen` has no `int_in_range`) and a full-range `u32`. `track_disposition`
   stays on `arb_via_code!` — bitflags, every bit pattern meaningful.

2. [medium] The bespoke `sample_format` helper's named arm only sampled
   6 of 12 slugs; `dbl`/`u8p`/`s32p`/`dblp`/`s64`/`s64p` were reachable
   only via the rare numeric arm. `SLUGS` now lists all 12.

Tests:
- `reachability_sample_format_all_named_plus_arms` — strengthened to
  assert all 12 named variants observed + `Unknown(_)` + `Other(_)`.
- `reachability_range_weighted_enums_hit_named_codes` (new) — Matrix ≥10,
  Primaries ≥8, Transfer ≥10, PixelFormat ≥40 distinct named codes.

Matrix (RUSTFLAGS=-Dwarnings cargo +1.95.0):
  test -p mediaframe --features quickcheck --lib              163 pass
  check -p mediaframe / --features arbitrary,quickcheck / --all-features  ✓
…ability

Codex round-3 finding: `Matrix` has a mediaframe-domain variant `Bt601`
at `DOMAIN_EXT_BASE` (`0x8000_0000`) — far outside the H.273 `0..=17`
range the round-2 `qc_via_code_weighted_range!` macro biases toward. So
`Bt601` was only reachable via the full-`u32` fallback (~1-in-8.6-billion),
and the reachability test missed it entirely (it only counted codes ≤ 17).

Fix: `Matrix` is now hand-written (out of the range macro) — a 3-way pick:
in-range H.273 code (`u32 % 18`) / the domain-ext code (`DOMAIN_EXT_BASE`,
decodes to `Bt601`) / full-range `u32` (broad `Unknown` exercise).
`Matrix` was the only coded enum with a domain variant — Primaries /
Transfer / PixelFormat have none, so they stay on the range macro.

Test `reachability_range_weighted_enums_hit_named_codes` strengthened to
assert `matrix` observed `DOMAIN_EXT_BASE` specifically.

163 tests pass; `--all-features` clean.
…ly values

Codex round-4 finding on PR #9: `quickcheck_helpers::composite::tags`
populated 12 of the 13 `Tags` fields but never `language`, so every
generated `Tags` had `language == None` — property tests could not catch
language-tag loss / corruption.

Fix: the `tags` helper now generates `language` too, 50/50 between `None`
and `Some(<arbitrary string>)`.

Also carried the PR #8 round-4 canonical-value fix into the quickcheck
helpers (same latent bug, separate file tree):
  - `qc_open_string_enum!` and the bespoke `sample_format` helper built
    `Other(SmolStr::from(s))` directly; a string equal to a named slug
    produced a malformed `Other("h264")` that serde canonicalises to
    `H264` — not round-trippable.
  - Both now route string construction through `FromStr` (the
    canonicalising constructor), so every generated value is canonical.

Tests:
- `reachability_tags_language_hits_none_and_some` — asserts both `None`
  and `Some(_)` are observed.
- `arbitrary_values_survive_serde_round_trip` (gated on `serde`) — 4096
  rounds, `SampleFormat` + `VideoCodec` round-trip through `serde_json`.

This branch was also rebased onto the updated `feat/arbitrary` (which
itself rebased onto the serde-fixed `feat/serde-support`) — keeps the
stack coherent so reviews see the real combined state.

Matrix (RUSTFLAGS=-Dwarnings cargo +1.95.0):
  test -p mediaframe --features quickcheck --lib              164 pass
  test -p mediaframe --features quickcheck,serde --lib        174 pass
  check -p mediaframe --all-features                          ✓
…de round-trip coverage

Carries the PR #8 round-5 `Loudness` fix into the quickcheck helpers (same
latent bug, separate file tree): the `loudness` helper filled its four
`f32` fields via `f32::arbitrary`, which yields NaN / ±inf. `Loudness`
derives serde; JSON serializes a non-finite `f32` as `null`, which fails
to deserialize back into `f32` — quickcheck-generated `Loudness` values
did not reliably survive a serde round-trip.

Fix: the `loudness` helper generates finite fields — a bounded `i32`
(`rem_euclid` → non-negative, shifted into `[-10_000_000, 10_000_000]`)
mapped to `f32 / 100.0`, finite in [-100_000, 100_000]. Extended
`arbitrary_values_survive_serde_round_trip` to round-trip `Loudness` too.

(The `HashSet` → `BTreeSet` half of the round-5 finding is PR #8-only —
`feat/quickcheck`'s tests run under the `quickcheck` feature, which
implies `std`; the no-std-capable `arbitrary` tests are inherited from
`feat/arbitrary` already fixed there.)

Also rebased onto the round-5-fixed `feat/arbitrary` to keep the stack
coherent.

Verified (RUSTFLAGS=-Dwarnings cargo +1.95.0):
  test -p mediaframe --features quickcheck,serde --lib    174 pass
  hack --each-feature check -p mediaframe                 ✓
  check -p mediaframe --all-features                      ✓
Carries the PR #8 round-8 fix into the quickcheck `tags` helper (same
latent bug, separate file tree): the helper produced
`Some(SmolStr::from(<arbitrary string>))`, which permits `Some("")`.
buffa writes field 13 only for a non-empty language and decodes empty
back to `None`, so a generated `Some("")` would not survive a wire
round-trip.

Fix: the helper now yields `Some` only for a non-empty string
(`(!s.is_empty()).then(...)`), else `None`. The reachability test is
tightened to assert each generated `Some(s)` has `!s.is_empty()`.

Verified: cargo +1.95.0 test -p mediaframe --features quickcheck,serde
--lib — 174 pass; --all-features clean.
…guage>` for `language`

Resolves the codex `Some(0)` finding fundamentally — by fixing the `Tags`
type rather than constraining the generators — and closes the `language`
`TODO(lang)`.

Numeric fields (`year` / `track_number` / `track_total` / `disc_number` /
`disc_total`): `Option<u16>` → bare `u16` with `0` = absent. The buffa
wire codec already proto-zero-elided these fields (`v != 0` guards), so
the old `Option` was a type/codec contradiction: `Some(0)` encoded to
nothing and decoded back to `None`. Bare `u16` makes the type agree with
the wire — `0` is unambiguously "absent" (track/disc numbers are 1-based;
no real release tags year 0). Matches the `SmolStr` empty-is-absent
convention the string fields already use. Drops the now-needless
`maybe_* / update_* / clear_*` numeric vocabulary; `set_year(0)` clears.

`language`: `Option<SmolStr>` placeholder → `Option<crate::lang::Language>`
(the `TODO(lang)`). Parsed BCP-47 instead of a raw string; `None` = no
tag, `Some(Language)` = a parsed tag (possibly `und`).

Why not `jiff` for `year`: `jiff` is std-only (mediaframe is no-std-
capable) and has no plain calendar-year type (smallest is `civil::Date`,
which would force a fake month/day). A release-year metadata tag is just
a number — bare `u16` is the honest representation.

Touched:
- `audio/tags.rs` — struct, `new`, accessors (`year() -> u16`,
  `language() -> Option<Language>`), builders/setters; tests rewritten.
- `buffa.rs` — `Tags` codec: numeric encode/decode now bare `u16`;
  `language` encodes `Language::to_bcp47()`, decodes via `from_bcp47`
  (lenient `unwrap_or_default`, mirroring the standalone `Language`
  codec). Round-trip test gains a `0`-sentinel case.
- `arbitrary_impls/composite.rs` + `quickcheck_helpers/composite.rs` —
  `Tags` generators emit bare `u16` (incl. `0`, now wire-canonical) and
  `Option<Language>` from the curated BCP-47 generator. The codex
  `Some(0)` / `Some("")` findings are gone — there is no `Option<u16>`
  and no empty-string language.
- reachability tests updated for the new shape.

Matrix (RUSTFLAGS=-Dwarnings cargo +1.95.0):
  test  -p mediaframe --all-features --lib            875 pass
  check -p mediaframe {default, no-default+arbitrary, no-default+serde,
        quickcheck+serde+buffa}                       ✓
  hack --each-feature check -p mediaframe             ✓
@uqio uqio changed the title feat: optional arbitrary support for the descriptor vocabulary feat: optional arbitrary + quickcheck support for the descriptor vocabulary May 22, 2026
uqio added 2 commits May 22, 2026 15:26
… (golden rule §9)

Applies the serde golden rule (rust-type-conventions §9): an absent
`Option` field serializes to an *omitted key*, never an explicit `null`.

- `Tags.language`, `HdrStaticMetadata.{mastering, content_light}` —
  `#[cfg_attr(feature = "serde", serde(skip_serializing_if =
  "Option::is_none"))]`. `HdrStaticMetadata` additionally gains
  container-level `serde(default)` (whole struct has a meaningful
  all-`None` `Default`), so an omitted field still deserializes —
  `skip_serializing_if` + `serde(default)` are a pair.
- `GeoLocation` has a hand-written `Serialize`; it unconditionally wrote
  `altitude` (emitting `"altitude":null` when absent). Now skips the
  field when `None` via `SerializeStruct::skip_field`. Its deserialize
  `Shadow` already had `#[serde(default)]` on `altitude`.

This also closes the round-9/10 concern about old `Option<u16>`-shape
`null` payloads: with the no-`null` rule there is no `null` to migrate —
and the `Tags` numerics are bare `u16` now anyway.

`Tags.language` doc notes the `Language` normalization (variant /
extension subtags fold to the language/script/region core) — lossless
for real embedded-metadata language tags.

New test `absent_option_fields_are_omitted_not_null` asserts the emitted
JSON for a default `Tags` / `HdrStaticMetadata` / `GeoLocation` contains
no `null` and no absent-field key (`HdrStaticMetadata::default()` → `{}`).

Matrix (RUSTFLAGS=-Dwarnings cargo +1.95.0):
  test -p mediaframe --all-features --lib   876 pass
  check -p mediaframe --all-features        ✓
…ec.rs`

CI `xtask-check` failed: `mediaframe/src/codec.rs is stale vs the
vendored FFmpeg table`.

`codec.rs` is generated by `cargo xtask gen-codec`. The `quickcheck`
feature derives `quickcheck_richderive::Arbitrary` on the three codec
enums via a `#[cfg_attr(feature = "quickcheck", derive(…),
quickcheck(arbitrary = "…"))]` on each type — but that attribute was
hand-added directly to `codec.rs`, which the generator does not emit. So
the on-disk file diverged from `gen-codec`'s output and the byte-for-byte
freshness diff in `xtask check` failed.

Fix: `build_codec_enum` now emits the `quickcheck` `cfg_attr` itself,
routing each enum to its `quickcheck_helpers::strings::{video,audio,
subtitle}_codec` helper. Re-running `gen-codec` produces a `codec.rs`
byte-identical to the current file (the hand-added attr matched), so
`xtask check` is green and future regenerations preserve it.

(`serde` / `arbitrary` need no codegen change — codec support for those
is macro-based in `serde_impls.rs` / `arbitrary_impls/`, not a derive on
the type, so it never touched `codec.rs`.)

Verified: `cargo xtask gen-codec` → no `codec.rs` diff; `cargo xtask
check` → "codec enums and FFmpeg n8.1 are in two-way sync".
@al8n al8n merged commit 704072e into main May 22, 2026
42 checks passed
@al8n al8n deleted the feat/arbitrary branch May 22, 2026 03:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants