Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 50 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,56 @@ All notable changes to this project will be documented in this file.
The format follows [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and the project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.2.1] - 2026-05-09

Closes [#3] (delivered via PR [#9]). Additive release: cargo-compatible
with `whispercpp = "0.2"` consumers; no changes to `whispercpp-sys` (the
`-sys` crate stays at `0.2.0`).

### Added

- `Segments<'a>` and `Tokens<'state>` iterator types in `whispercpp::state`,
plus `State::segments_iter()` and `Segment::tokens_iter()` constructors.
Both implement `Iterator + ExactSizeIterator + DoubleEndedIterator +
FusedIterator`.
- `IntoIterator` impls so the standard collection idioms work:
- `for seg in &state { ... }` — `IntoIterator for &State`.
- `for tok in seg { ... }` — `IntoIterator for Segment<'a>` (by-value;
`Segment` is `Copy`, so the consumption is cheap).
- `for tok in &seg { ... }` — `IntoIterator for &Segment<'a>`.
- `Tokens<'state>` owns a `Copy` of the `Segment` so adapter chains like
`state.segments_iter().flat_map(|s| s.tokens_iter())` compile (the inner
iterator does not borrow the closure-local `Segment` value).

### Performance

- `Segments::next` and `Tokens::next` inline the construction / pointer
projection rather than calling back through `State::segment(i)` /
`Segment::token(j)`, saving one `n_segments()` / `n_tokens()` FFI call
per yielded item. The iterator's captured `end` plus the `&self` borrow
chain (`State::full` requires `&mut self`) make the per-call bounds-check
redundant. Dominant on `Tokens` — typical states have hundreds to low
thousands of tokens per `State::full` invocation.

### Internal

- New test fixture (`PoisonedStateFixture`) and `Context::dangling_for_test`
(`unsafe fn`) so iterator behaviour can be exercised without a real model
file. The fixture's RAII guard holds a `ManuallyDrop<Arc<Context>>` clone
alongside the State, keeping the `Context` refcount permanently above
zero so `whisper_free` is never called on the dangling pointer — the
invariant is destructor-managed and survives test panics.
- `safety_audit.rs` matrix gains a `segments_iter` / `tokens_iter` row
walking all ten safety axes; inlined-FFI projection rationale documented
under axis #1 (throw).

### Fixed

No bug fixes — this is a feature-only release.

[#3]: https://github.com/Findit-AI/whispercpp/issues/3
[#9]: https://github.com/Findit-AI/whispercpp/pull/9

Comment thread
uqio marked this conversation as resolved.
## [0.2.0] - 2026-05-09

Two feature streams (DTW timestamps in [#7], the issue-#2 accessors in [#8])
Expand Down
2 changes: 1 addition & 1 deletion whispercpp/Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "whispercpp"
version = "0.2.0"
version = "0.2.1"
edition.workspace = true
rust-version.workspace = true
license.workspace = true
Expand Down
42 changes: 42 additions & 0 deletions whispercpp/src/context.rs
Original file line number Diff line number Diff line change
Expand Up @@ -993,6 +993,48 @@ impl Context {
self.ptr.as_ptr()
}

/// Internal test-only constructor. Builds a `Context`
/// whose `ptr` is `NonNull::dangling` — useful for unit
/// tests that exercise Rust-side logic (e.g. iterator
/// drivers on a poisoned `State`) without needing a real
/// model file.
///
/// # Safety
///
/// The returned `Context`'s [`Drop`] impl
/// unconditionally invokes `whisper_free(self.ptr.as_ptr())`,
/// which would dereference `NonNull::dangling()` — UB. The
/// caller MUST guarantee that the returned value (or any
/// `Arc<Context>` derived from it) is `core::mem::forget`'d
/// before its drop runs. The unsafety is on this
/// constructor (not the resulting `Context`) so the
/// precondition is enforced at every call site by the
/// borrow checker via the `unsafe` block.
///
/// `unsafe fn` is preferred over returning
/// `ManuallyDrop<Self>` because production code (and the
/// `State::poisoned_for_test` helper that consumes this)
/// expects an `Arc<Context>`, not an
/// `Arc<ManuallyDrop<Context>>`. The two are different
/// types — `ManuallyDrop` IS sufficient to suppress the
/// inner `Context`'s drop (its destructor is a no-op),
/// so the UB itself would be prevented; the issue is
/// purely API-shape compatibility with the production
/// `Arc<Context>` field on `State`. The
/// `PoisonedStateFixture` test guard sidesteps this by
/// holding a separate `ManuallyDrop<Arc<Context>>` clone
/// alongside the State — the `unsafe fn` here is the
/// raw-pointer producer, the guard handles the leak
/// invariant via composition.
#[cfg(test)]
pub(crate) unsafe fn dangling_for_test() -> Self {
Self {
ptr: NonNull::<sys::whisper_context>::dangling(),
lost: AtomicBool::new(false),
full_lock: Mutex::new(()),
}
}

/// Internal: mark this Context as poisoned because a
/// `State::full` on one of its States returned a
/// `WhisperError::StateLost`. Subsequent
Expand Down
2 changes: 1 addition & 1 deletion whispercpp/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ pub use params::{
MAX_BEAM_SIZE, MAX_INITIAL_TS_S, MAX_N_THREADS, MAX_TEMPERATURE, MIN_TEMPERATURE_INC, Params,
SamplingStrategy,
};
pub use state::{Segment, State, Token};
pub use state::{Segment, Segments, State, Token, Tokens};

/// Linked libwhisper version string (e.g. `"1.8.4"`).
///
Expand Down
53 changes: 53 additions & 0 deletions whispercpp/src/safety_audit.rs
Original file line number Diff line number Diff line change
Expand Up @@ -382,6 +382,59 @@
//! during inference). ✓
//! 10. model-bound: state-bound (correct). ✓
//!
//! ### `segments_iter(&self)` / `Segment::tokens_iter(&self)`
//! 1. throw: `Segments::next` and `Tokens::next` inline
//! the pointer-projection (private-field access on
//! `State.ptr` / `Segment.state`) instead of calling
//! back through `State::segment(i)` / `Segment::token(j)`,
//! which would re-call `n_segments()` / `n_tokens()`
//! (FFI) for their bounds check. The captured `end`
//! field plus the `&self` borrow chain make the
//! bounds-check redundant: `State::full` requires
//! `&mut self`, so the count cannot change while any
//! iterator borrow is alive. The inlined unsafe FFI
//! is `whisper_full_get_token_data_from_state` —
//! pure C accessor, no allocation, no throw. ✓
//! 2. sync: `State` is `!Sync`. `&self` on the iterator
//! permits multiple iterators alive simultaneously,
//! which is sound because the underlying buffers are
//! immutable for the borrow's duration (`State::full`
//! requires `&mut self`, ruled out by the borrow
//! checker while any `&self` iterator exists). ✓
//! 3. alloc: iterator state is two i32s + a borrow; no
//! Rust-side alloc. ✓
//! 4. lifetime: `Segments<'a>` ties yielded `Segment<'a>`
//! to the source `&'a State`. `Tokens<'state>` owns a
//! copied `Segment<'state>` (which is `Copy`), so
//! adapter composition like
//! `state.segments_iter().flat_map(|s|
//! s.tokens_iter())` typechecks (the iterator does
//! not borrow a closure-local `Segment`). The
//! `'state` lifetime still ties yielded item pointer
//! projections to the parent `State`. Yielded `Token`
//! is value-typed (owned snapshot) so has no further
//! lifetime constraint. ✓
//! 5. linkage: no new FFI symbols. ✓
//! 6. sentinels: `next` and `end` index counters;
//! bounded at construction by `n_segments()` /
//! `n_tokens()`. `next_back` (DoubleEndedIterator)
//! decrements `end`; the `next < end` guard at the
//! top of every direction's call rejects the
//! converged-cursor case. ✓
//! 7. log pollution: no log path. ✓
//! 8. error bounds: iterator yields `Option<Item>`; no
//! error variant. ✓
//! 9. race: `!Sync` rules out concurrent `&self` from
//! two threads. ✓
//! 10. model-bound: yields whatever `State::full`
//! produced; bound to the loaded model implicitly via
//! the parent state. ✓
//!
//! `IntoIterator` impls (`for &State`, `for Segment`,
//! `for &Segment`) delegate to the existing
//! `segments_iter` / `tokens_iter` constructors, so they
//! inherit the same axis coverage — no separate row.
//!
//! ### `full(&mut self, ...)`
//! Pre-existing FFI surface; the audit here is on the
//! preflight additions:
Expand Down
Loading
Loading