fix(asr): harden v3 multilingual chunk seams#604
Draft
vdt4534 wants to merge 1 commit into
Draft
Conversation
f7f58f5 to
bfa14a1
Compare
Member
|
i actually overhauled this change . #596 |
0af1b54 to
01f2b62
Compare
Author
|
Updated this draft after your #596 overhaul. The previous version of this PR is superseded; the current branch is now based on I also added a fresh note on #594 with the newer fixture results and timing numbers. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This updates the existing stacked draft after the #596 overhaul. I retested the current #596 head (
bfa14a17) with a larger fixture set. The no-mel direction still looks right and keeps the fast path, but a few longer/more varied fixtures still drift into English at chunk seams.This patch keeps the same basic direction and speed profile, then adds a v3-only acoustic boundary/warmup path for no-mel batch chunks.
It does not use language hints, vocabulary filtering, language-specific token rules, or a second decode.
What changed
melChunkContext == false, choose chunk starts from local acoustic boundaries instead of a fixed grid only.Why
On the current #596 branch, the original
notes_1408_clean.wavrepro is fixed, and speed is much better than the old pinned baseline. The expanded fixtures showed remaining drift:gouvernements and entreprises...départemental, the camions...the degree of pénétration...and Montepulciano... Chypre and SamosThe failures do not look French-specific in mechanism. They look like seam/decoder-context issues where v3's multilingual prior falls into English after a bad boundary. The fix only looks at audio energy, frame indices, and model token confidence.
Validation
20 fixtures locally:
Timing:
bb96003bfa14a17no-melKnown target snippets with this PR:
notes_1408_clean:faire un effort sur le progrèswwii_belgique_fr:En Belgique... a été jour fériéuser2_2026-05-12:blouses médicales blanches et portantgouvernements et entreprises ont revuoblasts occupés de Donetsk... A decade ondépartemental les camions rougesdegré de pénétrationet les invite... et à boireChecks:
swift test --filter ChunkProcessorTestsswift test --filter TdtRefactoredComponentsTestsswift test --filter TdtDecoderV2Testsswift build -c release --product fluidaudiocligit diff --checkFixtures
The already-uploaded issue fixtures are enough to reproduce part of this. I also tested additional longer clips. The LibriVox clips are public-domain and can be attached or regenerated as clipped WAVs if useful. The private Amnis/voice fixtures are not committed here; I can attach specific ones only if we decide they are okay to make public.