fix(asr): align acoustic and semantic feature lengths to prevent tensor mismatch by JasonOA888 · Pull Request #309 · microsoft/VibeVoice

JasonOA888 · 2026-04-02T01:13:00Z

Fixes #220

Root Cause

The acoustic and semantic tokenizers use different encoder architectures with different downsampling ratios. For certain audio lengths, they produce slightly different frame counts (e.g. 228 vs 223 frames).

When the element-wise addition acoustic_features + semantic_features is attempted with mismatched temporal dimensions, PyTorch raises:

RuntimeError: The size of tensor a (228) must match the size of tensor b (223)

Fix

Truncate both feature sequences to the minimum common length before the addition, applied consistently to:

Short-audio path (direct processing): align before combining
Long-audio path (streaming): align after segment concatenation

Both paths already had the alignment logic for streaming but it was missing from the direct processing path.

…or mismatch The acoustic and semantic tokenizers use different encoder architectures with different downsampling ratios. When processing audio of certain lengths, they can produce slightly different frame counts (e.g. 228 vs 223 frames), causing a tensor size mismatch when the features are combined via element-wise addition. This fix truncates both feature sequences to the minimum length before combination, applied consistently to both the short-audio (direct) and long-audio (streaming) code paths. Fixes microsoft#220

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(asr): align acoustic and semantic feature lengths to prevent tensor mismatch#309

fix(asr): align acoustic and semantic feature lengths to prevent tensor mismatch#309
JasonOA888 wants to merge 1 commit intomicrosoft:mainfrom
JasonOA888:fix/issue-220-tensor-mismatch

JasonOA888 commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

JasonOA888 commented Apr 2, 2026

Root Cause

Fix

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant