Skip to content

fix(voice-button): stabilize speech flow and clarify replace mode#315

Open
SonyLeo wants to merge 6 commits intoopentiny:developfrom
SonyLeo:docs/update-speech-config
Open

fix(voice-button): stabilize speech flow and clarify replace mode#315
SonyLeo wants to merge 6 commits intoopentiny:developfrom
SonyLeo:docs/update-speech-config

Conversation

@SonyLeo
Copy link
Copy Markdown
Collaborator

@SonyLeo SonyLeo commented Mar 18, 2026

变更背景

本次改动主要聚焦 VoiceButton 的语音识别流程和文档说明,目的是统一“追加 / 替换”两种输入体验,并修复语音会话中的几个边界问题。

主要收敛的问题包括:

  • autoReplace 的行为不够直观,开启后在部分场景下看起来仍像“追加”
  • continuous 场景下识别结果会话语义不清晰,容易和替换模式产生理解偏差
  • 手动停止录音时,浏览器补发的最终 transcript 存在丢失风险
  • speechConfig 动态更新后,底层语音 handler 仍可能沿用旧配置
  • demo 和文档中对 replace 模式、continuous 模式、lang 配置的说明不够清楚

主要改动

1. 收敛语音写回语义

  • autoReplace: false 时,语音识别结果继续追加到输入框末尾
  • autoReplace: true 时,录音期间使用识别结果替换整个输入框内容
  • continuous: true 场景下,替换内容为“当前录音会话的累计识别结果”,而不是仅保留最后一句

2. 优化 Web Speech 结果处理

  • 按增量方式解析 SpeechRecognitionEvent
  • 区分 final / interim 结果,避免 continuous 场景下重复拼接历史内容
  • 统一会话内 transcript 的累计逻辑,使 replace 模式和 continuous 模式的表现一致

3. 修复 stop / end 生命周期问题

  • 手动 stop() 时不再提前清理事件处理器
  • 改为由原生 onend 统一完成收尾,避免丢失最后一段 transcript
  • 调整回调顺序,保证录音状态、结束事件和最终识别结果的一致性

4. 让 speechConfig 按最新配置生效

  • 避免在组件初始化时一次性快照 speechConfig
  • langcontinuousinterimResultscustomHandler 等配置改为在下一次 start() 时按最新值生效
  • 解决运行时切换语音配置后,底层 handler 仍使用旧参数的问题

5. 同步更新文档和示例

  • 将 demo 的展示方式从“混合输入 / 连续识别”调整为“追加模式 / 替换模式”
  • 补充 autoReplacecontinuous 组合使用时的行为说明
  • 补充 speechConfig.lang 的配置说明和常见取值示例
  • 同步更新 VoiceButton / SpeechConfig 相关文档注释

影响说明

  • 本次改动后,autoReplace: true 的语义明确为“整框替换”
  • 如果输入框中已有手动输入内容,开启 replace 模式后会被当前录音结果覆盖
  • 若同时开启 continuous: true,输入框会随着当前录音会话的累计识别结果持续更新

验证情况

  • pnpm.cmd -F @opentiny/tiny-robot build
  • pnpm.cmd -F tiny-robot-test build
  • pnpm.cmd -F tiny-robot-test test -- src/voice-button/index.spec.ts
  • pnpm.cmd -F docs build

Summary by CodeRabbit

  • New Features

    • Voice input modes renamed to "Append" and "Replace" with matching UI; Replace updates content live, Append adds to end.
    • Interim transcription now reflects accumulated finalized text plus current partial results for smoother live feedback.
  • Documentation

    • Updated voice recognition docs, demo and language guidance; clarified voice-button handler behavior and speech-config options.
  • Bug Fixes

    • Improved placeholder, focus and session handling to avoid stale or duplicated transcription.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 18, 2026

Walkthrough

Renames voice modes from mixed|continuous to append|replace, changes demo/docs and VoiceButton props/types, adds session-based transcript accumulation and auto-replace behavior, and updates Web Speech handling to parse/accumulate finals and emit interim/final with updated callbacks.

Changes

Cohort / File(s) Summary
Documentation & Demo
docs/demos/sender/voice-input.vue, docs/src/components/sender.md
Replaced mode values/labels with append/replace, updated UI text, placeholders, demo descriptions, added lang tips, and tightened VoiceButton prop/type docs.
Voice Button Implementation
packages/components/src/sender-actions/voice-button/index.vue, packages/components/src/sender-actions/voice-button/speech.types.ts
Added session state (committedTranscript etc.), helpers to reset/append/replace transcripts, conditional autoReplace flows for onStart/onInterim/onFinal/onEnd/onError, and removed onVoiceButtonClick from SpeechConfig.
Web Speech Handler
packages/components/src/sender-actions/voice-button/webSpeechHandler.ts
Added parseSpeechRecognitionResult, tracking of finalizedTranscript, accumulate final pieces, emit interim as accumulated finals+partial, invoke onEnd with finalized transcript, and reset session state across start/stop/error.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant Button as VoiceButton
    participant Speech as SpeechRecognition
    participant Editor as Editor/Document

    rect rgba(255, 200, 100, 0.5)
    Note over User,Editor: Legacy / append-on-final flow
    User->>Button: Press record
    Button->>Speech: start()
    Speech-->>Button: onresult (interim)
    Button->>Button: emit interim event
    Speech-->>Button: onresult (final)
    Button->>Editor: append final transcript (if autoInsert)
    Button->>Editor: focus
    end

    rect rgba(100, 200, 255, 0.5)
    Note over User,Editor: New autoReplace / accumulated flow
    User->>Button: Press record
    Button->>Speech: start() (autoReplace config)
    Speech-->>Button: onresult (finals + interim)
    Button->>Button: parseSpeechRecognitionResult()
    Button->>Button: update finalizedTranscript
    Button->>Editor: replace text at speechRange with interim (finals+partial)
    Speech-->>Button: onresult (final)
    Button->>Editor: replace text at speechRange with merged final transcript
    Button->>Button: reset session
    Button->>Editor: focus
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped and listened, soft and snappy,
I learned to append or swap — how happy!
Finals gather, whispers fill the line,
Replace or add — each chunk aligns.
A tiny rabbit taps and sends on time.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'fix(voice-button): stabilize speech flow and clarify replace mode' accurately reflects the main changes, which involve stabilizing speech event handling and clarifying the replace mode behavior in the voice button component.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 18, 2026

✅ Preview build completed successfully!

Click the image above to preview.
Preview will be automatically removed when this PR is closed.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Mar 18, 2026

@SonyLeo SonyLeo linked an issue Mar 18, 2026 that may be closed by this pull request
@SonyLeo SonyLeo force-pushed the docs/update-speech-config branch from 4c60cae to 855ae9e Compare March 18, 2026 09:25
@SonyLeo SonyLeo marked this pull request as ready for review March 20, 2026 06:50
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
packages/components/src/sender-actions/voice-button/index.vue (1)

25-57: Consider simplifying speechRange state.

The speechRange.to field is tracked (line 54) but never read—the replacement always uses docSize as the end position. Since from is always 0 for autoReplace mode, you could simplify to just a boolean flag indicating whether a recording session has started.

However, if this tracking is intentional for future features (e.g., partial replacement from cursor position), feel free to keep it.

♻️ Optional simplification
-const speechRange = ref<{ from: number; to: number } | null>(null)
+const hasStartedRecording = ref(false)

 const resetSpeechRange = () => {
-  speechRange.value = null
+  hasStartedRecording.value = false
 }

 const insertTranscript = (transcript: string) => {
   // ... early returns ...
   
   // autoReplace 模式:替换整个输入框内容
-  if (speechRange.value === null) {
-    speechRange.value = {
-      from: 0,
-      to: 0,
-    }
-  }
+  hasStartedRecording.value = true

   const docSize = editorInstance.state.doc.content.size
-  const tr = editorInstance.state.tr.insertText(transcript, speechRange.value.from, docSize)
+  const tr = editorInstance.state.tr.insertText(transcript, 0, docSize)
   editorInstance.view.dispatch(tr)

-  speechRange.value = {
-    from: speechRange.value.from,
-    to: speechRange.value.from + transcript.length,
-  }
   editorInstance.commands.focus('end')
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/components/src/sender-actions/voice-button/index.vue` around lines
25 - 57, The speechRange.to field is never read in insertTranscript (autoReplace
uses docSize), so simplify state by removing speechRange.to and either replace
speechRange with a boolean like speechStarted or keep only speechRange.from;
update insertTranscript to initialize and check that flag (e.g., speechStarted
or speechRange.from === 0) and use speechRange.from (or 0) as the insert start,
and update the single state field accordingly; ensure references to
speechRange.value.to are removed and editorInstance, autoReplace, and
insertTranscript behavior remain unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/src/components/sender.md`:
- Line 1: The file docs/src/components/sender.md contains a UTF-8 BOM (U+FEFF)
at the start; remove the BOM so the file begins with the first markdown
character and re-save the file as UTF-8 without BOM (e.g., strip the leading
U+FEFF or re-encode), ensuring consistency with speech.types.ts which was
corrected earlier.

In `@packages/components/src/sender-actions/voice-button/speech.types.ts`:
- Line 1: The file speech.types.ts contains a UTF‑8 BOM at the very start;
remove the leading BOM character so the file begins with the first TypeScript
token, re-save the file as UTF‑8 without BOM, and re-commit; if your editor adds
BOMs automatically, update its save settings (or run a one-time clean-up
command) to ensure speech.types.ts stays UTF‑8 without BOM.

---

Nitpick comments:
In `@packages/components/src/sender-actions/voice-button/index.vue`:
- Around line 25-57: The speechRange.to field is never read in insertTranscript
(autoReplace uses docSize), so simplify state by removing speechRange.to and
either replace speechRange with a boolean like speechStarted or keep only
speechRange.from; update insertTranscript to initialize and check that flag
(e.g., speechStarted or speechRange.from === 0) and use speechRange.from (or 0)
as the insert start, and update the single state field accordingly; ensure
references to speechRange.value.to are removed and editorInstance, autoReplace,
and insertTranscript behavior remain unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: cc096f6e-f1c4-4f6d-b278-1a831a223417

📥 Commits

Reviewing files that changed from the base of the PR and between 16fd9bf and 855ae9e.

📒 Files selected for processing (4)
  • docs/demos/sender/voice-input.vue
  • docs/src/components/sender.md
  • packages/components/src/sender-actions/voice-button/index.vue
  • packages/components/src/sender-actions/voice-button/speech.types.ts

Comment thread docs/src/components/sender.md
Comment thread packages/components/src/sender-actions/voice-button/speech.types.ts Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/src/components/sender.md`:
- Around line 167-173: The markdown table for the `lang` examples is missing the
required blank line before the admonition terminator (`:::`), causing the
terminator to be parsed as a table row; edit the `sender.md` section containing
the `| 值 | 说明 |` table and insert a single empty line after the last table row
(after the `| `en-US` | 美式英语 |` row) before the `:::` terminator so the
admonition closes correctly.

In `@packages/components/src/sender-actions/voice-button/webSpeechHandler.ts`:
- Around line 151-156: The manual stop is clearing event handlers before the
SpeechRecognition final onresult can arrive; change the stop flow so you call
this.recognition.stop() first and allow the native onresult→onend sequence to
run (letting onend invoke cleanup())—do not call cleanup() or
resetSessionTranscript() before recognition.stop(); also remove the eager
callbacks.onEnd() invocation at useSpeechHandler.ts line where callbacks.onEnd()
is called (around useSpeechHandler.ts:99) so the final transcript
(finalizedTranscript) is accumulated by the native events and delivered before
cleanup resets handlers/state.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 00f621fb-799e-4808-bf33-16701a4df123

📥 Commits

Reviewing files that changed from the base of the PR and between 855ae9e and 986c63c.

📒 Files selected for processing (5)
  • docs/demos/sender/voice-input.vue
  • docs/src/components/sender.md
  • packages/components/src/sender-actions/voice-button/index.vue
  • packages/components/src/sender-actions/voice-button/speech.types.ts
  • packages/components/src/sender-actions/voice-button/webSpeechHandler.ts

Comment thread docs/src/components/sender.md Outdated
Comment thread packages/components/src/sender-actions/voice-button/webSpeechHandler.ts Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
docs/src/components/sender.md (1)

162-173: ⚠️ Potential issue | 🟡 Minor

Close the lang tip block.

The :::tip lang 语言说明 block never terminates, so everything after the table can be parsed inside the admonition. Add the missing closing ::: after the last row.

📝 Suggested markdown fix
 :::tip lang 语言说明
 `lang` 用于指定语音识别语言,建议显式传入,并与页面的 `html lang` 保持一致,避免页面语言和浏览器环境语言不一致时出现识别偏差。
 
 常见取值示例:
 
 | 值 | 说明 |
 | --- | --- |
 | `en` | 英语 |
 | `zh` | 中文 |
 | `zh-CN` | 简体中文 |
 | `en-US` | 美式英语 |
+
+:::
 
 #### 自定义语音服务
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/src/components/sender.md` around lines 162 - 173, The tip admonition
starting with ":::tip lang 语言说明" is not closed, causing the remainder of the
document to be included inside the block; locate the ":::tip lang 语言说明" block in
docs/src/components/sender.md (the block containing the language table) and add
the missing closing marker ":::" immediately after the table's last row so the
admonition terminates properly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@packages/components/src/sender-actions/voice-button/index.vue`:
- Around line 62-64: The current speechOptions constant snapshots
props.speechConfig once and becomes stale; make it reactive by replacing the
plain object with a computed or a reactive wrapper that derives from
props.speechConfig (e.g., replace const speechOptions = {...props.speechConfig}
with a computed(() => ({...props.speechConfig})) or reactive copy), and add a
watcher on props.speechConfig to recreate the speech recognition handler (the
customHandler / recognition handler creation logic) whenever speechConfig
changes so the handler uses updated lang, continuous, interimResults, and
customHandler settings.

---

Duplicate comments:
In `@docs/src/components/sender.md`:
- Around line 162-173: The tip admonition starting with ":::tip lang 语言说明" is
not closed, causing the remainder of the document to be included inside the
block; locate the ":::tip lang 语言说明" block in docs/src/components/sender.md (the
block containing the language table) and add the missing closing marker ":::"
immediately after the table's last row so the admonition terminates properly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 6c69696d-a33a-4695-96e5-6e20d44d5b43

📥 Commits

Reviewing files that changed from the base of the PR and between 986c63c and f0a4ca2.

📒 Files selected for processing (4)
  • docs/demos/sender/voice-input.vue
  • docs/src/components/sender.md
  • packages/components/src/sender-actions/voice-button/index.vue
  • packages/components/src/sender-actions/voice-button/speech.types.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • docs/demos/sender/voice-input.vue

Comment thread packages/components/src/sender-actions/voice-button/index.vue Outdated
@SonyLeo SonyLeo changed the title feat: add autoReplace support for sender and update voice button document fix(voice-button): stabilize speech flow and clarify replace mode Apr 16, 2026
@SonyLeo SonyLeo linked an issue Apr 16, 2026 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

🐛 [Bug]: 语音录入,容易有重复产生 【docs enhancement】speech config default lang

1 participant